We are seeing a rapid increase in the adoption of data warehouse on the Microsoft Azure platform. In a modern data warehouse architecture, a data warehouse is a centralized repository of integrated data from one or more disparate sources and stores current and historical data. This data then can be used for reporting and analysis of the data.
Very often while working with our customers, we have seen use cases for both Azure Synapse Analytics and Azure SQL database as the right fit for a data store. Ahead in this article, we will help to identify which technology option is best for you.
Azure Synapse Analytics is a cloud-based Platform as a Service (PaaS) offering on Azure platform which provides limitless analytics service using either serverless on-demand or provisioned resources—at scale. The key components are Synapse SQL pools, Spark, Synapse pipelines and studio experience. This article focuses on Synapse SQL pool which refers to the enterprise data warehousing features (OLAP) that are generally available in Azure Synapse. Azure Synapse SQL pool has been designed as a Massive Parallel Processing (MPP) System and leverages a scale-out Architecture to distribute computational processing of data across multiple nodes.
On the other hand, the Azure SQL database (Azure SQL DB) is a fully managed PaaS databased engine that handles most of the database management functions and mostly suitable for OLTP workloads based on Symmetric Multi-Processing (SMP) system. Azure SQL DB offer deployment options as a single database, elastic pool, and as a Managed Instance. This article covers a single database as the Azure SQL DB deployment option for comparison with Azure Synapse.
The next part covers some of the key considerations to help deciding which database option is best for your case. There may be other considerations not covered below which needs to investigated case-by-case basis.
Although it is not an individual metrics to identify for a minimum recommended size, Azure Synapse is an appropriate fit for data size and workload for 1 TB and more. This further provides a maximum of 240TB database limit for the rowstore and unlimited storage for columnstore tables making it a great fit for a larger database with lots of data and complex queries. As your data warehouse starts reaching near 1 TB or higher, Azure SQL Synapse should be considered.
For smaller data sizes An Azure SQL database should be considered which can scale-up efficiently for such smaller workloads. Azure SQL DB has a size limit for 8TB (General Purpose Tier) or 4TB (Business-critical tier) at this stage.
Azure Synapse is a great fit for the OLAP workload with set volume of reads and writes. With this approach, complex jobs (queries or load) are broken down into pieces and executed in parallel and enabling large data loads and complex queries to execute faster. The data often here is stored in a de-normalised form with a schema.
For a high volume of small read and writes and smaller data loads, An Azure SQL database can handle such tasks more efficiently. This is also useful in the case of normalised data stored in various tables.
Concurrent query users:
It’s important to take into the account for maximum concurrency available for both Azure SQL DB and Synapse. Azure Synapse provides up to a maximum of 128 concurrent queries depending on the service tier. If the usage pattern involves a high number of concurrent queries this can be supported or extended with Azure SQL DB or Azure Analysis service.
Azure SQL DB can handle a scenario where you have a large number of active users and logins for reporting and dashboard purposes on a smaller data size. For higher service tiers in Azure SQL DB Max concurrent workers requests can be up to 6400 and max concurrent session at 30000
Scale demands and cost:
Azure PaaS database services enable the ability to scale the service tier as per with workload requirements.
With Azure Synapse this gets more granular as it provides the ability to scale SQL pool compute by increasing and decreasing data warehouse units in a linear fashion to suit key operations such as complex aggregations, row scan and ingesting large data loads. This further has the ability to pause compute (independently of the storage) when no queries are run on data warehouse saving significant cost on compute charges.
Azure SQL DB has a set service tier configured for the appropriate compute. If data warehouse query patterns are simple, with small data loads, Azure SQL DB provides an easy to maintain data store with predictable cost patterns.
Table features and queries support:
Even with same underlying SQL engine for both Azure Synapse and SQL DB, there are few minor differences while querying or creating the tables in Azure Synapse in comparison with Azure SQL DB such as:
- Azure Synapse doesn’t support cross-database queries
- Few unsupported table features or constraints in Azure Synapse such as ALTER TABLE. A detailed list can be found at this link
- Merge statements are not currently supported with Azure Synapse and limitations with insert and delete operations for the logging purpose
Always Encrypted is the feature designed to protect sensitive data. This allows for data to be available to client-side applications and not to database engine or administrators. Azure SQL DB supports Always Encrypted while Azure Synapse does not support it.
Snapshot, backup and replication:
Azure Synapse provides a data warehouse snapshot functionality. This can be leveraged to re-create the data to suit business continuity and disaster recovery requirements. Further, this is very useful in a scenario where you have to recreate a copy of your data warehouse for test and development purposes. This is a built-in feature and allows for both automatic and user-defined restore points. This currently supports an eight-hour recovery point objective (RPO) with snapshots available for the past 7 days in a primary Azure region. A geo-back up is available once a day at this stage.
Azure SQL database provides inbuilt back up which can be configured for longer-term retention for compliance and auditing reasons. Azure SQL DB also supports active geo-replication (as compared to Azure Synapse which is primarily based on storage replication and not synchronised with primary)
PolyBase allows access to data external to the database using T-SQL. This is quite useful while importing data from Azure Blob storage or Data lake. Azure Synapse supports PolyBase to facilitate such a scenario.
Azure SQL database doesn’t support PolyBase at this stage.
Azure SQL database is a good fit for a data warehouse with a small data size and low volume data loads. It provides ease of maintenance, predictable cost and flexible RPOs. On the other hand, Azure Synapse with SQL pool is able to support a large data size for a data warehouse with greater complexity.
Although both Azure SQL DB and Azure Synapse are PaaS options on the Microsoft Azure platform, the original purposes are slightly different. Azure SQL DB is well suited for the OLTP workloads and Azure Synapse for OLAP. However, this doesn’t mean that Azure Synapse is always required for Data warehouse purposes. WARDY IT Solutions can help you choose the right fit for your data warehouse and reporting needs based on various factors such as data size, Complexity, cost, and feature support.