What Exactly is Microsoft Synapse?
The other day an exalted customer shared that they’ve acquired Synapse and now they’re ready for implementing semantic models with Power BI. The client wasn’t sure how to give business users access to Synapse so cool self-service BI can finally start. In the process of the conversation, it became clear that they opened Synapse Studio and were left with the impression that Synapse has semantic modeling features. This is what happens when Marketing gets involved and people get confused about what a tool actually does. Let’s attempt to clear this confusion.
What’s Synapse?
Think of Synapse (aka Azure Synapse Analytics) as a umbrella name that spans multiple unrelated (or rather loosely) related services that are sold separately but are bundled together to fulfill a vision of a “unified analytical platform”. This vision is further emphasized by Synapse Studio – an online tool to work with and monitor the Synapse services.
Let’s explain each service in the order it’s listed in the Azure pricing calculator. Again, each service has its own pricing model, and I don’t think that bundling them together gives you any price break.
- Data Integration – This is Azure Data Factory, which is typically acquired and installed as a standalone service. Why would you want to create ADF pipelines inside Synapse Studio instead of ADF Studio is beyond me. Another caveat to watch for regarding data integration is that Microsoft seemingly emphasizes the role of ADF data flows (at least there is a separate “Data flows” section in Synapse Studio) despite that the ELT pattern is a best practice to load data into the SQL dedicated pool.
- Data Warehousing – Synapse comes with a preconfigured “serverless” pool that can be used to virtualize data stored in Azure Data Lake. This is a very useful service that allows you to query data in ADLS files using T-SQL. Check this case study to learn how Prologika used this feature in a real-life project. This tab also provides pricing for a dedicated SQL pool but since there is a separate tab for it, I’ll cover it further down.
- Big Data Analytics – You can optionally provision an Azure Spark pool to process data or apply ML at scale using the Microsoft implementation of Apache Spark.
- Log and Telemetry Analysis – A recently introduced type of pool for analyzing large volumes of data streaming (i.e. log and telemetry data) from applications, websites, or IoT devices using Kusto Query Language (KDL).
- Dedicated SQL Pool – This is your SQL Server (or rather Azure SQL Database) on steroids for storing and querying massive data volumes that was previously known as Azure SQL DW. While you gain scalability, you lose various T-SQL features so don’t think that you can seamlessly migrate your on-prem SQL databases to Synapse. Also, for now, a dedicated pool is limited only to a single database.
- Azure Synapse Link – Another recently introduced service to automatically synchronize data from Azure Cosmos DB and SQL Server 2022 (without using change data capture).
What Synapse is not?
- Synapse is not a semantic modeling tool. Although you’ll see a Power BI section in the Develop tab of Synapse Studio, modeling is still done with Power BI Desktop (or other professional tools) and published to Power BI. As with ADF, why would a developer want to register your Power BI artifacts in Synapse Studio is another thing that escapes me.
- Synapse is not a data integration tool, master data management tool, or data cataloging tool.
- Synapse shouldn’t be your default option for data warehousing in the cloud. In my experience, Synapse would be an overkill for data processing needs of most companies because there are more cost-effective options for SQL Server in the cloud with less data.