Posts

Prologika Newsletter Winter 2023

I’ve covered my first impression about Microsoft Fabric in a series of blogs starting with this one. In the recap, I said that certain features might benefit organizations that are already on Power BI Premium because all Fabric features are available in Power BI Premium plans. However, as they stand, features such as Lakehouse Copy to Table (no automated synchronization), ADF pipelines (no connectivity to on-prem data sources), Data Warehouse (missing important T-SQL features), Direct Lake (only available online), are half-baked, thus barring wide-spread Fabric adoption. This newsletter covers a good case for Fabric that I’ve recommended for a large insurer that has invested heavily in Microsoft Dynamics 365 and Power BI Premium. I’ll be in a better position to confirm after a POC with larger datasets so follow my blog for future updates.

The Dynamics Integration Saga

Dynamics Online, aka Dynamics 365, epitomizes the customer’s struggle to export their data hosted in SaaS cloud offerings for analytics or other purposes. Since unfortunately Microsoft doesn’t provide direct access to the Dynamics native storage (Azure SQL Database), which often is the best solution, Dynamics has “compensated” throughout the years by provided the following options:

OptionProsCons
OData APIOriginal interfaceSlow, no query folding, currently deprecated
Data Export ServiceExport to Azure SQL DatabaseDeprecated
TDS endpointReal-time direct access to Dynamics Azure SQL DatabaseNot designed for massive extracts; slow
Dataverse APICurrently recommended by MicrosoftAs far as I can tell, no REST operations (see below); questionable performance
Dataverse connectorSame as Dataverse APISame as Dataverse API
Export to ADLSAutomatic synchronizationPerpetually in preview, files stored in proprietary CDM folder format
Fabric linkAutomatic synchronization; delta tables; SQL joinsRequires Fabric licensing; up to 60 min data sync delay; TBD

One of my clients, a large insurance company, has invested heavily in Dynamics 365. They have implemented various Power BI reports that would import millions of rows directly from common CRM entities, such as Account, Opportunity, as well as custom entities, using the OData API. The main struggle has always been that dataset refreshes take hours to complete. Because API calls don’t fold in Power Query, they apply REST operations to select a subset of columns, join tables, and filter data. They rely heavily on Power Query to shape the data which has worked reasonably well.

Dataverse Link to Fabric

Dataverse Link to Fabric looks promising in addressing some of the above pitfalls. Once you set it up, it extracts Dynamics data as Delta tables in a Fabric lakehouse. From there, you can use the SQL endpoint to join and filter tables.

After the initial synchronization, it might take up to 60 minutes to synchronize data changes in Dynamics so not really real-time data access but much better than the current state.

While working through the setup, keep the following in mind:

  1. The Link to Microsoft Fabric menu is found under the Analyze menu in the ribbon and not at the table level as the video shows. That’s because it exports all the tables.
  2. Referring to the comparison table between Fabric Link and Synapse Link, “No copy, no ETL direct integration with Microsoft Fabric” should be read as “No ETL” because the data is still copied. Further, “Data stays in Dataverse” should read as “Data is automatically copied to Fabric lakehouse”.
  3. As far as prerequisites, the most important one is “the workspace you choose to link with Dataverse must be assigned to a premium capacity in the same region as your Dataverse environment” with the emphasis on the “same region”. Not having them in the same region is a show stopper.

Cutting Cost

What if you are not on Power BI Premium and you are on a tight budget? Although I haven’t tested, this approach (suggested by Riccardo Perico) might work because Link to Fabric is available in all fabric capacities:

  1. Purchase a lower Fabric capacity, such as F2 or F4.
  2. Configure Dataverse Link to Fabric as usual.
  3. After the initial synchronization, pause and resume the capacity when not in use, such by using this approach.
  4. Import the data in a Power BI semantic model.

Dataverse Link to Fabric should queue the updates until the capacity is resumed.

Benefits

Again more testing is needed, but at this point I see the following advantages of using the Dataverse Link to Fabric:

  1. CRM data is staged in open file format (Delta Parquet tables) and available for any reporting needs without impacting Dynamics.
  2. No need to use APIs. Instead, you can use SQL to join and filter tables, such as to look up columns or filter data, which is much better than using the API syntax. This will also help you centralize column lookups in SQL views instead of doing this over and over for each report.
  3. Potentially much faster Power BI dataset refreshes
  4. No additional cost to Power BI Premium customers because the Fabric features are available in Power BI Premium.


Teo Lachev
Prologika, LLC | Making Sense of Data
logo

A Case for Microsoft Fabric: Dynamics 365 Analytics

I might have identified at last a good case for Microsoft Fabric, but I’ll be in a better position to confirm after a POC with larger datasets. Dynamics Online, aka Dynamics 365, epitomizes the customer’s struggle to export their data hosted in SaaS cloud offerings for analytics or other purposes. Since unfortunately Microsoft doesn’t provide direct access to the Dynamics native storage (Azure SQL Database), which often could be the simplest and fastest solution, Dynamics has “compensated” throughout the years by introducing and sunsetting various options:

OptionProsCons
OData APIOriginal interfaceSlow, no query folding, currently deprecated
Data Export ServiceExport to Azure SQL DatabaseDeprecated
TDS endpointReal-time direct access to Dynamics Azure SQL DatabaseNot designed for massive extracts; slow
Dataverse APICurrently recommended by MicrosoftAs far as I can tell, no REST operations (see below); questionable performance
Dataverse connectorSame as Dataverse APISame as Dataverse API
Export to ADLSAutomatic synchronizationPerpetually in preview, files stored in proprietary CDM folder format
Fabric linkAutomatic synchronization; delta tables; SQL joinsRequires Fabric licensing; up to 60 min data sync delay; TBD

One of my clients, a large insurance company, has invested heavily in Dynamics 365. They have implemented various Power BI reports that would import millions of rows directly (without data staging) from common CRM entities, such as Account, Opportunity, as well as custom entities, using the OData API. The main struggle has always been that dataset refreshes take hours to complete. Because API calls don’t fold in Power Query, they apply REST operations to select a subset of columns, join tables, and filter data. They rely heavily on Power Query to shape the data which has worked reasonably well.

Dataverse Link to Fabric looks promising in addressing some of the above pitfalls. Once you set it up, it extracts Dynamics data as Delta tables in a Fabric lakehouse. From there, you can use the SQL endpoint to join and filter tables. After the initial synchronization, it might take up to 60 minutes to synchronize data changes in Dynamics so not really real-time data access but much better than the current state.

While working through the setup, keep the following in mind:

  1. The Link to Microsoft Fabric menu is found under the Analyze menu in the ribbon and not at the table level as the video shows. That’s because it exports all the tables.

  1. Referring to the comparison table between Fabric Link and Synapse Link, “No copy, no ETL direct integration with Microsoft Fabric” should be read “No ETL” because the data is still copied. Further, “Data stays in Dataverse” should read “Data is automatically copied to Fabric lakehouse”. Dataverse is just a wrapper on top of Azure SQL Database.
  2. As far as prerequisites, the most important one is “the workspace you choose to link with Dataverse must be assigned to a premium capacity in the same region as your Dataverse environment” with the emphasis on the “same region”. Not having them in the same region is a show stopper.

Again more testing is needed, but at this point I see the following advantages of using the Dataverse Link to Fabric:

  1. CRM data is staged in open file format (Delta Parquet tables) and available for any reporting needs without impacting Dynamics.
  2. No need to use APIs. Instead, you can use SQL to join and filter tables, such as to look up columns or filter data, which is much better than using the API syntax. This will also help you centralize column lookups in SQL views instead of doing this over and over for each report.
  3. Potentially much faster Power BI dataset refreshes
  4. No additional cost to Power BI Premium customers because the Fabric features are available in Power BI Premium.
  5. In a long term bypass importing data and providing users with almost real time (CRM data changes might take up to 60 minutes to synchronize) when the Microsoft DirectLake technology matures. I’m personally very interested in this option but I’ll wait until Power BI Desktop supports it.

Microsoft Dataverse: A Verse Without Rhymes

The cloud is supposed to make things easier, right? Well, not necessarily, as a client and I have recently discovered. They use Dynamics Online 365 and they are facing long refresh times for Power BI datasets that import data from Dynamics. Dynamics saves its data in an Azure SQL Database, which now goes by the name Dataverse (previously known as Common Data Service for Apps or CDS-A). I wrote two years ago about the pros and cons of CDS-A. The ugly award back then went to getting the data out of Dynamics. I wrote “For years people were complaining that after migrating from the on-premises Dynamics to the cloud, they lost the ability to connect to its database directly and they had to rely on the REST APIs (slow) or Data Export Service to export the data to an SQL Server Database (fast but requires additional effort and budget).”

Have things changed in the past two years? Yes, but not necessarily for better. Microsoft now has three ways of getting the data out of Dynamics (staging to Azure SQL Database has been deprecated).

OptionProsCons
Connect to Dynamics REST APIsAbility to pass predicates on the REST API call, such as column lists, joins, and filtersSlow, no query folding
Staging to ADLSMicrosoft preferred approach, automatic data synchronizationNo query folding, no DirectQuery, not enough compute resources
DTS endpointImport and DirectQuery modes, query folding worksNot enough compute resources, not a strategic option

Let’s take a more in-depth look at the new options: staging to ADLS and the DTS endpoint.

Staging to ADLS

Microsoft invests heavily and recommends staging the Dynamics data out to Azure Data Lake Storage Gen 2 (ADLS). Once you set up your data lake, you go to the Power Platform Admin Center and specify which entities you want to stage out. Once the initial snapshot completes, changes to Dynamics entities are automatically propagated in almost real time. From there, you can use the Power BI “Azure Data Lake Storage Gen 2” connector (make sure to expand the “cdm” folder) to import the data. Microsoft explains the process in more detail in this video. What a clumsy solution though! From a perfect store (SQL Server) that supports querying, joining, query folding, etc, we stage the data out and save it as flat files and lose all of these. You would think that the Power BI refreshes will be blazingly fast. After all, we read from a bunch of CSV files. Unfortunately, I ran some tests to import an entity that had around 1 million rows and I had to cancel it after one hour. By comparison, the original query that used the REST API clocked at 30 minutes. How come?

The moment you stage data to flat files you embrace a very inefficient way of consuming that data. Since there is no “server”, Power Query can’t pass predicates to the data source. So, if you have an entity with a million rows and you need only a few columns and a subset of rows, Power Query would load all the data from the data lake before it applies the predicates.

As we’ve found, ADLS will not cut through this much data as it does not have the compute and Power Query itself does not bring enough compute for large datasets and joins, etc. Microsoft’s solution? Throw in more compute by using the Synapse SQL on-demand connector so the architecture now becomes Dynamics -> ADLS -> Synapse (OD) -> Power BI! Of course, this will entail an additional investment on your part in the “compute” layer, but this is the price to pay when you want your data fast, right?

Consider the ADLS staging only for small entities unless you want to invest in Synapse. I personally decided against pursuing this path as it’s an overkill for the simple request to get access to the data.

TDS Endpoint

I rejoiced when I recently learned that we know have a TDS endpoint to Dataverse. TDS is the native protocol that SQL Server uses to communicate with the client app. The TDS endpoint is essentially a wrapper on top of the Dataverse Azure SQL Database. And Power BI has a Dataverse connector that supports import and DirectQuery modes. Time to celebrate, right? This is exactly what we wanted since now we don’t have to worry about staging the data out and running out of compute thanks to the horribly inefficient way of loading data from ADLS. We now have real-time access to data and Power Query can pass predicates to SQL Server.

Unfortunately, no luck importing that entity as the refresh timed out. As it turned out, TDS endpoint is meant for real-time reporting over relatively smaller (no definition from Microsoft about “smaller”) datasets. With large datasets and joins involved queries would require even more compute and processing time. So, back to square one. We don’t have enough compute there too and there is no workaround. And Microsoft doesn’t plan to invest in the TDS endpoint since staging to ADLS is their preferred solution.

I failed to understand what prevents Microsoft from assigning more “compute” to the TDS endpoint. Surely, customers spending millions in licensing fees deserve an option for a dedicated environment with enough compute like Power BI offers with premium capacities. If the concern is that direct access can impact production loads, this is something the client should worry about and address, such by connecting to the read-only replica of the Azure SQL Database, which I’m sure it’s privately available.

I always consider it a travesty when SaaS apps don’t give you access to the data in its native storage, which usually is a relational database, and therefore a perfect store.  Join my quest to persuade Microsoft in providing direct access with enough compute to your data in the cloud as you have with their on-prem offerings.