Notes on Fabric F2 Performance: Report Load

What a better way to spend a lazy holiday afternoon than to do more Fabric performance testing? In my previous post, I shared my results from a single-threaded ETL load test to gauge the F2 ingest performance and F2 did pretty well (or at least outperformed Azure SQL DB). Will F2 hold as parallelism increases? Throughput testing is especially important for report loads because parallel tasks can run within a report, such as visuals executing DAX queries in parallel, and across reports, such as when concurrent report requests overlap.

I used the artifacts are included in the “Microsoft Dashboard in a Day” for this test and load tested only the first report page.

The Sales fact table in the semantic model has over seven million rows so it represents a good size dataset. Naturally, the more involved the report is and the more data the semantic model has, the more CPU power and parallelism are needed. I used the Microsoft Power BI Dedicated Capacity Load Assessment Tool and configured it to filter the report on different years in order to avoid report caching. I ran four tests for 1, 2, 3 and 4 virtual users with no think time, and each test ran for 10 minutes.

Here are the results:

UsersTotal report rendersRenders per user
1631×63
21002×50
31353×45
41644×41

Here are some additional findings:

  1. Within the first minute or so, F2 generates reports fast, presumably because bursting comes into play to let F2 borrow and recruit more CPU resources. As the sustained load continues, Fabric starts scaling back and throttling CPU.
  2. As time goes by, report executions are getting increasingly slower. While during the first minute a report can take 1-2 seconds for example, later it might take as much as 40 seconds to render (users are unlikely to tolerate this). This happens even with one virtual user, presumably because the quarter of the core is insufficient.
  3. I haven’t encountered any errors. All report executions succeeded irrespective of how long they take.

What all these tests mean is that if Fabric is appealing to you, F2 can be a viable option for smaller organizations where report users are expected to run reports sporadically. In case of a sudden load, such as everyone running reports at 8 AM on Monday, Fabric bursting can elevate the pressure for the first minute or so. Of course, you need to weigh in many other factors, such as relenting control to Microsoft, waiting for the technology to mature, avoiding lock-in, evaluating budget (note that viewers would still need at least Power BI Pro individual licenses for capacities lower than F64), and various other considerations I covered in my previous Fabric-related posts.

Notes on Fabric F2 Performance: Warehouse ETL

As inspired by Amir Netz‘s encouragement to partners to test the Fabric F2 capacity performance, I got on a quest to test what it would do to ETL loads for Fabric Warehouse. I must admit that I was skeptical that a quarter of a core would take a warehouse off the ground, but as usual, life proved me wrong and “wrong” is a big understatement of what happened.

After provisioning a Fabric F2 capacity and a warehouse, I settled on the Retail Data Model for World Wide Importers sample star schema dataset consisting of five dimension tables and one fact table. In terms of performance, I was mostly interested in how long it would take for the ADF copy activity to insert all the data (50 million rows) in the fact table. Granted, it’s a limited test but enough to rule out the technology for real-life projects. Then, I compared the performance against Azure SQL Database Serverless running on up to 2 cores and provisioned by the free trial offer that Microsoft has on Azure. To exclude impact on data transfer between regions, both technologies were provisioned on East US 2 data region, which is the region where my Power BI tenant is hosted on.

Much to my surprise, it took less than two minutes to load all 50 million rows in F2, whereas it took 1 hour to load to load 27 million rows to Azure SQL Database before the maximum 30 GB disk space was exhausted! I couldn’t believe it so I ran the test three times to confirm. Surely, bursting helps a lot! Now, throughput would be a different story, but as far as the warehouse is concerned, it doesn’t matter because in most cases, data will be imported in a Power BI semantic model and the warehouse will be out of the picture. As a next step, I plan to test the report throughput to see what concurrent report load would saturate the F2 capacity.

In conclusion, the Fabric lowest capacity F2 ($262.80 monthly cost) could be a viable option for smaller organizations willing to make their foray in the Fabric world. On the downside, we must leave Fabric to marinate for a few months and add needed features, including surrogate keys and MERGE for Warehouse and on-prem connectivity for Azure Data Factory, in order to be in consideration for real-life projects. More tests are needed to gauge the F2 report throughput.

UPDATE 12/24/2023 I was curious how much loading the same dataset from a CSV file would impact performance. It took much longer: 18 minutes. The most significant factor was that loading from CVS requires staging to a data lake although this appears redundant because the CVS file was in a lakehouse in the same Power BI workspace. ADF spent a total of 18 minutes in the two-staged copy (ten minutes to stage the data and eight minutes to load the fact table from the staged copy). Therefore, Parquet outperformed significantly CVS, probably because the Microsoft-provided Parquet file was compressed.

Prologika Newsletter Winter 2023

I’ve covered my first impression about Microsoft Fabric in a series of blogs starting with this one. In the recap, I said that certain features might benefit organizations that are already on Power BI Premium because all Fabric features are available in Power BI Premium plans. However, as they stand, features such as Lakehouse Copy to Table (no automated synchronization), ADF pipelines (no connectivity to on-prem data sources), Data Warehouse (missing important T-SQL features), Direct Lake (only available online), are half-baked, thus barring wide-spread Fabric adoption. This newsletter covers a good case for Fabric that I’ve recommended for a large insurer that has invested heavily in Microsoft Dynamics 365 and Power BI Premium. I’ll be in a better position to confirm after a POC with larger datasets so follow my blog for future updates.

The Dynamics Integration Saga

Dynamics Online, aka Dynamics 365, epitomizes the customer’s struggle to export their data hosted in SaaS cloud offerings for analytics or other purposes. Since unfortunately Microsoft doesn’t provide direct access to the Dynamics native storage (Azure SQL Database), which often is the best solution, Dynamics has “compensated” throughout the years by provided the following options:

OptionProsCons
OData APIOriginal interfaceSlow, no query folding, currently deprecated
Data Export ServiceExport to Azure SQL DatabaseDeprecated
TDS endpointReal-time direct access to Dynamics Azure SQL DatabaseNot designed for massive extracts; slow
Dataverse APICurrently recommended by MicrosoftAs far as I can tell, no REST operations (see below); questionable performance
Dataverse connectorSame as Dataverse APISame as Dataverse API
Export to ADLSAutomatic synchronizationPerpetually in preview, files stored in proprietary CDM folder format
Fabric linkAutomatic synchronization; delta tables; SQL joinsRequires Fabric licensing; up to 60 min data sync delay; TBD

One of my clients, a large insurance company, has invested heavily in Dynamics 365. They have implemented various Power BI reports that would import millions of rows directly from common CRM entities, such as Account, Opportunity, as well as custom entities, using the OData API. The main struggle has always been that dataset refreshes take hours to complete. Because API calls don’t fold in Power Query, they apply REST operations to select a subset of columns, join tables, and filter data. They rely heavily on Power Query to shape the data which has worked reasonably well.

Dataverse Link to Fabric

Dataverse Link to Fabric looks promising in addressing some of the above pitfalls. Once you set it up, it extracts Dynamics data as Delta tables in a Fabric lakehouse. From there, you can use the SQL endpoint to join and filter tables.

After the initial synchronization, it might take up to 60 minutes to synchronize data changes in Dynamics so not really real-time data access but much better than the current state.

While working through the setup, keep the following in mind:

  1. The Link to Microsoft Fabric menu is found under the Analyze menu in the ribbon and not at the table level as the video shows. That’s because it exports all the tables.
  2. Referring to the comparison table between Fabric Link and Synapse Link, “No copy, no ETL direct integration with Microsoft Fabric” should be read as “No ETL” because the data is still copied. Further, “Data stays in Dataverse” should read as “Data is automatically copied to Fabric lakehouse”.
  3. As far as prerequisites, the most important one is “the workspace you choose to link with Dataverse must be assigned to a premium capacity in the same region as your Dataverse environment” with the emphasis on the “same region”. Not having them in the same region is a show stopper.

Cutting Cost

What if you are not on Power BI Premium and you are on a tight budget? Although I haven’t tested, this approach (suggested by Riccardo Perico) might work because Link to Fabric is available in all fabric capacities:

  1. Purchase a lower Fabric capacity, such as F2 or F4.
  2. Configure Dataverse Link to Fabric as usual.
  3. After the initial synchronization, pause and resume the capacity when not in use, such by using this approach.
  4. Import the data in a Power BI semantic model.

Dataverse Link to Fabric should queue the updates until the capacity is resumed.

Benefits

Again more testing is needed, but at this point I see the following advantages of using the Dataverse Link to Fabric:

  1. CRM data is staged in open file format (Delta Parquet tables) and available for any reporting needs without impacting Dynamics.
  2. No need to use APIs. Instead, you can use SQL to join and filter tables, such as to look up columns or filter data, which is much better than using the API syntax. This will also help you centralize column lookups in SQL views instead of doing this over and over for each report.
  3. Potentially much faster Power BI dataset refreshes
  4. No additional cost to Power BI Premium customers because the Fabric features are available in Power BI Premium.


Teo Lachev
Prologika, LLC | Making Sense of Data
logo

A Case for Microsoft Fabric: Dynamics 365 Analytics

I might have identified at last a good case for Microsoft Fabric, but I’ll be in a better position to confirm after a POC with larger datasets. Dynamics Online, aka Dynamics 365, epitomizes the customer’s struggle to export their data hosted in SaaS cloud offerings for analytics or other purposes. Since unfortunately Microsoft doesn’t provide direct access to the Dynamics native storage (Azure SQL Database), which often could be the simplest and fastest solution, Dynamics has “compensated” throughout the years by introducing and sunsetting various options:

OptionProsCons
OData APIOriginal interfaceSlow, no query folding, currently deprecated
Data Export ServiceExport to Azure SQL DatabaseDeprecated
TDS endpointReal-time direct access to Dynamics Azure SQL DatabaseNot designed for massive extracts; slow
Dataverse APICurrently recommended by MicrosoftAs far as I can tell, no REST operations (see below); questionable performance
Dataverse connectorSame as Dataverse APISame as Dataverse API
Export to ADLSAutomatic synchronizationPerpetually in preview, files stored in proprietary CDM folder format
Fabric linkAutomatic synchronization; delta tables; SQL joinsRequires Fabric licensing; up to 60 min data sync delay; TBD

One of my clients, a large insurance company, has invested heavily in Dynamics 365. They have implemented various Power BI reports that would import millions of rows directly (without data staging) from common CRM entities, such as Account, Opportunity, as well as custom entities, using the OData API. The main struggle has always been that dataset refreshes take hours to complete. Because API calls don’t fold in Power Query, they apply REST operations to select a subset of columns, join tables, and filter data. They rely heavily on Power Query to shape the data which has worked reasonably well.

Dataverse Link to Fabric looks promising in addressing some of the above pitfalls. Once you set it up, it extracts Dynamics data as Delta tables in a Fabric lakehouse. From there, you can use the SQL endpoint to join and filter tables. After the initial synchronization, it might take up to 60 minutes to synchronize data changes in Dynamics so not really real-time data access but much better than the current state.

While working through the setup, keep the following in mind:

  1. The Link to Microsoft Fabric menu is found under the Analyze menu in the ribbon and not at the table level as the video shows. That’s because it exports all the tables.

  1. Referring to the comparison table between Fabric Link and Synapse Link, “No copy, no ETL direct integration with Microsoft Fabric” should be read “No ETL” because the data is still copied. Further, “Data stays in Dataverse” should read “Data is automatically copied to Fabric lakehouse”. Dataverse is just a wrapper on top of Azure SQL Database.
  2. As far as prerequisites, the most important one is “the workspace you choose to link with Dataverse must be assigned to a premium capacity in the same region as your Dataverse environment” with the emphasis on the “same region”. Not having them in the same region is a show stopper.

Again more testing is needed, but at this point I see the following advantages of using the Dataverse Link to Fabric:

  1. CRM data is staged in open file format (Delta Parquet tables) and available for any reporting needs without impacting Dynamics.
  2. No need to use APIs. Instead, you can use SQL to join and filter tables, such as to look up columns or filter data, which is much better than using the API syntax. This will also help you centralize column lookups in SQL views instead of doing this over and over for each report.
  3. Potentially much faster Power BI dataset refreshes
  4. No additional cost to Power BI Premium customers because the Fabric features are available in Power BI Premium.
  5. In a long term bypass importing data and providing users with almost real time (CRM data changes might take up to 60 minutes to synchronize) when the Microsoft DirectLake technology matures. I’m personally very interested in this option but I’ll wait until Power BI Desktop supports it.