Power BI – Prologika

Posts

TMDL View in Power BI Desktop

January 22, 2025/3 Comments/in Blog/by Prologika - Teo Lachev

I had the privilege to participate in the early preview program of the new TMDL View in Power BI Desktop which is currently in public preview in the latest January release of Power BI Desktop. Without reiterating what was said in the announcement, I’d like to mention three main benefits of this feature:

Ability to access the entire model metadata – This includes features don’t have User interface in Power BI Desktop. Traditionally, BI developers have been relying on Tabular Editor to do so. Now you have another option although it requires knowing the TMLDL language. Alas, TMLD doesn’t come with user interface although it does support Autocomplete.
Ability to copy specific model features from one Power BI Desktop file to another – For example, in the screenshot below, I have scripted a calculation group. Now, I can open another Power BI Desktop file, copy the script and apply it. Of course, the target model must include the referenced entities, otherwise I’ll get an error.
Automating tasks – Hopefully, in near future support creating add-ins to automate certain aspects like creating macros in Excel by programming the Excel VBA object model. For example, a developer should be to use the Tabular Object Model (TOM) API to create TMDL scripts and apply them to a semantic model.

Prologika Newsletter Winter 2024

December 14, 2024/0 Comments/in Newsletter/by Prologika - Teo Lachev

I conducted recently an assessment for a client facing memory pressure in Power BI Premium. You know these pesky out of memory errors when refreshing a biggish dataset. They started with P1, moved to P2, and now are on P3 but still more memory is needed to satisfy the memory appetite of full refresh. The runtime memory footprint of the problematic semantic model with imported data is 45 GB and they’ve done their best to optimize it. This newsletter outlines a few strategies to tackle excessive memory consumption with large semantic models. Unfortunately, given the current state of Power BI boxed capacities, no option is perfect and at end a compromise will probably be needed somewhere between latency and performance.

Why I don’t like Premium licensing

Since its beginning, Power BI Pro per-user licensing (and later Premium Per User (PPU) licensing) has been very attractive. Many organizations with a limited number of report users flocked to Power BI to save cost. However, organizations with more BI consumers gravitated toward premium licensing where they could have unlimited number of report readers against a fixed monthly fee starting at listed price of $5,000/mo for P1. Sounds like a great deal, right?

I must admit that I detest the premium licensing model because it boxes into certain resource constraints, such as 8 backend cores and 25 GB RAM for P1. There are no custom configurations to let you balance between compute and memory needs. And while there is an auto-scale compute model, it’s very coarse and it applies only to processing cores. The memory constraints are especially problematic given that imported models are memory resident and require more than twice the memory for full refresh. From the outside, these memory constraints seem artificially low to force clients into perpetual upgrades. The new Fabric F capacities that supersede the P plans are even more expensive, justifying the price increase with the added flexibility to pause the capacity which is often impractical.

It looks to me that the premium licensing is pretty good deal for Microsoft. Outgrown 25 GB of RAM in P1? Time to shelve another 5K per month for 25 GB more even if you don’t need more compute power. Meanwhile, the price of 32GB of RAM is less than $100 and falling.

It will be great if at some point Power BI introduces custom capacities. Even better, how about auto-scaling where the capacity resources (both memory and CPU) scale up and down on demand within minutes, such as adding more memory during refresh and reducing the memory when the refresh is over?

Strategies to combat out-of-memory scenarios

So, what should you do if you are strapped for cash? Consider evaluating and adopting one or more of the following memory saving techniques, including:

Switching to PPU licensing with a limited number of report users. PPU is equivalent of P3 and grants 100GB RAM per dataset.
Optimizing aggressively the model storage when possible, such as removing high-cardinality columns
Configuring aggressive incremental refresh policies with polling expressions
Moving large fact tables to a separate semantic model (remember that the memory constraints are per dataset and not across all the datasets in the capacity)
Implementing DirectQuery features, such as composite models and hybrid tables
Switching to a hybrid architecture with on-prem semantic model(s) hosted in SQL Server Analysis Services where you can control the hardware configuration and you’re not charge for more memory.
Lobbying Microsoft for much larger memory limits or to bring your own memory (good luck with that but it might be an option if you work for a large and important company)

Considering Direct Lake storage

If Fabric is in your future, one relatively new option to tackle out-of-memory scenarios that deserves to be evaluated and added to the list is semantic models configured for Direct Lake storage. Direct Lake on-demand loading should utilize memory much more efficiently for interactive operations, such as Power BI report execution. This is a bonus to the fact that data Direct Lake models don’t require refresh. Eliminating refresh could save tremendous amount of memory to start with, even if you apply advanced techniques such as incremental refresh or hybrid tables to models with imported data.

I did limited testing to compare performance of import and Direct Lake and posted detailed results in the “Fabric Direct Lake: Memory Utilization with Interactive Operations” blog.

I concluded that if Direct Lake is an option for you, it should be at the forefront of your efforts to combat out-of-memory errors with large datasets.

On the downside, more than likely you’ll have to implement ETL processes to synchronize your data warehouse to a Fabric lakehouse, unless your data is in Fabric to start with, or you use Fabric database mirroring for the currently supported data sources (Azure SQL DB, Cosmos, and Snowflake). I’m not counting the data synchronization time as a downside.

Teo Lachev
Prologika, LLC | Making Sense of Data

Atlanta Microsoft BI Group Meeting on December 2nd (Semantic Modeling as Code)

November 26, 2024/0 Comments/in Blog, Events/by Prologika - Teo Lachev

Atlanta BI fans, please join us online for our next meeting on Monday, December 2nd at 5PM ET (please note the change to our usual meeting time to accommodate the presenter). Rui Romano (Product Manager at Microsoft) will discuss how the new TMDL language for Power BI models can unlock new scenarios that previously weren’t possible. For more details and sign up, visit our group page.

Presentation: “Semantic Modeling as Code” with TMDL using Power BI Desktop Developer Mode (PBIP) and VS Code
Delivery: Online
Level: Intermediate to Advanced

Overview: The landscape for developing enterprise-scale models has never been more exciting than it is now! Developer mode in Power BI Desktop and the new TMDL language unlock new scenarios that previously weren’t possible, such as great source control and co-development experiences with Git integration. Additionally, the TMDL Visual Studio Code extension offers a new, powerful and efficient, code-first semantic modeling experience. Join us to discover the new and powerful ways you can leverage TMDL to accelerate your model development and get a sneak peek into the TMDL roadmap from the Power BI product team.

Speaker: Rui Romano is an experienced Microsoft Professional with a deep passion for data and analytics. He has spent the last decade helping companies make better data-driven decisions and is known for his innovative and practical solutions to complex problems. Currently works as a Product Manager at Microsoft on the Power BI product team, focusing on Pro-BI experiences.

Atlanta Microsoft BI Group Meeting on November 4th (Accelerating your Fabric Data Estate with AI & Copilot)

October 29, 2024/0 Comments/in Blog, Events/by Prologika - Teo Lachev

Atlanta BI fans, please join us in person for our next meeting on Monday, November 4th at 6:30 PM ET. Stacey Jones (Principal Data & AI Cross-Solution Architect at Microsoft) and Elayne Jones (Solutions Architect at Coca-Cola Bottlers Sales and Services) will explore the AI and Copilot capabilities within Microsoft Fabrics. And I’ll help you catch up on Microsoft BI latest. I will sponsor the event which marks the 14th anniversary of the Atlanta Microsoft BI Group! For more details and sign up, visit our group page.

Details

Presentation: Accelerating your Fabric Data Estate with AI & Copilot
Delivery: In-person
Date: November 4th, 2024
Time: 18:30 – 20:30 ET
Level: Beginner to Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Venue
Improving Office
11675 Rainwater Dr
Suite #100
Alpharetta, GA 30009

Overview: In this presentation, we will explore the groundbreaking AI and Copilot capabilities within Microsoft Fabric, a comprehensive platform designed to enhance productivity and collaboration. By leveraging advanced machine learning algorithms and natural language processing, Microsoft Fabric’s AI/Copilot not only streamlines workflows but also provides intelligent insights and automation, empowering users to achieve more with less effort. Join us as we delve into the features and functionalities that make Microsoft Fabric an indispensable tool for modern enterprises.

Sponsor: CloudStaff.ai

Implementing Role-playing Dimensions in Power BI

October 11, 2024/0 Comments/in Blog/by Prologika - Teo Lachev

Role-playing dimensions are a popular business requirement but yet challenging to implement in Power BI (and Tabular) due to a long-standing limitation that two tables can’t be joined multiple times with active relationships. Declarative relationships are both a blessing and a curse and, in this case, we are confronted with their limitations. Had Power BI allowed multiple relationships, the user must be prompted which path to take. Interestingly, a long time ago Microsoft considered a user interface for the prompting but dropped the idea for unknown reasons.

Given the existing technology limitations, you have two implementation choices for implementing subsequent role-playing dimensions: duplicating the dimension table (either in DW or semantic model) or denormalizing the dimension fields into the fact table. The following table presents pros and cons of each option:

Option

Pros

Cons

Duplicate dimension table in semantic model or DW

No or minimum impact on ETL

Minimum maintenance in semantic model

All dimension attributes are available

Metadata complexity and confusion

(potentially mitigated with perspectives that will filter metadata for specific subject area)

Denormalizing fields from into fact table

Avoid role-playing dimension instances

More intuitive model to business users

Increased fact table size and memory footprint

Impact on ETL

Limited number of dimension attributes

Track visited dimension changes as Type 2 with incremental extraction (while it could be Type 1)

If applicable, inability to reuse the role-playing dimension for another fact table and do cross-fact table analysis

So, which approach should you take? The middle path might make sense. If you need only a limited number of fields for the second role-playing dimension, you could add them to the fact table to avoid another dimension and confusion. For example, if you have a DimEmployee dimension and you need a second instance for the person making the changes to the fact table, you can add the administrator’s full name to the fact table assuming you need only this field from DimEmployee.
By contrast, if you need most of the fields in the role-playing instances, then cloning might make more sense. For example, analyzing fact data by shipped date or due date that requires the established hierarchies in DimDate, could be addressed by cloning DimDate. Then to avoid confusion, consider using Tabular Editor to create perspectives for each subject area where each perspective includes only the role-playing dimensions applicable to that subject area.

Yet, a narrow-case third option exists when you only need role-playing measures, such as SalesAmountByShipDate and SalesAmountByDueDate. This scenario can be addressed by forcing DAX measures to “travel” the inactive relationship by using the USERELATIONSHIP function.

Fabric Direct Lake: Memory Utilization with Interactive Operations

August 15, 2024/0 Comments/in Blog/by Prologika - Teo Lachev

As I mentioned in my Power BI and Fabric Capacities: Thinking Outside the Box, memory limits of Fabric capacities could be rather restrictive for large semantic models with imported data. One relatively new option to combat out-of-memory scenarios that deserves to be evaluated and added to the list if Fabric is in your future is semantic models configured for Direct Lake storage. The blog covers results of limited testing that I did comparing side by side the memory utilization of two identical semantic models with the first one configured to import data and the second to use Direct Lake storage. If you need a Direct Lake primer, Chris Webb has done a great job covering its essentials here and here. As a disclaimer, the emphasis is on limited as these results reflect my personal observations based on some isolated tests I’ve done lately. Your results may and probably will vary considerably.

Understanding the Tests

My starting hypothesis was that Direct Lake on-demand loading will utilize memory much more efficiently for interactive operations, such as Power BI report execution. This is a bonus to the fact that data Direct Lake models don’t require refresh. Eliminating refresh could save tremendous amount of memory to start with, even if you apply advanced techniques such as incremental refresh or hybrid tables to models with imported data. Therefore, the tests that follow focus on memory utilization with interactive operations.

To test my hypothesis, I imported the first three months for year 2016 of the NY yellow taxi Azure open dataset to a lakehouse backed up by a Fabric F2 capacity. The resulted in 34.5 million rows distributed across several Delta Parquet files. I limited the data to three months because F2 ran out of memory around the 50 million rows mark with the error “This operation was canceled because there wasn’t enough memory to finish running it. Either reduce the memory footprint of your dataset by doing things such as limiting the amount of imported data, or if using Power BI Premium, increase the memory of the Premium capacity where this dataset is hosted. More details: consumed memory 2851 MB, memory limit 2851 MB, database size before command execution 220 MB”

Descriptive enough and in line with the F2 memory limit of maximum 3 GB per semantic model. I used Power BI desktop to import all that data into a YellowTaxiImported semantic model, which I published to Power BI Service and configured for large storage format. Then, I created online a second YellowTaxiDirectLake semantic model configured for Direct Lake storage mapped directly to the data in the lakehouse. I went back to Power BI desktop to whip up a few analytical (aggregate) queries and a few detail-level queries. Finally, I ran a few tests using DAX Studio.

Analyzing Import Mode

Even after a capacity restart, the YellowTaxiImported model immediately reported 1.4 GB of memory. My conclusion was that that the primary focus of Power BI Premium on-demand loading that was introduced a while back was to speed the first query after the model was evicted from memory. Indeed, I saw that many segments were memory resident and many weren’t, but using queries to touch the non-resident column didn’t increase the memory footprint. The following table lists the query execution times with “Clear On Run” enabled in DAX Studio (to avoid skewing due to cached query data).

Naturally, as the queriers get more detailed, the slower they get because VertiPaq is a columnar database. However, the important observation is that the memory footprint remains constant. Please note that Fabric allocates additional memory to execute the queries, so the memory footprint should grow up as the report load increases.

Query	Duration (ms)
//Analytical query 1 EVALUATE ROW( “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])) )	124
//Analytical query 2 DEFINE VAR __DS0Core = SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[puMonth], “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])) ) VAR __DS0BodyLimited = SAMPLE(3502, __DS0Core, ‘nyc_yellowtaxi'[puMonth], 1) EVALUATE __DS0BodyLimited ORDER BY ‘nyc_yellowtaxi'[puMonth]	148
//Analytical query 3 DEFINE VAR __SQDS0Core = SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[doLocationId], “AveragetripDistance”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tripDistance])) ) VAR __SQDS0BodyLimited = TOPN(50, __SQDS0Core, [AveragetripDistance], 0) VAR __DS0Core = SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[doLocationId], __SQDS0BodyLimited, “AveragetripDistance”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tripDistance])) ) VAR __DS0PrimaryWindowed = TOPN(1001, __DS0Core, [AveragetripDistance], 0, ‘nyc_yellowtaxi'[doLocationId], 1) EVALUATE __DS0PrimaryWindowed ORDER BY [AveragetripDistance] DESC, ‘nyc_yellowtaxi'[doLocationId]	114
//Analytical query 4 DEFINE VAR __SQDS0Core = SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[doLocationId], “AveragetripDistance”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tripDistance])) ) VAR __SQDS0BodyLimited = TOPN(50, __SQDS0Core, [AveragetripDistance], 0) VAR __DS0Core = SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[rateCodeId], __SQDS0BodyLimited, “AveragetipAmount”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tipAmount])) ) VAR __DS0BodyLimited = SAMPLE(3502, __DS0Core, ‘nyc_yellowtaxi'[rateCodeId], 1) EVALUATE __DS0BodyLimited ORDER BY ‘nyc_yellowtaxi'[rateCodeId]	80
//Detail query 1 DEFINE VAR __DS0FilterTable = FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0) VAR __DS0FilterTable2 = FILTER( KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])), ‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2) ) VAR __DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL( ROLLUPGROUP(‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]), “IsGrandTotalRowTotal” ), __DS0FilterTable, __DS0FilterTable2, “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])) ) VAR __DS0PrimaryWindowed = TOPN( 502, __DS0Core, [IsGrandTotalRowTotal], 0, ‘nyc_yellowtaxi'[startLat], 1, ‘nyc_yellowtaxi'[startLon], 1 ) EVALUATE __DS0PrimaryWindowed ORDER BY [IsGrandTotalRowTotal] DESC, ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]	844
//Detail query 2 DEFINE VAR __DS0FilterTable = FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0) VAR __DS0FilterTable2 = FILTER( KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])), ‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2) ) VAR __DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL( ROLLUPGROUP(‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]), “IsGrandTotalRowTotal” ), __DS0FilterTable, __DS0FilterTable2, “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])) ) VAR __DS0PrimaryWindowed = TOPN( 502, __DS0Core, [IsGrandTotalRowTotal], 0, ‘nyc_yellowtaxi'[startLat], 1, ‘nyc_yellowtaxi'[startLon], 1 ) EVALUATE __DS0PrimaryWindowed ORDER BY [IsGrandTotalRowTotal] DESC, ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]	860
//Detail query 3 DEFINE VAR __DS0FilterTable = FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0) VAR __DS0FilterTable2 = FILTER( KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])), ‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2) ) VAR __DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL( ROLLUPGROUP(‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]), “IsGrandTotalRowTotal” ), __DS0FilterTable, __DS0FilterTable2, “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])) ) VAR __DS0PrimaryWindowed = TOPN( 502, __DS0Core, [IsGrandTotalRowTotal], 0, ‘nyc_yellowtaxi'[startLat], 1, ‘nyc_yellowtaxi'[startLon], 1 ) EVALUATE __DS0PrimaryWindowed ORDER BY [IsGrandTotalRowTotal] DESC, ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]	1,213
//Detail query 4 (All Columns) DEFINE VAR __DS0FilterTable = FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0) VAR __DS0FilterTable2 = FILTER( KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])), ‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2) ) VAR __ValueFilterDM1 = FILTER( KEEPFILTERS( SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon], ‘nyc_yellowtaxi'[paymentType], ‘nyc_yellowtaxi'[vendorID], ‘nyc_yellowtaxi'[improvementSurcharge], ‘nyc_yellowtaxi'[doLocationId], ‘nyc_yellowtaxi'[puLocationId], ‘nyc_yellowtaxi'[storeAndFwdFlag], ‘nyc_yellowtaxi'[tpepDropoffDateTime], ‘nyc_yellowtaxi'[tpepPickupDateTime], __DS0FilterTable, __DS0FilterTable2, “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])), “SumpassengerCount”, CALCULATE(SUM(‘nyc_yellowtaxi'[passengerCount])), “SumtripDistance”, CALCULATE(SUM(‘nyc_yellowtaxi'[tripDistance])), “SumendLat”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLat])), “SumendLon”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLon])), “Sumextra”, CALCULATE(SUM(‘nyc_yellowtaxi'[extra])), “SumfareAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[fareAmount])), “SummtaTax”, CALCULATE(SUM(‘nyc_yellowtaxi'[mtaTax])), “SumpuMonth”, CALCULATE(SUM(‘nyc_yellowtaxi'[puMonth])), “SumpuYear”, CALCULATE(SUM(‘nyc_yellowtaxi'[puYear])), “CountrateCodeId”, CALCULATE(COUNTA(‘nyc_yellowtaxi'[rateCodeId])) ) ), [SumtollsAmount] > 0 ) VAR __DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL( ROLLUPGROUP( ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon], ‘nyc_yellowtaxi'[paymentType], ‘nyc_yellowtaxi'[vendorID], ‘nyc_yellowtaxi'[improvementSurcharge], ‘nyc_yellowtaxi'[doLocationId], ‘nyc_yellowtaxi'[puLocationId], ‘nyc_yellowtaxi'[storeAndFwdFlag], ‘nyc_yellowtaxi'[tpepDropoffDateTime], ‘nyc_yellowtaxi'[tpepPickupDateTime] ), “IsGrandTotalRowTotal” ), __DS0FilterTable, __DS0FilterTable2, __ValueFilterDM1, “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])), “SumpassengerCount”, CALCULATE(SUM(‘nyc_yellowtaxi'[passengerCount])), “SumtripDistance”, CALCULATE(SUM(‘nyc_yellowtaxi'[tripDistance])), “SumendLat”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLat])), “SumendLon”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLon])), “Sumextra”, CALCULATE(SUM(‘nyc_yellowtaxi'[extra])), “SumfareAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[fareAmount])), “SummtaTax”, CALCULATE(SUM(‘nyc_yellowtaxi'[mtaTax])), “SumpuMonth”, CALCULATE(SUM(‘nyc_yellowtaxi'[puMonth])), “SumpuYear”, CALCULATE(SUM(‘nyc_yellowtaxi'[puYear])), “CountrateCodeId”, CALCULATE(COUNTA(‘nyc_yellowtaxi'[rateCodeId])) ) VAR __DS0PrimaryWindowed = TOPN( 502, __DS0Core, [IsGrandTotalRowTotal], 0, ‘nyc_yellowtaxi'[startLat], 1, ‘nyc_yellowtaxi'[startLon], 1, ‘nyc_yellowtaxi'[paymentType], 1, ‘nyc_yellowtaxi'[vendorID], 1, ‘nyc_yellowtaxi'[improvementSurcharge], 1, ‘nyc_yellowtaxi'[doLocationId], 1, ‘nyc_yellowtaxi'[puLocationId], 1, ‘nyc_yellowtaxi'[storeAndFwdFlag], 1, ‘nyc_yellowtaxi'[tpepDropoffDateTime], 1, ‘nyc_yellowtaxi'[tpepPickupDateTime], 1 ) EVALUATE __DS0PrimaryWindowed ORDER BY [IsGrandTotalRowTotal] DESC, ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon], ‘nyc_yellowtaxi'[paymentType], ‘nyc_yellowtaxi'[vendorID], ‘nyc_yellowtaxi'[improvementSurcharge], ‘nyc_yellowtaxi'[doLocationId], ‘nyc_yellowtaxi'[puLocationId], ‘nyc_yellowtaxi'[storeAndFwdFlag], ‘nyc_yellowtaxi'[tpepDropoffDateTime], ‘nyc_yellowtaxi'[tpepPickupDateTime]	4,240

Analyzing Direct Lake

After another restart, the resident memory footprint of the YellowTaxiDirectLake model was only 22.4 KB! Indeed, the $System.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS DMV showed that only system-generated RowNumber columns were memory resident.

For each query, I recorded two runs to understand how much time is spent in on-demand loading of columns into memory. The Import Mode column was added for convenience to compare the second run duration with the corresponding query duration from the Import Mode tests. Finally, the Model Resident Memory column records the memory footprint of the Direct Lake model.

Query	First Run (ms)	Second Run (ms)	Import Mode (ms)	Model Resident Memory (MB)
//Analytical query 1	79	75	124	14
//Analytical query 2	79	76	148	14.3
//Analytical query 3	382	133	114	68.1
//Analytical query 4	209	130	80	68.13
//Detail query 1	7,763	1,023	844	669.13
//Detail query 2	1,484	1,453	860	670.53
//Detail query 3	1,881	1,463	1,213	670.6
//Detail query 4	9,663	3,668	4,240	1,270

Conclusion

To sum up this long post, the following observations can be made:

As expected, the more columns the query touch, the higher the memory footprint of Direct Lake. For example, the last query requested all the columns, and the resulting memory footprint was at a par with imported mode.
It’s important to note that when Fabric is under memory pressure, such as when the report load increases, Direct Lake will start paging out columns with low temperature. The exact thresholds and rules are not documented but I’d expect the eviction mechanism to be much more granular and intelligent than evicting entire datasets with imported mode.
The reason that I didn’t see Direct Lake paging out memory is because I was still left with plenty (1.27 GB consumed out of 3 GB). It doesn’t make sense evicting data if there is no memory pressure since memory is the fasted storage.
You’ll pay a certain price the first time a column is loaded on demand with Direct Lake. The more columns, the longer the wait. Subsequent runs, however, will be much faster if the column is still mapped in memory.
Some queries will execute faster in import mode and some will execute slower. Overall, queries touching memory-resident columns should be comparable.

Therefore, if Direct Lake is an option for you, it should be at the forefront of your efforts to combat out-of-memory errors with large datasets. On the downside, more than likely you’ll have to implement ETL processes to synchronize your data warehouse to a Fabric lakehouse, unless your data is in Fabric to start with, or you use Fabric database mirroring for the currently supported data sources (Azure SQL DB, Cosmos, and Snowflake). I’m not counting the data synchronization time as a downside because it could supersede the time you currently spend in model refresh.

Fabric Capacity Limits

August 14, 2024/0 Comments/in Blog/by Prologika - Teo Lachev

Here is table that is getting more and more difficult to find as searching for Fabric capacity limits returns results about CU compute units (for the most part meaningless in my opinion). I embed in a searchable format below before it vanishes on Internet. The most important column for semantic modeling is the max memory which denotes the upper limit of memory Fabric will grant a semantic model.

SKU	Max memory (GB)1, 2	Max concurrent DirectQuery connections (per semantic model)1	Max DirectQuery parallelism3	Live connection (per second)1	Max memory per query (GB)1	Model refresh parallelism	Direct Lake rows per table (in millions)1, 4	Max Direct Lake model size on OneLake (GB)1, 4
F2	3	5	1	2	1	1	300	10
F4	3	5	1	2	1	2	300	10
F8	3	10	1	3.75	1	5	300	10
F16	5	10	1	7.5	2	10	300	20
F32	10	10	1	15	5	20	300	40
F64	25	50	4-8	30	10	40	1,500	Unlimited
F128	50	75	6-12	60	10	80	3,000	Unlimited
F256	100	100	8-16	120	10	160	6,000	Unlimited
F512	200	200	10-20	240	20	320	12,000	Unlimited
F1024	400	200	12-24	480	40	640	24,000	Unlimited
F2048	400	200		960	40	1,280	24,000	Unlimited

The same page lists another important table that shows the background CPU cores assigned to a capacity. Although bursting, overages, smoothing, and throttling make Fabric capacity compute resources a whole lot more difficult to figure out, think of your capacity as a VM that has that many cores for backend loads, including loads from interactive operations, such as report queries, and loads from background operations, such as dataset refreshes. Not sure what it shows N/A for F2 and F4. If memory serves me right, I’ve previously seen 0.25 cores for F2 and 0.5 cores for F4.

SKU	Capacity Units (CU)	Power BI SKU	Power BI v-cores
F2	2	N/A	N/A
F4	4	N/A	N/A
F8	8	EM1/A1	1
F16	16	EM2/A2	2
F32	32	EM3/A3	4
F64	64	P1/A4	8
F128	128	P2/A5	16
F256	256	P3/A6	32
F5121	512	P4/A7	64
F10241	1,024	P5/A8	128
F20481	2,048	N/A	N/A

Atlanta Microsoft BI Group Meeting on September 3rd (Create Code Copilots with Large Language Models)

August 12, 2024/0 Comments/in Blog, Events/by Prologika - Teo Lachev

Atlanta BI fans, please join us in person for the next meeting on Monday, September 3th at 6:30 PM ET. Your humble correspondent will show you how to use Large Language Models, such as ChatGPT, to create your own copilots for Text2SQL and Text2DAX. I’ll also help you catch up on Microsoft BI latest. I will sponsor the event which marks the 14th anniversary of the Atlanta Microsoft BI Group! For more details and sign up, visit our group page.

Details

Presentation: Create Code Copilots with Large Language Models
Delivery: In-person
Date: September 3rd, 2024
Time: 18:30 – 20:30 ET
Level: Beginner to Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Venue
Improving Office
11675 Rainwater Dr
Suite #100
Alpharetta, GA 30009

Overview: Resistance is futile! Instead of fearing that AI will take over our jobs, embrace it and apply it to outsource mundane work and create a new class of applications that were not possible before. In this session, I’ll introduce you through the fascinating world of Large Language Models (LLMs) and one of their practical applications in creating Text2SQL and Text2DAX copilots. I’ll demonstrate how LLMs open new opportunities for intelligent exploration. As an optional challenge, bring your laptop, download the code from my website (use the download link below), and follow along using your favorite AI chat, such as ChatGPT, which is what I’ll use for the demos, Microsoft Copilot, Meta AI, Google Gemini, or Perplexity.ai. You’ll also discover how you can automate LLM-powered copilots using Python and Azure OpenAI.

Code download link: Create Code Copilots with Large Language Models

Speaker: Teo Lachev is a BI consultant, author, and mentor. Through his Atlanta-based company Prologika (https://prologika.com) he designs and implements innovative solutions that bring tremendous value to his clients and help them make sense of data. Teo has authored and co-authored many books, and he has been leading the Atlanta Microsoft Business Intelligence group since he founded it in 2010. Microsoft has recognized Teo’s contributions to the community by awarding him the prestigious Microsoft Most Valuable Professional (MVP) Data Platform status for 15 years. Microsoft selected Teo as one of only 30 FastTrack Solution Architects for Power Platform worldwide.

Sponsor: Prologika (https://prologika.com)

Power BI and Fabric Capacities: Thinking Outside the Box

August 3, 2024/0 Comments/in Blog/by Prologika - Teo Lachev

I’m conducting an assessment for a client facing memory pressure in Power BI Premium. You know these pesky out of memory issues when refreshing a biggish dataset. They started with P1, moved to P2, and now are on P3 but still more memory is needed. The runtime memory footprint of the problematic semantic model with imported data is 45 GB and they’ve done their best to optimize it.

I must admit that I detest the premium licensing model because it boxes into certain resource constraints, such as 8 backend cores and 25 GB RAM for P1. There are no custom configurations to let you balance between compute and memory needs. And while there is an auto-scale compute model, it’s very coarse. The memory constraints are especially problematic given that that imported models are memory resident and require more than twice the memory for full refresh. From the outside, these memory constraints seem artificially low to force clients into perpetual upgrades. The new Fabric F capacities that supersede the P plans are even more expensive, justifying the price increase with the added flexibility to pause the capacity which is often impractical.

So, what should you do if you are strapped for cash? Consider evaluating and adopting one or more of the following techniques, including:

Switching to PPU licensing with a limited number of report users. PPU is equivalent of P3 and grants 100GB RAM per dataset.
Optimizing aggressively the model storage when possible, such as removing high-cardinality columns
Configuring aggressive incremental refresh policies with polling expressions
Moving large fact tables to a separate semantic model (remember that the memory constraints are per dataset and not across all the datasets in the capacity)
Implementing DirectQuery features, such as composite models and hybrid tables
Switching to a hybrid architecture with on-prem semantic model(s) hosted in SQL Server Analysis Services where you can control the hardware configuration and you’re not charge for more memory.
Lobbying Microsoft for much larger memory limits or to bring your own memory 🙂

It will be great if at some point Power BI introduces customized capacities. Even better, how about auto-scaling where the capacity resources scale up and down on demand within minutes, such as adding more memory during refresh and reducing the memory when the refresh is over?

Atlanta Microsoft BI Group Meeting on August 5th (Elevate Program Management with Power BI & DevOps)

July 30, 2024/0 Comments/in Blog, Events/by Prologika - Teo Lachev

Atlanta BI fans, please join us in person for the next meeting on Monday, August 5th at 6:30 PM ET. Elayne Jones and Matt Kim (Solutions Architects at Coca-Cola) will show us how to bring Azure DevOps data to life by creating data models and interactive reports in Power BI. Your humble correspondent will help you catch up on Microsoft BI latest. CloudStaff.ai will sponsor the event. For more details and sign up, visit our group page.

Details

Presentation: Elevate Program Management with Power BI & DevOps
Delivery: In-person
Date: August 5, 2024
Time: 18:30 – 20:30 ET
Level: Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Venue
Improving Office
11675 Rainwater Dr
Suite #100
Alpharetta, GA 30009

Overview: Have you ever opened Azure DevOps and felt overwhelmed by the vast sea of program management options? In large organizations, tracking progress across disparate projects and work items can be challenging. In this session, find out how to bring Azure DevOps data to life by creating data models and interactive reports in Power BI. Sleek Power BI visuals make even the most technical DevOps content both accessible to executives and actionable for project managers.

Speaker: Elayne Jones and Matt Kim are both Solutions Architects at Coca-Cola Bottlers Sales and Services. Elayne and Matt specialize in developing solutions that drive efficiency within organizations by utilizing the full set of Power Platform technologies. Elayne and Matt work together on a team focusing on designing and implementing automated solutions to enhance both internal and external stakeholders’ user experiences and to enforce consistency in reporting data.

Sponsor: Cloudstaff.ai

Posts

TMDL View in Power BI Desktop

Atlanta Microsoft BI Group Meeting on December 2nd (Semantic Modeling as Code)

Atlanta Microsoft BI Group Meeting on November 4th (Accelerating your Fabric Data Estate with AI & Copilot)

Details

Implementing Role-playing Dimensions in Power BI

Fabric Direct Lake: Memory Utilization with Interactive Operations

Fabric Capacity Limits

Atlanta Microsoft BI Group Meeting on September 3rd (Create Code Copilots with Large Language Models)

Details

Power BI and Fabric Capacities: Thinking Outside the Box

Atlanta Microsoft BI Group Meeting on August 5th (Elevate Program Management with Power BI & DevOps)

Details

Follow Us

Subscribe to our quarterly newsletter

Categories