Posts

Atlanta Microsoft BI Group Meeting on June 2nd (Power BI monitoring with Log Analytics Workspace)

Atlanta BI fans, please join us in person for our next meeting on Monday, June 2nd at 18:30 ET. Akshata Revankar (Senior Data & Applied Scientist with Microsoft) will show us how to streamline and optimize your Power BI environment using Azure Data Explorer (ADX) and Kusto Query Language (KQL). And your humble correspondent will walk you through some of the latest Power BI and Fabric enhancements. For more details and sign up, visit our group page.

Delivery: In-person

Level: Intermediate

Food: Pizza and drinks will be provided

 

Agenda:

18:15-18:30 Registration and networking

18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)

19:00-20:15 Main presentation

20:15-20:30 Q&A

Overview: Are you an admin overseeing multiple Power BI workspaces and dealing with the challenge of monitoring activities and performance across all of them in a unified view? This session is designed precisely for you. Join us as we delve into a comprehensive guide on how to streamline and optimize your Power BI environment using Azure Data Explorer (ADX) and Kusto Query Language (KQL).

Speaker: Akshata Revankar (Senior Data & Applied Scientist with Microsoft) has 20+ Years of experience in the data engineering and data reporting space. She has worked with Oracle database, SQL Server, SSIS, Informatica Power Center, Hadoop systems, Qlik and Power BI. Akshata enjoys being in the data space and learning new things.

Sponsor: At CloudStaff, we believe technology should make work easier, not more complicated. We help big companies use AI and automation to solve real problems and make employees’ lives better (CloudStaff.ai).

PowerBILogo

First Look at Fabric Translytical Task Flows

The May release of Power BI Desktop includes a new feature called Translytical Task Flows which aims to augment Power BI reports with rudimentary writeback capabilities, such as to make corrections to data behind a report. Previously, one way to accomplish this was to integrate the report with Power Apps as I demonstrated a while back here. My claim to fame was that Microsoft liked this demo so much that it was running for years on big monitors in the local Microsoft office!

Are translytical flows a better way to implement report writeback? I followed the steps to test this feature and here are my thoughts.

The Good

I like that translytical flows don’t require external integration and additional licensing. By contrast, the Power Apps integration required implementing an app and incurring additional licensing cost.

I like that Microsoft is getting serious about report writeback and has extended Power BI Desktop with specific features to support it, such as action buttons, new slicers, and button-triggered report refresh.

I like that you can configure the action button to refresh the report after writeback so you can see immediately the changes (assuming DirectQuery or DirectLake semantic models). I tested the feature with a report connected to a published dataset and it works. Of course, if the model imports data, refreshing the report won’t show the latest.

The Bad

Currently, you must use the new button, text, or list slicers, which can only provide a single value to the writeback function. No validation, except if you use a list slicer. From end user experience, every modifiable field would have a separate textbox. Ideally, I’d like to see Microsoft extending the Table visual to allow editing in place.

Translytical flows require a Python function to propagate the changes. Although this opens new possibilities (you can do more things with custom code), what happened to low-code, no-code mantra?

The Ugly

Currently, the data can be written to only four destinations:

  • Fabric SQL DB (Azure SQL DB provisioned in Fabric)
  • Fabric Lakehouse
  • Fabric Warehouse
  • Fabric Mirrored DB

Notice the “Fabric” prefix in all four options? Customers will probably interpret this as another attempt to force them into Fabric. Not to mention that this imitation excludes 99% of real-life scenarios where the data is in other data sources. I surely hope Microsoft will open this feature to external data sources in future.

So, is the Power Apps integration for writeback obsolete? Not really because it is more flexible and provides better user experience at the expense of additional licensing cost.

In summary, as they stand today, transalytical task flows attempt to address basic writeback needs within Fabric, such as changing a limited number of report fields or performing massive updates on a single field. They are heavily dependent on Fabric and support writeback to only Fabric data sources. Keep an eye on this feature with the hope that it will evolve over time to something more useful.

Thumbnail Image A digital illustration of a computer screen displaying a table report, similar to a spreadsheet or database table. One of the cells in the table is highlighted with a blinking text cursor inside it, indicating that the data is being edited. The table contains rows and columns with sample data like names, dates, and numbers. In the background or to the side, include a visual representation of a relational database—such as a 3D cylinder labeled 'Database' connected with lines to the table, symbolizing data flow. The overall style should be clean and modern, suitable for a tech or data visualization context.

 

Atlanta Microsoft BI Group Meeting on May 5th (Optimize Power BI Reports and Semantic Models Using External Tools)

Atlanta BI fans, please join us in person for our next meeting on Monday, May 5th at 18:30 ET. Lakshmana R Boya will show us how to optimize Power BI reports by identifying the performance and size issues using “Measure Killer” community app. Your humble correspondent will walk you through some of the Power BI and Fabric enhancements of late. For more details and sign up, visit our group page.

Delivery: In-person

Level: Intermediate

Food: Pizza and drinks will be provided

 

Agenda:

18:15-18:30 Registration and networking

18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)

19:00-20:15 Main presentation

20:15-20:30 Q&A

Overview: We will deep dive into how we can optimize Power BI reports and semantic models by identifying the performance and size issues using the “Measure Killer” app.

Sponsor: Prologika (https://prologika.com) helps organizations of all sizes to make sense of data by delivering tailored BI solutions that drive actionable insights and maximize ROI. Your BI project will be your best investment, we guarantee it!

PowerBILogo

Two Takeaways from FabCon 2025

The 2025 Fabric Community conference has just concluded. There were many announcements but I found two of them intriguing to make a case for Fabric more compelling should a suitable project comes around:

  • OneSecurity – Analysis Services Tabular and Power BI semantic models have long supported a comprehensive and flexible row-level security (RLS), such as to support scenarios requiring dynamically denying rows. Unfortunately, column-level security (aka object-level security) has been less flexible as I explain in more detail here. I hope OneSecurity will prove more flexible over time. Currently, it doesn’t support dynamic security, but I hope it eventually will, such as by allowing a SELECT statement to evaluate runtime conditions. So, keep an eye out for OneSecurity as it evolves over time if you face more complex column-level security requirements.
  • Copilots for all F SKUs – Previously, I complained that Microsoft has established that AI must require premium pricing starting at $5,000/mo (F64 and above). But now they’ve promised that by the end of April copilots would go mainstream and available in all F SKUs should you need AI assistance.

Alas, no major announcements about Fabric Warehouse. As with Synapse, Microsoft went back to the drawing board with Fabric Warehouse. Consequently, the its T-SQL surface area doesn’t support various features I use. For example, IDENTITY and MERGE were promised long ago but unfortunately didn’t materialize. I personally don’t care for the most part about relational-like lakehouses or feature bundles. I don’t dare even dream about “advanced” features that SQL Server had for a long time, such as multiple databases, temporal tables, etc. Without robust warehousing, I still find it difficult to substantiate a good case for Fabric for most of my projects.

A group of people in a conference room AI-generated content may be incorrect.

Atlanta Microsoft BI Group Meeting on April 7th (Source Control for Power BI)

Atlanta BI fans, please join us in person for our next meeting on Monday, April 7th at 18:30 ET. Neel Pathak will explain how to do source control for Power BI with Git integration. Your humble correspondent will walk you through some of the Power BI and Fabric enhancements of late. For more details and sign up, visit our group page.

Presentation: Source Control for Power BI
Delivery: In-person
Level: Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Overview: The days of sharing pbix files and copying in changes are gone. Git integration now enables development teams to collaborate and provides version control/rollbacks.

Speaker: Neel Pathak (Senior Consultant with 3Cloud) a BI and analytics professional with an expertise in Power BI and guiding clients through their analytics transformation journeys.

Sponsor: 3Cloud is the #1 Microsoft Azure partner. No other partner combines 3Cloud’s expertise in Azure with a deep understanding of infrastructure, data and analytics and app innovation. We have proven our impact and are trusted with client’s most complex business problems. We look for people who share our passion for technology and want to unleash their potential.

PowerBILogo

Atlanta Microsoft BI Group Meeting on February 3rd (Fabric Warehouse vs. Fabric Lakehouse)

Atlanta BI fans, please join us in person for our next meeting on Monday, February 3rd at 18:30 ET. Jeff Levy (Data Architect @ Protiviti) will present the core concepts, architectures, and use cases of the Fabric Warehouse and Fabric Lakehouse. Your humble correspondent will walk you through the most significant Power BI and Fabric enhancements of late. For more details and sign up, visit our group page.

Presentation: Fabric Warehouse vs. Fabric Lakehouse: Choosing the Right Architecture for Your Environment
Delivery: In-person
Level: Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Overview: The modern data landscape demands scalable, flexible, and efficient architectures to support diverse business needs. With Microsoft Fabric, two leading paradigms have emerged to address these challenges: the Fabric Warehouse and the Fabric Lakehouse. While both tools aim to provide robust solutions for data storage, processing, and analytics, their approaches, strengths, and trade-offs differ significantly.

This presentation explores the core concepts, architectures, and use cases of the Fabric Warehouse and Fabric Lakehouse. I will compare their performance in areas such as data integration, scalability, and cost-efficiency. Attendees will gain insights into how these approaches align with specific business objectives and workloads, enabling informed decisions about which model best suits their organization’s data strategy.

Sponsor: Protiviti (www.protiviti.com) is a global consulting firm that delivers deep expertise, objective insights, a tailored approach and unparalleled collaboration to help leaders confidently face the future. Protiviti and its independent and locally owned member firms provide clients with consulting and managed solutions in finance, technology, operations, data, digital, legal, HR, risk and internal audit through a network of more than 90 offices in over 25 countries.

PowerBILogo

Prologika Newsletter Winter 2024

computer memory with queries executing and Microsoft Fabric logo. Image 4 of 4I conducted recently an assessment for a client facing memory pressure in Power BI Premium. You know these pesky out of memory errors when refreshing a biggish dataset. They started with P1, moved to P2, and now are on P3 but still more memory is needed to satisfy the memory appetite of full refresh. The runtime memory footprint of the problematic semantic model with imported data is 45 GB and they’ve done their best to optimize it. This newsletter outlines a few strategies to tackle excessive memory consumption with large semantic models. Unfortunately, given the current state of Power BI boxed capacities, no option is perfect and at end a compromise will probably be needed somewhere between latency and performance.

Why I don’t like Premium licensing

Since its beginning, Power BI Pro per-user licensing (and later Premium Per User (PPU) licensing) has been very attractive. Many organizations with a limited number of report users flocked to Power BI to save cost. However, organizations with more BI consumers gravitated toward premium licensing where they could have unlimited number of report readers against a fixed monthly fee starting at listed price of $5,000/mo for P1. Sounds like a great deal, right?

I must admit that I detest the premium licensing model because it boxes into certain resource constraints, such as 8 backend cores and 25 GB RAM for P1. There are no custom configurations to let you balance between compute and memory needs. And while there is an auto-scale compute model, it’s very coarse and it applies only to processing cores. The memory constraints are especially problematic given that imported models are memory resident and require more than twice the memory for full refresh. From the outside, these memory constraints seem artificially low to force clients into perpetual upgrades. The new Fabric F capacities that supersede the P plans are even more expensive, justifying the price increase with the added flexibility to pause the capacity which is often impractical.

It looks to me that the premium licensing is pretty good deal for Microsoft. Outgrown 25 GB of RAM in P1? Time to shelve another 5K per month for 25 GB more even if you don’t need more compute power. Meanwhile, the price of 32GB of RAM is less than $100 and falling.

It will be great if at some point Power BI introduces custom capacities. Even better, how about auto-scaling where the capacity resources (both memory and CPU) scale up and down on demand within minutes, such as adding more memory during refresh and reducing the memory when the refresh is over?

Strategies to combat out-of-memory scenarios

So, what should you do if you are strapped for cash? Consider evaluating and adopting one or more of the following memory saving techniques, including:

  • Switching to PPU licensing with a limited number of report users. PPU is equivalent of P3 and grants 100GB RAM per dataset.
  • Optimizing aggressively the model storage when possible, such as removing high-cardinality columns
  • Configuring aggressive incremental refresh policies with polling expressions
  • Moving large fact tables to a separate semantic model (remember that the memory constraints are per dataset and not across all the datasets in the capacity)
  • Implementing DirectQuery features, such as composite models and hybrid tables
  • Switching to a hybrid architecture with on-prem semantic model(s) hosted in SQL Server Analysis Services where you can control the hardware configuration and you’re not charge for more memory.
  • Lobbying Microsoft for much larger memory limits or to bring your own memory (good luck with that but it might be an option if you work for a large and important company)

Considering Direct Lake storage

If Fabric is in your future, one relatively new option to tackle out-of-memory scenarios that deserves to be evaluated and added to the list is semantic models configured for Direct Lake storage. Direct Lake on-demand loading should utilize memory much more efficiently for interactive operations, such as Power BI report execution. This is a bonus to the fact that data Direct Lake models don’t require refresh. Eliminating refresh could save tremendous amount of memory to start with, even if you apply advanced techniques such as incremental refresh or hybrid tables to models with imported data.

I did limited testing to compare performance of import and Direct Lake and posted detailed results in the “Fabric Direct Lake: Memory Utilization with Interactive Operations” blog.

I concluded that if Direct Lake is an option for you, it should be at the forefront of your efforts to combat out-of-memory errors with large datasets.

On the downside, more than likely you’ll have to implement ETL processes to synchronize your data warehouse to a Fabric lakehouse, unless your data is in Fabric to start with, or you use Fabric database mirroring for the currently supported data sources (Azure SQL DB, Cosmos, and Snowflake). I’m not counting the data synchronization time as a downside.


Teo Lachev
Prologika, LLC | Making Sense of Data
logo

Atlanta Microsoft BI Group Meeting on December 2nd (Semantic Modeling as Code)

Atlanta BI fans, please join us online for our next meeting on Monday, December 2nd at 5PM ET (please note the change to our usual meeting time to accommodate the presenter). Rui Romano (Product Manager at Microsoft) will discuss how the new TMDL language for Power BI models can unlock new scenarios that previously weren’t possible. For more details and sign up, visit our group page.

Presentation: “Semantic Modeling as Code” with TMDL using Power BI Desktop Developer Mode (PBIP) and VS Code
Delivery: Online
Level: Intermediate to Advanced

Overview: The landscape for developing enterprise-scale models has never been more exciting than it is now! Developer mode in Power BI Desktop and the new TMDL language unlock new scenarios that previously weren’t possible, such as great source control and co-development experiences with Git integration. Additionally, the TMDL Visual Studio Code extension offers a new, powerful and efficient, code-first semantic modeling experience. Join us to discover the new and powerful ways you can leverage TMDL to accelerate your model development and get a sneak peek into the TMDL roadmap from the Power BI product team.

Speaker: Rui Romano is an experienced Microsoft Professional with a deep passion for data and analytics. He has spent the last decade helping companies make better data-driven decisions and is known for his innovative and practical solutions to complex problems. Currently works as a Product Manager at Microsoft on the Power BI product team, focusing on Pro-BI experiences.

PowerBILogo

Atlanta Microsoft BI Group Meeting on November 4th (Accelerating your Fabric Data Estate with AI & Copilot)

Atlanta BI fans, please join us in person for our next meeting on Monday, November 4th at 6:30 PM ET. Stacey Jones (Principal Data & AI Cross-Solution Architect at Microsoft) and Elayne Jones (Solutions Architect at Coca-Cola Bottlers Sales and Services) will explore the AI and Copilot capabilities within Microsoft Fabrics. And I’ll help you catch up on Microsoft BI latest. I will sponsor the event which marks the 14th anniversary of the Atlanta Microsoft BI Group! For more details and sign up, visit our group page.

Details

Presentation: Accelerating your Fabric Data Estate with AI & Copilot
Delivery: In-person
Date: November 4th, 2024
Time: 18:30 – 20:30 ET
Level: Beginner to Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Venue
Improving Office
11675 Rainwater Dr
Suite #100
Alpharetta, GA 30009

Overview: In this presentation, we will explore the groundbreaking AI and Copilot capabilities within Microsoft Fabric, a comprehensive platform designed to enhance productivity and collaboration. By leveraging advanced machine learning algorithms and natural language processing, Microsoft Fabric’s AI/Copilot not only streamlines workflows but also provides intelligent insights and automation, empowering users to achieve more with less effort. Join us as we delve into the features and functionalities that make Microsoft Fabric an indispensable tool for modern enterprises.

Sponsor: CloudStaff.ai

PowerBILogo

Fabric Direct Lake: Memory Utilization with Interactive Operations

As I mentioned in my Power BI and Fabric Capacities: Thinking Outside the Box, memory limits of Fabric capacities could be rather restrictive for large semantic models with imported data. One relatively new option to combat out-of-memory scenarios that deserves to be evaluated and added to the list if Fabric is in your future is semantic models configured for Direct Lake storage. The blog covers results of limited testing that I did comparing side by side the memory utilization of two identical semantic models with the first one configured to import data and the second to use Direct Lake storage. If you need a Direct Lake primer, Chris Webb has done a great job covering its essentials here and here. As a disclaimer, the emphasis is on limited as these results reflect my personal observations based on some isolated tests I’ve done lately. Your results may and probably will vary considerably.

Understanding the Tests

My starting hypothesis was that Direct Lake on-demand loading will utilize memory much more efficiently for interactive operations, such as Power BI report execution. This is a bonus to the fact that data Direct Lake models don’t require refresh. Eliminating refresh could save tremendous amount of memory to start with, even if you apply advanced techniques such as incremental refresh or hybrid tables to models with imported data. Therefore, the tests that follow focus on memory utilization with interactive operations.

To test my hypothesis, I imported the first three months for year 2016 of the NY yellow taxi Azure open dataset to a lakehouse backed up by a Fabric F2 capacity. The resulted in 34.5 million rows distributed across several Delta Parquet files. I limited the data to three months because F2 ran out of memory around the 50 million rows mark with the error “This operation was canceled because there wasn’t enough memory to finish running it. Either reduce the memory footprint of your dataset by doing things such as limiting the amount of imported data, or if using Power BI Premium, increase the memory of the Premium capacity where this dataset is hosted. More details: consumed memory 2851 MB, memory limit 2851 MB, database size before command execution 220 MB”

Descriptive enough and in line with the F2 memory limit of maximum 3 GB per semantic model. I used Power BI desktop to import all that data into a YellowTaxiImported semantic model, which I published to Power BI Service and configured for large storage format. Then, I created online a second YellowTaxiDirectLake semantic model configured for Direct Lake storage mapped directly to the data in the lakehouse. I went back to Power BI desktop to whip up a few analytical (aggregate) queries and a few detail-level queries. Finally, I ran a few tests using DAX Studio.

Analyzing Import Mode

Even after a capacity restart, the YellowTaxiImported model immediately reported 1.4 GB of memory. My conclusion was that that the primary focus of Power BI Premium on-demand loading that was introduced a while back was to speed the first query after the model was evicted from memory. Indeed, I saw that many segments were memory resident and many weren’t, but using queries to touch the non-resident column didn’t increase the memory footprint. The following table lists the query execution times with “Clear On Run” enabled in DAX Studio (to avoid skewing due to cached query data).

Naturally, as the queriers get more detailed, the slower they get because VertiPaq is a columnar database. However, the important observation is that the memory footprint remains constant. Please note that Fabric allocates additional memory to execute the queries, so the memory footprint should grow up as the report load increases.

QueryDuration (ms)
//Analytical query 1

EVALUATE
ROW(
“SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])),
“SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])),
“SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount]))
)

124
//Analytical query 2

DEFINE
VAR __DS0Core =
SUMMARIZECOLUMNS(
‘nyc_yellowtaxi'[puMonth],
“SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount]))
)
VAR __DS0BodyLimited =
SAMPLE(3502, __DS0Core, ‘nyc_yellowtaxi'[puMonth], 1)

EVALUATE
__DS0BodyLimited
ORDER BY
‘nyc_yellowtaxi'[puMonth]

148
//Analytical query 3

DEFINE
VAR __SQDS0Core =
SUMMARIZECOLUMNS(
‘nyc_yellowtaxi'[doLocationId],
“AveragetripDistance”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tripDistance]))
)

VAR __SQDS0BodyLimited =
TOPN(50, __SQDS0Core, [AveragetripDistance], 0)

VAR __DS0Core =
SUMMARIZECOLUMNS(
‘nyc_yellowtaxi'[doLocationId],
__SQDS0BodyLimited,
“AveragetripDistance”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tripDistance]))
)

VAR __DS0PrimaryWindowed =
TOPN(1001, __DS0Core, [AveragetripDistance], 0, ‘nyc_yellowtaxi'[doLocationId], 1)

EVALUATE
__DS0PrimaryWindowed

ORDER BY
[AveragetripDistance] DESC, ‘nyc_yellowtaxi'[doLocationId]

114
//Analytical query 4

DEFINE
VAR __SQDS0Core =
SUMMARIZECOLUMNS(
‘nyc_yellowtaxi'[doLocationId],
“AveragetripDistance”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tripDistance]))
)

VAR __SQDS0BodyLimited =
TOPN(50, __SQDS0Core, [AveragetripDistance], 0)

VAR __DS0Core =
SUMMARIZECOLUMNS(
‘nyc_yellowtaxi'[rateCodeId],
__SQDS0BodyLimited,
“AveragetipAmount”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tipAmount]))
)

VAR __DS0BodyLimited =
SAMPLE(3502, __DS0Core, ‘nyc_yellowtaxi'[rateCodeId], 1)
EVALUATE
__DS0BodyLimited

ORDER BY
‘nyc_yellowtaxi'[rateCodeId]

80
//Detail query 1

DEFINE
VAR __DS0FilterTable =
FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0)

VAR __DS0FilterTable2 =
FILTER(
KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])),
‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2)
)

VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL(
ROLLUPGROUP(‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]), “IsGrandTotalRowTotal”
),
__DS0FilterTable,
__DS0FilterTable2,
“SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])),
“SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])),
“SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount]))
)

VAR __DS0PrimaryWindowed =
TOPN(
502,
__DS0Core,
[IsGrandTotalRowTotal],
0,
‘nyc_yellowtaxi'[startLat],
1,
‘nyc_yellowtaxi'[startLon],
1
)

EVALUATE
__DS0PrimaryWindowed

ORDER BY
[IsGrandTotalRowTotal] DESC, ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]

844
//Detail query 2

DEFINE
VAR __DS0FilterTable =
FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0)

VAR __DS0FilterTable2 =
FILTER(
KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])),
‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2)
)

VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL(
ROLLUPGROUP(‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]), “IsGrandTotalRowTotal”
),
__DS0FilterTable,
__DS0FilterTable2,
“SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])),
“SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])),
“SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount]))
)

VAR __DS0PrimaryWindowed =
TOPN(
502,
__DS0Core,
[IsGrandTotalRowTotal],
0,
‘nyc_yellowtaxi'[startLat],
1,
‘nyc_yellowtaxi'[startLon],
1
)

EVALUATE
__DS0PrimaryWindowed

ORDER BY
[IsGrandTotalRowTotal] DESC, ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]

860
//Detail query 3

DEFINE
VAR __DS0FilterTable =
FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0)

VAR __DS0FilterTable2 =
FILTER(
KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])),
‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2)
)

VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL(
ROLLUPGROUP(‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]), “IsGrandTotalRowTotal”
),
__DS0FilterTable,
__DS0FilterTable2,
“SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])),
“SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])),
“SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount]))
)

VAR __DS0PrimaryWindowed =
TOPN(
502,
__DS0Core,
[IsGrandTotalRowTotal],
0,
‘nyc_yellowtaxi'[startLat],
1,
‘nyc_yellowtaxi'[startLon],
1
)

EVALUATE
__DS0PrimaryWindowed

ORDER BY
[IsGrandTotalRowTotal] DESC, ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]

1,213
//Detail query 4 (All Columns)

DEFINE
VAR __DS0FilterTable =
FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0)

VAR __DS0FilterTable2 =
FILTER(
KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])),
‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2)
)

VAR __ValueFilterDM1 =
FILTER(
KEEPFILTERS(
SUMMARIZECOLUMNS(
‘nyc_yellowtaxi'[startLat],
‘nyc_yellowtaxi'[startLon],
‘nyc_yellowtaxi'[paymentType],
‘nyc_yellowtaxi'[vendorID],
‘nyc_yellowtaxi'[improvementSurcharge],
‘nyc_yellowtaxi'[doLocationId],
‘nyc_yellowtaxi'[puLocationId],
‘nyc_yellowtaxi'[storeAndFwdFlag],
‘nyc_yellowtaxi'[tpepDropoffDateTime],
‘nyc_yellowtaxi'[tpepPickupDateTime],
__DS0FilterTable,
__DS0FilterTable2,
“SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])),
“SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])),
“SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])),
“SumpassengerCount”, CALCULATE(SUM(‘nyc_yellowtaxi'[passengerCount])),
“SumtripDistance”, CALCULATE(SUM(‘nyc_yellowtaxi'[tripDistance])),
“SumendLat”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLat])),
“SumendLon”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLon])),
“Sumextra”, CALCULATE(SUM(‘nyc_yellowtaxi'[extra])),
“SumfareAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[fareAmount])),
“SummtaTax”, CALCULATE(SUM(‘nyc_yellowtaxi'[mtaTax])),
“SumpuMonth”, CALCULATE(SUM(‘nyc_yellowtaxi'[puMonth])),
“SumpuYear”, CALCULATE(SUM(‘nyc_yellowtaxi'[puYear])),
“CountrateCodeId”, CALCULATE(COUNTA(‘nyc_yellowtaxi'[rateCodeId]))
)
),
[SumtollsAmount] > 0
)

VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL(
ROLLUPGROUP(
‘nyc_yellowtaxi'[startLat],
‘nyc_yellowtaxi'[startLon],
‘nyc_yellowtaxi'[paymentType],
‘nyc_yellowtaxi'[vendorID],
‘nyc_yellowtaxi'[improvementSurcharge],
‘nyc_yellowtaxi'[doLocationId],
‘nyc_yellowtaxi'[puLocationId],
‘nyc_yellowtaxi'[storeAndFwdFlag],
‘nyc_yellowtaxi'[tpepDropoffDateTime],
‘nyc_yellowtaxi'[tpepPickupDateTime]
), “IsGrandTotalRowTotal”
),
__DS0FilterTable,
__DS0FilterTable2,
__ValueFilterDM1,
“SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])),
“SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])),
“SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])),
“SumpassengerCount”, CALCULATE(SUM(‘nyc_yellowtaxi'[passengerCount])),
“SumtripDistance”, CALCULATE(SUM(‘nyc_yellowtaxi'[tripDistance])),
“SumendLat”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLat])),
“SumendLon”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLon])),
“Sumextra”, CALCULATE(SUM(‘nyc_yellowtaxi'[extra])),
“SumfareAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[fareAmount])),
“SummtaTax”, CALCULATE(SUM(‘nyc_yellowtaxi'[mtaTax])),
“SumpuMonth”, CALCULATE(SUM(‘nyc_yellowtaxi'[puMonth])),
“SumpuYear”, CALCULATE(SUM(‘nyc_yellowtaxi'[puYear])),
“CountrateCodeId”, CALCULATE(COUNTA(‘nyc_yellowtaxi'[rateCodeId]))
)

VAR __DS0PrimaryWindowed =
TOPN(
502,
__DS0Core,
[IsGrandTotalRowTotal],
0,
‘nyc_yellowtaxi'[startLat],
1,
‘nyc_yellowtaxi'[startLon],
1,
‘nyc_yellowtaxi'[paymentType],
1,
‘nyc_yellowtaxi'[vendorID],
1,
‘nyc_yellowtaxi'[improvementSurcharge],
1,
‘nyc_yellowtaxi'[doLocationId],
1,
‘nyc_yellowtaxi'[puLocationId],
1,
‘nyc_yellowtaxi'[storeAndFwdFlag],
1,
‘nyc_yellowtaxi'[tpepDropoffDateTime],
1,
‘nyc_yellowtaxi'[tpepPickupDateTime],
1
)

EVALUATE
__DS0PrimaryWindowed

ORDER BY
[IsGrandTotalRowTotal] DESC,
‘nyc_yellowtaxi'[startLat],
‘nyc_yellowtaxi'[startLon],
‘nyc_yellowtaxi'[paymentType],
‘nyc_yellowtaxi'[vendorID],
‘nyc_yellowtaxi'[improvementSurcharge],
‘nyc_yellowtaxi'[doLocationId],
‘nyc_yellowtaxi'[puLocationId],
‘nyc_yellowtaxi'[storeAndFwdFlag],
‘nyc_yellowtaxi'[tpepDropoffDateTime],
‘nyc_yellowtaxi'[tpepPickupDateTime]

4,240

Analyzing Direct Lake

After another restart, the resident memory footprint of the YellowTaxiDirectLake model was only 22.4 KB! Indeed, the $System.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS DMV showed that only system-generated RowNumber columns were memory resident.

For each query, I recorded two runs to understand how much time is spent in on-demand loading of columns into memory. The Import Mode column was added for convenience to compare the second run duration with the corresponding query duration from the Import Mode tests. Finally, the Model Resident Memory column records the memory footprint of the Direct Lake model.

QueryFirst Run
(ms)
Second Run (ms)Import Mode

(ms)

Model Resident Memory (MB)
//Analytical query 1797512414
//Analytical query 2797614814.3
//Analytical query 338213311468.1
//Analytical query 42091308068.13
//Detail query 17,7631,023844669.13
//Detail query 21,4841,453860670.53
//Detail query 31,8811,4631,213670.6
//Detail query 49,6633,6684,2401,270

Conclusion

To sum up this long post, the following observations can be made:

  1. As expected, the more columns the query touch, the higher the memory footprint of Direct Lake. For example, the last query requested all the columns, and the resulting memory footprint was at a par with imported mode.
  2. It’s important to note that when Fabric is under memory pressure, such as when the report load increases, Direct Lake will start paging out columns with low temperature. The exact thresholds and rules are not documented but I’d expect the eviction mechanism to be much more granular and intelligent than evicting entire datasets with imported mode.
  3. The reason that I didn’t see Direct Lake paging out memory is because I was still left with plenty (1.27 GB consumed out of 3 GB). It doesn’t make sense evicting data if there is no memory pressure since memory is the fasted storage.
  4. You’ll pay a certain price the first time a column is loaded on demand with Direct Lake. The more columns, the longer the wait. Subsequent runs, however, will be much faster if the column is still mapped in memory.
  5. Some queries will execute faster in import mode and some will execute slower. Overall, queries touching memory-resident columns should be comparable.

Therefore, if Direct Lake is an option for you, it should be at the forefront of your efforts to combat out-of-memory errors with large datasets. On the downside, more than likely you’ll have to implement ETL processes to synchronize your data warehouse to a Fabric lakehouse, unless your data is in Fabric to start with, or you use Fabric database mirroring for the currently supported data sources (Azure SQL DB, Cosmos, and Snowflake). I’m not counting the data synchronization time as a downside because it could supersede the time you currently spend in model refresh.

computer memory with queries executing and Microsoft Fabric logo. Image 4 of 4