Fabric OneLake: The Good, the Bad, and the Ugly

To know a tool is to know its limitations – Teo’s proverb

In a previous post, I shared my overall impression of Fabric. In this post, I’ll continue exploring Fabric, this time sharing my thoughts on OneLake. If you need a quick intro to Fabric OneLake, the Josh Caplan’s “Build 2023: Eliminate data silos with OneLake, the OneDrive for Data” presentation provides a great overview of OneLake, its capabilities, and the vision behind it from a Microsoft perspective. If you prefer a shorter narrative, you can find it in the “Microsoft OneLake in Fabric, the OneDrive for data” post. As always, we are all learning and constructive criticism would be appreciated if I missed or misinterpreted something.

What’s Fabric OneLake?

In a nutshell, OneLake is a Microsoft-provisioned storage where the ingested data and the data from the analytical (compute) engines are stored (see the screenshot below). Currently, PBI datasets (Analysis Services) are not saved in OneLake although one could anticipate that in the long run all Power BI data artifacts could (and should) end up in OneLake for Power BI Fabric-enabled workspaces.

Behind the scenes, OneLake uses Azure Data Lake Storage (ADLS) Gen2, but there is an application wrapper on top of it that handles various tasks, such as management (you don’t have to provision storage accounts or external tables), security (Power BI security is enforced), and governance. OneLake is tightly coupled with the Power BI catalog and security. In fact, you can’t create folders outside the Power BI catalog so proper catalog planning is essential. When you create a Fabric workspace in Power BI, Microsoft provisions an empty blob container in OneLake for that workspace. As you provision additional Fabric Services, Fabric creates more folders and save data in these folders. For example, I can use Azure Storage Explorer to connect to a Fabric Lakehouse Tutorial workspace (https://onelake.blob.fabric.microsoft.com/fabric lakehouse tutorial) and see that I have provisioned a data warehouse called dw and a lakehouse called wwilakehouse. Microsoft has also added two additional system folders for data staging.

The Good

I like to have the data from the compute engines saved in one place and to be able to access that data as delta Parquet files. There is something serene to have this transparency after decades of not being able to pick under hood of some proprietary file format.

I also like very much the ability to use different engines to access that data (Microsoft calls this OneCopy although a more appropriate term in my opinion would be OneShare).

For example, I can query a table in the lakehouse directly from the data warehouse (such as select * from [wwilakehouse].dbo.dimension_customer) without having to create PolyBase external tables or use the serverless endpoint.

I welcome the possibility for better data virtualization both within and outside the organization. For example, in a lakehouse I can create shortcuts to other Fabric lakehouses, ADLS folders, and even external storage systems, such as S3 (Google and Dataverse coming up).

I’d like the governance aspect and the possibility it enables, such as lineage tracking and data security. In fact, OneSecurity is on the roadmap, where once you secure the data at OneLake, the data security bubbles up. It would be interesting to see how this will work and its limitations, as I’d imagine it won’t be as flexible as Power BI RLS.

The Bad

An API wrapper is a double-edged sword because it wraps and abstracts things. You really have to embrace the Power BI catalog and its security model along with their limitations, because there is no way around it.

For example, you can’t use Azure Data Explorer to change the ACL permissions on the folder or create folders where OneLake doesn’t like them (OneLake simply ignores certain APIs). Isn’t this a good thing though to centralize at a higher level, such as the Power BI workspace and prevent users to pain outside the canvas? Well, how about granting some external vendor permissions to write to OneLake, such as to upload files. From what I can see, even Shared Access Signature is not allowed. Storage account key is not an option too as you don’t have access to the storage account. So, to get around such limitations, you’d have to create separate storage accounts, but this defeats the promise of one lake.

Perhaps, this is an example that you don’t care much about. How about the lakehouse medallion architecture pattern which is getting popular nowadays for lakehouse-centric implementations. While you can create subfolders in the Lakehouse “unmanaged” zone (the Files folder), no such luck with the Tables folder to organize your delta lake tables in Bronze, Silver, and Gold zones. I can’t see another solution but to create a Power BI workspace for each zone.

Further, what if you must address data residency requirements of some litigation-prone country? The above-cited presentation correctly states that content can be geo-located but that will require purchasing more Power BI capacities irrespective of the fact that ADLS storage accounts can be created in different geo locations. Somehow, I start longing about bringing your own data lake and I don’t think that provisioning storage accounts is that difficult (in fact, a Power BI workspace already has the feature to link to BYO ADLS storage to save the output of Power Query dataflows). Speaking of accounts, I’m looking forwards to seeing how Fabric would address DevOps, such as Development and Production environments. Currently, a best practice is to separate all services so more than likely Microsoft will enhance Power BI pipelines to handle all Fabric content.

The Ugly

How can I miss another opportunity to harp again on Power BI for the lack of hierarchical workspaces? The domain feature that Microsoft shows in the presentation to logically group workspaces could be avoided to a great extent if Finance could organize content in subfolders and break security inheritance when needed, instead of ending up with workspaces, such as Finance North America, Finance EMEA, etc. Since OneLake is married to the Power BI catalog, more catalog flexibility is essential.

Given that OneLake and ADLS in general would be a preferred location for business users to store reference data, such as Excel files, Microsoft should enhance Office to support ADLS. Currently, Office apps can’t directly open and save files in ADLS (the user must download the file somehow, edit it, and then upload it back to ADLS). Consequently, business users favor other tools, such as SharePoint Online, which leads to decentralizing the data required for analytics.

Atlanta Microsoft BI Group Meeting on June 5th (How to Fix an Inherited Power BI Report)

Please join us for the next meeting on Monday, June 5th, at 6:30 PM ET.  Kristyna Hughes (Senior data & analytics consultant at 3Cloud will join us remotely and share best practices for dissecting a complicated and broken Power BI report and a checklist of how to fix it. Improving will sponsor the meeting. Your humble correspondent will demonstrate how Power BI can integrate better with SharePoint and OneDrive. For more details and sign up, visit our group page.

WE NOW MEET BOTH IN-PERSON AND ONLINE. WE STRONGLY ENCOURAGE YOU TO ATTEND THE EVENT IN PERSON FOR BEST EXPERIENCE AND BECAUSE AN EMPTY AUDIENCE IS DISCOURAGING TO SPEAKERS AND SPONSORS. ALTERNATIVELY, YOU CAN JOIN OUR MEETINGS ONLINE VIA MS TEAMS. WHEN POSSIBLE, WE WILL RECORD THE MEETINGS AND MAKE RECORDINGS AVAILABLE HERE.

PLEASE RSVP ONLY IF COMING TO OUR IN-PERSON MEETING AND PLAN TO EAT

Presentation: How to Fix an Inherited Power BI Report

Date: June 5th

Time: 18:30 – 20:30 ET

Place: Onsite and online

Level: Intermediate

Food: Food and drinks will be available for this meeting

Agenda:

18:15-18:30 Registration and networking

18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)

19:00-20:15 Main presentation

20:15-20:30 Q&A

 ONSITE (RECOMMENDED)

Improving Office

11675 Rainwater Dr

Suite #100

Alpharetta, GA 30009

ONLINE

Click here to join the meeting

Overview: A common request in the realm of reporting is “hey, we have this report that a previous VIP used, but the report is really slow and the new VIP would like to revamp the report to answer their questions”. Great, you think, I can knock this out no problem! Then, you open the report. Fifty tables, over a hundred measures, and six calculated tables later you start to panic.

This session will go into best practices for dissecting a complicated report and a checklist for quick wins. No need to panic, this session will help build a toolbelt to tackle any reconstruction project. 

Speaker: Kristyna Hughes is a senior data & analytics consultant at 3Cloud. Her experience includes implementing and managing enterprise-level Power BI instance, training teams on reporting best practices, and building templates for scalable analytics. Passionate about participating and growing the data community, she enjoys co-writing on Data on Wheels (dataonwheels.com) and has recently co-founded Data on Rails (dataonrailsblog.com). She also is on the board of directors for PASS MN and is a co-organizer for Lexington Data Technology Group.

Sponsor: Improving

Sponsor: TBD

PowerBILogo

A First Look at Microsoft Fabric: The Good, the Bad, and the Ugly

“May you live in interesting times” – Chinese proverb

Microsoft Fabric is upon us with a grand fanfare. You can get a good overview of its vision and capabilities by watching the Microsoft Fabric Launch Digital Event (Day 1) and Microsoft Fabric Launch Digital Event (Day 2) recordings. Consultants and experts are extolling its virtues and busy fully aligning with Microsoft. There is a lot of stuff going on in Fabric and I’m planning to cover the technologies I work with and care about in more detail in future posts as Microsoft reveals more what’s under the kimono. This post is about my overall impression on Fabric, in an attempt to cut through the dopamine and adrenaline-infused marketing hype. As always, please feel free to disagree and provide constructive criticism.

The Good

Let’s just say that after 30 years working with Microsoft technologies, I’m very, very skeptical when I hear loaded terms, like “revolutionary”, “one-something”, “never has been done before”, etc. We all witnessed impressive launches for products that wouldn’t last a year. But it looks like this time Microsoft got their act together and put something that may pass the test of time and that I could recommend or use to help clients. As a starter, I’m glad that we’ve finally settled on a common and open storage (delta and Parquet) after years of experimenting with proprietary and open formats (CDM folders anyone?). This common storage has several advantages, including accessibility, portability, and virtualization.

I also like very much that Microsoft doesn’t enforce or propel a specific architecture or data flow pattern. If you want a lakehouse, sure you can have it. Care about medallion file organization? Sure, you can do that. Don’t want a lakehouse but data warehouse if you don’t deal with files and you don’t like a notebook with a blinking cursor? Not a problem. Want to skip staging data as files to the lake and load it directly in the warehouse? Fine. This is very different approach than other vendors take, such as to promote data warehousing on top of lakehouses and/or rule out relational databases whatsoever (read my thoughts on this here).

It’s obvious that a gigantic effort has taken place to unify and in same cases rewrite products, such as Analysis Services and Synapse Data Warehouse, to adhere to this new platform and vision. Basically, Fabric is the focal point of decades of hard work from all Microsoft teams involved in analytics to at least make a complicate data estate easy to access and manage.

The Bad

Going back to the presentation and my skepticism, I wish Microsoft could dialed down on some promises, like “one copy” of data. Anyone who has implemented a data warehouse of a decent complexity knows that data duplication is necessary. Data exists in the source systems, needs to be staged, and then transformed. Right there we have three copies. True, virtualization might help us avoid some data movement scenarios, such as accessing data directly in S3 buckets or importing in a Power BI dataset (for most companies a few extra minutes for refreshing datasets is not an issue).

Speaking of companies, it’s clear that Fabric (and presenters in the videos) targets the needs of large organizations with complex integration scenarios. But for most organizations “Big Data” is a few million rows and most common integration task is analyzing data from one or multiple ERPs. Should they care about Fabric? I guess it would really depend on its value proposition and budgets, but Fabric pricing hasn’t been announced yet. If Fabric is not available in PPU (Premium Per User), it probably would be dead on arrival for smaller organizations, as they can get modern analytics by spending less than $200/month on infrastructure excluding Power BI per-user licenses.

Finally, although presenters highlighted avoiding vendor lock-in as one of the major benefits of Fabric, you’re going to put all your eggs in one basket: Power BI/Fabric. Making Power BI a one-stop destination for analytics makes of course a lot of sense to Microsoft and increases its revenue potential (nothing wrong with revenue if it brings value). But for you Fabric would be a long-term commitment and you better make sure you avoid Microsoft-proprietary features as much as you can, such as Power Query dataflows and Azure Factory dataflows, should one day you decide to divest from Fabric, Power BI, or even Azure. Otherwise, you might find yourself in a similar situation as this client who had to migrate hundreds of Alteryx flows.

The Ugly

Confusion has descended upon the BI land after Microsoft throws and abandons products left and right. In fact, the Fabric documentation has sections to help you choose product, such as Lakehouse, New Synapse data warehouse, Power BI datamart (that one is easy, stay away from it especially if you plan to adopt Fabric). Should we add Synapse Dedicated Pools and Azure SQL Database to the comparison table?

Further, rewriting these engines means that we must go back to square one and wait for features. For example, the new Synapse data warehouse lacks so many T-SQL features and outside my plans for any near-term projects. Just when I thought Synapse SQL dedicated pools were caching up on T-SQL parity, someone moved my cheese… Well, good things happen to those who wait, so let’s give Fabric a year or so.

 

Other Fabric related posts:

Atlanta MS BI and Power BI Group Meeting on May 1st (Getting started with Power BI Deployment Pipelines)

Please join us for the next meeting on Monday, May 1st, at 6:30 PM ET.  Akshata Revankar (Data Engineer, Specialist at McKinsey & Company) will show you how to leverage Power BI deployment pipelines to promote content changes between environments, such as DEV and PROD. For more details and sign up, visit our group page.

WE NOW MEET BOTH IN-PERSON AND ONLINE. WE STRONGLY ENCOURAGE YOU TO ATTEND THE EVENT IN PERSON FOR BEST EXPERIENCE AND BECAUSE AN EMPTY AUDIENCE IS DISCOURAGING TO SPEAKERS AND SPONSORS. ALTERNATIVELY, YOU CAN JOIN OUR MEETINGS ONLINE VIA MS TEAMS. WHEN POSSIBLE, WE WILL RECORD THE MEETINGS AND MAKE RECORDINGS AVAILABLE HERE.

PLEASE RSVP ONLY IF COMING TO OUR IN-PERSON MEETING AND PLAN TO EAT

Presentation: Getting started with Power BI Deployment Pipelines
Date: May 1st
Time: 18:30 – 20:30 ET
Place: Onsite and online
Level: Intermediate
Food: Food and drinks will be available for this meeting

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

ONSITE (RECOMMENDED)
Improving Office
11675 Rainwater Dr.
Suite #100
Alpharetta, GA 30009

ONLINE
Click here to join the meeting

Overview: If you are wondering, “Is CI/CD possible in Power BI?” , the answer is YES! and this can be achieved with “Deployment Pipelines”. With Deployment Pipelines, it’s now possible to move content smoothly from Development/Test to QA to Production, helping enterprise BI teams bring application lifecycle management to their power BI environments.
· Understand deployment pipelines
· Create a deployment pipeline
· Deploy content with rules
· Comparing Content
· Move content from one stage to next.

Speaker: Akshata Revankar (Data Engineer, Specialist at McKinsey & Company) has 16+ Years of experience in data engineering and data reporting space. Have worked with Oracle database, SQL Server, SSIS, Informatica Power Center, Hadoop systems, Qlik and Power BI. Enjoy being in the data space and learning new things.

Sponsor: TBD

PowerBILogo

Fixing SSIS Crashes

I’ve spent hours on this so someone else might find the solution useful. I’ve developed an SSIS package that uses a ForEach Loop container. Then, I closed Visual Studio and reopen it. The SSIS designer opens the package, thinks for a few seconds if it likes it or not, and then it crashes Visual Studio. I’ve noticed that the VS status bar shows a message that it validates the ForEach Loop container, which was an important clue.

How do we fix this horrible issue? Initially, I was thinking that it was interference from Visual Studio 2022 that someone else has recently installed. So, I upgraded, uninstalled, repaired, tried VS 2022, etc. to no avail.

Finally, I open the package code in text editor and added “DTS:DelayValidation=”True” to the container task to disable the upfront validation on package open.

This fixed the issue although I had no idea what caused the crash.

<DTS:Executable
 DTS:refId="Package\Foreach Loop Container"
 DTS:CreationName="STOCK:FOREACHLOOP"
DTS:DelayValidation="True"
...

Atlanta MS BI and Power BI Group Meeting on April 3rd (Power BI Dashboard in an Hour)

Please join us for the next meeting on Monday, April 3rd, at 6:30 PM ET.  Your humble correspondent will revisit Power BI important fundamentals in a demo-packed session. For more details and sign up, visit our group page.

PLEASE NOTE THAT OUR IN-PERSON MEETING LOCATION HAS CHANGED! WE STRONGLY ENCOURAGE YOU TO ATTEND THE EVENT IN PERSON FOR BEST EXPERIENCE. ALTERNATIVELY, YOU CAN JOIN OUR MEETINGS ONLINE VIA MS TEAMS. WHEN POSSIBLE, WE WILL RECORD THE MEETINGS AND MAKE RECORDINGS AVAILABLE AT HTTPS://BIT.LY/ATLANTABIRECS. PLEASE RSVP ONLY IF COMING TO OUR IN-PERSON MEETING.

Presentation: Power BI Dashboard in an Hour (DIAH)

Date: April 3rd

Time: 18:30 – 20:30 ET

Place: Onsite and online

Level: Beginner

Food: Food and drinks will be available for this meeting

Agenda:

18:15-18:30 Registration and networking

18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)

19:00-20:15 Main presentation

20:15-20:30 Q&A

 

ONSITE (RECOMMENDED)

Improving Office

11675 Rainwater Dr.

Suite #100

Alpharetta, GA 30009

 

ONLINE

Click here to join the meeting

Overview: Dashboard in an Hour (DIAH)

Targeting novice Power BI users, this hands-on, no-slide session covers important Power BI fundamentals and best practices. If you’re already a Power BI user, you’ll probably learn a new trick or two. And if you like a challenge, bring your laptop and try to keep up through the steps to create a Power BI dashboard!

Join us and learn how to:

  • Design your BI model
  • Acquire and transform data
  • Turning data into valuable and interactive insights
  • Sharing your visualizations with others

Speaker: Teo Lachev is a consultant, author, and mentor, with a focus on Microsoft BI. Through his Atlanta-based company Prologika he designs and implements innovative solutions that bring tremendous value to his clients. Teo has authored and co-authored several books, and he has been leading the Atlanta Microsoft Business Intelligence group since he founded it in 2010. Microsoft has recognized Teo’s contributions to the community by awarding him the prestigious Microsoft Most Valuable Professional (MVP) Data Platform status for 15 years. Microsoft selected Teo as one of only 30 FastTrack Solution Architects for Power BI worldwide.

Sponsor: Improving

Prototypes with Pizza: Power BI Latest with Teo Lachev

PowerBILogo

Working with Large Tables in SQL Server

Warning: This blog contains old tricks of an old dog.

Scenario: Suppose you have a large table in SQL Server, e.g. hundreds of millions or even a billion rows. DML operations (SELECT, INSERT, UPDATE, DELETE) take long time. How do you speed them up? Do you split the large table into multiple tables? Or, do you ask for better hardware? Or, do you start looking for a new job with less data?

Solution: It’s nothing new but I see clients struggle with this all the time because they don’t know any better.

The solution is to partition the table and use partition switching that SQL Server has supported since time immemorial.

Cathrine Wilhelmsen has a great step-by-step blog covering different scenarios, but the process goes like this:

  1. Configure page compression for the large table (see benefits here).
  2. Partition the large table, such as by month.
  3. Create a not-partitioned staging table that has the same indexes and compression as the large table.
  4. Find the corresponding partition in the large table that will require DML, such as by using this script.
  5. If the data requires updates, switch out the affected partition to the staging table. Perform updates. For full loads where rows will be only inserted, you don’t have to switch out the partition (see the second scenario in Cathrine’s blog).
  6. Switch in the staging table into the corresponding partition of the large table. This should take a few seconds.

As a bonus, the SQL Server query processor could eliminate partitions for SELECTs, thus improving the query performance.

Atlanta MS BI and Power BI Group Meeting on March 6th (The Semantic Lakehouse: Power BI and Databricks)

Please join us for the next meeting on Monday, March 6th, at 6:30 PM ET.  Leo Furlong (Senior Solutions Architect at Databricks) will share their point of view on why “the best data warehouse is a lakehouse.” For more details and sign up, visit our group page.

PLEASE NOTE THAT OUR IN-PERSON MEETING LOCATION HAS CHANGED! WE STRONGLY ENCOURAGE YOU TO ATTEND THE EVENT IN PERSON FOR BEST EXPERIENCE. ALTERNATIVELY, YOU CAN JOIN OUR MEETINGS ONLINE VIA MS TEAMS. WHEN POSSIBLE, WE WILL RECORD THE MEETINGS AND MAKE RECORDINGS AVAILABLE AT HTTPS://BIT.LY/ATLANTABIRECS. PLEASE RSVP ONLY IF COMING TO OUR IN-PERSON MEETING.

Presentation: The Semantic Lakehouse: Power BI and Databricks

Date: March 6th

Time: 18:30 – 20:30 ET

Place: Onsite and online

Level: Intermediate

Food: Food and drinks will be available for this meeting

 

Agenda:

18:15-18:30 Registration and networking

18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)

19:00-20:15 Main presentation

20:15-20:30 Q&A

 

ONSITE (RECOMMENDED)

Improving Office

11675 Rainwater Dr.

Suite #100

Alpharetta, GA 30009

 

ONLINE

Click here to join the meeting

Overview: The team from Databricks will come and share their point of view on why “the best data warehouse is a lakehouse.” We’ll go over lakehouse 101, when you might (or might not!) need a lakehouse, some best practices for operating a BI solution with Databricks, and walk through a demo highlighting how PowerBI’s and Databricks’ SQL capabilities complement each other.

Speaker: Leo Furlong, Senior Solutions Architect at Databricks Leo is a seasoned data and analytics professional with 15 years of consulting experience building Data Warehousing and BI solutions using SQL Server, Power BI, and Azure technologies prior to joining Databricks in 2021. As an Atlanta native, Leo is a Georgia Tech and Georgia State grad and lives in the Smyrna/Vinings area with his 4 kids and 4 dogs.

Sponsor: Databricks

Prototypes with Pizza: Power BI Latest with Teo Lachev

PowerBILogo

Presenting at SQL Saturday Atlanta 2023

I’m presenting at SQL Saturday Atlanta 2023 – BI & Data Analytics Edition on February 25th at 9 AM (the very first slot in the very first room for very first early birds). I’ll do a Power BI Dashboard in an Hour this time to revisit the basics. I hope to see some of you there.

Targeting novice Power BI users, this hands-on, no-slide session covers important Power BI fundamentals and best practices. If you’re already a Power BI user, you’ll probably learn a new trick or two. And if you like a challenge, bring your laptop and try to keep up through the steps to create a Power BI dashboard! Join us and learn how to:

•    Design your BI model

•    Acquire and transform data

•    Turning data into valuable and interactive insights

•    Sharing your visualizations with others

Download the session files from here.

Demystifying Power BI Dataset Scale-out

Microsoft announced a public preview of Power BI Dataset scale-out (DSO) for Power Premium, Premium per User (PPU), and Power BI Embedded. In the comments below the announcement, the article implies that this feature is a replacement for the Azure Analysis Services scale-out. “If you have an AAS scale out and you migrate your databases (aka models aka datasets aka cubes) to Power BI Premium, you get scale out automatically and at no extra cost.” Scaling out for free? Sure, where do I sign?

But then further down the comments, we have this clarification “[Power BI DSO happens] if a dataset is on peak load and the vcores of your capacity aren’t maxed out. Keep in mind that scalability on a single instance isn’t linear. By scaling out, we can achieve a better utilization of available CPU resources for high workloads. On the other hand, if your vcores are already maxed out, then scaling out brings no further perf benefit.” Confused? So was I, and I reached for clarification to Microsoft. Below, is my best understanding of what happens behind the scenes.

First, let’s start with a definition. By scaling out, we mean distributing the load to more machines (presumably when scaling up has saturated one server). AAS scale-out is a true scale-out because you provision additional VMs (replicas) and the system distributes queries across the available replica to achieve a linear load (to the point of saturation of course). The big downside is that you pay for each replica and that cost can surely a dent in your budget (a customer recently incurred 30K for one month with the maximum 7 replicas).

Power BI DSO gets trickier. First, Power BI monitors your overall CPU usage to make sure that you don’t abuse the system and exceed the provisioned number of cores. For example, P1 limits you to 4 background cores. If your capacity is consistently saturated, DSO doesn’t bring any benefits. Power BI will throttle all replicas if it sees sustained saturation. You paid for 4 cores and you get 4 cores overall, irrespective that you now have other read replica(s).

But then, suppose the system is in a relatively quiet state. Suddenly, at 8 AM in the morning, a burst of queries comes along from sales reps checking the latest commissions. Now, Power BI will distribute these queries across the replicas to answer them as quickly as possible without throttling. But there is also some smoothing that happens as your queries execute in Gen2. If you get 100 queries at the same time, then they can all execute together with no throttling and on different replicas. After they finish, their CPU cost may cause latency for the next set of queries – but if there’s enough of a gap between this burst of queries and the next burst of queries, then the impact will be minimal and you would just see far better experiences for the end users.

Additional performance boost can be realized by enabling Power BI core auto-scale. As your CPU usage grows beyond the provisioned cores, Power BI will provision more cores. And now you will benefit from a second replica, because a new VM is available with its own set of physical cores. By contrast, without QSO the extra cores wouldn’t really help performance of that dataset because the first replica is running on a VM with all provisioned cores.

For now, DSO targes refresh isolation (one write + one read replica) and this enables scenarios where you want to do a full refresh on a large dataset that wouldn’t otherwise fit in memory. But in future, Power BI promises DSO to scale out to more read replicas. However, I see no reason against using this feature for large datasets, such as organizational semantic models, even in its current state (free, remember?).

Finally, unlike AAS, replica synchronization is automatic if you schedule the dataset refresh in Power BI. If you go through the back door, such as to process a dataset via the XMLA endpoint, you’re on your own synchronizing the replicas.

The following table summarizes the DSO features for AAS and Power BI.

Azure Analysis Services Power BI
Core throttling No Yes, with sustained loads
Primary scenario Better query performance Currently, refresh isolation
Synchronization Explicit synchronization required Automatic (Power BI dataset refresh) or explicit when XMLA endpoint is used
Cost Pay per replica No additional cost