Posts

Demystifying Power BI Dataset Scale-out

Microsoft announced a public preview of Power BI Dataset scale-out (DSO) for Power Premium, Premium per User (PPU), and Power BI Embedded. In the comments below the announcement, the article implies that this feature is a replacement for the Azure Analysis Services scale-out. “If you have an AAS scale out and you migrate your databases (aka models aka datasets aka cubes) to Power BI Premium, you get scale out automatically and at no extra cost.” Scaling out for free? Sure, where do I sign?

But then further down the comments, we have this clarification “[Power BI DSO happens] if a dataset is on peak load and the vcores of your capacity aren’t maxed out. Keep in mind that scalability on a single instance isn’t linear. By scaling out, we can achieve a better utilization of available CPU resources for high workloads. On the other hand, if your vcores are already maxed out, then scaling out brings no further perf benefit.” Confused? So was I, and I reached for clarification to Microsoft. Below, is my best understanding of what happens behind the scenes.

First, let’s start with a definition. By scaling out, we mean distributing the load to more machines (presumably when scaling up has saturated one server). AAS scale-out is a true scale-out because you provision additional VMs (replicas) and the system distributes queries across the available replica to achieve a linear load (to the point of saturation of course). The big downside is that you pay for each replica and that cost can surely a dent in your budget (a customer recently incurred 30K for one month with the maximum 7 replicas).

Power BI DSO gets trickier. First, Power BI monitors your overall CPU usage to make sure that you don’t abuse the system and exceed the provisioned number of cores. For example, P1 limits you to 4 background cores. If your capacity is consistently saturated, DSO doesn’t bring any benefits. Power BI will throttle all replicas if it sees sustained saturation. You paid for 4 cores and you get 4 cores overall, irrespective that you now have other read replica(s).

But then, suppose the system is in a relatively quiet state. Suddenly, at 8 AM in the morning, a burst of queries comes along from sales reps checking the latest commissions. Now, Power BI will distribute these queries across the replicas to answer them as quickly as possible without throttling. But there is also some smoothing that happens as your queries execute in Gen2. If you get 100 queries at the same time, then they can all execute together with no throttling and on different replicas. After they finish, their CPU cost may cause latency for the next set of queries – but if there’s enough of a gap between this burst of queries and the next burst of queries, then the impact will be minimal and you would just see far better experiences for the end users.

Additional performance boost can be realized by enabling Power BI core auto-scale. As your CPU usage grows beyond the provisioned cores, Power BI will provision more cores. And now you will benefit from a second replica, because a new VM is available with its own set of physical cores. By contrast, without QSO the extra cores wouldn’t really help performance of that dataset because the first replica is running on a VM with all provisioned cores.

For now, DSO targes refresh isolation (one write + one read replica) and this enables scenarios where you want to do a full refresh on a large dataset that wouldn’t otherwise fit in memory. But in future, Power BI promises DSO to scale out to more read replicas. However, I see no reason against using this feature for large datasets, such as organizational semantic models, even in its current state (free, remember?).

Finally, unlike AAS, replica synchronization is automatic if you schedule the dataset refresh in Power BI. If you go through the back door, such as to process a dataset via the XMLA endpoint, you’re on your own synchronizing the replicas.

The following table summarizes the DSO features for AAS and Power BI.

Azure Analysis ServicesPower BI
Core throttlingNoYes, with sustained loads
Primary scenarioBetter query performanceCurrently, refresh isolation
SynchronizationExplicit synchronization requiredAutomatic (Power BI dataset refresh) or explicit when XMLA endpoint is used
CostPay per replicaNo additional cost

Atlanta MS BI and Power BI Group Meeting on January 9th (Integrating Azure Synapse Analytics and Power BI)

The Atlanta MS BI and Power BI Group is resuming in-person meetings! Please join us for the next meeting on Monday, January 9th, at 6:30 PM ET.  Elayne Jones (Data Engineer at 3Cloud) will show you how to integrate Synapse with Power BI. For more details and sign up, visit our group page.

WE ARE RESUMING IN-PERSON MEETINGS AT THE MICROSOFT OFFICE IN ALPHARETTA. WE STRONGLY ENCOURAGE YOU TO ATTEND THE EVENT IN PERSON FOR BEST EXPERIENCE. PLEASE NOTE THAT GUESTS ENTERING MICROSOFT BUILDINGS IN THE U.S. MUST PROVIDE PROOF OF VACCINATION OR SELF-ATTEST WITH HEALTHCHECK (HTTPS://AKA.MS/HEALTHCHECK). ALTERNATIVELY, YOU CAN JOIN OUR MEETINGS ONLINE VIA MS TEAMS. WHEN POSSIBLE, WE WILL RECORD THE MEETINGS AND MAKE RECORDINGS AVAILABLE AT HTTPS://BIT.LY/ATLANTABIRECS. PLEASE RSVP ONLY IF COMING TO OUR IN-PERSON MEETING.

Presentation: Integrating Azure Synapse Analytics and Power BI

Date: January 9th

Time: 6:30 – 8:30 PM ET

Place: Onsite and online

 

ONSITE

Microsoft Office (Alpharetta)

8000 Avalon Boulevard Suite 900

Alpharetta, GA 30009

ONLINE

Click here to join the meeting

 

Overview: Combining the forces of Azure Synapse Analytics and Microsoft Power BI allows you to weave together the full lifecycle of data ingestion, transformation, and visualization. Synapse encompasses the traditional processes of data warehousing, cleansing, and visualizing all within Synapse Studio, fostering unity among teams and driving efficiency across organizations.

Speaker: Elayne Jones is a Data Engineer at 3Cloud. She specializes in data visualization and data modeling using Power BI. She has expertise developing Power Apps and creating Power Platform solutions that drive efficiency within organizations. Elayne is also experienced querying data using the DAX and SQL languages. Elayne has delivered numerous BI trainings and written blog posts on various BI and reporting topics.

PowerBILogo

Atlanta MS BI and Power BI Group Meeting on December 5th (Automate and Improve Planning, Budgeting and Forecasting)

The Atlanta MS BI and Power BI Group is resuming in-person meetings! Please join us for the next meeting on Monday, December 5th, at 6:30 PM ET.  Your humble correspondent will show you how to implement a custom solution for automating planning, budgeting, and forecasting based on a real-life project. For more details and sign up, visit our group page.

WE ARE RESUMING IN-PERSON MEETINGS STARTING DECEMBER 5, 2022, AT THE MICROSOFT OFFICE IN ALPHARETTA. WE STRONGLY ENCOURAGE YOU TO ATTEND THE EVENT IN PERSON FOR BEST EXPERIENCE. PLEASE NOTE THAT GUESTS ENTERING MICROSOFT BUILDINGS IN THE U.S. MUST PROVIDE PROOF OF VACCINATION OR SELF-ATTEST WITH HEALTHCHECK (HTTPS://AKA.MS/HEALTHCHECK). ALTERNATIVELY, YOU CAN JOIN OUR MEETINGS ONLINE VIA MS TEAMS. WHEN POSSIBLE, WE WILL RECORD THE MEETINGS AND MAKE RECORDINGS AVAILABLE AT HTTPS://BIT.LY/ATLANTABIRECS. PLEASE RSVP ONLY IF COMING TO OUR IN-PERSON MEETING.

Presentation: Automate and Improve Budgeting, Planning, and Forecasting

Date: December 5th

Time: 6:30 – 8:30 PM ET

Place: Onsite and online

 

ONSITE

Microsoft Office (Alpharetta)

8000 Avalon Boulevard Suite 900

Alpharetta, GA 30009

ONLINE

Click here to join the meeting

Overview: Business Performance Management (BPM) is a methodology to help the company predict its performance. An integral part of a BPM strategy is a process for Budgeting, Planning, and Forecasting which is typically performed by the Finance department. When it comes to Finance, nothing is simple, and budgeting is no exception.  The temptation is to buy expensive prepackaged software but even that route would require a lot of customization and compromises.

Join this session to learn how to implement your own home-grown solution using Microsoft Analysis Services and Excel. I’ll share lessons learned from a real-life project.

Speaker: Teo Lachev is a consultant, author, and mentor, with a focus on Microsoft BI. Through his Atlanta-based company Prologika he designs and implements innovative solutions that bring tremendous value to his clients. Teo has authored and co-authored several books, and he has been leading the Atlanta Microsoft Business Intelligence group since he founded it in 2010. Microsoft has recognized Teo’s contributions to the community by awarding him the prestigious Microsoft Most Valuable Professional (MVP) Data Platform status for 15 years. Microsoft has selected Teo as one of only 30 FastTrack Solution Architects for Power BI worldwide.

Sponsor: Prologika (a Microsoft Gold Partner in Data Analytics and Data Platform and Power BI Red Carpet Partner) helps organizations of all sizes to make sense of data. Your BI project will be your best investment, we guarantee it! prologika.com

Prototypes with Pizza: Power BI latest news

Download presentation

PowerBILogo

Atlanta MS BI and Power BI Group Meeting on November 7th (Data Science for Power BI Developers)

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, November 7th, at 6:30 PM ET.  Sandeep Pawar will join us again to show us the Power BI data science features and how you can create ML models. For more details and sign up, visit our group page.

Presentation:Data Science for Power BI Developers
Date:November 7th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:The use of Data Science tools and techniques has grown in the last few years and BI developers are collaborating closely with the Data Scientists. In this session, we will learn about some of the core concepts in data science, key terminologies and the overall process so that you as a BI developer can effectively collaborate with your Data Science team. You will learn about analogous features in Power BI, their pitfalls, the overall Machine Learning process and the questions you should be asking the next time you are consuming the ML model in Power BI.

 

This session won’t teach you how to create Machine Learning models, rather it will show you how you should consume the Machine Learning models as a BI developer to help your business users.

Speaker:Sandeep Pawar is a Sr. Power BI Architect at Hitachi Solutions America. He has extensive experience in creating data analytics and data science solutions. You can read more about him and his passion for data at www.pawarbi.com or follow him on twitter at @PawarBI.
Prototypes without PizzaPower BI Latest

PowerBILogo

Testing RLS with Power BI Shared Datasets

In a typical engagement, I create an organizational semantic model(s) and “report packs”, such as Sales Report Pack, Inventory Report Pack, etc. These report packs are typically implemented as Power BI reports connected to the semantic model as a shared dataset using the Power BI Datasets connector. Reports sanctioned by IT are published to a dedicated workspace, such as Corporate BI. Departmental reports are deployed to their respective workspace, such as Sales, to enforce content-level security. Usually, the semantic model has row-level security (RLS) roles defined to enforce restricted access to data depending on the identity of the interactive user.

Although not immediately obvious, here is how you can test RLS to ensure that connected reports produce expected results under someone else’s identity directly in Power BI Service:

  1. In powerbi.com, go to the workspace where the shared dataset is published.
  2. In the dataset page, click the Datasets tab, and then click the ellipsis button (…) next to the shared dataset. In the dataset Settings, click Security.
  3. Expand the … button next to the Restricted Access role, and then click “Test as role”. This will open whatever report is included in the shared dataset pbix file (I typically have some reports there for testing).
  4. You can expand the Restricted Access dropdown and enter the user email of the user whose RLS permissions you want to test or the role you want to apply. In the screenshot below, I’m testing under my identity applying the rules in the Restricted Access role. If I wanted to test what a specific user would see, I’d type in that user’s email.
  5. To test other reports for this user, expand the powerbi.com left-navigation pane and then click the desired report. Power BI will navigate you to that report, but it will honor RLS.

Resolving Tabular Conversion Errors

A scheduled SSIS job that executes a massive DAX query to an on-prem Tabular server (Power BI can also generate this error) one day decided to throw an error “Source: “Microsoft OLE DB Provider for Analysis Services.” Hresult: 0x80004005 Description: “MdxScript(Model) (2020, 98) Calculation error in measure ‘Account Snapshot'[Average utilisation % of all CR active current accounts last 3 months]: The result of a conversion or arithmetic operation is either too large or too small.” At least we know the offending measure, but which row is causing the error? The query requests some 300+ measures for 120 million customers, so I thought someone might find the troubleshooting technique useful. Let’s ignore what the measure does for now except mentioning that it performs a division of two other measures.

We can use the DAX ISERROR function to check if a measure throws an error. So, the first step is to wrap the measure with ISERROR and execute the query for all customers.

EVALUATE
FILTER (
 CALCULATETABLE (
 ADDCOLUMNS (
VALUES ( 'Customer'[Customer Registry ID] ),
"x", ISERROR ( [Average utilisation % of all CR active current accounts last 3 months] )
 ),
 FILTER ( VALUES ( 'Date'[Date] ), 'Date'[Date] = DATEVALUE ( "2022-10-14" ) )
 ),
 [x]
)

The query uses the ADDCOLUMNS function to add the measure so that it’s evaluated for each customer as of the date when the job ran. Make sure to add dependent measures that references the measure in the error description. ISERROR will return TRUE if the measure throws an error. The final step is to wrap CALCULATEDTABLE with FILTER that returns only the offending customer(s) where the measure is TRUE.

Using this technique, I’ve managed to narrow the error to data quality issues concerning large negative credit limits for some customers. BTW, the Tabular engine raises this error when converting a double precision floating point number to an integer, but the former is out of range of valid integer numbers.

Embedding Power BI Reports in Internal Portals

This question pops up over and over. What’s the easiest way to embed a Power BI or paginated report hosted in Power BI Service in an internal portal, such as a custom site developed by your team or on-prem SharePoint Server, so internal users can see all reports in one place? Use the “Embed report” feature that will give you a link or iframe code that you can readily add to the portal without any coding (notice that there is a SharePoint Online option because SharePoint Online has a special webpart for this purpose).

Will you get a single sign-on (SSO) so that the interactive user credentials automatically flow to Power BI? Nope. To the best of my knowledge, Power BI doesn’t support single sign-on even if your organization has extended its active directory to Azure, but let me know if you have found a workaround. The first time the user opens a Power BI report in the browser session, the user will get prompted to enter credentials. Then the user won’t be asked to authenticate when viewing subsequent reports in the browser session because the authentication token will be cashed in the browser session. The browser can also mitigate the issue by saving the user credentials until the password changes.

What if you want to avoid the prompt under any circumstance? Write code and use Power BI Embedded. Down the custom code rabbit hole we go… The upside is that you’ll get more control over the embedded reports thanks to the Power BI JavaScript library, such as if you want to replace the Filter Pane with your filtering mechanism.

Atlanta MS BI and Power BI Group Meeting on October 3rd (Auto-provision embedded report delivery for your customers)

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, October 3rd, at 6:30 PM ET.  Tom Huguelet (Lucient) will present a real-world case study and the challenges involved in bringing a Power BI application from initial Excel-based POC to an ISV platform that auto-provisions for new clients, using Power BI Embedded Azure Premium Capacity.  For more details and sign up, visit our group page.

Presentation:Auto-provision embedded report delivery for your customers
Date:October 3rd
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:In this session we’ll look at a real-world case study and the challenges involved in bringing a Power BI application from initial Excel-based POC to an ISV platform that auto-provisions for new clients, using Power BI Embedded Azure Premium Capacity.

We will cover:

•            Initial Business Case and Project Team

•            Testing Embedded in a Sandbox Environment (Authentication)

•            Enabling User-Customized Experiences in Embedded

•            Using REST APIs for automatic Provisioning

•            Lessons learned and challenges encountered

Speaker:Tom Huguelet has been engaged in using Business Intelligence, and Dashboards & Analytics to enable Corporate Performance Management since 1999. While coming from a technological background, Tom has always focused on the people and process aspects of how Business Intelligence can create positive changes within an organization. In his role as a Practice Director, he leads and mentors technical teams as well as advises business stakeholders on how to build the cross-disciplinary teams necessary to be successful with Business Intelligence. Tom holds several Microsoft certifications including Microsoft Certified Trainer, is a frequent speaker at industry events, has co-authored 3 books on SQL Server and Data Warehousing, and helped start a BI Roundtable group in Chicago.
Prototypes without PizzaPower BI Latest

PowerBILogo

Power BI Visual Help Tooltips

Want to display a visual-left hint to perplexed users that explains what your visual is supposed to reveal? Like me, you’ve probably missed the handy Power BI help tooltips feature that allows you to pop up some helpful text for each visual. On the Format tab of the visualization pane, expand the Help Tooltip section, and enter the text. Then, a question mark glyph shows up in the visual header when the user hovers. Besides showing static text, you could alternatively redirect the user to a report page that could provide more context.

Unfortunately, the tooltip tip can’t be expression-based due to a long-standing and overdue Power BI limitation that only a small set of properties that can be bound to measures.

Prologika Newsletter Fall 2022

BI implementation shortcuts are tempting and disguise themselves as cost-effective. True, you can slap a Power BI dataset on top of your data (the cornerstone of self-service BI), but most BI implementations will greatly benefit from a datamart, irrespective of the propaganda of self-service BI vendors. I explain the benefits of datamarts vs Power BI datasets directly on data sources in my blog “Datasets vs. Datamarts”. This newsletter discusses a newly released Power BI feature that attempts to simplify the implementation of datamarts by business users.

 

Understanding Datamarts

Let’s define a datamart as a centralized repository for hosting clean and trusted data that is typically organized in a star schema, i.e. as a set of fact tables and dimension (lookup) tables. BI pros would typically host a datamart in a relational database. Nowadays, a cloud-based datamart, such as a datamart hosted in Azure SQL Database, Azure SQL Managed Instance, or Azure Synapse, is the norm. Once the datamart schema is designed, a BI pro would use a professional tool, such as Azure Data Factory (ADF) to load the datamart periodically, such as once every night.

As Microsoft announced here, Power BI datamarts are mean to democratize datamart implementations by business users. In my opinion, this is yet another premium feature, spearheaded with great vision and effort, but questionable practical value. In a nutshell, a Power BI datamart is a combo of Power BI Premium and a Microsoft-hosted Azure SQL Database aiming to simplify the implementation of a departmental datamart.

The Good

Unlike other vendors, such as Domo and their proprietary and overly expensive stack, Microsoft has decided to go with somewhat open solution consisting of tools that Power BI users already know: Power Query, Power BI Desktop (for the first time some of its modeling features, such as relationships and DAX measures, made it to the cloud), and SQL Server. Microsoft provisions the database for you although surrounds it with some red tape (more on this in a moment). Thus, a business users aiming for “no code, low code” experience will whip out Power Query dataflows that populate the database and then build a model (dataset) directly in Power BI Service. Obviously, the main goal is to simplify the experience as much as possible where all the action happens online.

It’s nice that Microsoft chose hosting the data in a SQL database instead of a “lakehouse”. Apparently, they learned some painful lessons from Power Query CDM folders. The database size is up to 100 GB which is not bad at all. I also like that some Power BI modeling features finally made it to Power BI Service, but unfortunately only in the confines of the datamart feature.

The screenshot below shows how a business user can use Power Query dataflow to transform and save the data into a Power BI datamart.

The Bad

From the announcement, “Best of all, IT doesn’t have to worry about getting all data into centrally governed data sources, thus providing discipline at the core and flexibility at the edge.” I failed to see how this will provide “discipline at the core” – a tenant that Microsoft learned from their own pain points after tilting too much toward self-service BI. I’ve also seen statements online that business users don’t have to “consult with IT anymore” when implementing datamarts. Really? What happened to managed self-service BI? I’m sure IT will be thrilled having corporate data in Microsoft-owned databases that they can’t manage and queries running amuck and consuming precious premium resources. Luckily, the admin portal has a switch to control who can create these datamarts. I hope at least we have a BYO (bring your own) database feature at some point.

The elastic Azure SQL database that Microsoft provisions is read-only, meaning that you can’t extend it. I’m a big fan of pushing calculations as much upstream as possible to SQL Server, such as by implementing SQL views, but we can’t do that. Instead, we would use Power Query (what else of course) for all data transforms. But I have serious reservations against Power Query dataflows – a tool that is known to cause performance issues without providing any troubleshooting and maintenance insights.

The Ugly

Do we really need this feature? I would argue that what was really needed was extending Power Query with “destinations” where the user can specify where the data would land. If that was implemented, IT could selectively let business users augment the infrastructure set up by IT with self-service ETL (more than likely temporary) that sinks the data into an IT-sanctioned database.

Further, it would have gotten us out of another proprietary mess that forces dataflows to save their output into CDM folders that make sense only to Microsoft (see my “Power BI Dataflows vs ADF Mapping Data Flows” blog for the gory details). Want to save dataflow data somewhere else? For now, you must use Power BI datamarts because this is the only way you can have your data in a (Microsoft-provided) relational database and nowhere else.

Conclusion

Recently, an enterprise client decided to migrate all self-service Alteryx flows to IT-governed ADF pipelines. More than likely, Power BI datamarts will be heading in that direction. Be very careful about any pure self-service features, as you might find yourself in a bigger mess that you tried to solve.


Teo Lachev
Prologika, LLC | Making Sense of Data
logo