Posts

Atlanta MS BI and Power BI Group Meeting on October 5th

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, October 5th, at 6:30 PM. Paco Gonzalez (CEO of SolidQ) will present and demonstrate the Power BI AI and ML capabilities. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:AI, ML and Power BI
Date:October 5th, 2020
Time6:30 – 8:30 PM ET
Place:Join Microsoft Teams Meeting

Learn more about Teams | Meeting options

Overview:Perhaps some of the most powerful features of Power BI involve the Artificial Intelligence and Machine Learning functionality built into the platform. In this session, we will review the AI and ML capabilities built into Power BI Desktop and the Power BI Service. From Data Visualizations to Data Preparation and Data Modeling, we will discuss the various ways that AI and ML can be easily implemented in your reporting.
Speaker:Paco Gonzalez is the CEO of SolidQ North America, and a Microsoft Data Platform MVP. Focused on Business Analytics and Artificial Intelligence, he specializes in helping organizations become data driven from a strategic and technical perspective. Paco is a speaker at small and large conferences such as PASS Summit, Ignite, and Business Applications Summit, and he has published several books and whitepapers. He is based in Atlanta, GA.
Prototypes without pizza:Power BI latest features

PowerBILogo

Organizational Semantic Model

I am delivering a data governance assessment for an enterprise client. As a part of the effort to migrate reporting from MicroStrategy to Power BI, the client wants to improve data analytics. The gap analysis interviews with the business leaders revealed common pitfalls: no single version of truth, data is hard to come by, business users don’t know what data sources exist, business users spend more time in data wrangling than analytics, data quality is bad, IT is overwhelmed with report requests, report proliferation and duplication, and so on…

Sounds familiar? As I mentioned many times in my blog, an enterprise data warehouse (EDW) plays a critical role in overcoming the above challenges, but it’s not enough. A semantic model is needed and I extolled its virtues in my “Why Semantic Layer?” newsletter. In the Microsoft BI world, Analysis Services Tabular is commonly used to implement such models that are typically layered on top of EDW . In general, there are two ways to approach the model implementation:

  1. (Self-service BI path) Business users create self-service semantic models using Power BI Desktop. Behind the scenes, Power BI creates databases hosted in the Analysis Services Tabular server from the *.pbix files.
  2. (Organizational BI path) BI developers implement organizational semantic models.

Since both implementation paths lead to the same technology, it boils down to ownership, vision, and purpose.

Because IT is overwhelmed, the temptation is to transfer the semantic model development to business users. The issue with this approach is business users seldom have the skills, time, and vision to do so. And the end, mini “semantic models” (“spreadmarts”) are produced and the same problems are perpetuated.

In most cases, my recommendation is for IT to own the semantic model because they own the data warehouse and the “Discipline at the Core” vision. And yes, unless operational and security requirements dictate otherwise, it should strive for a single centralized semantic model that spans all subject areas. If the technology you use for semantic modeling can’t deliver acceptable performance with large models, then it’s time to change it.

Of course, not all data exists or will exist in EDW. This is where self-service “Flexibility at the Edge” comes in. I have high expectations for the forthcoming “Composite models over Power BI datasets and Azure Analysis Services” mega feature (public preview expected in November 2020). This will enable the following scenario that Power BI cannot deliver today:

  1. Business user starts by connecting live to corporate data in the organizational semantic model. Every new report requirement should start with evaluating if all or some of the data is in the semantic model, and if so, instructing the user to connect to the semantic model to avoid data modeling and data duplication.
  2. Business user wants to mash up this data with some data that is not in EDW by retaining the live connection to the organizational semantic model and importing (or connective live with DirectQuery) to other datasets.

This will give us the best of both worlds and help us achieve “Discipline at the Core, Flexibility at the Edge“.

Presenting Analytics in a Day Workshop on August 20th

In partnership with Microsoft, I’m delivering a complimentary, one-day, Analytics in a Day virtual workshop on August 20th, 9 AM – 5 PM Eastern Time. Targeting BI developers, architects and technology decision makers interested in achieving a single version of truth with organizational BI, this workshop is designed to guide and accelerate your journey towards a modern data warehouse to power your business with Azure Synapse, Azure Data Factory, Azure Data Lake, and Power BI.

The first half of the day from 9 am – 1 pm will help you better understand how to:

  • Create an analytics solution that goes from data ingestion to insights using Azure Synapse Analytics and Power BI
  • Empower self-service analytics
  • Enable a truly data-driven culture in your business

Part of the workshop will be dedicated to hands-on training to help you get started on your cloud analytics journey.

The second half of the day from 1 pm – 5 pm is optional and will focus on best practices around designing Power BI + Azure Synapse Analytics BI solutions to enable “Discipline at the core and flexibility at the edge“.

Say goodbye to data silos. Analytics in a Day is designed to simplify and accelerate your journey towards a modern data warehouse to power your business. Reserve your seat today at http://bit.ly/aid202008.

Atlanta MS BI and Power BI Group Meeting on August 3rd

Our group celebrates its 10 anniversary! Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, August 3rd, at 6:30 PM. Chris Hamill from the Power BI CAT team will share techniques on creating performant reports without sacrificing design. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:Power BI Report Design Techniques for Performance
Date:August 3rd, 2020
Time6:30 – 8:30 PM ET
Place:Join Microsoft Teams Meeting

Learn more about Teams | Meeting options

Overview:As you have likely observed, a performant model with optimized DAX can suffer greatly if the front-end design is too heavy.  Front end report developers are often challenged with balancing performance and the richness of user requirements. Chris Hamill from Power BI CAT team will share techniques on creating performant reports without sacrificing design.
Speaker:Chris is a Sr. Program Manager for Microsoft’s Power BI CAT team with a primary focus on front end reporting and user experience.  In his day job he spends time working with enterprise customers to help skill up report authors and guide design for high impact reporting.  Check out his blog at www.alluringbi.com for tips and tricks for the Power BI Creator.
Prototypes without pizza:Power BI latest features

PowerBILogo

Why You Need a Trusted Advisor

I’ve providing advisory services to a Fortune 500 organization for a few months now. As all large organizations, they adopted Power BI Premium. However, they have provisioned only one Power BI Premium P1 node which has been showing signs for overutilization. In the process, I discovered they have purchased 40 Power BI Premium cores with 32 cores left unutilized! In other words, they used 1/5 of what they’ve been paying Microsoft as Power BI Premium fees. How did they arrive at this unfortunate situation?

A year or so ago, they used the Power BI Premium Calculator to estimate the licensing cost on their own. They plugged in 10,000 users and got a recommendation for 5 P1 nodes (or 40 cores). And that’s what they bought, assuming that they will get a cluster of five P1 nodes that would load balance the reports across nodes. When they set up Power BI Premium, they set up only one P1 capacity and all the important reports got in there. And these reports have been running just fine for a long time with thousands of users … The other 32 cores – sitting ducks.

Power BI Premium doesn’t currently load balance across nodes. Once you’ve licensed a certain number of cores, it does give you the flexibility to provision nodes of different sizes. So, with 40 licensed cores, you can set up 5 P1 nodes, or 2 P2 nodes and 1 P1 node, or one P3 node and 1 P1 node. Unfortunately, the flexibility ends there. Once you provision the capacity, the meter is running use it or not. And if you’re not using all these licensed cores, you still pay unless you manually scale down the capacity. I hope to see Power BI Premium Serverless sometimes in the future, where the service scales up and down on demand and you pay for what you really consume. This will void the decision on how much capacity and nodes you need, probably save you lots of money, and make Power BI Premium more affordable.

The moral of this story? Hire a trusted advisor and technology expert, and save a lot!

Designing Responsive Power BI Reports (Part 2)

In my previous post about responsive reports, I said that changes to the report design can dramatically reduce the report load time. But you will undoubtedly reach a point where you cannot optimize the report any further. After all, having a page with a single visual might be fast, but I doubt it would deliver much business value. What else can be done to speed up the reports and ideally achieve a page load time of a few seconds? Our next stop for that project is to revisit the solution architecture. The client initially opted for a hybrid architecture where the reports use live connection to on-prem SSAS Tabular models. However, a significant time was spent in Power BI showing “loading data” before rendering the report. What took so long?

In this case, Power BI was hosted in the Azure West region, the Tabular server in the company data center in Atlanta, and most users were in the East region. This resulted in round trips crossing the country for each visual.

  1. A user in the East region requests a report which is hosted in the West region.
  2. Each visual generates a query which is sent to Tabular in the company data center.
  3. SSAS returns data back to the West region.
  4. Report payload is sent to the user in the East region.

Ideally, users, reports, and data should be in the same region. However, a long-standing limitation of Power BI is that once the first user signs up for Power BI, Power BI determines the data region based the user location and you can’t move it unless you open a support ticket with Microsoft. Power BI Premium will let you create premium capacities in a different region, but by default these capacities will be created in the original Power BI region and you can’t change the region once the capacity is created either. The chart below shows the round-trip latency doubles if the user is in the East region, but Power BI is in the West.

We decided to move the semantic model to Power BI so that Power BI owns the data. Besides potentially improving the report load time, this architecture has also other important advantages (to learn more, read my “Power BI Large Datasets: The Good, the Bad, and the Ugly” post). If you’re not on Power BI Premium, that “movement” might not easy if you have opted to use Visual Studio or Tabular Editor for development. That’s because Power BI Pro doesn’t expose the XMLA endpoint, so your only option is to migrate the model to Power BI Desktop. But migrating an SSAS Tabular project to Power BI Desktop is not officially supported and there is no automatic migration path.

Thanks to the enhanced dataset metadata (a better name would have been “metadata done right this time”), you could opt for cracking the Power BI Desktop file and replacing the model schema (my “Power BI Source Control” blog has the steps to get to the schema). You might get lucky especially if your Tabular project uses structured data source (the only data source type supported by Power BI Desktop). However, if it uses legacy data sources, you must change every table to use a structured data source and making changes to the JSON schema by hand and Power BI Desktop refusing the load the model with every typo is no fun.

On the other hand, Power BI Premium lets you deploy the SSAS model to the workspace XMLA endpoint. The prerequisites are:

  1. Upgrade the Tabular model in Visual Studio to 1500 compatibility mode. This adds special annotations to the data source. If you don’t upgrade to 1500, you’ll find that you won’t be able to schedule the dataset for refresh as Power BI Service won’t show any connections.
  2. Enable the XMLA endpoint for read-write in the capacity settings.
  3. Enable the “Export data” tenant setting in Power BI (to learn more about this setting, read my “A False Sense of Security” blog).
  4. Like the “Export data” battle wasn’t enough, we found that we have to beg the Security group to enable also the “Analyze in Excel with on-premise datasets” tenant setting. I have no idea what this setting has to do with the XMLA endpoint. Luckily, Fidler showed us the exact error message that this setting must be enabled.
  5. In Visual Studio, change the model processing option to “Do not process”. The “Troubleshoot XMLA endpoint connectivity” article has the details.

Moving the semantic model paid big dividends. We could only see “loading data” for a moment. We also found that the network speed of the user connection and the browser play a significant role. For best performance, users should use Chrome or new Edge and have a broadband connection. We were able to reduce further the load time of the first page by enabling the published dataset for query caching which is only available in Power BI Premium.

Atlanta MS BI and Power BI Group Meeting on July 6th

MS BI fans, please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, July 6th, at 6:30 PM. David Eldersveld, a Microsoft MVP, will demonstrate cool visualization techniques using DAX. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:DAX Visualization Techniques for the Power BI Table/Matrix
Date:July 6th, 2020
Time6:30 – 8:30 PM ET
Place:Join Microsoft Teams Meeting

Learn more about Teams | Meeting options

Computer audio is recommended

Overview:Did you know that DAX is more than a modeling language and can also be used for data visualization? From sparklines to bullet charts, Unicode text to SVG, icons to maps; learn different techniques to build visuals within visuals using DAX. In this session, David Eldersveld guides you through techniques that go beyond basic conditional formatting to enhance your table and matrix visuals.
Speaker:David is a Solution Architect and Microsoft MVP who has employed skills in technology development, data integration, data analysis, and Microsoft BI for over ten years. David enjoys building BI and analytics solutions with technologies in Microsoft Azure and the Power Platform. In addition to his work with the Microsoft Data Platform at BlueGranite, he also blog at dataveld.com.
Prototypes without pizza:Power BI latest features

PowerBILogo

A False Sense of Data Security

UPDATE 11/26/2020: After the latest update to tenant settings, I’m glad that the new “Export to …” settings don’t affect external connections anymore. There is now a new setting “Allow live connections” to control external connectivity. Read “Updated Export Settings” to learn more.

The Power BI “Export data” setting is horrible. I wish Microsoft never came up with it and no other Microsoft tool has it. Apparently, the premise here is to prevent users from exporting data behind the report so it does not end up in wrong hands. But all it accomplishes is a false sense of data security and does more damage than good. Like users can’t take screenshots and share them wherever they please. Or print and mail reports. Unless of course, you disable printing reports too, but where is that path of not trusting your users going? Why don’t you just disallow them from viewing reports at all? “No data, no problem”, right?

Besides “exporting” data, this setting effectively disables all external connections to published datasets. Therefore, the following Power BI features won’t work:

  1. Creating reports from published datasets, which is a best practice.
  2. Publishing semantic models from Visual Studio via the Power BI Premium XMLA endpoint. Even if you enable the XMLA endpoint for read/write, “Export data” will overwrite it because it’s highest level overwrite
  3. Connecting other tools to published datasets, such as SSMS, Tabular Editor, DAX Studio, Profiler, and third-party reporting tools.
  4. Using the “Analyze in Excel” feature for creating Excel reports connected to published datasets.

So, leave “Export data” enabled. If you really want to prevent sensitive data outside your organization, consider applying sensitivity labels which integrate with Office 365 information protection.

A Gateway to Hell

“Life is suffering and suffering is unavoidable”. Buddha must have dealt with data gateways. Two “gotchas” that surfaced recently after countless hours of troubleshooting:

  1. A bug in the data gateway AD lookup cache – In my “Gateway AD Lookups” post, I described a very useful gateway feature to look up the on-prem user identity from a cloud identity so that you don’t have to create explicit mappings. The gateway maintains an internal dictionary cache so that it doesn’t have to look up Windows Active Directory for every query. This cache can’t be disabled. The issue is that If you have an older gateway build, e.g. December 2019, there is a bug (and a security vulnerability) with concurrent access to cache which may cause the gateway to map the interactive user to a wrong AD user under heavy load. The issue was fixed in a later build, but the fix introduced another issue by imposing a very coarse lock (probably the entire cache) which may reduce concurrency with many users running reports. In other words, under a heavy load user requests queue up to wait for the cache lock to be released. This issue will be fixed in the next July 2020 gateway build.
  2. Beware stale data sources registered in the gateway – Now that we’ve addressed the concurrency bug, the same client reported a new issue where every hour or so users can’t render reports connected to the gateway for a couple of minutes and then the issue resolves itself. The culprit turned out to be neglected Analysis Services data sources registered in the gateway. By default, Power BI dashboards will query these connections every hour to cache tile data (the frequency can be changed on the dataset settings page, but it can’t be disabled). The problem was that the master account used to connect to Analysis Services changed its password. While the team updated the connections in the actively used data sources, it did not do so for the neglected data sources (no one is using them, right?). However, the dashboards still submit queries to update the tile cache and the gateway asks AD to authenticate the master account.  Then, the domain controller locks the master account for a few minutes after a certain number of attempts since it rejects the logon attempt with the old password. The solution was to remove unused data sources from the gateway.

Designing Responsive Power BI Reports

I’m currently providing advisory services to an enterprise client for architecting and implementing an executive dashboard. As a typical dashboard, the UX design included various KPIs that look like these:

The Power BI report implementation followed the design and the above visualization was implemented as four cards and two textboxes. Including other UX elements, such as icons, labels, etc., the most important summary page of the report ended up having more than 100 visuals. It took 25 seconds to render the page on average, which is horrible performance. Performance Analyzer showed that DAX queries are very fast and most of the time was spent in the “Other” category. This means that because JavaScript is single-threaded, visual rendering is sequential. Indeed, the SQL Server Profiler revealed that out of 25 seconds, the first 15 seconds were spent elsewhere before Analysis Services Tabular starts receiving DAX queries. The Performance Analyzer document provides the essential coverage about how Power BI renders visuals, but the bottom line is this:

The more visuals the page has, the slower it will be. If you find that most of the time is spent in the “Other” category on page refresh, more than likely this is caused by Power BI serializing the visual rendering. If this is the case, the best course of action would be to reduce the number of visuals.

For example, the above visualization can be rendered with one visual only (Matrix) and some blackbelt visualization techniques. Matrix supports rendering measures on rows (in the Values section of the Format tab, turn on “Show on rows”). Spacing the rows can be achieved by increasing row padding. And icons can be rendered with conditional formatting. Another technique for reducing visuals and faster rendering is eliminating many labels with a page background image. What didn’t help was disabling visual interactions although it doesn’t hurt disabling it if you don’t need it. We’ve also found that browsers differ in how fast their render visuals. Internet Explorer was the slowest while Chrome and the new Edge were the fastest.

How fast did their report get after applying such optimizations? Between 50 and 60% faster.

True, Power BI has report design limitations. Developers with SSRS background will miss more advanced features, such as nesting visuals (e.g. bullet graph inside a matrix) and asymmetrical crosstab layouts. Still, techniques as the one I shared above will help you create more responsive reports.