Atlanta MS BI and Power BI Group Meeting on September 10th

MS BI fans, join us for the next Atlanta MS BI and Power BI Group meeting on September 10, Tuesday, at 6:30 PM at the Microsoft office in Alpharetta. This is a meeting not to miss. A DAX founder, Jeffrey Wang (Principal Software Engineer Manager at the Power BI product group) is coming all the way from Seattle to share DAX best practices with our group! Captech Consulting will sponsor the event. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:Common DAX Patterns
Date:September 10, 2019
Time6:30 – 8:30 PM ET
Place:Microsoft Office (Alpharetta)

8000 Avalon Boulevard Suite 900

Alpharetta, GA 30009

Overview:We will learn common DAX patterns by examining several issues frequently reported by DAX users through PowerBI customer support tickets. Users will be able to:

1.      Learn from mistakes made by other DAX users so you can avoid them in your own work.

2.      Learn the common techniques to debug DAX problems.

3.      Learn the best DAX patterns to solve common problems.

4.      Gain deeper understanding of DAX by learning relevant history and underlying design of the programming language.

5.      Ask me their own DAX questions

Speaker:Currently working as Principal Software Engineer Manager at Microsoft, Jeffrey Wang stumbled upon BI after the Y2K bubble burst, fell in love with the field, and stayed in the industry ever since. After I joined Microsoft Analysis Services engine team in 2004, I quickly discovered that working on the programming language is the best way to stay close to the end users, so I joined the MDX formula engine team. After shipping a couple of releases of SQL Server Analysis Services, I joined the committee that created DAX. Today I lead the development effort of the DAX engine and the modeling engine inside the Power BI product group.

DCI_PowerBI

Implementing Asymmetric Crosstab in Power BI

A recent requirement called for an asymmetric crosstab report in Power BI. Transitioned to Adventure Works, the final report looks like this.

The Internet column shows the sales amount from FactInternetSales. Then, the matrix pivots on the BusinessType column in the FactResellerSales. Because, Internet sales don’t relate to BusinessType, it doesn’t make sense to pivot it. Instead, we want to show Internet sales in a single static column before the crosstab portion starts.

Implementing such a report in SSRS is easy thanks to its support of adjacent groups and static columns but not so much in Power BI. The issue is that Matrix would happily pivot both measures and the InternetSalesAmount would be repeated for each business type.

Fortunately, with some blackbelt modeling and DAX, we can achieve the desired effect. I attached the pbix file and here are the high-level implementation steps:

  1. Add a ReportCategory table (you can use the Enter Data feature), with a single column Category and two values: Internet and Reseller. Add a Category calculated column to FactInternetSales with a value of “Internet”. Add a Category calculated column to FactResellerSales with a value of “Reseller”. Create relationships FactInternetSales[Category]->ReportCategory and FactResellerSales[Category]->ReportCategory. These relationships are required because Matrix doesn’t support unrelated tables. Your model schema should look like this:

  2. Create a SalesAmount measure with the following formula:
     SalesAmount = IF(SELECTEDVALUE('ReportCategory'[Category]) = "Internet" && NOT ISINSCOPE('FactResellerSales'[BusinessType]),
     SUM(FactInternetSales[InternetSalesAmount]),
     SUM(FactResellerSales[ResellerSalesAmount]))

    The IF condition checks if the measure is under the Internet category. NOT ISINSCOPE(‘FactResellerSales'[BusinessType]) will return TRUE only for the total column in the Internet section. Otherwise, it will return the reseller sales. Because there are no reseller sales in the Internet category, only Internet sales will be shown. Note that SELECTEDVALUE will return FALSE for empty cells which is why I had to create the ReportCategory table.

  3. Add Matrix viz and bind Product[Color] to rows, ReportCategory[Category] and ResellerSales[BusinessType] on columns, and SalesAmount in Values. Expand the columns to the next level (to show the BusinessType level).
  4. In the Matrix format properties, go Subtotals, enable the “Per column” setting, and turn off the Category subtotal to avoid showing the reseller sales total twice.

Although Power BI has made great progress on the visualization side of things, it still lacks in flexibility. The two features that miss the most from SSRS are nesting visuals (to create repeat sections) and the flexible Matrix layout. However, with some black belt modeling and DAX, workarounds are possible.

High Memory Usage and Calculated Columns

Scenario: A client reports a memory spike during processing. They have a Tabular semantic model deployed to Azure Analysis Services. They fully process the model daily. The model normally takes less than 50 GB RAM but during processing, it spikes five times and Azure Analysis Services terminates the processing task complaining that it “reached the maximum allowable memory in our pricing tier”. Normally, fully processing the model should take about twice the memory but five times?

high_memory_usageSolution: Upon expecting the model design, I discovered that the client has decided to add (many) calculated columns to the two fact tables in the model. Most of these columns are used to calculated variances to prior year. The formulas contain DATESYTD and other DAX date-related functions. After data is read, Tabular processes calculated columns, relationships and hierarchies. In this case, the spike was due to calculations involving large time ranges and ineffective DAX expressions. Converting these columns to measures resolved the issue.

As a best practice, abstain from using calculated columns (especially in fact tables). Make sure you understand the difference between measures and calculated columns (I cover this extensively in my latest book “Applied DAX with Power BI“). If you do need expression-based columns, such to materialize expensive calculations, consider defining them upstream, such as in SQL views or Power Query.

Gateway AD Lookups

A data gateway is required for a cloud service, such as Power BI, to access on-prem data, either for live connections or for refreshing imported data. The gateway picked up an interesting feature to look up Active Directory on one property and return the value of another, which proved extremely useful for one client of mine.

Scenario: An enterprise client has adopted a hybrid architecture I set up for them, where Power BI reports connect directly to an on-prem Analysis Services Tabular model. The issue is that besides the regular network login, every user has a cloud identity (the employee number) to access cloud data sources. Before the gateway enhancement, this required explicit per-user mappings in the gateway data source. Imagine you must do this for thousands of users. and you’ll understand the maintenance issue they faced rolling Power BI across the enterprise.

Solution: Instead of explicit mappings, all they need to do now is configure the following two settings in the Microsoft.PowerBI.DataMovement.Pipeline.GatewayCore.dll.config file:

<setting name=”ADUserNameReplacementProperty” serializeAs=”String”>

<value>UserPrincipalName</value>

</setting>

<setting name=”ADUserNameLookupProperty” serializeAs=”String”>

<value>employeeNumber</value>

</setting>

When a Power BI user opens a report connected live to the on-prem Analysis Services Tabular model, the gateway receives the request mapped to the email that the user logs in Power BI. Then, the gateway queries the Active Directory to look up the user principal name (login) based on the employeeNumber which is what the Power BI login is. This query should complete fairly quickly so no worries about report queries. Make sure you restart the gateway after the configuration change.

AD lookups could save you configuring per-user mappings for live connections to on-prem data sources. Adam Saxton has a Power BI User lookup with the gateway using Active Directory video on this topic that goes into more detail.

Embedding Power BI Reports in JavaScript

A recent engagement brought an interesting dilemma. An ISV wanted to embed Power BI reports in a SharePoint Online portal accessible by their customers. The app handles user authentication and authorization by following the “App owns data” pattern. However, apparently inspired by the Stalin’s “No man, no problem” motto, SharePoint has decided to throw away custom server-side code, so we couldn’t add an ASP.NET page to obtain the embed token. This left us with two choices:

  • A pure JavaScript implementation – The main advantage of this option is that we could simply write a SharePoint client web part that handles the entire Power BI Embedded integration in JavaScript only. The major downside is that any JavaScript code is not secure, and this approach has a security vulnerability because you must store passwords in client-side code.
  • Implement a separate app – You can implement a separate service, such as an Azure Function app to handle tokens. Eventually, we ended up going this path because it’s more secure.

I attach the Node.js code if you need a pure JavaScript solution. The code uses a service principal authentication (follow the steps in the “Service principal with Power BI” document to configure it).

Important: If you decide to follow this path, please do your best to obfuscate the client secret because if the hacker gets access to the client-side code, the hacker can obtain that secret.

One could argue that if a hacker gets access to the user’s computer, you have a much bigger issue than going through JavaScript files to get the secret, but there is a security vulnerability, nonetheless.

MVP Again!

Got awarded for the elite Microsoft Most Valuable Professional (MVP) – Data Platform award again. I’ve been recognized by Microsoft for my expertise and contributions to the community for every year since 2004! Learn more about the MVP program at https://mvp.microsoft.com/en-us/Overview.

MVP_Logo_Horizontal_Secondary_Blue288_CMYK_72ppi

The Power BI Viewer Role

As I discussed in “Power BI Sharing is Getting Better” and “Power BI Sharing is Getting Better 2“, the Power BI sharing saga has been improving and Microsoft is closing gaps. As recently announced, Microsoft added an important Viewer role that grants view permissions to the workspace content.

The most important part in the announcement is:

“Viewer role requires a Pro license or for the content to be in Power BI Premium. If your workspace is in Premium, users with Viewer role who don’t have a Pro license can view the workspace in the workspaces list, navigate to the workspace, and view the content without getting a Pro Trial prompt.”

Therefore, the Viewer role lifts a significant limitation that forced you to use report/dashboard sharing or apps to share content out of a premium workspace with viewers. So, if you want to share the entire content of the workspace, now you don’t have to use apps. You can simply add users or groups as viewers to the workspace.

Speaking of apps and selective sharing, my joy will be complete once workspaces supports nesting (folders) with the ability to overwrite the securing permissions at a folder level so we can have SSRS-like folder-based security. This will allow you to organize content anyway you want (instead of resorting to a flatten list of workspaces) and grant selective permissions to different groups of users. Now I have a much better Power BI sharing story to tell!

Why Is SQL Server Not Using My Index?

Scenario: A partitioned Tabular table generates SELECT statements that queries a massive range from the base table on a date column. The processing job is taking long time. The SELECT query plan shows Clustered Index Scan but there is an index on the date column.

Why is SQL Server not using the index? Surely, it will speed things up, right? This is where the blaming game starts…

Analysis: SQL Server might ignore the index if the range is too wide. For example, these two queries will probably hit the index on the LastUpdated column in a 300 million rows table because the range is very narrow.

select from FactAccountSnapshot with (index(IX_FactAccountSnapshot_LastUpdated)) where LastUpdated ‘1/1/2019’

select from FactAccountSnapshot with (index(IX_FactAccountSnapshot_LastUpdated)) where LastUpdated between ‘1/1/2019’ and ‘1/2/2019’

This query will probably not:

select from FactAccountSnapshot where LastUpdated between ‘1/1/2019’ and ‘1/31/2019’

Why? Because the server might determine that is much faster to scan the entire table instead of doing many lookups against the index. To test this out, you can force the query to use the index:

select from FactAccountSnapshot with (index(IX_FactAccountSnapshot_LastUpdated)) where LastUpdated between ‘1/1/2019’ and ‘1/31/2019’

Only the find out that it takes x5 more to fetch the data (in this example, 1:37 minutes versus 24 seconds with the table scan).

Conclusion: SQL Server is smarter that you might think. Indexes may benefit queries with narrow ranges. The wider the range, the less likely it will result in an index hit. And this is a good thing. BTW, in this particular case, the performance issue was not related to SQL Server at all.

Tip: Assuming good hardware and proper conditions (good query throughput (no massive joins in views), good network throughput, narrow fact table, etc.), Analysis Services should be able to process 100,000 rows/sec on a single thread. So, a 300 million rows table should process in less than an hour. Even better, if you partition it, each partition will process in parallel so with five partitions (five parallel processing threads), you should aim at around 15-20 minutes to fully process this table. Even better, process only the partitions with data changes to bring the query processing window further down.

Multi-level Column Unpivoting with Power Query

An interesting challenge today come up while mentoring finance users in Power BI. Consider the following Excel layout.

061919_0107_MultilevelC1.png

The user wants to unpivot the data by rotating the three header rows (Scenario Type, Month, and Year) from columns to rows. The issue is that the headers span three rows. If you just select these columns and unpivot, you’ll end up with a mess. And Power Query operates on row at the time so you can’t reference previous rows, such as to concatenate Scenario, Month, and Year. We can do the concatenation in Excel so we have one row with column headers, such as Actuals-Jan-2018, Actuals-Feb-2018, and so on, which we can easily unpivot in Power Query. But if we can’t or don’t want to modify the Excel file, such as to avoid the same steps every time a new file comes in?

One way to achieve a Power Query-only solution is to transpose the rows. Then, we can apply the required transforms, such as filling down, creating a new column that concatenates the three columns and so on.

061919_0107_MultilevelC2.png

Then, we transpose back and unpivot. You can download the Excel file with the Power Query transforms using this link.
UnpivotBudget

Prologika Newsletter Summer 2019

090918_1951_DAXVariable1.pngAfter learning how to model the data properly (the most important skill), DAX would be the next hurdle in your self-service or organizational BI journey. You won’t get far in Microsoft BI without DAX. This letter shares a tip on how to identify DAX performance bottlenecks by using a recent feature in Power BI called Performance Analyzer.

Learning DAX

DAXBefore I get to Performance Analyzer, I’m excited to announce my latest book: Applied DAX with Power BI: From zero to hero with 15-minute lessons. This book was born out of necessity. In my consulting practice, I had been teaching and implementing Power BI and Analysis Services Tabular, and people were constantly asking for DAX book recommendations. Indeed, DAX is not an easy topic and has its ways to humble even experienced practitioners. There are a few good reference books out there, but they could be somewhat overwhelming for novice users. So, I turned my classroom and consulting experience into this book and designed it as a self-paced guide to help you learn DAX one lesson at a time.

This one was a different book to write. Unlike my other books, which tend to be heavy, this book is smaller (220 pages). Regardless, it took me a while to write it. The book also adopts a lesson-based approach, so the reader can learn and practice a specific DAX concept quickly. There are usually three sections in each lesson: Practice (with step-by-step guidance), Outcome (a visual that shows the result), and Analysis (explaining why DAX works this way). Most lessons are five to six pages long, and it should take no more than 15 minutes to complete the lesson’s exercises (you can download and view a sample lesson from the book page).

And if you do one lesson per day, you’ll be a DAX expert in a month!

Analyzing DAX Queries

“This report is slow!” I hope you never hear this, but the chances are that you will sooner or later. No one likes watching a spinning progress indicator and waiting for the report to show up. Tucked in the View ribbon of the Power BI Desktop recent builds is a handy tool called Performance Analyzer that can help you identify slow queries. It’s easy to use:

  1. In Power BI Desktop, click the View ribbon and check the Performance Analyzer setting. This will open the Performance Analyzer pane.
  2. Click Start Recording in the Performance Analyzer pane. Once you start recording, any action that requires refreshing a visual, such as filtering or cross-highlighting, will populate the Performance Analyzer pane. You’ll see the statistics of each visual logged in the load order with its corresponding load duration.
  3. You can click the “Refresh visuals” link in Performance Analyzer to refresh all visuals on the page and capture all queries. However, once you are in a recording mode, every visual adds a new icon to help you refresh only that visual.

PerformanceAnalyzer

Performance Analyzer captures the following information:

  • DAX query – The length of time to execute the query.
  • Visual display – How long it took for the visual to render on the screen after the query is executed.
  • Other – This is the time that the visual spent in other tasks, such as preparing queries, waiting for other visuals to complete, or doing some other background processing.

Although not as comprehensive as DAX Studio, Performance Analyzer is a quick and easy way to get an overall idea of what impacts your report page performance. Performance Analyzer is currently only available in Power BI Desktop and it’s not available in Power BI Service.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo