DAX – Page 3 – Prologika

Posts

Atlanta MS BI and Power BI Group Meeting on September 10th

September 3, 2019/0 Comments/in Blog, Events/by Prologika - Teo Lachev

MS BI fans, join us for the next Atlanta MS BI and Power BI Group meeting on September 10, Tuesday, at 6:30 PM at the Microsoft office in Alpharetta. This is a meeting not to miss. A DAX founder, Jeffrey Wang (Principal Software Engineer Manager at the Power BI product group) is coming all the way from Seattle to share DAX best practices with our group! Captech Consulting will sponsor the event. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:	Common DAX Patterns
Date:	September 10, 2019
Time	6:30 – 8:30 PM ET
Place:	Microsoft Office (Alpharetta) 8000 Avalon Boulevard Suite 900 Alpharetta, GA 30009
Overview:	We will learn common DAX patterns by examining several issues frequently reported by DAX users through PowerBI customer support tickets. Users will be able to: 1. Learn from mistakes made by other DAX users so you can avoid them in your own work. 2. Learn the common techniques to debug DAX problems. 3. Learn the best DAX patterns to solve common problems. 4. Gain deeper understanding of DAX by learning relevant history and underlying design of the programming language. 5. Ask me their own DAX questions
Speaker:	Currently working as Principal Software Engineer Manager at Microsoft, Jeffrey Wang stumbled upon BI after the Y2K bubble burst, fell in love with the field, and stayed in the industry ever since. After I joined Microsoft Analysis Services engine team in 2004, I quickly discovered that working on the programming language is the best way to stay close to the end users, so I joined the MDX formula engine team. After shipping a couple of releases of SQL Server Analysis Services, I joined the committee that created DAX. Today I lead the development effort of the DAX engine and the modeling engine inside the Power BI product group.

Implementing Asymmetric Crosstab in Power BI

August 24, 2019/0 Comments/in Blog/by Prologika - Teo Lachev

A recent requirement called for an asymmetric crosstab report in Power BI. Transitioned to Adventure Works, the final report looks like this.

The Internet column shows the sales amount from FactInternetSales. Then, the matrix pivots on the BusinessType column in the FactResellerSales. Because, Internet sales don’t relate to BusinessType, it doesn’t make sense to pivot it. Instead, we want to show Internet sales in a single static column before the crosstab portion starts.

Implementing such a report in SSRS is easy thanks to its support of adjacent groups and static columns but not so much in Power BI. The issue is that Matrix would happily pivot both measures and the InternetSalesAmount would be repeated for each business type.

Fortunately, with some blackbelt modeling and DAX, we can achieve the desired effect. I attached the pbix file and here are the high-level implementation steps:

Add a ReportCategory table (you can use the Enter Data feature), with a single column Category and two values: Internet and Reseller. Add a Category calculated column to FactInternetSales with a value of “Internet”. Add a Category calculated column to FactResellerSales with a value of “Reseller”. Create relationships FactInternetSales[Category]->ReportCategory and FactResellerSales[Category]->ReportCategory. These relationships are required because Matrix doesn’t support unrelated tables. Your model schema should look like this:
Create a SalesAmount measure with the following formula:
```
 SalesAmount = IF(SELECTEDVALUE('ReportCategory'[Category]) = "Internet" && NOT ISINSCOPE('FactResellerSales'[BusinessType]),
```
```
 SUM(FactInternetSales[InternetSalesAmount]),
```
```
 SUM(FactResellerSales[ResellerSalesAmount]))
```
The IF condition checks if the measure is under the Internet category. NOT ISINSCOPE(‘FactResellerSales'[BusinessType]) will return TRUE only for the total column in the Internet section. Otherwise, it will return the reseller sales. Because there are no reseller sales in the Internet category, only Internet sales will be shown. Note that SELECTEDVALUE will return FALSE for empty cells which is why I had to create the ReportCategory table.
Add Matrix viz and bind Product[Color] to rows, ReportCategory[Category] and ResellerSales[BusinessType] on columns, and SalesAmount in Values. Expand the columns to the next level (to show the BusinessType level).
In the Matrix format properties, go Subtotals, enable the “Per column” setting, and turn off the Category subtotal to avoid showing the reseller sales total twice.

Although Power BI has made great progress on the visualization side of things, it still lacks in flexibility. The two features that miss the most from SSRS are nesting visuals (to create repeat sections) and the flexible Matrix layout. However, with some black belt modeling and DAX, workarounds are possible.

High Memory Usage and Calculated Columns

August 18, 2019/0 Comments/in Blog/by Prologika - Teo Lachev

Scenario: A client reports a memory spike during processing. They have a Tabular semantic model deployed to Azure Analysis Services. They fully process the model daily. The model normally takes less than 50 GB RAM but during processing, it spikes five times and Azure Analysis Services terminates the processing task complaining that it “reached the maximum allowable memory in our pricing tier”. Normally, fully processing the model should take about twice the memory but five times?

Solution: Upon expecting the model design, I discovered that the client has decided to add (many) calculated columns to the two fact tables in the model. Most of these columns are used to calculated variances to prior year. The formulas contain DATESYTD and other DAX date-related functions. After data is read, Tabular processes calculated columns, relationships and hierarchies. In this case, the spike was due to calculations involving large time ranges and ineffective DAX expressions. Converting these columns to measures resolved the issue.

As a best practice, abstain from using calculated columns (especially in fact tables). Make sure you understand the difference between measures and calculated columns (I cover this extensively in my latest book “Applied DAX with Power BI“). If you do need expression-based columns, such to materialize expensive calculations, consider defining them upstream, such as in SQL views or Power Query.

Prologika Newsletter Summer 2019

June 15, 2019/0 Comments/in Blog, Newsletter/by Prologika - Teo Lachev

After learning how to model the data properly (the most important skill), DAX would be the next hurdle in your self-service or organizational BI journey. You won’t get far in Microsoft BI without DAX. This letter shares a tip on how to identify DAX performance bottlenecks by using a recent feature in Power BI called Performance Analyzer.

Learning DAX

DAX Before I get to Performance Analyzer, I’m excited to announce my latest book: Applied DAX with Power BI: From zero to hero with 15-minute lessons. This book was born out of necessity. In my consulting practice, I had been teaching and implementing Power BI and Analysis Services Tabular, and people were constantly asking for DAX book recommendations. Indeed, DAX is not an easy topic and has its ways to humble even experienced practitioners. There are a few good reference books out there, but they could be somewhat overwhelming for novice users. So, I turned my classroom and consulting experience into this book and designed it as a self-paced guide to help you learn DAX one lesson at a time.

This one was a different book to write. Unlike my other books, which tend to be heavy, this book is smaller (220 pages). Regardless, it took me a while to write it. The book also adopts a lesson-based approach, so the reader can learn and practice a specific DAX concept quickly. There are usually three sections in each lesson: Practice (with step-by-step guidance), Outcome (a visual that shows the result), and Analysis (explaining why DAX works this way). Most lessons are five to six pages long, and it should take no more than 15 minutes to complete the lesson’s exercises (you can download and view a sample lesson from the book page).

And if you do one lesson per day, you’ll be a DAX expert in a month!

Analyzing DAX Queries

“This report is slow!” I hope you never hear this, but the chances are that you will sooner or later. No one likes watching a spinning progress indicator and waiting for the report to show up. Tucked in the View ribbon of the Power BI Desktop recent builds is a handy tool called Performance Analyzer that can help you identify slow queries. It’s easy to use:

In Power BI Desktop, click the View ribbon and check the Performance Analyzer setting. This will open the Performance Analyzer pane.
Click Start Recording in the Performance Analyzer pane. Once you start recording, any action that requires refreshing a visual, such as filtering or cross-highlighting, will populate the Performance Analyzer pane. You’ll see the statistics of each visual logged in the load order with its corresponding load duration.
You can click the “Refresh visuals” link in Performance Analyzer to refresh all visuals on the page and capture all queries. However, once you are in a recording mode, every visual adds a new icon to help you refresh only that visual.

Performance Analyzer captures the following information:

DAX query – The length of time to execute the query.
Visual display – How long it took for the visual to render on the screen after the query is executed.
Other – This is the time that the visual spent in other tasks, such as preparing queries, waiting for other visuals to complete, or doing some other background processing.

Although not as comprehensive as DAX Studio, Performance Analyzer is a quick and easy way to get an overall idea of what impacts your report page performance. Performance Analyzer is currently only available in Power BI Desktop and it’s not available in Power BI Service.

Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

“Applied DAX with Power BI” Book Available

May 12, 2019/3 Comments/in Blog, News/by Prologika - Teo Lachev

I’m excited to announce my latest book: “Applied DAX with Power BI: From zero to hero with 15-minute lessons“. This book was born out of necessity. In my consulting practice, I had been teaching and implementing Power BI and Analysis Services Tabular, and people were constantly asking for DAX book recommendations. Indeed, DAX is not an easy topic and has its ways to humble even experienced practitioners. There are a few good reference books out there, but they could be somewhat overwhelming for novice users. So, I turned my classroom and consulting experience into this book and designed it as a self-paced guide to help you learn DAX one lesson at a time.

This one was a different book to write. Unlike my other books, which tend to be heavy, this book is smaller (220 pages). Regardless, it took me a while to write it. The book also adopts a lesson-based approach, so the reader can learn and practice a specific DAX concept quickly. There are usually three sections in each lesson: Practice (with step-by-step guidance), Outcome (a visual that shows the result), and Analysis (explaining why it works this way). Most lessons are five to six pages long, and it should take no more than 15 minutes to complete the lesson’s exercises (you can download and view a sample lesson from the book page).

And if you do one lesson per day, you’ll be a DAX expert in a month!

Currently, the book is available on Amazon (paperback and Kindle eBook) and it’s making its way to other outlets. As always, I’m awaiting feedback on the discussion forum at the bottom of the book page.

Fixing Random Problems with DAX

May 10, 2019/1 Comment/in Blog/by Prologika - Teo Lachev

Scenario: You have a dimension table and a fact table, and you want to generate some test data, such by randomly looking up a value from the dimension table for each row in the fact table.

The following DAX formula should get the job done but RANDBETWEEN doesn’t work (although it works as a standalone function outside LOOKUPVALUE). Specifically, you get either the same value or an empty value.

Rand_Value_From_Other_Table =
LOOKUPVALUE (
Dim_Table[Fruit],
Dim_Table[Fruit_ID], ( RANDBETWEEN ( 1, MAX ( Dim_Table[Fruit_ID] ) ) )
)

Solution: I think the issue is that like Power Query, DAX assumes that functions are idempotent. Given the same arguments, they should produce the same result, which isn’t true for RANDBETWEEN and Power Query Number.Random functions. The simple solution is to change the first argument to perform a dummy operation that doesn’t use a constant value, such as:

Rand_Value_From_Other_Table =
LOOKUPVALUE (
Dim_Table[Fruit],
Dim_Table[Fruit_ID], ( RANDBETWEEN ( 1 + Fact_table[Fact_table_id] – Fact_table[Fact_table_id], MAX ( Dim_Table[Fruit_ID] ) ) )
)

Thanks to Stacey Jones who came up with this interesting puzzle.

Implementing Input Table in DAX Query

March 7, 2019/0 Comments/in Blog/by Prologika - Teo Lachev

Scenario: If you want to pass an input table to a DAX query. For example, you might have an application that lets the user specify a subset of customers and as-of dates for each customer. Your query needs to calculate certain metrics, such as Sales, for the list of customers but as of the user-specified date for each customer, and potentially other parameters. It’s impractical to save these parameters in a physical table in your database because this requires a processing task. You may also want to avoid Analysis Services placing locks on the database for each refresh and thus blocking pending queries. Instead, you want to simply pass a collection of all the parameters the query needs to do its job, inside the query.

Solution: DAX supports static tables using the DATATABLE function, but the resulting data table is very limited in its support of features. First, you can’t name the table in your query, so you need to resort to using a variable. More importantly, many DAX operations that reference columns, such as attempting to compute MAX of a table column to get the “current” value, will error with “Table variable ‘_filter’ cannot be used in current context because a base table is expected”. You can’t also create physical relationships to a custom data table.

However, you can use TREATAS to establish virtual relationships based on the input parameters in the input table. The following query has a _filter variable that points to a custom data table that contains two customers with corresponding as-of dates. Then, the query uses TREATAS to evaluate the sales and order quantity for each customer as of the specified date. You can run this query against the AdventureWorks Tabular Model that comes with the SSAS samples.

DEFINE
VAR _filter =
DATATABLE ( “Customer Id”, STRING, “Date”, DATETIME,
{ { “AW00011000”, “7/22/2007”}, {“AW00011001”, “7/18/2005” } } )
EVALUATE
ADDCOLUMNS (
TREATAS( _filter , Customer[Customer Id], ‘Date'[Date] ),
“Sales”, CALCULATE(SUM(‘Internet Sales'[Sales Amount]) ),
“Quantity”, CALCULATE(SUM(‘Internet Sales'[Order Quantity]) )
)

One cautionary note here is that the cost of such “per-row” virtual relationships could be very expensive. When possible, you should always favor “batch”-oriented execution where the query uses global filters and physical relationships.

Thanks to Darren Gosbell and Marco Russo for helping with this.

DAX Calculation Groups

March 5, 2019/4 Comments/in Blog/by Prologika - Teo Lachev

Just when I started thinking that there won’t be any new BI features in SQL Server 2019, Christian Wade announced DAX calculation groups in CTP 2.3. His excellent post helped me try them out with Tabular Editor and Power BI Desktop and whip out this cool report.

As Chris Webb pointed out, DAX calculation groups let us implement time calculations as in Multidimensional with a “shell” time dimension. They probably won’t help you reducing the number of measures because you’d still need to flatten the “shell” dimension for maximum flexibility and avoid forcing the user to use the Time Calculations slicer to select the time calculations they want on the report. So, you would still have to project Sales Amount YTD as a separate measure:

[Sales Amount YTD] = CALCULATE (SUM ( 'Reseller Sales'[Sales Amount] ), 'Time Intelligence'[Time Calculation] = "YTD")

What calculation groups help with is avoiding copying and pasting the time intelligence formulas all over the place. A future CTP will also help centralize measure formatting where you’ll have the option to inherit the current measure format string (by leaving it blank), or override it.

To me, DAX calculation groups is a nice to have feature, which I could have happily traded for more significant ones. Speaking of which, I hope Microsoft adds more important enhancements to Tabular 2019.

Here is my top 10:

Anything that can help the poor developer debug and understand in what context a DAX expression is evaluated as DAX tends to have more exceptions than rules.
DAX script – to show and organize all measures in one place. I miss also the MD scope assignments.
Better Excel support
Aggregations as in Power BI
Composite models as in Power BI
Recursive DAX measures
Writeback
Default members
Actions (or make the JSON schema extensible so the community can fill in blanks)
Full support for Tabular in Power BI, including Quick Insights, explain Increase/decrease, etc. My blog “Power BI Feature Discrepancies for Data Acquisition” has the details.

How Filters Affect DAX Measures

January 26, 2019/0 Comments/in Blog/by Prologika - Teo Lachev

xVelocity (the storage engine behind Tabular, Power BI and columnstore indexes) is an in-memory columnar database, and it’s such it’s not suitable for detail-level reports, such as transaction-level reports. I wrote about performance implications with xVelocity and detail-level reporting here and here. In general, the lower the report grain and the more columns you add to the report (customer first name, last name, invoice number, etc.), the slower the performance will be as more and more columns would need to be scanned and cross-joined.

A recent Tabular project brought another twist. Most measures (about 250 total) were produced on top of a biggish snapshot table (250 million rows) and Customer table (12 million rows). As a best practice, base measures were created with common filters and then other measures would piggy back on the base measures like Russian dolls. It’s a best practice because if you must change a filter, you need to change only the base measures. For example, a [Number of Open Accounts] measure would filter the snapshot to find how many open accounts the customer has. Then, a [Number of Open Accounts: Banks] would apply another filter:

[Number of open accounts ever] (‘Creditor'[Subscriber Segment] = “Bank”)

Then, a web app will let the end user specify the customer identifiers and send a DAX query to the model to request all the 250 measures. The client complains that the query takes a long time and there isn’t much difference in the query execution for one customer or 100. To Microsoft credit, had the measures have the same filter, a “fusion cache” (introduced I believe in 2016 under “super DAX”) would kick in to reduce queries to the storage engine. For example, if the query requests SalesAmount, OrderQuantity, and other measures and they all have the same filters, this fusion cache will cache queries to the storage engine and help with the query execution. But the cache doesn’t work if measures have different filters irrespective if their definitions are nested or not. Consequently, each measure sends a separate query to the storage engine. So, if the query requests 100 measures for one customer, the storage engine will be queried 100 times and the 250 million snapshot table will be scanned 100 times for that one customer even if there are only a few transactions for that customer. This is a terribly inefficient way!

The first thing a database developer would do to solve this straight in SQL is to create a temporary table to extract the data for the selected customers before calculating the measures. But Tabular is not SQL Server. It would be great if at some point, it could introduce a caching mechanism to let the developer copy a subset of “rows” in the cache (in this case, extract rows from the Account Snapshot for the selected customers), and then transparently redirecting the snapshot queries to the cache (like how Power BI aggregations work today).

As things stand today, one way you can reduce the query execution time is to eliminate large joins. In this case, the DAX query filters the Customer table because some measures are produced from other tables. But you can split the query into multiple queries for the different tables and then apply the customer filter directly on the table bypassing the Customer table. So, instead of:

FILTER (VALUES (Customer[Customer ID] ), ‘Customer‘[Customer ID] IN { … }

We want to the filter for measures involving only the snapshot to be:

FILTER ( VALUES ( ‘Account Snapshot’[Customer ID] ), ‘Account Snapshot’[Customer ID] IN { … }

In this case, eliminating the Account Snapshot -> Customer join resulted in 50% improvement in the query execution time.

But you should also consider using the right tool for the job at hand and for detail-level reports this could be a relational database which reads data by rows and not columns.

T-SQL, Power Query, or DAX?

January 20, 2019/4 Comments/in Blog/by Prologika - Teo Lachev

I taught my Applied Power BI class last week to a group of smart data analysts. All of them were knowledgeable of T-SQL, which they have been using extensively for years to shape and transform the data and produce SSRS, Qlik and now Power BI reports. They haven’t previously used Power Query, but they liked my overview of its features. However, they were rightfully confused when to use T-SQL, Power Query, or DAX. Starting with the data source and moving up the chain, Power BI lets you use expressions in:

Data source (custom SQL Query, SQL view)
Power Query
DAX calculated columns
DAX measures (the uppermost level)

My advice is to invest time and learn all these capabilities. When you have diverse skills, you can use the best tool for the task at hand.

Although is hard to generalize as every data transformation is different, I recommend you shape the data as upstream (closer to the data source) as possible. So, if you are familiar with T-SQL and that’s your primary data source, then use T-SQL. As one of the Microsoft best products, SQL Server has been enjoying 30 years of continuous improvements. Not only can you apply the skills you already have, but you will gain in performance when you use T-SQL and delegate data crunching to SQL Server which is what it’s designed to do. You can also benefit from the rich data manipulation features of T-SQL. These are the same reasons why I favor the ELT (Extract, Load and Transform) pattern in organizational BI solutions instead of SSIS data flow transforms. And if you find T-SQL lacking in features, Power Query can supplement it nicely, such as to fill down missing values, quickly unpivot data, or apply fuzzy lookup.

When you connect to data sources that don’t support SQL, such as flat files or Excel, the next natural place is Power Query to shape and transform the data. The choice between Power Query and DAX calculated columns is a tricky one and typically involves a compromise between performance and skill set. In many cases, you’ll find that a custom column can be implemented both in Power Query and DAX calculated columns. In general, if performance is OK, use Power Query, which is further upstream than DAX calculated columns. As a bonus, your data model will see the custom columns as regular columns and compress them equally well. Consider DAX calculated columns (one level up Power Query) when:

You must use DAX features that Power Query lacks, such as ranking.
When Power Query transforms lead to long data refresh times, such as a lookup between two large tables.

Finally, while custom columns can be often implemented in any of the first three layers, DAX measures are unique, and they typically can be implemented in DAX only. For example, if you need the expression to reflect the end user filter selection, you must use a DAX measure because only DAX measures are evaluated at run time and can access runtime conditions, such as values from filters and slicers.