Posts

New “Applied DAX with Power BI” Workshop

Data Analysis Expressions (DAX) is the expression language of Power BI, Power Pivot, and Analysis Services Tabular. It’s very powerful but it’s perceived as complex, requiring a steep learning curve. I’m excited to announce a new 2-day “Applied DAX with Power BI” workshop that I designed to help you become proficient with DAX. You’ll learn practical skills that will help you tackle a wide range of reporting requirements. We’ll start with DAX fundamentals, such as calculated columns and measures, and then progress to more advanced concepts, including such as context transitions, variables, filters, time intelligence, advanced relationships, row-level security, query optimization, and much more. Think of this workshop as Advanced Power BI and the next level from my “Applied Power BI” class. The target audience is data analysts and BI developers wanting to hone their DAX skills with Power BI, Power Pivot, or Tabular.

Here is my entire training catalog with a brief description and link to each course page.

COURSEDESCRIPTION

DURATION
(days)

Applied Power BI
This two-day workshop is designed to help you become proficient with Power BI and acquire the necessary skills to work with online and on-premises data, implement data models on a par with professional models created by BI pros, unlock the power of data by creating interactive reports and dashboards, and share insights with other users. No prior data modeling or reporting knowledge is assumed. Students are welcome to bring their own data to the second day of the class.

2

Applied DAX with Power BI
NEW!
Data Analysis Expressions (DAX) is the expression language of Power BI, Power Pivot, and Analysis Services Tabular. It’s very powerful but it’s usually perceived as complex requiring a steep learning curve. This two-day class is designed to help you become proficient with implementing business calculations with Data Analysis Services (DAX).

2

Applied BI Semantic ModelTargeting BI developers, this intensive 5-day onsite class is designed to help you become proficient with Analysis Services and acquire the necessary skills to implement Tabular and Multidimensional semantic models. Use the opportunity to ask questions and study best practices that will help you achieve a single version of the truth by implementing scalable and secure organizational models. Bring your organizational BI to the next level by learning these two powerful BI technologies in one class!

5

Applied SQL FundamentalsSQL Server is the most deployed and popular database today. Different types of users need to query data stored in SQL Server data structures. This 2-day instructor led course provides you with the necessary skills to query Microsoft SQL Server databases with Transact-SQL. This course starts with the basics of a SELECT statement and its syntax, and progresses to teach you how to join, aggregate, and convert data.

2

Applied Microsoft BI (End to End)This four-day class is designed to help you become proficient with the Microsoft BI toolset and acquire the necessary skills to implement an organizational BI solution. You’ll learn how to design a star schema, use SQL Server Integration Services to transform data, and implement a Tabular semantic model. Depending on the students’ skillset, it can be customized, such as to reduce coverage of specific technologies, replace them with other topics of interest, such as Multidimensional instead of Tabular, or cover additional topics, such as Power BI or Reporting Services.

4

Applied Reporting ServicesMicrosoft SQL Server Reporting Services has evolved into a sophisticated reporting platform that lets you present and analyze data consistently, quickly, and reliably. This intensive 3-day class is designed to help you become proficient with Reporting Services and acquire the necessary skills to author, manage, and deliver reports.

3

Applied Analysis Services-MultidimensionalThis intensive four-day class is designed to help you become proficient with Analysis Services (Multidimensional) and acquire the necessary skills to implement OLAP and data mining solutions. Learn how to build a cube from scratch. Use the opportunity to ask questions and study best practices!

4

Applied Excel and Analysis ServicesIf your organization have Analysis Services Multidimensional cubes or Tabular models and you want to gain valuable insights from them in Excel, then this course is for you. Designed as a step-by-step tour, this course teaches business users how to become data analysts and unlock the hidden power of data. You’ll learn how to apply the Excel desktop BI capabilities to create versatile reports and dashboards for historical and trend analysis.

2

Applied Power BI with ExcelPower BI is a suite of products for personal business intelligence (BI). It brings the power of Microsoft’s Business Intelligence platform to business users. At the same time, Power BI lets IT monitor and manage published models to track their usage, security, and estimate hardware and software resources. With Power BI, anyone can easily build personal BI models using the most popular tool – Excel and share them on premises or the cloud.

2

Applied Microsoft Visualization ToolsThis two-day class is designed to help you learn the visualization tools that are included in the Microsoft Data Analytics Platform. We’ll start by exploring the Excel reporting capabilities that include pivot (PivotTable and PivotChart) and Power View reports. Then, you’ll learn how to explore data interactively with Power BI Desktop. During the second day of the class, we’ll focus on learning how to create paginated reports with Reporting Services. The class can be customized to discuss other tools, such as SSRS mobile reports or Power Map reports.

2

Applied Master Data ManagementThis two-day class is designed to help you become proficient with Master Data Services (MDS) and Data Quality Services (DQS). IT and business users learn how to design MDS models and extend them with business rules, attribute groups, and hierarchies. IT will learn how to integrate MDS with upstream and downstream systems and how to enforce secured access. Business users will learn how to use Excel to manage data with the tool they love most!

2

DAX Variables to Rescue

Scenario: DAX has its own share of idiosyncrasies that can humble both novice and experienced users. Consider a common example where a measure attempts to return sales for the last date in the Date table. What makes this common is that many real-life calculations require measures that evaluate as of the user-specified date (as of date), which is what MAX(‘Date'[Date]) returns.

=CALCULATE(SUM(Sales[SalesAmount]), MAX(Date[CalendarYear]) )

As innocent and logically correct this measure is, it fails with “A function ‘MAX’ has been used in a True/False expression that is used as a table filter expression. This is not allowed.” It’s also amusing to see even Microsoft struggling to explain the reason of this error which at some point was on the internal top 10 Microsoft list for DAX “ease of use”. This is what the documentation states about this error:

“The filter expression, MAX(‘Date'[CalendarYear]) attempts to return the largest numeric value in the CalendarYear column. However, in context of the measure expression, it cannot be passed as a table filter expression to the CALCULATE function, causing an error.”

This of course is incorrect. First, the MAX function doesn’t return a table but a scalar value. Second, the CALCULATE function is perfectly capable of taking Boolean expressions. The actual issue is that the Boolean expression is surrounded by a hidden CALCULATE and it’s ambiguous in what context the maximum date should be evaluated. To be consistent with the way filters propagate, it should be in the filter context outside of CALCULATE, but in the row context of the as-of date, which becomes a filter context with MAX expression. But this is not what you would expect, so DAX fails safe with the error.

The workaround suggested by the documentation is to filter the Date table and pass it as a table filter to calculate. This requires ignoring first the filter context on the Date table, only to overwrite it later with the ‘as of date’.

=CALCULATE( SUM(Sales[SalesAmount]), FILTER( ALL( ‘Date’[CalendarYear]), [CalendarYear] = MAX(‘Date’[CalendarYear]) ) )

Solution: A better solution is to use a variable. This example defines an EOP (End of Period) variable.

= VAR EOP = MAX(Date[CalendarYear]) RETURN CALCULATE(SUM(Sales[SalesAmount]), [CalendarYear] = EOP )

Because the EOP variable is evaluated where it’s defined there is no hidden context and the measure works. Unfortunately, as it stands today, DAX doesn’t allow us to create a global session variable for such scenarios. So, you’d need to include this variable in every measure that requires it.

Speaking of issues with the DAX documentation and variables, here is another example (last example on that page) that doesn’t work:

YoY% = var Sales = SUM(SalesTable[SalesAmount]) var SalesLastYear=CALCULATE(Sales, SAMEPERIODLASTYEAR(‘Calendar‘[Date])) return
if(Sales, DIVIDE(Sales – SalesLastYear, Sales))

This doesn’t work because the SalesLastYear var attempts to overwrite the context of the Sales var. In other words, it attempts to treat it as a measure but SalesLastYear would always return this year sales. The correct example should be:

YoY% = var Sales = SUM(SalesTable[SalesAmount]) var SalesLastYear=CALCULATE(SUM(SalesTable[SalesAmount]), SAMEPERIODLASTYEAR(‘Calendar‘[Date])) return
if(Sales, DIVIDE(Sales – SalesLastYear, Sales))

The documentation should say that although one variable can reference another, it can’t overwrite its context. Also, to avoid a logical bug, the example should actually be:

YoY% = var Sales = SUM(SalesTable[SalesAmount]) var SalesLastYear=CALCULATE(SUM(SalesTable[SalesAmount]), SAMEPERIODLASTYEAR(‘Calendar‘[Date])) return
if(AND(Sales, SalesLastYear), DIVIDE(Sales – SalesLastYear, Sales))

Because it’s meaningless to calculate YoY if this year or last year sales are missing.


Notes on Analysis Services Performance and Parallelism

As a part of a semantic model assessment, I got to experiment with different configurations of Azure Analysis Services. In this case, the largest fact table has almost 2 billion rows so it’s a good size model. I was particularly interested the correlation between a higher performance tier and query performance. One thing that I like about PaaS is that it’s easy to change and experiment with different configurations. The thing that I don’t like is that I can’t peek under the hood. This is probably done on purpose to keep people like me out, just like these green nets surrounding construction zones. So, I don’t know what hardware AAS is running on and how it’s configured. And I failed to get clarification from Microsoft. So, I might be out of line here, but I thought my findings are worth sharing albeit not officially vetted.

AAS Basic and Standard tiers are advertised as dedicated tiers running on dedicated virtual machines preconfigured with a certain number of cores that are a 20x multiple of the tier QPUs. For example, S2 tier (200 QPUs) comes with 10 cores while S9 (640 QPUs) comes with 32 cores. If you examine the advanced properties of the AAS instance in SSMS, you’ll see Group Affinity masks that confirms this. For example, the mask for S2 is x3FF (or xFFC) which converted to binary is 1111111111, whereas the mask for S9 is FFFFFFFFFFFF for 32 1’s. I don’t know why masks are used given that the VM has already a predefined number of cores. One, and most probable, explanation is that Microsoft co-hosts multiple Analysis Services instances on the same VM. When you provision a new AAS instance, Azure finds a VM that has enough capacity and spins the new service there. If you upgrade AAS, it could get relocated to a different VM if the old one doesn’t have enough capacity. Or, Microsoft might allocates more cores to the VM and a subset to AAS.

Anyone who’s done parallel programming knows that it’s not easy. Safe access and locks need to be used to synchronize access and protect shared resources. The AS xVelocity engine (aka Vertipaq) is a multi-threaded application that accesses RAM as a shared resource. As each thread competes for a global lock for memory allocation/deallocation, scalability decreases. Because processor cores reserve memory in chunks, called cache lines, additional synchronization is required when threads access memory location that are close to each other. Before SQL Server 2016 SP1, this was a serious issue with Tabular. As we’ve learned, starting with SQL Server 2016 SP1, Microsoft switched to using the Intel Threading Building Clocks (TBB) C++ library. Specifically, Tabular now uses the TBB scalable memory allocator. Mind you that TBB is not a CPU-specific extension so any modern Intel CPU should get these benefits. “Analysis Services SP1 uses an Intel TBB-based scalable allocator that provides separate memory pools for every core. As the number of cores increases, the system can scale almost linearly” and “The Intel TBB-based scalable allocator is also expected to help mitigate performance problems due to heap fragmentation that have been shown to occur with the Windows Heap”. Further, starting with SP1, Tabular is NUMA aware with 4-node NUMA system but I don’t know if the AAS VMs are preconfigured for NUMA.

These changes seem to be very beneficial and I do see the promised linear scalability (or close to it) as the number of cores increase. After all, how would you explain a customer that the query times remain the same when they switch to a higher tier? I used this query as a sample query to measure the memory scan performance (the filter is used to prevent Tabular short-circuiting the count by using internal statistics):

EVALUATE CALCULATETABLE (ROW ( “result”, DISTINCTCOUNT ( ‘Table'[day_id] ) ),    ‘Table'[day_id] > 0)

The day_id column has a 2.5 GB of total column size with 875 distinct values. The query execution time decreased 2.5 times when switching from S2 to S9, while the storage engine scan time decreased 3 times. Of course, more complicated queries would carry additional overhead in the formula engine which as we know is single threaded.

AAS TierStorage Engine
Parallelism
Storage Engine Time (ms)Total Execution Time
(ms)
S2 (200 QPUs)9x1,5631,922
S9 (640 QPUs)29x516812

Tabular does an excellent job parallelizing the storage engine queries and maxing out all available cores in short bursts of time. Staring with SQL Server 2016 SP1, you should see substantial performance gains as the number of cores increase if the query is storage engine bound (which it will probably be with larger models).

I plan to update this blog when results from official load testing we plan to do with this semantic model are available.

Too Many Measures?

I’m doing an assessment of a data mart and semantic layer for an organizational BI solution. I was given the source of the Analysis Services Tabular model. Upon opening it locally, it took Visual Studio several minutes to open the project. Even worse, after the project was finally loaded, my laptop CPU utilization went into 80-90% and all the 16 GB of memory got exhausted rather quickly. With an empty workspace database! What’s going on?

As it turned out, a business requirement asked for all measures to be in the same table. So, developers introduced a Metrics DAX calculated table and assign it as a home table for all measures organized in display folders. Nothing wrong with that. However, the number of DAX measures were 2,774! It’s common practice to “flatten” DAX measures, such as to have separate measure for time calculations: SalesAmount, SalesAmountQTD, SalesAmountYTD, and so on. This can easily result in a measure explosion, but I haven’t experienced such a performance hit before with other clients.

As it turned out, each measure requires a data structure even with no data. When you select a table in Visual Studio, Visual Studio sends a DAX query asking for the value of each measure in the table: EVALUATE CALCULATETABLE(ROW, “Measure1”, [Measure1], “Measure2”, [Measure2]… It does so because the Measure Grid shows not only the measure formulas but also their values. Currently, there is no way to suppress this query, not even if you hide the Measure Grid. When the server that hosts the workspace database receives the query, it allocates a data structure for each measure. When the query asks for thousands of measures, the server would allocate a lot of CPU and memory resources just to evaluate the query. In this case, the server would exhaust all the memory on my laptop and return an “Out of memory” exception to the client. Visual Studio doesn’t give up and switches then to evaluating each measure separately by sending separate EVALUATE CALCULATETABLE query measure by measure! Once the project is finally loaded, design performance is bad, especially in the Diagram view. Every task, even selecting a table or moving the Visual Studio window from one monitor to another, results in a CPU utilization spike and takes a few seconds until UI becomes responsive.

I asked Microsoft to introduce a switch to suppress the EVALUATE CALCULATETABLE query, such as when the Measure Grid is disabled. I personally don’t care at all about measures values in the grid and I don’t care about the measure grid at all for that matter. These values are meaningless in most cases anyway, such as when they show blank values for time calculations since there is no date context at design time.

While Microsoft is investigating and working on improving the SSDT performance, you can mitigate the performance issue with many measures by doing the following:

  1. If possible, assign measures to multiple home tables.
  2. Switch your project to Manual Calculation Mode (Model menu, Calculation Options, Manual Calculation). Note that manual calculation won’t process calculated columns and relationship automatically. You’d have to switch to automatic calculation or explicitly process the model with Process Default when you make design changes.
  3. Create perspectives to reduce the number of tables in Diagram view.
  4. Give the Tabular Editor community tool a try.

031818_1803_TooManyMeas1.png

Visual-level Formatting in Power BI

Scenario: You want to overwrite the formatting of some field in a Power BI Visual, such as in the case of showing the field in one visual with decimals and  then in another without decimals. Or, you might want to overwrite the default format when connected live to a semantic model. You search left and right, bing the Internet, and still no clue as to why Power BI doesn’t have this feature.

Answer (that you probably won’t like): Most visuals already support display units and decimals. Table and matrix would probably get this feature very soon. We don’t have ETA for full custom format string override (per visual). Your vote counts so vote on ideas.powerbi.com.

As a best practice, I recommend applying format settings in the model so that it’s applied consistently to all reports.

In the narrower case of formatting measures, you can create a new measure and change its formatting. This also works when Power BI Desktop connects to a Tabular semantic model because you can create local DAX measures on top of the semantic model. Just create a measure that piggy-backs on an existing measure, whose format you want to change. Then, use the Formatting section to change its format.

020718_1703_Visuallevel1.png

Reporting on Concatenated Field in DAX

Scenario: You have a concatenated field stored in a table. For example, a medical claim might have several denial reasons. Instead of representing this as a Many-to-Many relationship, you’ve decided to store this a comma-delimited field, such as to allow the user to see all codes on one row. However, users are asking to produce counts grouped by each reason code, such as this one:

Solution: Follow these steps to implement a DAX measure that dynamically parses the string.

  1. Implement a DenialReason table with a single column DenialReason that stores the distinct reason codes. Add the table to your Power BI Desktop/Tabular model. Leaving it hanging without a relationship.
  2. Add a CountByDenialReason DAX measure that parses the string:

    CountByDenialReason :=
    CALCULATE (
        SUMX (
    Claim,
            IF (
                NOT ISEMPTY (
                    FILTER (
                        VALUES ( DenialReason[DenialReason] ),
                        PATHCONTAINS (
                            SUBSTITUTE ( Claim[DenialReason], “,”“|” ),
    DenialReason[DenialReason]
                        )
                    )
                ),
                1
            )
        )
    )

The NOT ISMPTY clause checks if the row contains any reasons. The PATHCONTAINS checks if the row has the denial reason whose context is passed from the report. In my case, the DenialReason field has a comma-delimited string if multiple reasons are stored on the row. Because PATHCONTAINS requires a pipe “|” delimiter, I use the SUBSTITUTE function to replace commas with pipes. If match is found for that reason, the IF operator returns one, which is summed in SUMX to return the actual count.

If you have a large table, you might get a better performance if you do the replacement in a wrapper SQL view or in Query Editor and avoid SUBSTITUTE altogether.

SSRS Multivalue Parameters in DAX

UPDATE: Microsoft has added support for multi-valued parameters in the DAX Query Designer. Please read the discussion thread below for more info. You can use the approach discussed in this article if you need more control over the parameter handling, but the Microsoft-provided way should suffice for most cases and it’s easier to implement.

Déjà vu today with a twist. SSRS multivalue parameters in SSAS reports but this time in DAX. Now that SSAS Designer supports DAX queries, we should be able to do everything we were able to do in MDX, right? Unfortunately, as you will quickly discover, Microsoft “forgot” about multivalue parameters when working on the DAX Designer.

You can use MDX (no shame there) and write queries the old way, but if you are a DAX purist, you’d need to take the road less traveled which goes through the DAX rabbit hole.

Here are the high-level steps in the SSDT Report Designer/Report Builder and I tried my best to simplify this as much as I can:

  1. Do as much drag and drop using the DAX Graphical Query Designer to auto-generate the DAX query, as you won’t have another chance once you switch to a text mode. You can also use the graphical mode to declare your parameter(s) and to let SSRS autogenerate the report-level parameters and queries.
  2. If you let the DAX Query Designer auto-generate the parameter queries, change the Available Values of the report-level parameters to use ParameterCaption field (not the ParameterValue column). If the parameter uses default values, change the Default Values tab to set the default values by captions (not using pipe-delimited format that the DAX Designer auto-generates). Again, that’s because we’d use the parameter caption.
    011218_0216_SSRSMultiva1.png
  3. Go to the properties of the main dataset, flip to the Parameters tab and change the expression to concatenate the parameter values with a pipe “|”, such as =Join(Parameters!DateFiscalYear.Value,”|”). You’ll see why in a moment.
    011218_0216_SSRSMultiva2.png
  4. Now open the main DAX report query and switch to Text mode. Promise yourself never to go back to the Graphical mode (the one that lets you drag and drop). Click the Parameters button and reconfigure the parameter by selecting the empty value in the Dimension column. For testing the query inside the query designer, you might want to enter some pipe-delimited values in the Default column.
    011218_0216_SSRSMultiva3.png
  5. Change the main query to support multivalue parameters. The following query highlights the important changes:

    EVALUATE
    SUMMARIZECOLUMNS (
    ‘Date'[Fiscal Year],
    FILTER (
    VALUES ( ‘Date'[Fiscal Year] ),
    (
                OR (
                    ( @DateFiscalYear = “All” ),
                    PATHCONTAINS ( @DateFiscalYear, ‘Date'[Fiscal Year] )
                )
    )
    ),
    “Internet Total Sales”, [Internet Total Sales],
    “Reseller Total Sales”, [Reseller Total Sales]
    )

I removed the variable (VAR) definitions (not needed after simplifying the query). The filter expression uses an OR condition. If the user selects the parameter “All” value, then all rows are returned. If specific values are selected, the PATHCONTAINS function would return TRUE for that row in the filtered column (‘Date'[Fiscal Year] in this case) that matches one of the selected values. If you have more parameters, simply add more FILTER clauses.

I attach a report to demonstrate the changes.

Power BI What-if Analysis

Veteran Excel users might have used the Excel What-if feature that let you try several different sets of values in one or more formulas to explore various outcomes (scenarios). The August release of Power BI Desktop introduces a similar feature. You start by defining a What If parameter from the Modeling tab.

Don’t confuse the What If parameter with a query parameter which is used in query parameter-driven properties, such as to change the data source the query connects to.

In the process of configuring the parameter, you define hardcoded minimum, maximum, and increment values. The outcome is two DAX measures. The first one generates the scale, while the second returns the selected value.

Discount Percentage = GENERATESERIES(0, 0.5, 0.05)

Discount Percentage Value = SELECTEDVALUE(‘Discount Percentage'[Discount Percentage])

 

Next, you can use the “Value” measure as any DAX measure. Typically, you would create a target measure that somehow depends on the What If parameter, e.g.:

Sales After Discount = SUM([SalesAmount]) – (SUM(InternetSales[SalesAmount]) * ‘Discount Percentage'[Discount Percentage Value])

Now you can see how the target value changes when you change the What If value.

Power BI What If parameters allow you to parameterize you DAX measures. Continuing on the same path, I’d like to see expression-based properties and filters, as well as the ability to parameterize fields on the report, such as to present the user with a list of measures, and rebind visualizations after the user selects which measure they want to see on the report.

Power BI Report Measures Over Tabular Models

The May release of Power BI Desktop adds the ability to define DAX calculated measures when Power BI Desktop is connected live to a Tabular model or Power BI datasets. This is conceptually similar to defining MDX calculated members in Excel connected to a cube. The measure definitions are local to the Power BI Desktop model (the Tabular model is not modified). You can do all measure-related tasks as when you define measures in the data model, such as changing the data type and formatting the measure or changing the home table. In the screenshot below, I’ve defined a YTD report measure over the Adventure Works Tabular model.

050317_0117_PowerBIRepo1.png

Behind the scenes, the DAX query generated by Power BI Desktop adds the measures as query-scoped measures in the /* USER DAX BEGIN/END */ section:

DEFINE MEASURE ‘Reseller Sales'[Reseller Sales YTD] =

(/* USER DAX BEGIN */

TOTALYTD(SUM(‘Reseller Sales'[Sales Amount]), ‘Date'[Date])

/* USER DAX END */)

EVALUATE

ROW(

“Reseller_Total_Sales”, ‘Reseller Sales'[Reseller Total Sales],

“Reseller Sales YTD”, ‘Reseller Sales'[Reseller Sales YTD]

)

Report-level measures are a welcome enhancement. Bringing this further, I’d like to see the ability to define report-level measures using the Quick Measure feature. Another feature that I’m waiting for is the ability to use custom measures (both defined in the model and report-level) in the new numeric range slicer (currently in preview).

Tabular DAX Editor

The Tabular toolset is getting better. One thing that I miss from Multidimensional is the cube script that lets you view all custom calculations in one place so that you can organize them any way you want, add comments, etc. This is why I contributed to the DAX Editor tool. Microsoft has taken notice and introduced a tool (also called DAX Editor) in the latest SSDT release. Read Kay Unkroth’s announcement here.

The Microsoft DAX Editor supports the old XML-based schema and the new JSON schema. On the upside, it gives you a break from the Measure Grid and the formula bar. On the downside, you can work only on one measure at the time. So, let’s leave it to marinate it a few more months with the hope that we can finally have a Tabular script. As Kay commented at the end of his blog post there is a hope:

Yes, we are hearing this a lot from you guys! Having all expressions in a single document makes it easy to find and replace, search, etc. It’s on the backlog, but not yet on the top of the priorities. Looking at the higher prio work we still need to get done , it’s more mid-termish. But we know how we want to achieve this and we are laying down the foundation with this DAX Editor. Btw. it is much, much more than just an editor window. That is really just the tip of the iceberg. Same with the DAX query window in SSSM. The real beauty (and complexity) is in the DAX parser behind these windows, and a few other features like IntelliSense. It’s coming together. Brick by brick!

Meanwhile, use the community DAX Editor.

Events

Nothing Found

Sorry, no posts matched your criteria