Posts

How Filters Affect DAX Measures

xVelocity (the storage engine behind Tabular, Power BI and columnstore indexes) is an in-memory columnar database, and it’s such it’s not suitable for detail-level reports, such as transaction-level reports. I wrote about performance implications with xVelocity and detail-level reporting here and here. In general, the lower the report grain and the more columns you add to the report (customer first name, last name, invoice number, etc.), the slower the performance will be as more and more columns would need to be scanned and cross-joined.

A recent Tabular project brought another twist. Most measures (about 250 total) were produced on top of a biggish snapshot table (250 million rows) and Customer table (12 million rows). As a best practice, base measures were created with common filters and then other measures would piggy back on the base measures like Russian dolls. It’s a best practice because if you must change a filter, you need to change only the base measures. For example, a [Number of Open Accounts] measure would filter the snapshot to find how many open accounts the customer has. Then, a [Number of Open Accounts: Banks] would apply another filter:

[Number of open accounts ever] (‘Creditor'[Subscriber Segment] = “Bank”)

Then, a web app will let the end user specify the customer identifiers and send a DAX query to the model to request all the 250 measures. The client complains that the query takes a long time and there isn’t much difference in the query execution for one customer or 100. To Microsoft credit, had the measures have the same filter, a “fusion cache” (introduced I believe in 2016 under “super DAX”) would kick in to reduce queries to the storage engine. For example, if the query requests SalesAmount, OrderQuantity, and other measures and they all have the same filters, this fusion cache will cache queries to the storage engine and help with the query execution. But the cache doesn’t work if measures have different filters irrespective if their definitions are nested or not. Consequently, each measure sends a separate query to the storage engine. So, if the query requests 100 measures for one customer, the storage engine will be queried 100 times and the 250 million snapshot table will be scanned 100 times for that one customer even if there are only a few transactions for that customer. This is a terribly inefficient way!

The first thing a database developer would do to solve this straight in SQL is to create a temporary table to extract the data for the selected customers before calculating the measures. But Tabular is not SQL Server. It would be great if at some point, it could introduce a caching mechanism to let the developer copy a subset of “rows” in the cache (in this case, extract rows from the Account Snapshot for the selected customers), and then transparently redirecting the snapshot queries to the cache (like how Power BI aggregations work today).

As things stand today, one way you can reduce the query execution time is to eliminate large joins. In this case, the DAX query filters the Customer table because some measures are produced from other tables. But you can split the query into multiple queries for the different tables and then apply the customer filter directly on the table bypassing the Customer table. So, instead of:

FILTER (VALUES (Customer[Customer ID] ), ‘Customer[Customer ID] IN { }

We want to the filter for measures involving only the snapshot to be:

FILTERVALUES ( ‘Account Snapshot’[Customer ID] ), Account Snapshot’[Customer ID] IN { }

In this case, eliminating the Account Snapshot -> Customer join resulted in 50% improvement in the query execution time.

But you should also consider using the right tool for the job at hand and for detail-level reports this could be a relational database which reads data by rows and not columns.

T-SQL, Power Query, or DAX?

I taught my Applied Power BI class last week to a group of smart data analysts. All of them were knowledgeable of T-SQL, which they have been using extensively for years to shape and transform the data and produce SSRS, Qlik and now Power BI reports. They haven’t previously used Power Query, but they liked my overview of its features. However, they were rightfully confused when to use T-SQL, Power Query, or DAX. Starting with the data source and moving up the chain, Power BI lets you use expressions in:

  1. Data source (custom SQL Query, SQL view)
  2. Power Query
  3. DAX calculated columns
  4. DAX measures (the uppermost level)

My advice is to invest time and learn all these capabilities. When you have diverse skills, you can use the best tool for the task at hand.

Although is hard to generalize as every data transformation is different, I recommend you shape the data as upstream (closer to the data source) as possible. So, if you are familiar with T-SQL and that’s your primary data source, then use T-SQL. As one of the Microsoft best products, SQL Server has been enjoying 30 years of continuous improvements. Not only can you apply the skills you already have, but you will gain in performance when you use T-SQL and delegate data crunching to SQL Server which is what it’s designed to do. You can also benefit from the rich data manipulation features of T-SQL. These are the same reasons why I favor the ELT (Extract, Load and Transform) pattern in organizational BI solutions instead of SSIS data flow transforms. And if you find T-SQL lacking in features, Power Query can supplement it nicely, such as to fill down missing values, quickly unpivot data, or apply fuzzy lookup.

When you connect to data sources that don’t support SQL, such as flat files or Excel, the next natural place is Power Query to shape and transform the data. The choice between Power Query and DAX calculated columns is a tricky one and typically involves a compromise between performance and skill set. In many cases, you’ll find that a custom column can be implemented both in Power Query and DAX calculated columns. In general, if performance is OK, use Power Query, which is further upstream than DAX calculated columns. As a bonus, your data model will see the custom columns as regular columns and compress them equally well. Consider DAX calculated columns (one level up Power Query) when:

  • You must use DAX features that Power Query lacks, such as ranking.
  • When Power Query transforms lead to long data refresh times, such as a lookup between two large tables.

Finally, while custom columns can be often implemented in any of the first three layers, DAX measures are unique, and they typically can be implemented in DAX only. For example, if you need the expression to reflect the end user filter selection, you must use a DAX measure because only DAX measures are evaluated at run time and can access runtime conditions, such as values from filters and slicers.

The Cost of xVelocity Relationships

How expensive is an xVelocity relationship? The answer depends on several factors, such as column cardinality, DAX calculations, and query itself. But it general, a relationship can add a significant overhead. Consider two tables: Customer (7.1 million rows) and Account Snapshot (187 million rows) and a relationship ‘Account Snapshot'[CustomerKey] -> Customer[CustomerKey]. I’ll use two queries for the test. The Relationship query uses the ‘Account Snapshot'[CustomerKey]->Customer[CustomerKey] relationship, while the No Relationship query scans directly the Account Snapshot table bypassing the relationship.

Relationship No Relationship
EVALUATE
CALCULATETABLE (
ADDCOLUMNS (
FILTER (
VALUES ( ‘Customer'[CustomerKey], … ),
‘Customer'[CustomerKey] IN { 731102730822895922 }
),
“Balance”, [Balance]
),
FILTER ( VALUES ( ‘Date'[Date] ), ‘Date'[Date] = DATE ( 2018, 3, 31 ) )
)
EVALUATE
CALCULATETABLE (
ADDCOLUMNS (
FILTER (
VALUES ( ‘Account Snapshot'[CustomerKey], … ),
‘Account Snapshot'[CustomerKey] IN { 731102730822895922}
),
“Balance”, [Balance]
),
FILTER ( VALUES ( ‘Date'[Date] ), ‘Date'[Date] = DATE ( 2018, 3, 31 ) )
)
1 customer: 198 (181/17) 1 customer: 141 (84/57)
1,000 customers: 901(547/354) 1,000 customers: 1,194 (315/877)

Each of these DAX queries results in two significant queries sent to the storage engine. The first calculates the measure for each customer and the second returns the selected customers (the filter clause in the query).  In the case of the Relationship query, this is what the first SE call looks like:

SELECT
‘Customer'[CustomerKey],
SUM ( ‘Account Snapshot'[Balance Base] )
FROM ‘Account Snapshot’
LEFT OUTER JOIN ‘Customer’ ON ‘Account Snapshot'[CustomerKey]=’Customer'[CustomerKey]

And, of course, there is no join for the No Relationship query. The statistics below the queries shows the query execution for 1 and 1,000 customers (customer identifiers were comma-separated and added to the IN clause). The first number shows the overall execution time in the storage engine while the number in parenthesis show the breakdown of the two SE queries.

As we can see, for one customer the relationship adds more than twice of overhead (181 vs 84 milliseconds). But for 1,000 customers, the overall execution time for the No Relationship query is higher 1,194 vs 901. How come? The first storage query is still faster (almost x2) but the second SE query is a way slower. As it turns out, the storage engine locates 1,000 rows in the smaller table (Customer) much faster than in the snapshot table when it executes the filter query. However, if we add more measures than at some point the No Relationship query would become faster. In a real-life project where the query requested some 200 measures, queries without relationship executed x3 faster than the ones with relationships.

xVelocity relationships between large tables could be expensive especially when the query requests many measures. If your query semantics allows it, consider denormalizing attributes from large dimension tables into fact tables to eliminate relationships.

One case where the query semantics might require a relationship is if it requests measures from multiple fact tables. But if you find that eliminating the relationship results in a significant performance boost, consider breaking down the query to request measures from each table and then union the result.

2-Day Applied DAX with Power BI Workshop in Atlanta (Oct 15-16)

Struggling with DAX? Data Analysis Expressions (DAX) is the expression language of Power BI, Power Pivot, and Analysis Services Tabular. It’s very powerful but it’s usually perceived as complex and requiring a steep learning curve. Taught by an established expert, this two-day workshop is designed to help you become proficient with DAX. Think of this workshop as advanced Power BI training. Reserve your seat today to attend this insightful 2-day workshop for only $999, when Teo Lachev (CEO of Prologika and Microsoft Data Platform MVP) teaches you the necessary DAX skills that you can immediately apply to your job.

Event Summary

  • Date: October 15 and 16
  • Time: 8:30 AM – 5 PM
  • Location: Microsoft Office in Alpharetta
  • Catering: lunch provided
  • Syllabus: available here
  • Price: $999 (use coupon SQL201810 for an instant 5% discount if you sign up two or more attendees from your company)
  • For more information and to register, go to http://bit.ly/daxworkshop201810

Key Benefits

  • Understand best practices for model design
  • Create advanced calculated columns
  • Master measures and evaluation contexts
  • Apply time intelligence
  • Work with advanced relationships and data security
  • Learn how to troubleshoot performance issues
  • Brainstorm your DAX puzzles
  • and much more…

091818_1243_2DayApplied1.png

New “Applied DAX with Power BI” Workshop

Data Analysis Expressions (DAX) is the expression language of Power BI, Power Pivot, and Analysis Services Tabular. It’s very powerful but it’s perceived as complex, requiring a steep learning curve. I’m excited to announce a new 2-day “Applied DAX with Power BI” workshop that I designed to help you become proficient with DAX. You’ll learn practical skills that will help you tackle a wide range of reporting requirements. We’ll start with DAX fundamentals, such as calculated columns and measures, and then progress to more advanced concepts, including such as context transitions, variables, filters, time intelligence, advanced relationships, row-level security, query optimization, and much more. Think of this workshop as Advanced Power BI and the next level from my “Applied Power BI” class. The target audience is data analysts and BI developers wanting to hone their DAX skills with Power BI, Power Pivot, or Tabular.

Here is my entire training catalog with a brief description and link to each course page.

COURSE DESCRIPTION

DURATION
(days)

Applied Power BI
This two-day workshop is designed to help you become proficient with Power BI and acquire the necessary skills to work with online and on-premises data, implement data models on a par with professional models created by BI pros, unlock the power of data by creating interactive reports and dashboards, and share insights with other users. No prior data modeling or reporting knowledge is assumed. Students are welcome to bring their own data to the second day of the class.

2

Applied DAX with Power BI
NEW!
Data Analysis Expressions (DAX) is the expression language of Power BI, Power Pivot, and Analysis Services Tabular. It’s very powerful but it’s usually perceived as complex requiring a steep learning curve. This two-day class is designed to help you become proficient with implementing business calculations with Data Analysis Services (DAX).

2

Applied BI Semantic Model Targeting BI developers, this intensive 5-day onsite class is designed to help you become proficient with Analysis Services and acquire the necessary skills to implement Tabular and Multidimensional semantic models. Use the opportunity to ask questions and study best practices that will help you achieve a single version of the truth by implementing scalable and secure organizational models. Bring your organizational BI to the next level by learning these two powerful BI technologies in one class!

5

Applied SQL Fundamentals SQL Server is the most deployed and popular database today. Different types of users need to query data stored in SQL Server data structures. This 2-day instructor led course provides you with the necessary skills to query Microsoft SQL Server databases with Transact-SQL. This course starts with the basics of a SELECT statement and its syntax, and progresses to teach you how to join, aggregate, and convert data.

2

Applied Microsoft BI (End to End) This four-day class is designed to help you become proficient with the Microsoft BI toolset and acquire the necessary skills to implement an organizational BI solution. You’ll learn how to design a star schema, use SQL Server Integration Services to transform data, and implement a Tabular semantic model. Depending on the students’ skillset, it can be customized, such as to reduce coverage of specific technologies, replace them with other topics of interest, such as Multidimensional instead of Tabular, or cover additional topics, such as Power BI or Reporting Services.

4

Applied Reporting Services Microsoft SQL Server Reporting Services has evolved into a sophisticated reporting platform that lets you present and analyze data consistently, quickly, and reliably. This intensive 3-day class is designed to help you become proficient with Reporting Services and acquire the necessary skills to author, manage, and deliver reports.

3

Applied Analysis Services-Multidimensional This intensive four-day class is designed to help you become proficient with Analysis Services (Multidimensional) and acquire the necessary skills to implement OLAP and data mining solutions. Learn how to build a cube from scratch. Use the opportunity to ask questions and study best practices!

4

Applied Excel and Analysis Services If your organization have Analysis Services Multidimensional cubes or Tabular models and you want to gain valuable insights from them in Excel, then this course is for you. Designed as a step-by-step tour, this course teaches business users how to become data analysts and unlock the hidden power of data. You’ll learn how to apply the Excel desktop BI capabilities to create versatile reports and dashboards for historical and trend analysis.

2

Applied Power BI with Excel Power BI is a suite of products for personal business intelligence (BI). It brings the power of Microsoft’s Business Intelligence platform to business users. At the same time, Power BI lets IT monitor and manage published models to track their usage, security, and estimate hardware and software resources. With Power BI, anyone can easily build personal BI models using the most popular tool – Excel and share them on premises or the cloud.

2

Applied Microsoft Visualization Tools This two-day class is designed to help you learn the visualization tools that are included in the Microsoft Data Analytics Platform. We’ll start by exploring the Excel reporting capabilities that include pivot (PivotTable and PivotChart) and Power View reports. Then, you’ll learn how to explore data interactively with Power BI Desktop. During the second day of the class, we’ll focus on learning how to create paginated reports with Reporting Services. The class can be customized to discuss other tools, such as SSRS mobile reports or Power Map reports.

2

Applied Master Data Management This two-day class is designed to help you become proficient with Master Data Services (MDS) and Data Quality Services (DQS). IT and business users learn how to design MDS models and extend them with business rules, attribute groups, and hierarchies. IT will learn how to integrate MDS with upstream and downstream systems and how to enforce secured access. Business users will learn how to use Excel to manage data with the tool they love most!

2

DAX Variables to Rescue

Scenario: DAX has its own share of idiosyncrasies that can humble both novice and experienced users. Consider a common example where a measure attempts to return sales for the last date in the Date table. What makes this common is that many real-life calculations require measures that evaluate as of the user-specified date (as of date), which is what MAX(‘Date'[Date]) returns.

=CALCULATE(SUM(Sales[SalesAmount]), MAX(Date[CalendarYear]) )

As innocent and logically correct this measure is, it fails with “A function ‘MAX’ has been used in a True/False expression that is used as a table filter expression. This is not allowed.” It’s also amusing to see even Microsoft struggling to explain the reason of this error which at some point was on the internal top 10 Microsoft list for DAX “ease of use”. This is what the documentation states about this error:

“The filter expression, MAX(‘Date'[CalendarYear]) attempts to return the largest numeric value in the CalendarYear column. However, in context of the measure expression, it cannot be passed as a table filter expression to the CALCULATE function, causing an error.”

This of course is incorrect. First, the MAX function doesn’t return a table but a scalar value. Second, the CALCULATE function is perfectly capable of taking Boolean expressions. The actual issue is that the Boolean expression is surrounded by a hidden CALCULATE and it’s ambiguous in what context the maximum date should be evaluated. To be consistent with the way filters propagate, it should be in the filter context outside of CALCULATE, but in the row context of the as-of date, which becomes a filter context with MAX expression. But this is not what you would expect, so DAX fails safe with the error.

The workaround suggested by the documentation is to filter the Date table and pass it as a table filter to calculate. This requires ignoring first the filter context on the Date table, only to overwrite it later with the ‘as of date’.

=CALCULATE( SUM(Sales[SalesAmount]), FILTER( ALL( ‘Date’[CalendarYear]), [CalendarYear] = MAX(‘Date’[CalendarYear]) ) )

Solution: A better solution is to use a variable. This example defines an EOP (End of Period) variable.

= VAR EOP = MAX(Date[CalendarYear]) RETURN CALCULATE(SUM(Sales[SalesAmount]), [CalendarYear] = EOP )

Because the EOP variable is evaluated where it’s defined there is no hidden context and the measure works. Unfortunately, as it stands today, DAX doesn’t allow us to create a global session variable for such scenarios. So, you’d need to include this variable in every measure that requires it.

Speaking of issues with the DAX documentation and variables, here is another example (last example on that page) that doesn’t work:

YoY% = var Sales = SUM(SalesTable[SalesAmount]) var SalesLastYear=CALCULATE(Sales, SAMEPERIODLASTYEAR(‘Calendar‘[Date])) return
if(Sales, DIVIDE(Sales – SalesLastYear, Sales))

This doesn’t work because the SalesLastYear var attempts to overwrite the context of the Sales var. In other words, it attempts to treat it as a measure but SalesLastYear would always return this year sales. The correct example should be:

YoY% = var Sales = SUM(SalesTable[SalesAmount]) var SalesLastYear=CALCULATE(SUM(SalesTable[SalesAmount]), SAMEPERIODLASTYEAR(‘Calendar‘[Date])) return
if(Sales, DIVIDE(Sales – SalesLastYear, Sales))

The documentation should say that although one variable can reference another, it can’t overwrite its context. Also, to avoid a logical bug, the example should actually be:

YoY% = var Sales = SUM(SalesTable[SalesAmount]) var SalesLastYear=CALCULATE(SUM(SalesTable[SalesAmount]), SAMEPERIODLASTYEAR(‘Calendar‘[Date])) return
if(AND(Sales, SalesLastYear), DIVIDE(Sales – SalesLastYear, Sales))

Because it’s meaningless to calculate YoY if this year or last year sales are missing.


Notes on Analysis Services Performance and Parallelism

As a part of a semantic model assessment, I got to experiment with different configurations of Azure Analysis Services. In this case, the largest fact table has almost 2 billion rows so it’s a good size model. I was particularly interested the correlation between a higher performance tier and query performance. One thing that I like about PaaS is that it’s easy to change and experiment with different configurations. The thing that I don’t like is that I can’t peek under the hood. This is probably done on purpose to keep people like me out, just like these green nets surrounding construction zones. So, I don’t know what hardware AAS is running on and how it’s configured. And I failed to get clarification from Microsoft. So, I might be out of line here, but I thought my findings are worth sharing albeit not officially vetted.

AAS Basic and Standard tiers are advertised as dedicated tiers running on dedicated virtual machines preconfigured with a certain number of cores that are a 20x multiple of the tier QPUs. For example, S2 tier (200 QPUs) comes with 10 cores while S9 (640 QPUs) comes with 32 cores. If you examine the advanced properties of the AAS instance in SSMS, you’ll see Group Affinity masks that confirms this. For example, the mask for S2 is x3FF (or xFFC) which converted to binary is 1111111111, whereas the mask for S9 is FFFFFFFFFFFF for 32 1’s. I don’t know why masks are used given that the VM has already a predefined number of cores. One, and most probable, explanation is that Microsoft co-hosts multiple Analysis Services instances on the same VM. When you provision a new AAS instance, Azure finds a VM that has enough capacity and spins the new service there. If you upgrade AAS, it could get relocated to a different VM if the old one doesn’t have enough capacity. Or, Microsoft might allocates more cores to the VM and a subset to AAS.

Anyone who’s done parallel programming knows that it’s not easy. Safe access and locks need to be used to synchronize access and protect shared resources. The AS xVelocity engine (aka Vertipaq) is a multi-threaded application that accesses RAM as a shared resource. As each thread competes for a global lock for memory allocation/deallocation, scalability decreases. Because processor cores reserve memory in chunks, called cache lines, additional synchronization is required when threads access memory location that are close to each other. Before SQL Server 2016 SP1, this was a serious issue with Tabular. As we’ve learned, starting with SQL Server 2016 SP1, Microsoft switched to using the Intel Threading Building Clocks (TBB) C++ library. Specifically, Tabular now uses the TBB scalable memory allocator. Mind you that TBB is not a CPU-specific extension so any modern Intel CPU should get these benefits. “Analysis Services SP1 uses an Intel TBB-based scalable allocator that provides separate memory pools for every core. As the number of cores increases, the system can scale almost linearly” and “The Intel TBB-based scalable allocator is also expected to help mitigate performance problems due to heap fragmentation that have been shown to occur with the Windows Heap”. Further, starting with SP1, Tabular is NUMA aware with 4-node NUMA system but I don’t know if the AAS VMs are preconfigured for NUMA.

These changes seem to be very beneficial and I do see the promised linear scalability (or close to it) as the number of cores increase. After all, how would you explain a customer that the query times remain the same when they switch to a higher tier? I used this query as a sample query to measure the memory scan performance (the filter is used to prevent Tabular short-circuiting the count by using internal statistics):

EVALUATE CALCULATETABLE (ROW ( “result”, DISTINCTCOUNT ( ‘Table'[day_id] ) ),    ‘Table'[day_id] > 0)

The day_id column has a 2.5 GB of total column size with 875 distinct values. The query execution time decreased 2.5 times when switching from S2 to S9, while the storage engine scan time decreased 3 times. Of course, more complicated queries would carry additional overhead in the formula engine which as we know is single threaded.

AAS Tier Storage Engine
Parallelism
Storage Engine Time (ms) Total Execution Time
(ms)
S2 (200 QPUs) 9x 1,563 1,922
S9 (640 QPUs) 29x 516 812

Tabular does an excellent job parallelizing the storage engine queries and maxing out all available cores in short bursts of time. Staring with SQL Server 2016 SP1, you should see substantial performance gains as the number of cores increase if the query is storage engine bound (which it will probably be with larger models).

I plan to update this blog when results from official load testing we plan to do with this semantic model are available.

Too Many Measures?

I’m doing an assessment of a data mart and semantic layer for an organizational BI solution. I was given the source of the Analysis Services Tabular model. Upon opening it locally, it took Visual Studio several minutes to open the project. Even worse, after the project was finally loaded, my laptop CPU utilization went into 80-90% and all the 16 GB of memory got exhausted rather quickly. With an empty workspace database! What’s going on?

As it turned out, a business requirement asked for all measures to be in the same table. So, developers introduced a Metrics DAX calculated table and assign it as a home table for all measures organized in display folders. Nothing wrong with that. However, the number of DAX measures were 2,774! It’s common practice to “flatten” DAX measures, such as to have separate measure for time calculations: SalesAmount, SalesAmountQTD, SalesAmountYTD, and so on. This can easily result in a measure explosion, but I haven’t experienced such a performance hit before with other clients.

As it turned out, each measure requires a data structure even with no data. When you select a table in Visual Studio, Visual Studio sends a DAX query asking for the value of each measure in the table: EVALUATE CALCULATETABLE(ROW, “Measure1”, [Measure1], “Measure2”, [Measure2]… It does so because the Measure Grid shows not only the measure formulas but also their values. Currently, there is no way to suppress this query, not even if you hide the Measure Grid. When the server that hosts the workspace database receives the query, it allocates a data structure for each measure. When the query asks for thousands of measures, the server would allocate a lot of CPU and memory resources just to evaluate the query. In this case, the server would exhaust all the memory on my laptop and return an “Out of memory” exception to the client. Visual Studio doesn’t give up and switches then to evaluating each measure separately by sending separate EVALUATE CALCULATETABLE query measure by measure! Once the project is finally loaded, design performance is bad, especially in the Diagram view. Every task, even selecting a table or moving the Visual Studio window from one monitor to another, results in a CPU utilization spike and takes a few seconds until UI becomes responsive.

I asked Microsoft to introduce a switch to suppress the EVALUATE CALCULATETABLE query, such as when the Measure Grid is disabled. I personally don’t care at all about measures values in the grid and I don’t care about the measure grid at all for that matter. These values are meaningless in most cases anyway, such as when they show blank values for time calculations since there is no date context at design time.

While Microsoft is investigating and working on improving the SSDT performance, you can mitigate the performance issue with many measures by doing the following:

  1. If possible, assign measures to multiple home tables.
  2. Switch your project to Manual Calculation Mode (Model menu, Calculation Options, Manual Calculation). Note that manual calculation won’t process calculated columns and relationship automatically. You’d have to switch to automatic calculation or explicitly process the model with Process Default when you make design changes.
  3. Create perspectives to reduce the number of tables in Diagram view.
  4. Give the Tabular Editor community tool a try.

031818_1803_TooManyMeas1.png

Visual-level Formatting in Power BI

Scenario: You want to overwrite the formatting of some field in a Power BI Visual, such as in the case of showing the field in one visual with decimals and  then in another without decimals. Or, you might want to overwrite the default format when connected live to a semantic model. You search left and right, bing the Internet, and still no clue as to why Power BI doesn’t have this feature.

Answer (that you probably won’t like): Most visuals already support display units and decimals. Table and matrix would probably get this feature very soon. We don’t have ETA for full custom format string override (per visual). Your vote counts so vote on ideas.powerbi.com.

As a best practice, I recommend applying format settings in the model so that it’s applied consistently to all reports.

In the narrower case of formatting measures, you can create a new measure and change its formatting. This also works when Power BI Desktop connects to a Tabular semantic model because you can create local DAX measures on top of the semantic model. Just create a measure that piggy-backs on an existing measure, whose format you want to change. Then, use the Formatting section to change its format.

020718_1703_Visuallevel1.png

Reporting on Concatenated Field in DAX

Scenario: You have a concatenated field stored in a table. For example, a medical claim might have several denial reasons. Instead of representing this as a Many-to-Many relationship, you’ve decided to store this a comma-delimited field, such as to allow the user to see all codes on one row. However, users are asking to produce counts grouped by each reason code, such as this one:

Solution: Follow these steps to implement a DAX measure that dynamically parses the string.

  1. Implement a DenialReason table with a single column DenialReason that stores the distinct reason codes. Add the table to your Power BI Desktop/Tabular model. Leaving it hanging without a relationship.
  2. Add a CountByDenialReason DAX measure that parses the string:

    CountByDenialReason :=
    CALCULATE (
        SUMX (
    Claim,
            IF (
                NOT ISEMPTY (
                    FILTER (
                        VALUES ( DenialReason[DenialReason] ),
                        PATHCONTAINS (
                            SUBSTITUTE ( Claim[DenialReason], “,”“|” ),
    DenialReason[DenialReason]
                        )
                    )
                ),
                1
            )
        )
    )

The NOT ISMPTY clause checks if the row contains any reasons. The PATHCONTAINS checks if the row has the denial reason whose context is passed from the report. In my case, the DenialReason field has a comma-delimited string if multiple reasons are stored on the row. Because PATHCONTAINS requires a pipe “|” delimiter, I use the SUBSTITUTE function to replace commas with pipes. If match is found for that reason, the IF operator returns one, which is summed in SUMX to return the actual count.

If you have a large table, you might get a better performance if you do the replacement in a wrapper SQL view or in Query Editor and avoid SUBSTITUTE altogether.

Events

Nothing Found

Sorry, no posts matched your criteria