Tag Archive for: Performance

Posts

Fabric Direct Lake: Memory Utilization with Interactive Operations

August 15, 2024/0 Comments/in Blog/by Prologika - Teo Lachev

As I mentioned in my Power BI and Fabric Capacities: Thinking Outside the Box, memory limits of Fabric capacities could be rather restrictive for large semantic models with imported data. One relatively new option to combat out-of-memory scenarios that deserves to be evaluated and added to the list if Fabric is in your future is semantic models configured for Direct Lake storage. The blog covers results of limited testing that I did comparing side by side the memory utilization of two identical semantic models with the first one configured to import data and the second to use Direct Lake storage. If you need a Direct Lake primer, Chris Webb has done a great job covering its essentials here and here. As a disclaimer, the emphasis is on limited as these results reflect my personal observations based on some isolated tests I’ve done lately. Your results may and probably will vary considerably.

Understanding the Tests

My starting hypothesis was that Direct Lake on-demand loading will utilize memory much more efficiently for interactive operations, such as Power BI report execution. This is a bonus to the fact that data Direct Lake models don’t require refresh. Eliminating refresh could save tremendous amount of memory to start with, even if you apply advanced techniques such as incremental refresh or hybrid tables to models with imported data. Therefore, the tests that follow focus on memory utilization with interactive operations.

To test my hypothesis, I imported the first three months for year 2016 of the NY yellow taxi Azure open dataset to a lakehouse backed up by a Fabric F2 capacity. The resulted in 34.5 million rows distributed across several Delta Parquet files. I limited the data to three months because F2 ran out of memory around the 50 million rows mark with the error “This operation was canceled because there wasn’t enough memory to finish running it. Either reduce the memory footprint of your dataset by doing things such as limiting the amount of imported data, or if using Power BI Premium, increase the memory of the Premium capacity where this dataset is hosted. More details: consumed memory 2851 MB, memory limit 2851 MB, database size before command execution 220 MB”

Descriptive enough and in line with the F2 memory limit of maximum 3 GB per semantic model. I used Power BI desktop to import all that data into a YellowTaxiImported semantic model, which I published to Power BI Service and configured for large storage format. Then, I created online a second YellowTaxiDirectLake semantic model configured for Direct Lake storage mapped directly to the data in the lakehouse. I went back to Power BI desktop to whip up a few analytical (aggregate) queries and a few detail-level queries. Finally, I ran a few tests using DAX Studio.

Analyzing Import Mode

Even after a capacity restart, the YellowTaxiImported model immediately reported 1.4 GB of memory. My conclusion was that that the primary focus of Power BI Premium on-demand loading that was introduced a while back was to speed the first query after the model was evicted from memory. Indeed, I saw that many segments were memory resident and many weren’t, but using queries to touch the non-resident column didn’t increase the memory footprint. The following table lists the query execution times with “Clear On Run” enabled in DAX Studio (to avoid skewing due to cached query data).

Naturally, as the queriers get more detailed, the slower they get because VertiPaq is a columnar database. However, the important observation is that the memory footprint remains constant. Please note that Fabric allocates additional memory to execute the queries, so the memory footprint should grow up as the report load increases.

Query	Duration (ms)
//Analytical query 1 EVALUATE ROW( “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])) )	124
//Analytical query 2 DEFINE VAR __DS0Core = SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[puMonth], “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])) ) VAR __DS0BodyLimited = SAMPLE(3502, __DS0Core, ‘nyc_yellowtaxi'[puMonth], 1) EVALUATE __DS0BodyLimited ORDER BY ‘nyc_yellowtaxi'[puMonth]	148
//Analytical query 3 DEFINE VAR __SQDS0Core = SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[doLocationId], “AveragetripDistance”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tripDistance])) ) VAR __SQDS0BodyLimited = TOPN(50, __SQDS0Core, [AveragetripDistance], 0) VAR __DS0Core = SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[doLocationId], __SQDS0BodyLimited, “AveragetripDistance”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tripDistance])) ) VAR __DS0PrimaryWindowed = TOPN(1001, __DS0Core, [AveragetripDistance], 0, ‘nyc_yellowtaxi'[doLocationId], 1) EVALUATE __DS0PrimaryWindowed ORDER BY [AveragetripDistance] DESC, ‘nyc_yellowtaxi'[doLocationId]	114
//Analytical query 4 DEFINE VAR __SQDS0Core = SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[doLocationId], “AveragetripDistance”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tripDistance])) ) VAR __SQDS0BodyLimited = TOPN(50, __SQDS0Core, [AveragetripDistance], 0) VAR __DS0Core = SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[rateCodeId], __SQDS0BodyLimited, “AveragetipAmount”, CALCULATE(AVERAGE(‘nyc_yellowtaxi'[tipAmount])) ) VAR __DS0BodyLimited = SAMPLE(3502, __DS0Core, ‘nyc_yellowtaxi'[rateCodeId], 1) EVALUATE __DS0BodyLimited ORDER BY ‘nyc_yellowtaxi'[rateCodeId]	80
//Detail query 1 DEFINE VAR __DS0FilterTable = FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0) VAR __DS0FilterTable2 = FILTER( KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])), ‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2) ) VAR __DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL( ROLLUPGROUP(‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]), “IsGrandTotalRowTotal” ), __DS0FilterTable, __DS0FilterTable2, “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])) ) VAR __DS0PrimaryWindowed = TOPN( 502, __DS0Core, [IsGrandTotalRowTotal], 0, ‘nyc_yellowtaxi'[startLat], 1, ‘nyc_yellowtaxi'[startLon], 1 ) EVALUATE __DS0PrimaryWindowed ORDER BY [IsGrandTotalRowTotal] DESC, ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]	844
//Detail query 2 DEFINE VAR __DS0FilterTable = FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0) VAR __DS0FilterTable2 = FILTER( KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])), ‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2) ) VAR __DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL( ROLLUPGROUP(‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]), “IsGrandTotalRowTotal” ), __DS0FilterTable, __DS0FilterTable2, “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])) ) VAR __DS0PrimaryWindowed = TOPN( 502, __DS0Core, [IsGrandTotalRowTotal], 0, ‘nyc_yellowtaxi'[startLat], 1, ‘nyc_yellowtaxi'[startLon], 1 ) EVALUATE __DS0PrimaryWindowed ORDER BY [IsGrandTotalRowTotal] DESC, ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]	860
//Detail query 3 DEFINE VAR __DS0FilterTable = FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0) VAR __DS0FilterTable2 = FILTER( KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])), ‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2) ) VAR __DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL( ROLLUPGROUP(‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]), “IsGrandTotalRowTotal” ), __DS0FilterTable, __DS0FilterTable2, “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])) ) VAR __DS0PrimaryWindowed = TOPN( 502, __DS0Core, [IsGrandTotalRowTotal], 0, ‘nyc_yellowtaxi'[startLat], 1, ‘nyc_yellowtaxi'[startLon], 1 ) EVALUATE __DS0PrimaryWindowed ORDER BY [IsGrandTotalRowTotal] DESC, ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon]	1,213
//Detail query 4 (All Columns) DEFINE VAR __DS0FilterTable = FILTER(KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[startLat])), ‘nyc_yellowtaxi'[startLat] <> 0) VAR __DS0FilterTable2 = FILTER( KEEPFILTERS(VALUES(‘nyc_yellowtaxi'[tpepPickupDateTime])), ‘nyc_yellowtaxi'[tpepPickupDateTime] < DATE(2016, 1, 2) ) VAR __ValueFilterDM1 = FILTER( KEEPFILTERS( SUMMARIZECOLUMNS( ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon], ‘nyc_yellowtaxi'[paymentType], ‘nyc_yellowtaxi'[vendorID], ‘nyc_yellowtaxi'[improvementSurcharge], ‘nyc_yellowtaxi'[doLocationId], ‘nyc_yellowtaxi'[puLocationId], ‘nyc_yellowtaxi'[storeAndFwdFlag], ‘nyc_yellowtaxi'[tpepDropoffDateTime], ‘nyc_yellowtaxi'[tpepPickupDateTime], __DS0FilterTable, __DS0FilterTable2, “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])), “SumpassengerCount”, CALCULATE(SUM(‘nyc_yellowtaxi'[passengerCount])), “SumtripDistance”, CALCULATE(SUM(‘nyc_yellowtaxi'[tripDistance])), “SumendLat”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLat])), “SumendLon”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLon])), “Sumextra”, CALCULATE(SUM(‘nyc_yellowtaxi'[extra])), “SumfareAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[fareAmount])), “SummtaTax”, CALCULATE(SUM(‘nyc_yellowtaxi'[mtaTax])), “SumpuMonth”, CALCULATE(SUM(‘nyc_yellowtaxi'[puMonth])), “SumpuYear”, CALCULATE(SUM(‘nyc_yellowtaxi'[puYear])), “CountrateCodeId”, CALCULATE(COUNTA(‘nyc_yellowtaxi'[rateCodeId])) ) ), [SumtollsAmount] > 0 ) VAR __DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL( ROLLUPGROUP( ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon], ‘nyc_yellowtaxi'[paymentType], ‘nyc_yellowtaxi'[vendorID], ‘nyc_yellowtaxi'[improvementSurcharge], ‘nyc_yellowtaxi'[doLocationId], ‘nyc_yellowtaxi'[puLocationId], ‘nyc_yellowtaxi'[storeAndFwdFlag], ‘nyc_yellowtaxi'[tpepDropoffDateTime], ‘nyc_yellowtaxi'[tpepPickupDateTime] ), “IsGrandTotalRowTotal” ), __DS0FilterTable, __DS0FilterTable2, __ValueFilterDM1, “SumtotalAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[totalAmount])), “SumtipAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tipAmount])), “SumtollsAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[tollsAmount])), “SumpassengerCount”, CALCULATE(SUM(‘nyc_yellowtaxi'[passengerCount])), “SumtripDistance”, CALCULATE(SUM(‘nyc_yellowtaxi'[tripDistance])), “SumendLat”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLat])), “SumendLon”, CALCULATE(SUM(‘nyc_yellowtaxi'[endLon])), “Sumextra”, CALCULATE(SUM(‘nyc_yellowtaxi'[extra])), “SumfareAmount”, CALCULATE(SUM(‘nyc_yellowtaxi'[fareAmount])), “SummtaTax”, CALCULATE(SUM(‘nyc_yellowtaxi'[mtaTax])), “SumpuMonth”, CALCULATE(SUM(‘nyc_yellowtaxi'[puMonth])), “SumpuYear”, CALCULATE(SUM(‘nyc_yellowtaxi'[puYear])), “CountrateCodeId”, CALCULATE(COUNTA(‘nyc_yellowtaxi'[rateCodeId])) ) VAR __DS0PrimaryWindowed = TOPN( 502, __DS0Core, [IsGrandTotalRowTotal], 0, ‘nyc_yellowtaxi'[startLat], 1, ‘nyc_yellowtaxi'[startLon], 1, ‘nyc_yellowtaxi'[paymentType], 1, ‘nyc_yellowtaxi'[vendorID], 1, ‘nyc_yellowtaxi'[improvementSurcharge], 1, ‘nyc_yellowtaxi'[doLocationId], 1, ‘nyc_yellowtaxi'[puLocationId], 1, ‘nyc_yellowtaxi'[storeAndFwdFlag], 1, ‘nyc_yellowtaxi'[tpepDropoffDateTime], 1, ‘nyc_yellowtaxi'[tpepPickupDateTime], 1 ) EVALUATE __DS0PrimaryWindowed ORDER BY [IsGrandTotalRowTotal] DESC, ‘nyc_yellowtaxi'[startLat], ‘nyc_yellowtaxi'[startLon], ‘nyc_yellowtaxi'[paymentType], ‘nyc_yellowtaxi'[vendorID], ‘nyc_yellowtaxi'[improvementSurcharge], ‘nyc_yellowtaxi'[doLocationId], ‘nyc_yellowtaxi'[puLocationId], ‘nyc_yellowtaxi'[storeAndFwdFlag], ‘nyc_yellowtaxi'[tpepDropoffDateTime], ‘nyc_yellowtaxi'[tpepPickupDateTime]	4,240

Analyzing Direct Lake

After another restart, the resident memory footprint of the YellowTaxiDirectLake model was only 22.4 KB! Indeed, the $System.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS DMV showed that only system-generated RowNumber columns were memory resident.

For each query, I recorded two runs to understand how much time is spent in on-demand loading of columns into memory. The Import Mode column was added for convenience to compare the second run duration with the corresponding query duration from the Import Mode tests. Finally, the Model Resident Memory column records the memory footprint of the Direct Lake model.

Query	First Run (ms)	Second Run (ms)	Import Mode (ms)	Model Resident Memory (MB)
//Analytical query 1	79	75	124	14
//Analytical query 2	79	76	148	14.3
//Analytical query 3	382	133	114	68.1
//Analytical query 4	209	130	80	68.13
//Detail query 1	7,763	1,023	844	669.13
//Detail query 2	1,484	1,453	860	670.53
//Detail query 3	1,881	1,463	1,213	670.6
//Detail query 4	9,663	3,668	4,240	1,270

Conclusion

To sum up this long post, the following observations can be made:

As expected, the more columns the query touch, the higher the memory footprint of Direct Lake. For example, the last query requested all the columns, and the resulting memory footprint was at a par with imported mode.
It’s important to note that when Fabric is under memory pressure, such as when the report load increases, Direct Lake will start paging out columns with low temperature. The exact thresholds and rules are not documented but I’d expect the eviction mechanism to be much more granular and intelligent than evicting entire datasets with imported mode.
The reason that I didn’t see Direct Lake paging out memory is because I was still left with plenty (1.27 GB consumed out of 3 GB). It doesn’t make sense evicting data if there is no memory pressure since memory is the fasted storage.
You’ll pay a certain price the first time a column is loaded on demand with Direct Lake. The more columns, the longer the wait. Subsequent runs, however, will be much faster if the column is still mapped in memory.
Some queries will execute faster in import mode and some will execute slower. Overall, queries touching memory-resident columns should be comparable.

Therefore, if Direct Lake is an option for you, it should be at the forefront of your efforts to combat out-of-memory errors with large datasets. On the downside, more than likely you’ll have to implement ETL processes to synchronize your data warehouse to a Fabric lakehouse, unless your data is in Fabric to start with, or you use Fabric database mirroring for the currently supported data sources (Azure SQL DB, Cosmos, and Snowflake). I’m not counting the data synchronization time as a downside because it could supersede the time you currently spend in model refresh.

Designing Responsive Power BI Reports (Part 2)

July 6, 2020/0 Comments/in Blog/by Prologika - Teo Lachev

In my previous post about responsive reports, I said that changes to the report design can dramatically reduce the report load time. But you will undoubtedly reach a point where you cannot optimize the report any further. After all, having a page with a single visual might be fast, but I doubt it would deliver much business value. What else can be done to speed up the reports and ideally achieve a page load time of a few seconds? Our next stop for that project is to revisit the solution architecture. The client initially opted for a hybrid architecture where the reports use live connection to on-prem SSAS Tabular models. However, a significant time was spent in Power BI showing “loading data” before rendering the report. What took so long?

In this case, Power BI was hosted in the Azure West region, the Tabular server in the company data center in Atlanta, and most users were in the East region. This resulted in round trips crossing the country for each visual.

A user in the East region requests a report which is hosted in the West region.
Each visual generates a query which is sent to Tabular in the company data center.
SSAS returns data back to the West region.
Report payload is sent to the user in the East region.

Ideally, users, reports, and data should be in the same region. However, a long-standing limitation of Power BI is that once the first user signs up for Power BI, Power BI determines the data region based the user location and you can’t move it unless you open a support ticket with Microsoft. Power BI Premium will let you create premium capacities in a different region, but by default these capacities will be created in the original Power BI region and you can’t change the region once the capacity is created either. The chart below shows the round-trip latency doubles if the user is in the East region, but Power BI is in the West.

We decided to move the semantic model to Power BI so that Power BI owns the data. Besides potentially improving the report load time, this architecture has also other important advantages (to learn more, read my “Power BI Large Datasets: The Good, the Bad, and the Ugly” post). If you’re not on Power BI Premium, that “movement” might not easy if you have opted to use Visual Studio or Tabular Editor for development. That’s because Power BI Pro doesn’t expose the XMLA endpoint, so your only option is to migrate the model to Power BI Desktop. But migrating an SSAS Tabular project to Power BI Desktop is not officially supported and there is no automatic migration path.

Thanks to the enhanced dataset metadata (a better name would have been “metadata done right this time”), you could opt for cracking the Power BI Desktop file and replacing the model schema (my “Power BI Source Control” blog has the steps to get to the schema). You might get lucky especially if your Tabular project uses structured data source (the only data source type supported by Power BI Desktop). However, if it uses legacy data sources, you must change every table to use a structured data source and making changes to the JSON schema by hand and Power BI Desktop refusing the load the model with every typo is no fun.

On the other hand, Power BI Premium lets you deploy the SSAS model to the workspace XMLA endpoint. The prerequisites are:

Upgrade the Tabular model in Visual Studio to 1500 compatibility mode. This adds special annotations to the data source. If you don’t upgrade to 1500, you’ll find that you won’t be able to schedule the dataset for refresh as Power BI Service won’t show any connections.
Enable the XMLA endpoint for read-write in the capacity settings.
Enable the “Export data” tenant setting in Power BI (to learn more about this setting, read my “A False Sense of Security” blog).
Like the “Export data” battle wasn’t enough, we found that we have to beg the Security group to enable also the “Analyze in Excel with on-premise datasets” tenant setting. I have no idea what this setting has to do with the XMLA endpoint. Luckily, Fidler showed us the exact error message that this setting must be enabled.
In Visual Studio, change the model processing option to “Do not process”. The “Troubleshoot XMLA endpoint connectivity” article has the details.

Moving the semantic model paid big dividends. We could only see “loading data” for a moment. We also found that the network speed of the user connection and the browser play a significant role. For best performance, users should use Chrome or new Edge and have a broadband connection. We were able to reduce further the load time of the first page by enabling the published dataset for query caching which is only available in Power BI Premium.

Designing Responsive Power BI Reports

June 21, 2020/1 Comment/in Blog/by Prologika - Teo Lachev

I’m currently providing advisory services to an enterprise client for architecting and implementing an executive dashboard. As a typical dashboard, the UX design included various KPIs that look like these:

The Power BI report implementation followed the design and the above visualization was implemented as four cards and two textboxes. Including other UX elements, such as icons, labels, etc., the most important summary page of the report ended up having more than 100 visuals. It took 25 seconds to render the page on average, which is horrible performance. Performance Analyzer showed that DAX queries are very fast and most of the time was spent in the “Other” category. This means that because JavaScript is single-threaded, visual rendering is sequential. Indeed, the SQL Server Profiler revealed that out of 25 seconds, the first 15 seconds were spent elsewhere before Analysis Services Tabular starts receiving DAX queries. The Performance Analyzer document provides the essential coverage about how Power BI renders visuals, but the bottom line is this:

The more visuals the page has, the slower it will be. If you find that most of the time is spent in the “Other” category on page refresh, more than likely this is caused by Power BI serializing the visual rendering. If this is the case, the best course of action would be to reduce the number of visuals.

For example, the above visualization can be rendered with one visual only (Matrix) and some blackbelt visualization techniques. Matrix supports rendering measures on rows (in the Values section of the Format tab, turn on “Show on rows”). Spacing the rows can be achieved by increasing row padding. And icons can be rendered with conditional formatting. Another technique for reducing visuals and faster rendering is eliminating many labels with a page background image. What didn’t help was disabling visual interactions although it doesn’t hurt disabling it if you don’t need it. We’ve also found that browsers differ in how fast their render visuals. Internet Explorer was the slowest while Chrome and the new Edge were the fastest.

How fast did their report get after applying such optimizations? Between 50 and 60% faster.

True, Power BI has report design limitations. Developers with SSRS background will miss more advanced features, such as nesting visuals (e.g. bullet graph inside a matrix) and asymmetrical crosstab layouts. Still, techniques as the one I shared above will help you create more responsive reports.

Load Testing Tabular

April 17, 2016/0 Comments/in Blog/by Prologika - Teo Lachev

I while back I did a TechEd presentation “Can Your BI Solution Scale?”, when I discussed a methodology for load testing SSAS and SSRS. A customer wanted to ensure that its Tabular model can scale to thousands of deployed users when it goes live.

You can still use the excellent Microsoft-originated AS Load Sim framework that I demonstrated in the presentation to load test Tabular. And you can use it can send both MDX and DAX queries.

One aspect that deserves more attention is how to tweak the framework to parameterize DAX queries. The framework was design to parameterize MDX queries with tupples. For example, if you want to parameterize an MDX query by month, you can specify the set NonEmpty( [Date].[Calendar].[Month].Members, [Measures].[Internet Sales Amount] ). Then, the framework executes the set and assigns tupples from the set in random so you don’t just get cached results from the same query.

However, you need to make a small change to the framework to parameterize DAX queries. Because DAX queries doesn’t support the MDX UniqueName syntax for filtering, you can’t parse the UniqueName of the tupple member to extract only the name. Instead, you can use the DAX MID function for this purpose. For example, if I want to filter the Customer[Customer Name] column on the actual name, e.g. Acme, you can use the following expression:

Customer[Customer Name] = MID(“([Customer].[Customer Name].&[Acme])”, SEARCH(“&[“, “([Customer].[Customer Name].&[Acme])”) + 2, SEARCH(“])”, “([Customer].[Customer Name].&[Acme])”) – SEARCH(“&[“, “([Customer].[Customer Name].&[Acme])”) – 2)

Basically, this expression extracts the string “Acme” from ([Customer].[Customer Name].&[Acme]). Since, the customer names will vary. it’s a generic and a rather convoluted expression to extract a string surrounded by “&[” and “])”.

Posts

Fabric Direct Lake: Memory Utilization with Interactive Operations

Designing Responsive Power BI Reports (Part 2)

Designing Responsive Power BI Reports

Load Testing Tabular

Follow Us

Subscribe to our quarterly newsletter

Categories