Cube vs. VertiPaq Query Performance

This is a big topic and frankly it’s too ambitious on my part to tackle it. Assuming equivalent multidimensional (BISM Multidimensional) and tabular (BISM Tabular) models, I was curious how a multidimensional cube fares against VertiPaq in terms of performance. To be fair to VertiPaq, I decided to use native DAX queries. As you’ve probably heard, BISM Tabular in SQL Denali will include a variant of DAX to query tabular models deployed to SharePoint and SSAS running in VertiPaq mode. Chris Webb has a good writeup about DAX queries here. The DAX EVALUATE construct allows external clients to query tabular models using native DAX syntax instead of MDX. Since BISM Tabular speaks DAX, DAX queries are likely to be more efficient and give you better performance when querying tabular models. At this point, only Crescent generates native DAX queries. The DAX query syntax is:

DEFINE

MEASURE Table1 [measure1] = <DAX_Expression>

MEASURE Table2 [measure2] = <DAX_Expression>

EVALUATE <DAX Table Expression>

ORDER BY

    <DAX_Expression> [ASC | DESC]

    <DAX_Expression> [ASC | DESC]

START AT

    Value_or_Parameter, Value_or_Parameter, …

To have a more sizable dataset, I used the Contoso cube for my tests. I created a BISM Tabular model and imported the Contoso data. Since you probably don’t have Contoso, I provide queries that target the Adventure Works cube. I’ve started with the following unoptimized MDX query which calculates the average sales amount by date across products whose daily sales exceed the daily sales for the same date in the previous month:

WITH


MEMBER [Measures].SlowAvg AS


Avg

(


Filter

(

[Product].[Product].[Product].MEMBERS

,[Measures].[Sales Amount] > ([Measures].[Sales Amount], ParallelPeriod([Date].[Calendar].[Month]))

)

,[Measures].[Sales Amount]

)

SELECT

[Measures].SlowAvg ON 0,

[Date].[Calendar].[Date].Members
ON 1

FROM [Adventure Works];

Then, I optimized the query to take advantage of block computation mode, as follows:

WITH


MEMBER diff as
iif ([Measures].[Sales Amount] > ([Measures].[Sales Amount], ParallelPeriod([Date].[Calendar].[Month])), [Measures].[Sales Amount], null)


MEMBER [Measures].SlowAvg AS


Avg

(

[Product].[Product].[Product].MEMBERS, diff

)

SELECT

[Measures].SlowAvg ON 0,

[Date].[Calendar].[Date].Members
ON 1

FROM [Adventure Works];

Finally, my equivalent DAX query that used to measure performance was:

define measure FactResellerSales[TotalSales] = Sum([SalesAmount])

measure FactResellerSales[TotalSales – LastYear] = [TotalSales](SamePeriodLastYear(DimDate[FullDateAlternateKey]), All(DimDate))

measure FactResellerSales[AverageSales] = AverageX(Filter(Values(DimProduct[ProductKey]), [TotalSales] > [TotalSales – LastYear]), [TotalSales])

evaluate addcolumns(filter(values(DimDate[DateKey]), not isblank([AverageSales])), “AverageSalesAmount”, [AverageSales])

order by [DateKey]

And, the findings from the tests:

1.       MDX query un-optimized (cell calculation model) both on cold cache and executed second time – 33 sec

2.       MDX query optimized (block computation mode) on cold cache – 4.8 sec

3.       MDX query optimized (block computation mode) executed second time – 0.7 sec

4.       DAX query both on cold cache and executed second time – 6.4 sec

Here are some take-home notes:

  1. The fact that VertiPaq is an in-memory database doesn’t mean that it will perform much better than a multidimensional cube. The formula engine of BISM Multdimensional does cache query results in memory. So does the Windows OS. In fact, the more the cube is used, the higher the chances that its data will end up in memory.
  2. VertiPaq might give you good performance without special tuning. All DAX calculations run in a block computation mode.
  3. Optimized MDX queries might outperform VertiPaq especially if results are cached.
  4. DAX queries are never cached which explains why DAX queries perform the same when executed subsequently.

The fact that VertiPaq gives you a head start doesn’t mean that you cannot write inefficient DAX queries. For example, the following DAX measure definition returns the same results but it’s twice as slow.

measure FactResellerSales[AverageSales] = AverageX(Filter(AddColumns(Values(DimProduct[ProductKey]), “x”, [TotalSales]), [x] > [TotalSales – LastYear]), [x])

Again, this is an isolated test case and your mileage might vary greatly depending on queries, data volumes, hardware, etc. But I hope you could use it as a starting point to run your own tests while waiting for a VertiPaq performance guide.

The Load Balancing Act

I had some fun lately setting up a Reporting Services farm of two SSRS 2008 R2 nodes and a hardware load balancer. We followed the steps in BOL only to find out that the report server will return sporadic empty pages or MAC viewstate validation errors although the machine key was identical on both servers. We fixed the issues by:

  1. Enabling sticky sessions in the load balancer (not documented).
  2. Configuring the ReportServerURL setting (BOL says “Do not modify ReportServerUrl”).

Despite what BOL says or doesn’t say, it appears that sticky sessions required with R2 probably due to the AJAX-based ReportViewer. Here is an example configuration that demonstrates the three settings you need to change in the rsreportserver.config:

<UrlRoot>http://atltstssrsibo/reportserver</UrlRoot>

<Hostname>atltstssrsibo</Hostname>

<ReportServerUrl>http://atltstbir02ibo/reportserver</ReportServerUrl>

On each node, set up ReportServerURL to point to the node itself. In this scenario, atltstssrsibo is the load balancer name and atltstbir02ibo is the node server name.

Applied Analysis Services 2008 and PowerPivot Onsite Class

I am partnering with Data Education to deliver an intensive five-day onsite class on Analysis Services and PowerPivot. The class will be held September 19-23 at the Microsoft Technology Center in Boston. The class doesn’t assume any experience with Analysis Services. We’ll start from zero and build a multidimensional cube sharing along the way as many as best practices as possible. More information about the class and registration details is available here.

Using the IF Operator in Scope Assignments

UDM scope assignments are incredible useful because they let you write to the cube space, such as to implement custom aggregation, allocations, currency conversion, data calculations, and so on. Unfortunately, scope assignments have limitations. One of them is that more complicated scope expressions result in “an arbitrary shape of the sets is not allowed in the current context” error. Recently, I tried to use a scope assignment to zero out a cube subspace that the end user shouldn’t see and cannot be protected via dimension data security. I tried the following scope assignment (translated to Adventure Works):

Scope (

Employee.Employees.Members – Exists([Employee].[Employees].Members, <another set>, “<Measure Group Name>”),

<another attribute hierarchy>

);

this = null;

End Scope;

This produces the above error caused by employee set expression and as far as I know there is nothing you can do to rewrite the scope expression to avoid the error. In a moment of Eureka, I recall the IF operator and rewrote the scope as:

Scope (

<another attribute hierarchy>

);

IF Intersect([Employee].[
Employees].CurrentMember,

Exists([Employee].[Employees].Members, <another set>, “Measure Group Name”)

).Count = 0 THEN

this = null

END IF

End Scope;

This worked and appears that it performs well. But knowing better about scope assignments I was concerned that there might be hidden performance traps ahead. Jeffrey Wang from the Analysis Services was king enough to provide some input and even write a detailed blog about the performance implications of the IF operator. BTW, you are missing a lot if you are not following Jeffrey’s MDX and DAX blog.

As it turned out, the philosophical answer about the IF performance impact is “it depends”. Let’s say you have another scope assignment X. Instead of calculating X directly, you will be calculating something like IIF(not condition, X, NULL). So we are talking about extra overhead of an IIF function which can deliver good performance if X is in block mode. The bottom line is the server will try to apply block mode logic for IF statements. In my case, the condition would result in cell-by-cell mode but I might still get good performance if the other scope assignment (X) operates in block mode.

Although initially it appeared that there isn’t performance impact in my case, more testing revealed a severe performance hit. The cube had also scope assignments for currency conversion. These assignments interfered with the IF logic and the cube performance took a big hit. Consequently, I had to scratch out this approach.

I hope Jeffrey will write a new blog about what causes the very annoying “an arbitrary shape of the sets is not allowed in the current context” and how to avoid it if possible.