Transactional Reporting with BISM Tabular

Continuing on BISM Tabular performance (see my Cube vs. VertiPaq Query Performance blog), I was disappointed to find that there are no performance improvements in PowerPivot/BISM Tabular in CTP3 in the area of detailed (transactional-level) reporting which has plagued me since the R2 release. Consider the following report build on top of an Adventure Works PowerPivot model.


This report shows all Adventure Works customers and their order information by combining data from the Customer (18,000 rows) and InternetSales (60,000 rows) tables. Specifically, the report includes two columns from the Customer table and four columns from the InternetSales table. The report has a slicer to let the user filter a customer. Given that PowerPivot is an end-user tool it’s likely that end users would build such reports.

Unfortunately, each time you click the slicer or filter a column, the report query takes about ten minutes (!) to execute. One would surely expect better performance from the “engine of the devil”. To be fair to VertiPaq, Excel OLAP PivotTable is designed for summarized reports and not transactional reports. I am not excluding the possibility of a bug in the VertiPaq engine because a similar report connected to the Adventure Works cube takes milliseconds to execute. Here is the MDX statement generated by PowerPivot report when I filter on Aaron:

SELECT {[Measures].[Sum of OrderQuantity 2],[Measures].[Sum of SalesAmount 2]} DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS , NON EMPTY Hierarchize(DrilldownMember(DrilldownMember(DrilldownMember(CrossJoin({[Customer].[FirstName].[All],[Customer].[FirstName].[FirstName].AllMembers}, {([Customer].[LastName].[All],[InternetSales].[SalesOrderNumber].[All],[InternetSales].[SalesOrderLineNumber].[All])}), [Customer].[FirstName].[FirstName].AllMembers, [Customer].[LastName]), [Customer].[LastName].[LastName].AllMembers, [InternetSales].[SalesOrderNumber]), [InternetSales].[SalesOrderNumber].[SalesOrderNumber].AllMembers, [InternetSales].[SalesOrderLineNumber])) DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON ROWS FROM (SELECT ({[Customer].[FirstName].&[Aaron]}) ON COLUMNS FROM [Sandbox]) CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

We would probably need to wait for the next version of Office to fix this performance issue at its core. Meanwhile, avoid Excel pivot reports and consider Crescent for detail-level reporting. Crescent generates native DAX queries and I verified that detail-level reporting is efficient.

Please vote for the connect feedback I’ve submitted on this subject.

UPDATE 8/14/2011

The slowdown is caused by the subselects added to the MDX query when applying a slicer or column filtering. Specifically, when the query requests calculations, the server applies Autoexists before doing NON EMPTY which could be rather expensive over a large space. Filtering on a field added to the Report Filter zone is very fast because the query uses a WHERE clause.

PowerPivot Configuration Tool

One great enhancement in SQL Server 11 (Denali) is the PowerPivot for SharePoint setup refactoring. Previously, the PowerPivot for SharePoint setup and configuration was included and performed at the end of the SQL Server setup. Consequently, it was very difficult to figure out what went wrong. In Denali, you need to perform the PowerPivot for SharePoint configuration as a post-installation step after the SQL Server setup completes. You can do this using the new PowerPivot Configuration Tool, SharePoint Central Administration, or PowerShell.

You can find the PowerPivot Configuration Tool in the SQL Server Denali CTP3 ð Configuration Tools program group. When you open the tool, it examines the SharePoint setup and presents a list of tasks (if any) that need to be done to configure PowerPivot for SharePoint so it’s operational. Once you validate the tasks and run them, the tool would perform the steps one at a time and indicate which step is currently taking place. You can select task and click the Script tab to see a PowerShell script for that task or the Output tab to see an execution log.

If a step fails, the tool stops and the rest of the steps are not executed. You need to examine what went wrong, fix it, and rerun the tool to execute the remaining tasks. You can also use the tool to remove features, services, applications. and solutions.

080811_0126_Transaction1

SharePoint is great product but it’s also very complex and difficult to troubleshoot. The PowerPivot Configuration Tool will simplify the management effort to configure PowerPivot for SharePoint.

Cube vs. VertiPaq Query Performance

This is a big topic and frankly it’s too ambitious on my part to tackle it. Assuming equivalent multidimensional (BISM Multidimensional) and tabular (BISM Tabular) models, I was curious how a multidimensional cube fares against VertiPaq in terms of performance. To be fair to VertiPaq, I decided to use native DAX queries. As you’ve probably heard, BISM Tabular in SQL Denali will include a variant of DAX to query tabular models deployed to SharePoint and SSAS running in VertiPaq mode. Chris Webb has a good writeup about DAX queries here. The DAX EVALUATE construct allows external clients to query tabular models using native DAX syntax instead of MDX. Since BISM Tabular speaks DAX, DAX queries are likely to be more efficient and give you better performance when querying tabular models. At this point, only Crescent generates native DAX queries. The DAX query syntax is:

DEFINE

MEASURE Table1 [measure1] = <DAX_Expression>

MEASURE Table2 [measure2] = <DAX_Expression>

EVALUATE <DAX Table Expression>

ORDER BY

    <DAX_Expression> [ASC | DESC]

    <DAX_Expression> [ASC | DESC]

START AT

    Value_or_Parameter, Value_or_Parameter, …

To have a more sizable dataset, I used the Contoso cube for my tests. I created a BISM Tabular model and imported the Contoso data. Since you probably don’t have Contoso, I provide queries that target the Adventure Works cube. I’ve started with the following unoptimized MDX query which calculates the average sales amount by date across products whose daily sales exceed the daily sales for the same date in the previous month:

WITH


MEMBER [Measures].SlowAvg AS


Avg

(


Filter

(

[Product].[Product].[Product].MEMBERS

,[Measures].[Sales Amount] > ([Measures].[Sales Amount], ParallelPeriod([Date].[Calendar].[Month]))

)

,[Measures].[Sales Amount]

)

SELECT

[Measures].SlowAvg ON 0,

[Date].[Calendar].[Date].Members
ON 1

FROM [Adventure Works];

Then, I optimized the query to take advantage of block computation mode, as follows:

WITH


MEMBER diff as
iif ([Measures].[Sales Amount] > ([Measures].[Sales Amount], ParallelPeriod([Date].[Calendar].[Month])), [Measures].[Sales Amount], null)


MEMBER [Measures].SlowAvg AS


Avg

(

[Product].[Product].[Product].MEMBERS, diff

)

SELECT

[Measures].SlowAvg ON 0,

[Date].[Calendar].[Date].Members
ON 1

FROM [Adventure Works];

Finally, my equivalent DAX query that used to measure performance was:

define measure FactResellerSales[TotalSales] = Sum([SalesAmount])

measure FactResellerSales[TotalSales – LastYear] = [TotalSales](SamePeriodLastYear(DimDate[FullDateAlternateKey]), All(DimDate))

measure FactResellerSales[AverageSales] = AverageX(Filter(Values(DimProduct[ProductKey]), [TotalSales] > [TotalSales – LastYear]), [TotalSales])

evaluate addcolumns(filter(values(DimDate[DateKey]), not isblank([AverageSales])), “AverageSalesAmount”, [AverageSales])

order by [DateKey]

And, the findings from the tests:

1.       MDX query un-optimized (cell calculation model) both on cold cache and executed second time – 33 sec

2.       MDX query optimized (block computation mode) on cold cache – 4.8 sec

3.       MDX query optimized (block computation mode) executed second time – 0.7 sec

4.       DAX query both on cold cache and executed second time – 6.4 sec

Here are some take-home notes:

  1. The fact that VertiPaq is an in-memory database doesn’t mean that it will perform much better than a multidimensional cube. The formula engine of BISM Multdimensional does cache query results in memory. So does the Windows OS. In fact, the more the cube is used, the higher the chances that its data will end up in memory.
  2. VertiPaq might give you good performance without special tuning. All DAX calculations run in a block computation mode.
  3. Optimized MDX queries might outperform VertiPaq especially if results are cached.
  4. DAX queries are never cached which explains why DAX queries perform the same when executed subsequently.

The fact that VertiPaq gives you a head start doesn’t mean that you cannot write inefficient DAX queries. For example, the following DAX measure definition returns the same results but it’s twice as slow.

measure FactResellerSales[AverageSales] = AverageX(Filter(AddColumns(Values(DimProduct[ProductKey]), “x”, [TotalSales]), [x] > [TotalSales – LastYear]), [x])

Again, this is an isolated test case and your mileage might vary greatly depending on queries, data volumes, hardware, etc. But I hope you could use it as a starting point to run your own tests while waiting for a VertiPaq performance guide.

The Load Balancing Act

I had some fun lately setting up a Reporting Services farm of two SSRS 2008 R2 nodes and a hardware load balancer. We followed the steps in BOL only to find out that the report server will return sporadic empty pages or MAC viewstate validation errors although the machine key was identical on both servers. We fixed the issues by:

  1. Enabling sticky sessions in the load balancer (not documented).
  2. Configuring the ReportServerURL setting (BOL says “Do not modify ReportServerUrl”).

Despite what BOL says or doesn’t say, it appears that sticky sessions required with R2 probably due to the AJAX-based ReportViewer. Here is an example configuration that demonstrates the three settings you need to change in the rsreportserver.config:

<UrlRoot>http://atltstssrsibo/reportserver</UrlRoot>

<Hostname>atltstssrsibo</Hostname>

<ReportServerUrl>http://atltstbir02ibo/reportserver</ReportServerUrl>

On each node, set up ReportServerURL to point to the node itself. In this scenario, atltstssrsibo is the load balancer name and atltstbir02ibo is the node server name.

PowerPivot and SQL Server Denali CTP3 Release Notes Available

Microsoft published release notes for PowerPivot and SQL Server Denali Community Technology Preview (CTP) 3 Release Notes. This means that the CTP3 release is imminent although the download link is not live yet. The release notes cover the new features pretty well.

UPDATE 7/12/2011

CTP3 got released today.

Applied Analysis Services 2008 and PowerPivot Onsite Class

I am partnering with Data Education to deliver an intensive five-day onsite class on Analysis Services and PowerPivot. The class will be held September 19-23 at the Microsoft Technology Center in Boston. The class doesn’t assume any experience with Analysis Services. We’ll start from zero and build a multidimensional cube sharing along the way as many as best practices as possible. More information about the class and registration details is available here.

Using the IF Operator in Scope Assignments

UDM scope assignments are incredible useful because they let you write to the cube space, such as to implement custom aggregation, allocations, currency conversion, data calculations, and so on. Unfortunately, scope assignments have limitations. One of them is that more complicated scope expressions result in “an arbitrary shape of the sets is not allowed in the current context” error. Recently, I tried to use a scope assignment to zero out a cube subspace that the end user shouldn’t see and cannot be protected via dimension data security. I tried the following scope assignment (translated to Adventure Works):

Scope (

Employee.Employees.Members – Exists([Employee].[Employees].Members, <another set>, “<Measure Group Name>”),

<another attribute hierarchy>

);

this = null;

End Scope;

This produces the above error caused by employee set expression and as far as I know there is nothing you can do to rewrite the scope expression to avoid the error. In a moment of Eureka, I recall the IF operator and rewrote the scope as:

Scope (

<another attribute hierarchy>

);

IF Intersect([Employee].[
Employees].CurrentMember,

Exists([Employee].[Employees].Members, <another set>, “Measure Group Name”)

).Count = 0 THEN

this = null

END IF

End Scope;

This worked and appears that it performs well. But knowing better about scope assignments I was concerned that there might be hidden performance traps ahead. Jeffrey Wang from the Analysis Services was king enough to provide some input and even write a detailed blog about the performance implications of the IF operator. BTW, you are missing a lot if you are not following Jeffrey’s MDX and DAX blog.

As it turned out, the philosophical answer about the IF performance impact is “it depends”. Let’s say you have another scope assignment X. Instead of calculating X directly, you will be calculating something like IIF(not condition, X, NULL). So we are talking about extra overhead of an IIF function which can deliver good performance if X is in block mode. The bottom line is the server will try to apply block mode logic for IF statements. In my case, the condition would result in cell-by-cell mode but I might still get good performance if the other scope assignment (X) operates in block mode.

Although initially it appeared that there isn’t performance impact in my case, more testing revealed a severe performance hit. The cube had also scope assignments for currency conversion. These assignments interfered with the IF logic and the cube performance took a big hit. Consequently, I had to scratch out this approach.

I hope Jeffrey will write a new blog about what causes the very annoying “an arbitrary shape of the sets is not allowed in the current context” and how to avoid it if possible.

Atlanta BI Record Attendance Last Night

We had a blast light night at Atlanta BI and ran out of space with a record attendance of some 60+ people. This is phenomenal given that we are amidst a vacationing season. Jonathan Lacefield from Daugherty gave us a great intro presentation of Analysis Services. Michael Clifford shows cool Integration Services tips. And, Beth Lenoir from Daugherty was kind to sponsor to event and arrange for some great food. Whether it was Jonathan’s presentation, tips, or the food, the atmosphere was electrifying. Thanks to everybody for making last night a fantastic success!

4505.IMG-20110627-00036.jpg-550x0

Performance Considerations for Securing Parent-Child Dimensions

Parent-child dimensions are a very useful feature of Analysis Services to model recursive hierarchies, such as an organizational hierarchy. Recently, I had to apply dimension data security to a parent-child dimension. The allowed set expression used the factless fact table approach (see my Protect UDM with Dimension Data Security article for more details), as follows (translated to the Adventure Works schema):

IIF

— check if the user has full access

(StrToMember(“[User].[Login].&[” + Username +”]”).Properties(“FullDimensionAccess”,TYPED)=True , [Employee].[Employees].[All],

Generate

(

Exists ([Employee].[ Employee].[ Employee].Members, StrToMember(“[User].[Login].&[” + Username + “]”), “Bridge Employee Security”)

, {LinkMember (Employee.Employee.CurrentMember, Employee.Employees)}

)

)

To avoid loading the factless fact table with all dimension members for users with full dimension access, the expression checks if the user is flagged as such. Since Analysis Services doesn’t provide a way to ignore dimension data security from the allowed/denied set, the expression returns the hierarchy top member [Employee].[Employees].[All] and this is where the issue is. Returning the All member causes the server to scan recursively the entire hierarchy and perform AutoExists against all other attributes within the P-C dimensions. Unlike regular dimensions where the server can infer attribute relationships quickly, a P-C hierarchy requires dynamically checking each member. This can slow the user connect time significantly. In my case, the P-C dimension had about 300,000 members and the connect time for users with full dimension access exceeded two minutes. Needless to say, users were not happy.

There are several workarounds (none of them ideal):

  1. Move users with full dimension access to a separate role to avoid the IIF check. This wasn’t an option in our case.
  2. Remove as many attributes as possible from the P-C dimension to reduce the number of attributes for the AutoExists check.
  3. Add a new hidden P-C dimension that has the dimension key and P-C attribute. Apply dimension data security on this dimension and leave the original P-C dimension intact.
  4. Refactor the P-C dimension and move its leaf members to a new dimension in order to reduce the number of members in the P-C dimension.

I logged a new wish on connect.microsoft.com and requested the ability to ignore dimension data security. Please vote so we can have a more elegant way to address such requirements without refactoring the cube design.

Dimension Data Security and Cube Script

If you have worked with Analysis Services dimension data security for a while, you know that MDX allowed/denied set expressions can become rather complex. You might be tempted to externalize these expressions in the cube script for brevity or reusability reasons. For example, you might decide to add a calculated member that returns a list of allowed partners that the user is authorized to see over a factless fact table as follows:

CREATE MEMBER CURRENTCUBE.[Measures].[PartnerSecurityFilter] AS “Exists([Partner].[Partner].[Partner].MEMBERS, StrToMember(“”[User].[Login].&[” + Username + “]””), ‘Bridge Partner Security’; 

Then, in the allowed set you might try reusing the member definition in the allowed set expression, such as:

StrToSet([Measures].[PartnerSecurityFilter])

This sounds like a great idea but it doesn’t work. Specifically, you will find that it works for the first member of the SSAS role that connects to the server but it doesn’t work for any other members of the same role who connect to the server. To make things worst, the other users will inherit the same security access as the first user and this opens also security vulnerability.

While I am not excluding the possibility for a bug, I think what’s going here is related to the fact that the MDX script is evaluated after dimension data security is applied. What’s weird is that even dynamic expressions, such as StrToMember or StrToSet don’t work. I wasn’t able to find a workaround besides having the entire logic in the allowed/denied set expression and not referencing the cube script at all. If dimension data security becomes complicated, consider externalizing it to an Analysis Services stored procedure.