Prologika Forums
Making sense of data
Optimizing Arbitrary Shaped Sets

Blogs

Prologika (Teo Lachev's Weblog)

Books

Applied Microsoft SQL Server 2012 Analysis Services (Tabular Modeling)Learn PowerPivot and Analysis Services-Tabular at your own pace with our latest book Applied Microsoft SQL Server 2012 Analysis Services (Tabular Modeling). It is designed as an easy-to-follow guide for learning how to implement BI solutions spanning the entire personal-team-organizational BI spectrum.

Training

Applied Microsoft SQL Server 2008 Reporting Services

We offer onsite and online Business Intelligence classes!  Contact us about in-person training for groups of five or more students.

For more information or to register click here! 

Syndication

Archives

I'm working on optimizing a fairly large cube (2.5 TB) with some 25 billion rows in the fact table. The customer was complaining about long-running queries. Most queries would specify a time range that was passed to the query filter. For example, if the user wants to see the aggregated data from May 25th 20:00 to June 2nd 19:00, the query WHERE clause would like this:

WHERE

(

{

[Date].[Date].[Date].&[20140527] * [Hour].[Hour].&[20] : [Hour].[Hour].&[23],

[Date].[Date].[Date].&[20140528]:[Date].[Date].[Date].&[20140601] * [Hour].[Hour].&[0] : [Hour].[Hour].&[23],

[Date].[Date].[Date].&[20140602] * [Hour].[Hour].&[0] : [Hour].[Hour].&[19]

}

… additional dimension filters here

)

 

This an example of using arbitrary-shaped sets which Mosha wrote about here and Thomas Kejser here.

"The resulting union set has interesting shape – it cannot be represented as pure crossjoin of attribute hierarchies. Such sets are nicknamed "arbitrary shaped sets" in Analysis Services, because they represent shapes of arbitrary complexity. Analysis Services in general doesn't like arbitrary shaped sets, and there are often performance issues when using them, but if the business requirement calls for them – they can be used and will work correctly, although not most efficiently."

 

Indeed, the server doesn't like them and it would happily scan all partitions. In this case, the cube is partitioned by hour and data is kept for 40 days so there are 960 partitions. Although the above query spans 120 hours (partitions), the profiler would show that the server scans all 960 partitions, resulting in enormous amount of data being read. As it turns out, the WHERE clause is not optimized to project the filter on partitions. However, SUBSELECT filters are optimized because most reporting tools, such as Excel and SSRS, use them. The solution was simple: replace the WHERE clause with SUBSELECT to bring the query execution time from minutes to seconds:

 

FROM

(

SELECT

{

[Date].[Date].[Date].&[20140527] * [Hour].[Hour].&[20] : [Hour].[Hour].&[23],

[Date].[Date].[Date].&[20140528]:[Date].[Date].[Date].&[20140601] * [Hour].[Hour].&[0] : [Hour].[Hour].&[23],

[Date].[Date].[Date].&[20140602] * [Hour].[Hour].&[0] : [Hour].[Hour].&[19]

}

ON 0 FROM [<Cube>]

)

WHERE (… additional dimension filters here)

 

 


Posted Mon, Jun 16 2014 10:04 PM by tlachev
Filed under: