Optimizing Arbitrary Shaped Sets

I’m working on optimizing a fairly large cube (2.5 TB) with some 25 billion rows in the fact table. The customer was complaining about long-running queries. Most queries would specify a time range that was passed to the query filter. For example, if the user wants to see the aggregated data from May 25th 20:00 to June 2nd 19:00, the query WHERE clause would like this:

WHERE

(

{

[Date].[Date].[Date].&[20140527] * [Hour].[Hour].&[20] : [Hour].[Hour].&[23],

[Date].[Date].[Date].&[20140528]:[Date].[Date].[Date].&[20140601] * [Hour].[Hour].&[0] : [Hour].[Hour].&[23],

[Date].[Date].[Date].&[20140602] * [Hour].[Hour].&[0] : [Hour].[Hour].&[19]

}

… additional dimension filters here

)

This an example of using arbitrary-shaped sets which Mosha wrote about here and Thomas Kejser here.

“The resulting union set has interesting shape – it cannot be represented as pure crossjoin of attribute hierarchies. Such sets are nicknamed “arbitrary shaped sets” in Analysis Services, because they represent shapes of arbitrary complexity. Analysis Services in general doesn’t like arbitrary shaped sets, and there are often performance issues when using them, but if the business requirement calls for them – they can be used and will work correctly, although not most efficiently.”

Indeed, the server doesn’t like them and it would happily scan all partitions. In this case, the cube is partitioned by hour and data is kept for 40 days so there are 960 partitions. Although the above query spans 120 hours (partitions), the profiler would show that the server scans all 960 partitions, resulting in enormous amount of data being read. As it turns out, the WHERE clause is not optimized to project the filter on partitions. However, SUBSELECT filters are optimized because most reporting tools, such as Excel and SSRS, use them. The solution was simple: replace the WHERE clause with SUBSELECT to bring the query execution time from minutes to seconds:

FROM

(

SELECT

{

[Date].[Date].[Date].&[20140527] * [Hour].[Hour].&[20] : [Hour].[Hour].&[23],

[Date].[Date].[Date].&[20140528]:[Date].[Date].[Date].&[20140601] * [Hour].[Hour].&[0] : [Hour].[Hour].&[23],

[Date].[Date].[Date].&[20140602] * [Hour].[Hour].&[0] : [Hour].[Hour].&[19]

}

ON 0 FROM [<Cube>]

)

WHERE (… additional dimension filters here)