Prologika Forums
Making sense of data
ProcessUpdate and Partition Scans

Blogs

Prologika (Teo Lachev's Weblog)

Books

Applied Microsoft SQL Server 2012 Analysis Services (Tabular Modeling)Learn PowerPivot and Analysis Services-Tabular at your own pace with our latest book Applied Microsoft SQL Server 2012 Analysis Services (Tabular Modeling). It is designed as an easy-to-follow guide for learning how to implement BI solutions spanning the entire personal-team-organizational BI spectrum.

Training

Applied Microsoft SQL Server 2008 Reporting Services

We offer onsite and online Business Intelligence classes!  Contact us about in-person training for groups of five or more students.

For more information or to register click here! 

Syndication

I'm working on minimizing the cube processing for a cube with some two billion rows in a fact table. I put together an ETL package that processes the dimensions with ProcessUpdate following by processing the latest partitions. I've noticed that processing one of the dimensions , which happens to be the largest (some 1 million rows) and most complicated dimension, resulted in partitions scans. The SQL Profiler showed the scans with the following events:

Finished processing the '<partition name>' partition

These partition scans don't result in SQL queries to the database and normally should execute pretty fast. In this case, however, the scans were taking plenty of time resulting in some 15 minutes delay in the incremental processing flow. With some help from T.K. Anand, it turned out that I had a design issue with that dimension. I discovered the issue by changing the KeyDuplicate error in the dimension ErrorConfiguration to True and fully processing the dimension. Dimension processing failed because of the attribute relationships implied a one-to-many relationship between two attributes but the data didn't support it. Somewhere along the design cycle, I turned off KeyDuplicate probably to get around the same issue in order to process the dimension successfully.

So, the moral of this story is:

  1. Always check your dimensions with the Dimension Health Check feature of the BIDS Helper.
  2. Don't turn off KeyDuplicate.

BTW, the problem with duplicate key errors is that the server can move members around and that would cause indexes and aggregations to require being rebuilt. For example, if you have the following members in the source data:

Attr1Key    Attr2Key

55        32

55        35

 

The first time, the engine may store 32 as the related member. But the second time it sees the rows during ProcessUpdate, it would potentially choose 35 as the related member. And that's going to cause indexes and aggregations to need rebuilding.


Posted Tue, Feb 7 2012 11:42 AM by Teo Lachev