ProcessUpdate and Partition Scans

I’m working on minimizing the cube processing for a cube with some two billion rows in a fact table. I put together an ETL package that processes the dimensions with ProcessUpdate following by processing the latest partitions. I’ve noticed that processing one of the dimensions , which happens to be the largest (some 1 million rows) and most complicated dimension, resulted in partitions scans. The SQL Profiler showed the scans with the following events:

Finished processing the ‘<partition name>‘ partition

These partition scans don’t result in SQL queries to the database and normally should execute pretty fast. In this case, however, the scans were taking plenty of time resulting in some 15 minutes delay in the incremental processing flow. With some help from T.K. Anand, it turned out that I had a design issue with that dimension. I discovered the issue by changing the KeyDuplicate error in the dimension ErrorConfiguration to True and fully processing the dimension. Dimension processing failed because of the attribute relationships implied a one-to-many relationship between two attributes but the data didn’t support it. Somewhere along the design cycle, I turned off KeyDuplicate probably to get around the same issue in order to process the dimension successfully.

So, the moral of this story is:

  1. Always check your dimensions with the Dimension Health Check feature of the BIDS Helper.
  2. Don’t turn off KeyDuplicate.

BTW, the problem with duplicate key errors is that the server can move members around and that would cause indexes and aggregations to require being rebuilt. For example, if you have the following members in the source data:

Attr1Key    Attr2Key

55        32

55        35

The first time, the engine may store 32 as the related member. But the second time it sees the rows during ProcessUpdate, it would potentially choose 35 as the related member. And that’s going to cause indexes and aggregations to need rebuilding.