As I mentioned in my TechEd Memoirs post, Microsoft is scaling back from the “super” cube approach in favor of more granular data “islands”. Since this took me by suprise, I approached the SSAS team for clarification. Here are the answers I managed to get:
1. What performance benefits could be realized by splitting UDM in data islands?
At this point, we don't have detailed performance numbers we can share. But to give an indication - a cube that had around 15 measure groups, and a very large number of attributes (over 100 cube dimensions, many with more than 100 attributes) showed a performance improvement of about 6 fold when split into separate cubes (one per measure group), for queries that only referenced a single measure group. The main benefit came from the reduced cube space - the individual cubes only needed to include dimensions related to the measure group, and therefore paid no overhead for unrelated dimensions. There was also some benefit from the MDX script being simpler for each of the individual cubes.
2. What guidelines should be followed when partitioning UDM?
The general principle is to have multiple cubes, maybe even one per fact table. Then have other cubes, using linked measure groups that can combine into the complete view. Use the small cubes when they meet the needs, and query the full cube when need to use measures from >1 measure group.
Some general guidelines are:
- If the dimensionality of all the measure groups is (much) the same, it would limit the benefit of splitting up (though there is some benefit of the simpler MDX script per cube)
- If queries very commonly reference measures from >1 measure group, then there would be no value in splitting those measure groups into separate cubes
- The perf benefit must be balanced against the negative impact on the end user view, if frequently a question using measures from just one measure group becomes a question where a measure from another measure group is added. For the user to switch cubes in the middle would be tiresome
- A single large cube is easier to manage, as there is no need to duplicate scripts
For calculations that span cross measure groups e.g. where many measure groups have calculations using exchange rate, certainly would not want to use LookupCube. Could either keep all those measure groups in the same cube, or split up and again use linked measure groups to utilize exchange rates in multiple cubes
3. By a “data island” do you mean leveraging the linked dimensions and measure group features? If so, any known limitations and restrictions?
Yes, uses linked measure groups, as described above. The dimensions are simply shared, and need not be linked. As far as limitations, there is overhead of having to define calculations etc. on the cube that combines the smaller cubes. Also, Report Builder does not handle such databases that well, as the linked measure groups appear as separate entities in the report model (there is nothing saying the Sales in one cube is exactly the same as Sales in another)
4. When can we expect a UDM performance whitepaper?
One is being worked on.
Wed, Jun 28 2006 3:11 AM