Linked Attribute Hierarchies

In UDM, the cube space is defined by attribute hierarchies. Dimensions are just logical containers of attribute hierarchies. It will be great if the next release of SQL Server (Katmai) could expand further on the attribute nature of UDM and solve some nagging issues that modelers currently face.

Let’s consider an example. Say, you have Geography and Customer dimension tables (see the AdventureWorksDW database). It is logical to expect that end users may be willing to browse data by the Geography-Customer hierarchy. As common this requirement is, modelers today need to make some tradeoffs to meet this requirement.

To start with, you could embrace the star schema and build two UDM dimensions (Geography and Customer) on top of the corresponding dimension tables. Then, the end users can drop these two dimensions side-by-side in their favorite OLAP browser to slice the data by Geography and Customer. Behind the scenes, the server will cross-join these two dimensions . Although cross-joing dimensions may meet the business need, there are a couple of well-known issues with this approach. First, you won’t be able to define easily useful hierarchy-related calculations, such as ratios of children to parent because there are no user-defined hierarchies. Second, cross-joining large dimensions may impact performance.

Alternatively, you could decide to create a user-defined hierarchy (aka multi-level hierarchy) that spans Geography and Customer attributes. This is where the things get trickier. Assuming a star schema, both dimension tables would join the fact table. To build a Customer dimension spanning both tables, you need to bridge them via the fact table – a definite no-no situation. The other solution is to duplicate the Geography columns to the Customer dimension table but this leads to duplication of database and UDM storage and will complicate the ETL processes.

At this point, ditching the star schema and showflaking the Geography table off the Customer table starts to look appealing. Now, you could build the Customer dimension on top of the Geography and Customer dimension tables – the approach that the sample AdventureWorks UDM takes. The first tradeoff is that you may need to fight an uphill battle with fans of the “classic” dimensional modeling that swear by star schemas. Second, this approach still results in duplication of UDM storage. That’s because, when the Geography and Customer dimensions are processed, the server will still build all dimension attribute hierarchies irrespective of the fact that some of them of identical. For example, if you have a City attribute (from the Geography table) in the Customer dimension, the server will build two City attribute hierarchies – one when building the Geography dimension and another one when building the Customer dimension.

It will be nice if a future SSAS version supports linking attributes from one dimension to another. With this feature, the database schema (star or snowflake) could become irrelevant. The City attribute from the Geography dimension could be linked to the Customer dimension. Of course, the cube schema will still be constrained by the underlying database schema and table relationships. However, cross-dimension attribute relationships will certainly enable more flexible scenarios.