Analysis Services Relationship Limitations and Workarounds

Relationships in analytical models (Multidimensional and Tabular) are both a blessing and a curse. Once defined by the modeler, relationships power “slicing and dicing” data. The analytical tool doesn’t have to worry about and include the relationships in the query because they are defined in the model and the server “knows” how to use them to aggregate the data. On the other hand, as they stand, all analytical model flavors (Multidimensional, Tabular, and PowerPivot) have inherent limitations. The following table compares relationship limitations between Multidimensional and Tabular (PowerPivot).

MultidimensionalTabular and PowerPivot
Many-to-ManySupportedNot supported
Parent-childSupportedRequires flattening the hierarchy
Role-playingSupportedNot supported
Multi-grainSupportedNot supported
Fact-to-factNot supportedSupported but rarely useful

 

Let’s discuss these limitations in more details.

Many-to-Many

The classic example of a many-to-many relationship is a joint bank account that is owned by multiple customers. Many-to-many relationships are natively supported in Multidimensional and the server produces correct results when aggregating the data. Tabular and PowerPivot don’t support many-to-many out of the box. When the relationship One-To-Many flow is reversed through a bridge table, Tabular will let you continue but produce wrong results. In PowerPivot, this limitation manifests with the “Relationships might be needed” warning in the PowerPivot Field List. The workaround is to introduce DAX explicit measures for each column that requires aggregation using the formula =CALCULATE (SUM (FactTable[Column] ), BridgeTable). On the upside, due to the Tabular memory-resident nature, expect M2M over a large fact table to perform much better in comparison with Multidimensional.

Parent-Child

A parent-child relationship defines a recursive join between two columns, such as an employee and a manager. This relationship type is supported natively in Multidimensional but requires flattening the hierarchy using DAX calculated columns for each level in the hierarchy. Flattening the hierarchy is accomplished by using DAX PATH-related functions.

Role-playing

A role-playing relationship allows a dimension (lookup) table to be joined multiple times to a fact table, such as to join a Date dimension multiple times to a Sales fact table in order to support aggregating the fact data by OrderDate, ShipDate, and DueDate. Multidimensional supports role-playing relationships natively. Tabular doesn’t and flags only one of the relationships as active. The first (and recommended for most scenarios) workaround is simply to reimport the Date table as many times as needed. This gives you the flexibility to control the schema and data between the Date table instances, e.g. the ShipDate table includes only valid ship dates, at the expense of storage and increased model complexity. The second more complicated workaround is to create calculated measures, such as ShipAmount, DueAmount, etc., that use the DAX USERELATIONSHIP function to traverse the inactive relationships. This approach presents maintenance challenges because the chances are that the fact table will have many columns and you’ll have to create role-playing variants for each measure that needs to be aggregated.

Multi-grain

A multi-grain relationship exists when a dimension joins a fact table at a higher level than the dimension key. For example, you might have an Organization table that goes all the way down to business unit. However, an Expenses fact table might record expenses at a higher level in the organizational hierarchy, such as Division. Consequently, the modeler needs to set up a relationship from the Expenses fact table to the Division column as opposed to the Business Unit column (dimension key). Multidimensional supports this scenario out of the box. Tabular doesn’t because the joined column on the One side of the relationship must be the dimension key which uniquely identifies each row in the dimension table. The workaround is to break the Organization dimension table into two tables: Organization and OrganizationDivision in order to join the OrganizationDivision table to the Expense fact table on the Division column which is now the key column. Then, you need to join the OrganizationDivision table to the Organization table for consolidated reporting purposes, such as to see Sales and Expenses side by side.

Fact-to-fact

Suppose that your model has two fact tables: OrderHeaders and OrderLineItems. You would like to have a transactional report that shows the line items for each order. Let’s say that that the OrderHeaders fact table has an OrderID column and the OrderLineItems have a matching OrderID column. On a first thought, you might attempt to solve the problem by creating a relationship between the OrderHeaders and OrderLineItems tables. However, Multidimensional doesn’t support fact-to-fact relationships. That’s because the classic OLAP model insists on having relationships between dimensions and fact tables only. To solve this predicament in Multidimensional you need a PO Order dimension that joins both OrderHeaders and OrderLineItems tables. However, the chances are that you might end up with another complication if you take this route. This might not be applicable to the order header-line items scenario but what if the rows in the two fact tables have different dates. If a Date dimension joins them and you attempt to slice by date, the business question will be “Show me orders and order line items that have matching dates”. This might or might not be correct depending on your situation. So, this is where you’ll find that although very useful, relationship in analytical models might not be that flexible for ad hoc analysis.

 

Thanks to its more relaxed schema requirements, Tabular doesn’t care about the table type (dimension or fact) and it will let you set up a relationship between OrderHeaders and OrderLineItems as long as they comply to the One-To-Many rule . However, you’ll face another limitation that Tabular doesn’t support closed-loop relationships. More than likely, the OrderHeaders and OrderLineItems will have relationships to other lookup tables already, such as Account, Organization, etc. Once you create a fact-to-fact relationship, you’ll close the relationship loop. Tabular and PowerPivot will detect this and automatically deactivate the fact-to-fact relationship (its line will be dotted in the diagram view). Consequently, you must resort to the Multidimensional workaround to create an OrderHeader lookup table. Another workaround is to merge the OrderHeaders and OrderLineItems tables into a single table. Or, create a model that has only the OrderHeaders and OrderLineItems tables so you can create the fact-to-fact relationship.

 

Relationships are a challenging concept to grasp. As a modeler, you need to understand their limitations and workarounds while hoping that a future version of Analysis Services will relax or remove some of the existing constraints.

Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services

A really impressive white paper/book with insights about Tabular performance and inner-workings. Kudos to the authors for producing this colossal resource.

“Tabular models hosted in SQL Server 2012 Analysis Service provide a comparatively lightweight, easy to build and deploy solution for business intelligence. However, as you increase the data load, add more users, or run complex queries, you need to have a strategy for maintaining and tuning performance. This paper describes strategies and specific techniques for getting the best performance from your tabular models, including processing and partitioning strategies, DAX query tuning, and server tuning for specific workloads.”

Speaking at Atlanta Code Camp

I’ll be speaking at Atlanta Code Camp on August 24th. Given the developer-oriented nature of the event, I settled on a more code-intensive topic: Real-time BI with StreamInsight.

“So, you have (or heard of) classic BI, self-service BI, Big Data BI, descriptive, predictive or even prescriptive BI, but do you have real-time BI? Real-time business intelligence (BI) has become one of the top priorities for data analytics. The ability of having near real-time access to large amounts of data across heterogeneous data sources is one of the key factors to enable effective business decisions. Join this talk to learn how to implement real-time analytics over streams of events using StreamInsight and .NET. You’ll see how these technologies come together in a Big Data Twitter demo that allows you to analyze the Twitter stream in real time.”

Solving Laptop Overheating Problems

Off the BI path but I wanted to save this tip for future generation and my sake in case I forget and I run into the same issue in Windows.NEXT or a new laptop. My $3,500 HP 3-year old EliteBook 8540w is prone to overheating. If a task puts a sustained pressure on the CPU, e.g. 10% or more, the fan engages and produces a noise like a howling wind in winter. It’s not so bad but it’s good enough to destroy my faith in HP… and embarrass me during meetings. The bottom of the laptop gets hot to a point where it can’t be used a laptop anymore. Eight months after I bought the laptop it became so bad that it would shut down from overheating so HP had to replace its motherboard and heat sink. And, now that the 3-year warranty expired it’s getting worse again.

Anyway, what got me curious is what causes CPU activity when the laptop is idle so I could minimize the level of its noise and embarrassment. I open the Windows 8 resource monitor (resmon.exe) and Task Manager side by side. I’ve noticed the problem gets worse when Office 2013 is running. This causes the Windows Search Index (SearchIndexer.exe) and SearchProtocolHost.exe to spring in the background, drawing more and more CPU time over time. I ran across a tip on the Internet that this happens if the Outlooks folders are above 2 GB in size. True, one of the Outlook archive folders and my main folder were above 2 GB so I had to do some pruning.

Another thing I’ve noticed is that the Antilmalware Service Executable (MsMpEng.exe), aka Windows Defender, will poke his head every now and then and draw some CPU cycles contributing to the CPU overhead. The Resource Monitor would show that the Antilmalware Service Executable would scan C:\Temp\fslogvw (the screenshot is meant to show how you can check what files a process accesses).

071513_0155_SolvingLapt1

When this happens, the Task Manager would show mscorsvw.exe surging in CPU utilization. Mscorsvw.exe is precompiling .NET assemblies in the background, e.g. after a Windows update, so this is normal. When this happens, it generates .NET native images in C:\Temp\fslogvw so I had to exclude this folder and c:\windows\assembly from the Microsoft Defender real-time protection.

071513_0155_SolvingLapt2

I left the Search Indexer to scan the new Outlook folders. When it was done it appeared that the CPU utilization is back to normal when the laptop was idle. One more tip. Disassemble the laptop and remove the fan cover as explained in the HP Maintenance and Service Guide. Then, clean the dust in the fan socket. In my case, there wasn’t much dust but you should probably start with cleaning if you run into overheating issues.

Power BI Expands Microsoft Cloud Self-service BI

Expanding its Office 365 feature set, Microsoft introduced Power BI. Power BI is an umbrella name of the following BI offerings in the cloud: Power Pivot, PowerView, Power Query (previously known as Data Explorer), and Power Map (previously known as GeoFlow), plus more. Microsoft demonstrated Power BI at WPC yesterday (scroll to the 30th minute marker to see the Amir’s heart attack-provoking demo) or see the YouTube extract.

The “more” is the intriguing part to me:

  • Data Management Gateway – not sure how exactly this will work but it will provide some sort of connectivity to on-premise data so you can schedule PowerPivot data refreshes between the Microsoft cloud and your corporate data. Expect your ISP speed provider speed to be a limiting factor here so don’t hope for fast refreshes of millions of rows.
  • BI Sites – a dedicated SharePoint site that will support finding and collaborating on BI artifacts.
  • Mobile support – Power View is going HTML5!
  • Natural language query – Amir covered this interesting feature pretty well. I don’t know how it works behind the scenes yet (probably a natural language-to-DAX translator).
  • Data views – data stewards can publish datasets so they can be easily discoverable by Power Query.
  • King of the Hill – an interesting new PowerView visualization (demonstrated by Amir toward the end of the demo).

We don’t know at this point when these features will make to on-premise SharePoint, which probably would have been a better starting point for these enhancements. And, pricing hasn’t been announced yet. Look for Power BI toward the end of the summer if cloud self-service BI makes sense to your organization.

PerformancePoint Display Condition Not Working

Issue: A client reports an issue with our PerformancePoint dashboard where the clicking a KPI in the scorecard doesn’t show the supporting analytical grid. Normally, when the user clicks a row in the Operational Scorecard (see the image below), the SharePoint connectable architecture triggers an update of the Operational Detail Report Months analytical report that shows more details about the selected KPI.

 

Workaround: Finding the cause was tricky. To make things even trickier, the SQL Server Profiler would show “Query (1, 36) Parser: The syntax for ‘,’ is incorrect.” error so you would believe that there is something wrong with the MDX queries. This turned out to be an “innocent” error if there is such a thing. The reason for the issue with the report refresh not working was actually a PerformancePoint Display Condition bug presumably introduced by a recent SharePoint upgrade (more than likely the April CU). The display condition actually works if you put the page in Edit mode. While waiting for a fix from Microsoft, the workaround I found is to enable a default display condition:

  1. Open the dashboard in the Dashboard Designer.
  2. In the Analytical Report web part, right-click the Display Condition and then Edit Condition.
  3. Check the Default checkbox and redeploy the dashboard.

070313_1254_Performance2

This displays the default configuration of the Analytical Grid report which the user will see when opening the dashboard page. For some reason, this makes the KPI row clicks work. Sometimes, things get worse before getting better…

Microsoft SQL Server 2014 Community Technology Preview 1 (CTP1) Released

Microsoft released the Community Technology Preview 1 (CTP1) of SQL Server 2014. As I mentioned in my previous post, the accompanying Product Guide should get you started with the new features.

Microsoft SQL Server 2014 CTP1 Product Guide Released

In anticipation of SQL Server 2014 CTP1, which should be out soon, Microsoft released Microsoft SQL Server 2014 CTP1 Product Guide – a collection of white papers, slides, and other resources that discuss the new features to help you evaluate SQL Server 2014.

“The SQL Server 2014 CTP1 Product Guide is now officially available to customers and partners. The guide is intended to help you get the most value out of Microsoft SQL Server 2014 CTP1.”

Atlanta BI Group Meeting on Monday

The Atlanta Microsoft BI Group will have a meeting tomorrow, June 24th.

Main Presentation: Developing a Custom Task in SSIS 2012 Level: Intermediate

Date: June 24th Time 6:30 – 8:30 PM ET

Place: South Terraces Building (Auditorium Room) 115 Perimeter Center Place Atlanta, GA 30346

Overview: Integration Services uses tasks to perform units of work in support of the extraction, transformation, and loading of data. Integration Services includes a variety of tasks that perform the most frequently used actions, from executing an SQL statement to downloading a file from an FTP site. If the included tasks and supported actions do not completely meet your requirements, you can create a custom task. In this session we will demonstrate to you how to create custom SSIS tasks.

Speaker: Aneel Ismaily was born and raised in Karachi, Pakistan. He moved to the United States at the age of 18. Since then he has lived in Atlanta, GA. Aneel did his undergrad in Computer Science (BS) from Georgia State University (GSU) with concentration in Database Systems. He recently graduated with a professional MBA degree from Georgia State University with concentration in Organization Management and Entrepreneurship. Aneel owns MSBI Consulting, an IT consulting firm. MSBI Consulting provides Business Intelligence solutions to its customers. Prior to MSBI Consulting, Aneel was employed with Intellinet where he was working as a Principal Consultant. Before that he worked at RDA Corporation where he was working as a Sr. Software Engineer and before RDA he worked as a BI Solution Developer at BCD Travel. You can learn more about Aneel at http://www.linkedin.com/in/aismaily.

Sponsor: 3Sage Consulting Founded and led by real consultants who really care about the end deliverable, 3sage is untangling some of the most complex data issues in business today.

Prototypes with Pizza: Real-time BI with Big Data Demo by Teo Lachev

So, you have classic BI, self-service BI, Big Data BI, predictive BI, but do you have real-time BI? To demonstrate how classic BI, Big Data, and real-time BI can play together, Microsoft put together a great sample – Big Data Twitter Demo.

Big Data Twitter Demo

So, you have (or heard of) classic BI, self-service BI, Big Data BI, descriptive BI, predictive BI or even prescriptive BI, but do you have real-time BI? I’ve been doing quite bit of research and work in that area lately. As you could imagine, real-time BI requires a different architecture that is capable of processing streams of data (sometimes thousands of events) in real time. The Microsoft premium technology for Complex Event Processing (CEP) is StreamInsight (requires a SQL Server license). Microsoft has also a lightweight, open-source .NET library called RX which does event streaming but it doesn’t have many of the StreamInsight features, such as windowing. To demonstrate how classic BI, Big Data, and real-time BI can play together, Microsoft put together a great sample – Big Data Twitter Demo.

The demo allows you to subscribe to one or more Twitter topics of interest. It uses StreamInsight to listen to the Twitter activity and extract tweets that match the topics you “subscribe” to. In the screenshot below, I’m intercepting tweets about Microsoft and SQL Server. Then the demo saves the results to a SQL server table for offline analysis with PowerPivot and Power View (a sample Excel workbook with reports is included). In addition, the demo stores the results in a SQL Azure Hadoop cluster (HDInsight). I guess the idea is to truncate the operational SQL Server store on a regular basis while archiving all data on Hadoop for future analysis. The demo also includes a dashboard that displays the matching tweets and hit rate in real time. Behind the scenes, the application uses Web Sockets (IMO, SignalR would have been a better choice here since Web Sockets have limited browser and platform support) to communicate with the JQuery code on the client which updates the dashboard content. For more information about how all of this work, Mike Wilmot covers the demo in more details.

062013_2135_BigDataTwit1

This is a very impressive demo and I can imagine how much effort went into building it. I personally believe that we’ll see more demand for real-time applications, especially coupled with predictive analytics, such as detecting outliers or forecasting volumes.On the downside, a few days after Microsoft released the demo, Twitter discontinued Basic Authentication, which the demo uses to authenticate with Twitter (you need a Twitter account to run it). Twitter now uses OAuth so I had to tweak the code. Specifically, I added the OAuthTokens.cs and WebRequestBuilder.cs from the Patrick Smith’s Twitterizer library to the StreamInsight.Demos.Twitter.Common class library in the demo. In the same library, I changed the Read method in the TwitterStreaming class as follows:

public TextReader Read() {

var url = GetURL();

// Basic Authencation – Obsolete

//var request = HttpWebRequest.Create(url);

//request.Timeout = _config.Timeout; 

//request.Credentials = new NetworkCredential(_config.Username, _config.Password);

//var response = request.GetResponse();

// Twitter uses OAuth now which is much more complex to implement so you need wrapper classes, such as Twitterizer

OAuthTokens tokens = new OAuthTokens();

tokens.ConsumerKey = “<your Twitter consumer key>”;

tokens.ConsumerSecret = “<your Twitter consumer secret>”;

tokens.AccessToken = “<your Twitter access token>”;

tokens.AccessTokenSecret = “<your Twitter access token secret>”

WebRequestBuilder requestBuilder = new WebRequestBuilder(new Uri(url), HTTPVerb.GET, tokens);

var response = requestBuilder.ExecuteRequest();

return new StreamReader(response.GetResponseStream());

}

And, to get the dashboard to work, I had to use Safari since for some obscure reason Web Sockets won’t work for me with IE 10 on Windows 8. If I have time, I plan to cover StreamInsight and real-time BI in more details in future posts.