Building a Custom ETL Framework with SSIS 2012

Even though the SSIS catalog and the new project deployment mode in SQL Server 2012 take care of many of the mundane monitoring needs, a custom ETL framework is still important for handling features such as restartability and parallelism. Join our next Atlanta BI User Group meeting on Monday, March 31st when Aneel Ismaily will talk about this subject in his “Building a Custom ETL Framework with SSIS 2012” presentation.

“ETL Frameworks are the foundation of any data warehouse/data mart implementation. In this session we will discuss building a heavy-duty enterprise ETL Framework and will talk in detail about some major ETL enhancements available in SQL Server 2012. Topics include auditing, logging, designing for fault tolerance and recoverability, handling orchestration, parallelism and finally scheduling.”

X-IO will sponsor the meeting and present their Intelligent Storage Element fast storage product.

Dynamic Dimensions

Scenario

You need to assign a dimension dynamically to a cube measure group, such as to calculate binning. In the extreme example, you might need to perform fact row level evaluation to determine where the dimension “fits” in. Consider the following example from the insurance industry. An ODS-style Claim table interprets every change to the claim table as a Type 2 change by creating a new row and expiring the previous row of the claim.

Claim_KeyStart_DateEnd_DateClaim_NumberStatus
11/1/20105/1/2010C00001New
25/2/20103/8/2012C00001Open
33/9/201212/31/9999C00001Closed

 

Options

Given this scenario, you might want to count the number of claims as of any date. You have two options:

  1. Create a Claim Snapshot fact table taken at a daily or monthly interval.
  2. Dynamically associate the Date dimension. In Multidimensional, this will require creating a Claim dimension and a Claim measure group, both bound to the same Claim table. Then, you can use the following script to “associate” the Date dimension (added as a unrelated dimension to the cube) to the Claim measure group:

CREATE MEMBER CURRENTCUBE.[Measures].[ClosingAccountingDate] AS [Date].[Calendar].MemberValue, VISIBLE = 0 ;

Scope

(

(

MeasureGroupMeasures(“Fact Claim”)

{

[Measures].[Start Date],

[Measures].[End Date]

}

),

[Date].[Calendar].Members,

Leaves([Claim])

);

this = iif ([Measures].[ClosingAccountingDate]>=[Measures].[Start Date] and [Measures].[ClosingAccountingDate]<=[Measures].[End Date],

Measures.CurrentMember, null);

End Scope;

End Scope;

To simplify the evaluation, the MemberValue property of all attributes of the Date dimension is bound to the closing date of the period, e.g. 12/31/2014 for year 2014, 3/31/2014 for any day in Q1, and 1/31/2014 for any day in January 2014. The scope assignment includes all measures in the Fact Claim measure group, except the Start Date and End Date measures since we don’t want to overwrite them. It also scopes on all members of the Calendar hierarchy of the Date dimension so that the calculation is performed at any level of the Calendar hierarchy. For example, if the user selects 2014 as a year, the calculation is performed as of 12/31/2014. Moreover, the Leaves([Claim]) scope positions at the measure leaves so we can evaluate any row in the fact table by comparing if the ‘as of’ date falls in the range of Start Date and End Date for that row.

032314_2359_DynamicDime1

Are we happy? We might be but end users might not be so excited. The problem is that this remarkably simple scope assignment can put a remarkable dent in the query performance. For any date, the server has to perform the evaluation for every row in the fact table. With a Claim dimension and a measure group with 500,000 rows, calculating the counts executes relatively fast (within a few seconds). However, running a trend query that requests counts for every day in a quarter might take a minute or so. The larger the claim dimension and the cube subspace that the query requires, the worse the performance will be. The performance is further impacted if you have other calculations on top of the claim counts.

Scope assignments are a very powerful and useful feature of Multidimensional but you shouldn’t abuse them. Keep the scope as narrow of possible to limit the cube space where the calculation is applied. Although it might require ETL effort, materialize whenever you can. For example, although it might result in millions of rows, a snapshot would perform much better with the above scenario. Further, performance can be further enhanced by partitions and aggregations while dynamic granular calculations won’t.

KPIs on Smartphone

Some of your might be familiar with the PushBI mobile BI offering of Extended Results. The company was recently acquired by Tibco. Now rebranded as Tibco Spotfire Metrics, its mobile BI offering is now available for the most popular mobile platforms, including Windows Phone and Windows 8. As its documentation explains, Spotfire Metrics supports surfacing KPIs from a variety of data sources, including Analysis Services. If you’re looking for ways to present KPIs to mobile phones, Spotfire Metrics could fill in the gap.

030214_2329_KPIsonSmart1

Analyzing Power Pivot Models with Tableau

Now that Tableau has been highly praised by Gartner, you might be looking for ways to make the best of both self-service BI tools: the great data modelling capabilities and programming richness of Power Pivot and the visualization strengths of Tableau. Interestingly, besides published to SharePoint Power Pivot workbooks, Tableau can connect to local Power Pivot models and send MDX queries (as it does when connected to Multidimensional and Tabular).

022414_2006_AnalyzingPo1

This works thanks to the Tableau PowerPivot Connector wrapper (Microsoft SQL Server 2012 PowerPivot for Excel) that you can install from the Tableau website. Read this document to decide which one to use.

022414_2006_AnalyzingPo2

2014 BI Outlook: Who’s Hot, Who’s Not

While Microsoft has fallen a bit in the Gartner’s eye, according to InformationWeek, it’s the hottest BI vendor6378.whohot.png-550x0 in 2014. I guess the reason is that Gartner focuses predominantly on visualization, which is just one of the BI manifestations, while InformationWeek looks at the entire BI ecosystem.

 

 

Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms 2014 Released

Gartner released the 2014 update of the Business Intelligence and Analytics Platforms Magic Quadrant. Interestingly, Gartner moved the predictive capabilities to a new Magic Quadrant for Advanced Analytics Platforms and dropped scorecards. The most interesting aspect of this report for me was the Market Overview section at the end. According to Gartner, the most prevalent future BI trends will be:

  1. 7% BI annual growth all the way until 2017
  2. Visual data discovery
  3. Easier data preparation
  4. Collaboration and social analysis
  5. Growth in Cloud BI
  6. Real-time BI
  7. Deemphasizing the BIG DATA hype

Join me at the Atlanta MS BI Group meeting tonight to discuss item 5 and at SQL Saturday in Atlanta on May 3rd to talk about item 6.

022414_1915_GartnersMag1

Atlanta MS BI Group Meeting on February 24th

Atlanta BI fans,

Join us for the next Atlanta MS BI Group meeting on Monday, February 24th. Lance England will present “introduction to MDX”. Computer Associates will sponsor our meeting and show us the latest Erwin release. I’ll also talk briefly and present Power BI now that it went live. Spread the word and invite your coworkers but RSVP on our website so we can plan food accordingly. Also, don’t forget to register for the SQL Saturday precons that will take place on May 2nd, the day before SQL Saturday Atlanta.

Main Presentation:MDX Fundamentals
 Level: Intermediate
Date:Monday, February 24th, 2014
Time6:30 – 8:30 PM ET
Place:South Terraces Building (Auditorium Room)

115 Perimeter Center Place

Atlanta, GA 30346

Overview:MDX is a powerful language, but the price of that power is a higher degree of complexity. The key to understanding MDX is to learn the fundamental concepts and have a mental model of how it works. This presentation is for anyone who has struggled with the MDX learning curve, or is interested in starting down the MDX path.
Speaker:Lance England is an Atlanta-area data professional specializing in data warehouse and business intelligence with a Microsoft-focus. He welcomes learning and networking with others. His blog is at http://lance-england.com and Twitter-handle is @lanceengland.
Sponsor:CA Technologies

CA Technologies’ portfolio of software and services helps our customers speed time to value, ensure critical application performance and simplify security and operations so they can drive innovation and quality for their customers. CA ERwin Data Modeler is an industry-leading data modeling solution that provides a simple, visual interface to manage your complex data environment both on premises and cloud.

When Dynamic Sets Are Not Dynamic

In my “Optimizing Dimension Data Security” blog, I’ve mentioned that converting static sets to dynamic could reduce the connection initialization time. To my surprise, using Excel to connect to the cube triggered the executing of dynamic sets and end users reported long wait times for the connection to initialize. It turned out that when Excel connects to the cube, it issues DISCOVER statements to query the metadata. One of these statements queries MDSCHEMA_SETS to find out what sets are available in the cube. For some obscure reason, this triggers the actual set evaluation for dynamic sets. If the set is expensive, e.g. touches all partitions, the DISCOVER statement can surely delay the connect time.

It turned out that other users are experiencing the same issue even if security is not applied. Greg Galloway found a workaround and extended the Analysis Services Stored Procedure project with a CurrentCommandIsDiscover method. You can use this method to check if the tool sends a DISCOVER command in your dynamic set definition and short-circuit the set evaluation, like so:

Create Dynamic Set CurrentCube.[Top 25 Selling Products] As

IIf(

ASSP.CurrentCommandIsDiscover(),

Head([Product].[Product].[Product].Members,0),

TopCount

(

[Product].[Product].[Product].Members,

25,

[Measures].[Sales Amount]

)

),

Display_Folder = ‘Sets’;

The Office Click-To-Run Setup

As you’ve probably head, Office 2013 supports now two installation options: the traditional MSI-based installation and the new Click-To-Run streaming installation. Chris Webb mentioned about it here and Melissa Coates describes how it works in more details here. The MSI setup is a perpetual one (you pay for a version once and you’re entitled to fixes for it) while the C2R setup is a subscription-based Office 365 setup (you continuously pay for using the software and you’re entitled to fixes and the latest features within the SKU you’re subscribed to). Perpetual installations will get updates (cumulative updates and service packs) just like they’re used to but they are meant primarily to be fixes rather than features. On the other hand, Office 365 subscribers have the benefit of getting fixes and new features as long as their subscription is active. Currently, there is no way to switch your existing Office installation from MSI to subscriber-based or vice versa. You must uninstall Office 2013 and reinstall. Once you do this, you’ll find that there is no difference as far as user experience. C2R still installs Office on the desktop although in different location.

The C2R setup has important ramifications on self-service BI. C2R users will have an always up-to-date service allowing Microsoft to add new functionality to the Office applications at a much faster rate. We’ve already seen this with the synonyms feature that are used for natural queries, aka Q&A (Q&A requires Power BI). Although I’ve initially dismissed the streaming installation, the C2R option now seems very attractive. I’ve already have an Office 365 subscription for e-mail and SharePoint. Shelving a few more bucks to upgrade to Power BI and stay on the latest and greatest seems like a good value proposition because I don’t have to wait for Office.NEXT to get the latest and greatest. More information about Office 365 plans can be found here. As an extension to Office 365, Power BI charges a premium as explained here.

A roadmap of C2R planned features is currently in the works so customers know the details of what’s coming up and when.

Best Visualization Tool for Dashboards?

More and more organizations are planning and adopting dashboards. And, the question “which visualization tool is the best for dashboarding?” has been asked more frequently. The visualization aspect of BI has been rapidly evolving. There are plenty of vendors and plethora of tools out there. And, they keep on leap-frogging each other and each one has its strong sides and limitations. More than likely, the darlings of today will be forgotten in a few years. So, a quest to find the perfect tool that does it all is likely to fail or it will be short-lived.

I’m not a visualization expert. When it comes to visualizations, I listen to guidance from experts. If you navigate to slide 30 of my “Best Practices for Implementing Enterprise BI Solution”, you’ll see a mockup of a sales dashboard by Stephen Few that is taken from his excellent “Information Dashboard Design” book. This is an example of a pixel-perfect layout with a lot of customizations, such as conditional formatting. No fancy gauges, no bouncing needles, the focus is on intuitive interpretation of data and not on the visualization fluff. If you decide to adopt such visualization standards, you probably already own a great tool for implementing dashboards – SQL Server Reporting Services – which supports bullet graphs, sparklines, as well as a high level of customization.

So, instead of investing in a myriad of tools and hoping that they will solve your BI challenges, my advice would be to spend your money on a solid architecture that would easily allow you to support multiple visualization tools and swap tools as new ones come on board. For example, if you like and decide to adopt Tableau for interactive data visualization and exploration, you’ll find that integrates nicely with SSAS. Here is a dashboard that was done without asking end users to take care of the data logistics and business calculations since this work has already been taken care of in the backend layer. You could also easily implement a similar interactive dashboard with Power View which might not have all visualizations that third-party tools now have but gains in other areas. Please don’t take this dashboard as a best practice, it was meant to only show the possibility of tool integration.

021614_1808_BestVisuali1

But you don’t have time and budget for beautiful architectures, right? Since you don’t get much help on the IT side of things (if you have IT at all), you want to delegate BI and let Business to take things in their own hands. However, no matter what visualization vendors would tell you, soon or later you’ll find that self-service BI still requires a proper foundation, as Key Unkroth explains nicely in his “Self-Service Business Intelligence Governance” presentation. At minimum, a department would need some sort of data repository that integrates and cleans the data before letting end users in. But if you’ve gone that far, before implementing dashboards, why not add a semantic layer that centralizes business logic and it’s supported by most popular visualization tools out there?