With all the tremendous interest around BI, new vendors and tools are emerging almost every day. In general, you can approach your BI needs in two ways.
You can try a top-down approach starting with the presentation layer, hoping that a cool data visualization tool and self-service BI will somehow solve your challenges. Lots of vendors out there would love to take you on that path.
You can follow a bottom-up approach that starts with a solid data foundation and semantic layer that enables a single version of the truth and it is supported by most popular visualization tools.
I had the pleasure to teach a class this week for the HR department of one of the largest and most successful companies. They have an ambitious goal to establish a modern data analytics platform in order to gain insights into all aspects of their workforce. Their manager told that they have tried unsuccessfully the top-down approach and multiple vendor tools until they realized that the focus should be on the data first. And, I agree completely. There are no shortcuts.
On this note, join me for a full-day precon session “Deep Dive into the Microsoft BI Semantic Model (BISM)” at SQL Saturday Atlanta on May 2st to find out how the Microsoft BI helps you deliver a modern BI platform, as well as discussing the toolset strengths and challenges.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2014-04-06 23:17:492016-02-15 07:38:50Where is Your Focus?
The self-service BI journey starts with the business user importing data. With Microsoft Power Pivot, we encourage the user to import tables and create relationships among these tables, similar to what they would do with Microsoft Access. This brings tremendous flexibility because it allows the user to incrementally add new datasets and implement sophisticated models for consolidated reporting, such as for analyzing reseller and internet sales side by side. True, Power Pivot relationships have limitations, including:
The lookup table must have a primary key and the relationship must be established using this key. As it stands, Power Pivot doesn’t support multi-grain relationship where the fact table joins the lookup table at a higher grain than the primary key.
Many-to-many relationships (such as a joint bank account) are not natively supported and currently require simple DAX formulas to resolve the relationship, such as CALCULATE ( SUM ( Table[Column] ), BridgeTable)
Closed loop relationships (Customer->Sales->Orders->Customer) are not allowed, etc.
The chances that these constraints will be relaxed in time as Power Pivot and Tabular evolve to allow you to meet even more involved requirements on a par with organizational BI models.
With the recent popularity of other tools for self-service BI tools, I took a closer look at their data features. Tableau, for example, encourages users to work with a single dataset. If the user wants to import data from multiple tables, the user must create table joins to relate the tables before data is imported in order to create a dataset that has all the required data. This probably meets 80% of self-service BI requirements out there although it might preclude consolidated analysis. Suppose you want to analyze reseller sales by customer. With Tableau, you need to join the Customer and ResellerSales tables in order to prepare the dataset and that’s probably OK to meet this requirement. At some point, however, suppose you need to bring also Internet sales which are stored in a separate table. Now you have an issue. If you opt to change the dataset and join the Customer, ResellerSales, and InternetSales, you’ll end up with duplicated data because you can’t join ResellerSales to InternetSales (they don’t have a common field). In Power Pivot, you could simply address this by importing Customers, ResellerSales, and InternetSales as separate tables and creating two relationships Customer->ResellerSales and Customer->InternetSales. Notice that a Power Pivot relationship happens after the tables are imported although it’s possible of course to establish a data source join during the data import. A Tableau workaround for the above scenario would be to go for “data blending”. Speaking of which…
It’s also interesting how Tableau addresses combining data from separate data sources. Referred to as “data blending”, this scenario requires the user to designate one of the data sources as primary and specifying matching fields from the two data sources to perform the join. Behind the scenes, Tableau executes a post-aggregate join that aggregates data at the required grain. Let’s say, you import the Customer table from Data Source A and InternetSales table from Data Source B. Suppose that both the Customer and InternetSales datasets have a State field which you want to use for the join. With data blending, Tableau will first aggregate the InternetSales dataset at the State level (think of SUM(SALES) FROM … GROUP by STATE) and then join this dataset to the Customer dataset on the State field. The advantage of this approach is that you can join any two datasets as long as they have a common field (there is no need for a primary key in the Customer table). The disadvantage is that the secondary data source can’t be joined to another (third) data source. Further, this approach precludes more complicated data entity relationships, such as many-to-many.
When choosing a self-service BI tool, you need to carefully evaluate features and one of the most important criteria is the tool data capabilities. Some tools are designed for one-off analysis on top of a single dataset and might require importing the same data to meet different reporting requirements. Power Pivot gives you more flexibility but requires end users to know more about data modeling, tables and relationships.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2014-03-30 19:52:002016-02-15 07:41:39Data Models for Self-Service BI
Even though the SSIS catalog and the new project deployment mode in SQL Server 2012 take care of many of the mundane monitoring needs, a custom ETL framework is still important for handling features such as restartability and parallelism. Join our next Atlanta BI User Group meeting on Monday, March 31st when Aneel Ismaily will talk about this subject in his “Building a Custom ETL Framework with SSIS 2012” presentation.
“ETL Frameworks are the foundation of any data warehouse/data mart implementation. In this session we will discuss building a heavy-duty enterprise ETL Framework and will talk in detail about some major ETL enhancements available in SQL Server 2012. Topics include auditing, logging, designing for fault tolerance and recoverability, handling orchestration, parallelism and finally scheduling.”
X-IO will sponsor the meeting and present their Intelligent Storage Element fast storage product.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2014-03-24 20:36:042016-02-15 07:42:45Building a Custom ETL Framework with SSIS 2012
You need to assign a dimension dynamically to a cube measure group, such as to calculate binning. In the extreme example, you might need to perform fact row level evaluation to determine where the dimension “fits” in. Consider the following example from the insurance industry. An ODS-style Claim table interprets every change to the claim table as a Type 2 change by creating a new row and expiring the previous row of the claim.
Claim_Key
Start_Date
End_Date
Claim_Number
Status
1
1/1/2010
5/1/2010
C00001
New
2
5/2/2010
3/8/2012
C00001
Open
3
3/9/2012
12/31/9999
C00001
Closed
Options
Given this scenario, you might want to count the number of claims as of any date. You have two options:
Create a Claim Snapshot fact table taken at a daily or monthly interval.
Dynamically associate the Date dimension. In Multidimensional, this will require creating a Claim dimension and a Claim measure group, both bound to the same Claim table. Then, you can use the following script to “associate” the Date dimension (added as a unrelated dimension to the cube) to the Claim measure group:
CREATEMEMBERCURRENTCUBE.[Measures].[ClosingAccountingDate] AS [Date].[Calendar].MemberValue, VISIBLE = 0 ;
Scope
(
(
MeasureGroupMeasures(“Fact Claim”)
–
{
[Measures].[Start Date],
[Measures].[End Date]
}
),
[Date].[Calendar].Members,
Leaves([Claim])
);
this = iif ([Measures].[ClosingAccountingDate]>=[Measures].[Start Date] and [Measures].[ClosingAccountingDate]<=[Measures].[End Date],
Measures.CurrentMember, null);
EndScope;
EndScope;
To simplify the evaluation, the MemberValue property of all attributes of the Date dimension is bound to the closing date of the period, e.g. 12/31/2014 for year 2014, 3/31/2014 for any day in Q1, and 1/31/2014 for any day in January 2014. The scope assignment includes all measures in the Fact Claim measure group, except the Start Date and End Date measures since we don’t want to overwrite them. It also scopes on all members of the Calendar hierarchy of the Date dimension so that the calculation is performed at any level of the Calendar hierarchy. For example, if the user selects 2014 as a year, the calculation is performed as of 12/31/2014. Moreover, the Leaves([Claim]) scope positions at the measure leaves so we can evaluate any row in the fact table by comparing if the ‘as of’ date falls in the range of Start Date and End Date for that row.
Are we happy? We might be but end users might not be so excited. The problem is that this remarkably simple scope assignment can put a remarkable dent in the query performance. For any date, the server has to perform the evaluation for every row in the fact table. With a Claim dimension and a measure group with 500,000 rows, calculating the counts executes relatively fast (within a few seconds). However, running a trend query that requests counts for every day in a quarter might take a minute or so. The larger the claim dimension and the cube subspace that the query requires, the worse the performance will be. The performance is further impacted if you have other calculations on top of the claim counts.
Scope assignments are a very powerful and useful feature of Multidimensional but you shouldn’t abuse them. Keep the scope as narrow of possible to limit the cube space where the calculation is applied. Although it might require ETL effort, materialize whenever you can. For example, although it might result in millions of rows, a snapshot would perform much better with the above scenario. Further, performance can be further enhanced by partitions and aggregations while dynamic granular calculations won’t.
Some of your might be familiar with the PushBI mobile BI offering of Extended Results. The company was recently acquired by Tibco. Now rebranded as Tibco Spotfire Metrics, its mobile BI offering is now available for the most popular mobile platforms, including Windows Phone and Windows 8. As its documentation explains, Spotfire Metrics supports surfacing KPIs from a variety of data sources, including Analysis Services. If you’re looking for ways to present KPIs to mobile phones, Spotfire Metrics could fill in the gap.
Now that Tableau has been highly praised by Gartner, you might be looking for ways to make the best of both self-service BI tools: the great data modelling capabilities and programming richness of Power Pivot and the visualization strengths of Tableau. Interestingly, besides published to SharePoint Power Pivot workbooks, Tableau can connect to local Power Pivot models and send MDX queries (as it does when connected to Multidimensional and Tabular).
This works thanks to the Tableau PowerPivot Connector wrapper (Microsoft SQL Server 2012 PowerPivot for Excel) that you can install from the Tableau website. Read this document to decide which one to use.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2014-02-24 20:06:002021-02-16 04:29:48Analyzing Power Pivot Models with Tableau
While Microsoft has fallen a bit in the Gartner’s eye, according to InformationWeek, it’s the hottest BI vendor in 2014. I guess the reason is that Gartner focuses predominantly on visualization, which is just one of the BI manifestations, while InformationWeek looks at the entire BI ecosystem.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2014-02-24 19:25:002016-02-15 07:59:052014 BI Outlook: Who’s Hot, Who’s Not
Gartner released the 2014 update of the Business Intelligence and Analytics Platforms Magic Quadrant. Interestingly, Gartner moved the predictive capabilities to a new Magic Quadrant for Advanced Analytics Platforms and dropped scorecards. The most interesting aspect of this report for me was the Market Overview section at the end. According to Gartner, the most prevalent future BI trends will be:
7% BI annual growth all the way until 2017
Visual data discovery
Easier data preparation
Collaboration and social analysis
Growth in Cloud BI
Real-time BI
Deemphasizing the BIG DATA hype
Join me at the Atlanta MS BI Group meeting tonight to discuss item 5 and at SQL Saturday in Atlanta on May 3rd to talk about item 6.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2014-02-24 19:15:002016-02-15 08:51:03Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms 2014 Released
Join us for the next Atlanta MS BI Group meeting on Monday, February 24th. Lance England will present “introduction to MDX”. Computer Associates will sponsor our meeting and show us the latest Erwin release. I’ll also talk briefly and present Power BI now that it went live. Spread the word and invite your coworkers but RSVP on our website so we can plan food accordingly. Also, don’t forget to register for the SQL Saturday precons that will take place on May 2nd, the day before SQL Saturday Atlanta.
Main Presentation:
MDX Fundamentals
Level: Intermediate
Date:
Monday, February 24th, 2014
Time
6:30 – 8:30 PM ET
Place:
South Terraces Building (Auditorium Room)
115 Perimeter Center Place
Atlanta, GA 30346
Overview:
MDX is a powerful language, but the price of that power is a higher degree of complexity. The key to understanding MDX is to learn the fundamental concepts and have a mental model of how it works. This presentation is for anyone who has struggled with the MDX learning curve, or is interested in starting down the MDX path.
Speaker:
Lance England is an Atlanta-area data professional specializing in data warehouse and business intelligence with a Microsoft-focus. He welcomes learning and networking with others. His blog is at http://lance-england.com and Twitter-handle is @lanceengland.
Sponsor:
CA Technologies
CA Technologies’ portfolio of software and services helps our customers speed time to value, ensure critical application performance and simplify security and operations so they can drive innovation and quality for their customers. CA ERwin Data Modeler is an industry-leading data modeling solution that provides a simple, visual interface to manage your complex data environment both on premises and cloud.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2014-02-21 15:35:002016-02-15 08:53:28Atlanta MS BI Group Meeting on February 24th
In my “Optimizing Dimension Data Security” blog, I’ve mentioned that converting static sets to dynamic could reduce the connection initialization time. To my surprise, using Excel to connect to the cube triggered the executing of dynamic sets and end users reported long wait times for the connection to initialize. It turned out that when Excel connects to the cube, it issues DISCOVER statements to query the metadata. One of these statements queries MDSCHEMA_SETS to find out what sets are available in the cube. For some obscure reason, this triggers the actual set evaluation for dynamic sets. If the set is expensive, e.g. touches all partitions, the DISCOVER statement can surely delay the connect time.
It turned out that other users are experiencing the same issue even if security is not applied. Greg Galloway found a workaround and extended the Analysis Services Stored Procedure project with a CurrentCommandIsDiscover method. You can use this method to check if the tool sends a DISCOVER command in your dynamic set definition and short-circuit the set evaluation, like so:
Create Dynamic Set CurrentCube.[Top 25 Selling Products] As
IIf(
ASSP.CurrentCommandIsDiscover(),
Head([Product].[Product].[Product].Members,0),
TopCount
(
[Product].[Product].[Product].Members,
25,
[Measures].[Sales Amount]
)
),
Display_Folder = ‘Sets’;
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2014-02-19 00:54:002016-02-15 08:56:45When Dynamic Sets Are Not Dynamic