Best Practices for Implementing Enterprise BI Solution at SQL Saturday 2013

SQL Saturday 2013 in Atlanta was a raging success. Some 555 people attended which made it the most attended SQL Saturday even ever. I’ve uploaded the slides from my presentation “Best Practices for Implementing Enterprise BI Solution” to the Prologika site and Slideshare. The presentation had a great attendance and reviews. Thanks to everyone who attended it and it was great meeting all of you!

“Learn best practices to make your organization a center of BI excellence! I’ll walk you through lessons learned during our implementation of an enterprise end-to-end BI solution. Working experience with the dimensional modeling and the Microsoft BI stack is assumed.”

Leveraging Data to Revolutionize a Mature Business by Bijal Patel at Atlanta BI Group on May 20th

Due to Memorial Day, we’ll move our next Atlanta BI Group meeting to May 20th. The meeting will be sponsored by Strategy Analyzer. Our speaker will be Bijal Patel (Director of Data and Integration Services at Cox Media Group) and the topic is “Leveraging Data to Revolutionize a Mature Business”.

“Learn how the integration, management and analysis of data is helping Cox Media Group reinvent its business model. Leveraging SSIS, BizTalk, MDS and several other leading technologies, Cox Media Group is finding new ways to reach its customers and deliver innovative media products and services to its established customer base.”

This looks like a very interesting “from the trenches” presentation given the experience of the speaker and the business of its organization. I hope you can join us. Please register and RSVP on the Atlanta BI Group website.

Presenting at SQL Saturday in Atlanta

I’ll present at SQL Saturday in Atlanta on May 18th. Initially, I was planning to talk about dashboard options with the Microsoft BI stack but the organizers had a call for more advanced content. To accommodate this request, I’ll present Best Practices for Implementing an Enterprise BI Solution where I’ll share proven practices harvested from real-life projects.

“Learn best practices to make your organization a center of BI excellence! I’ll walk you through lessons learned during our implementation of an enterprise end-to-end BI solution, which is discussed in the Records Management Firm Saves $1 Million, Gains Faster Data Access with Microsoft BI case study published by Microsoft. Working experience with the dimensional modeling and the Microsoft BI stack is assumed.”

I’m looking forward to seeing you on May 18th.

Default Parameters in Power View

Scenario: You need to configure a Power View report to show data for a dynamic date, such as the current date or the last date with data. However, as it stands Power View doesn’t support expressions.

Workaround: Add a Boolean calculated column to the Date table that returns TRUE for the date of interest. For example, if you want the report to show data for today’s date, the expression might be:

=if([Date]=TODAY(), True, False)

where [Date] is the column with date data type. Then, use this column as a filter in Power View.

050413_1722_DefaultPara1

Unfortunately, this workaround has a significant limitation. If the report needs another filter on the Date table, such as to allow the user to overwrite the default filter on the current date, this approach won’t work because filters on multiple attributes are interpreted as an AND condition (Date is 1/1/2013 AND TodayDate=True). Inability to specify OR filter condition between attributes is another Power View limitation. So, users needs to be trained to use either the TodayDate or Date filter but not both.

Looking ahead, it will be nice if Power View supports VB or DAX expressions as regular SSRS reports do.

Many thanks to Darren Gosbell for suggesting the workaround.

Best Practice and Creative Data Visualization by Jen Underwood for Atlanta BI

It looks like Atlanta BI Group will have a record attendance tonight with 74 people registered! Jen Underwood will present Best Practice and Creative Data Visualization.

This fun, informative, and inspirational session covers both best practices and creative options for data visualization. We will showcase data visualization techniques in Excel, Power View, Reporting Services, Visio, and a variety of other Open Source projects and Third-Party data visualization offerings. The possibilities are endless with the right mix of tools, tips, and tricks.

Windows Azure Infrastructure Services

Microsoft announced today the availability of Windows Azure Infrastructure Services which is a collective name for running Virtual Machines and Virtual Networks in the cloud. Scott Guthrie’s blog on this subject is very informative. Pricing get slashed too to be competitive with Amazon. What’s interesting is that these cloud VMs can be configured as an extension to your existing network. In the BI world, this would allow us to source data from existing on-premises data sources albeit probably over a much slower connection, such as to host your ETL, data mart and SSAS in the cloud or just the analytical layer. Speaking of connection speeds, the Azure bandwidth is actually good (5-15 GB/s) although it’s likely that is likely you’ll hit a bottleneck with your ISP on your way to and back from the cloud.

And, speaking of BI, check the SQL Server Business Intelligence in Windows Azure Virtual Machines document if you’re interested in cloud BI deployments. Notice that there are VM templates that install SSRS (native mode), SSAS (Multidimensional), as well as SharePoint 2013 but you can install manually the other components as well if you need to, such as Tabular. Microsoft recommends Extra Large VM size for BI deployments although its memory capacity (14 GB) might be on the lower end especially for Tabular.

About Tableau 8

Jen Underwood highly recommended I attend the Tableau 8 tour on Tuesday to witness firsthand its new features. Naturally, I couldn’t resist of comparing everything I saw with Microsoft BI. I took some time after the pretention to take 8.0 for a spin and reconfirm my understanding. Here are the top five things I liked about Tableau.

  1. Simplicity – A few months ago, I blogged about my top 5 Microsoft BI wish list. My number 1 wish was a continued focus on integration and simplification. Tableau 8.0 nailed it down as far as simplicity, at least on the visualization side of things. One desktop tool and its server-based counterpart. A tool whose sole purpose is BI. Not something that was bolted on another tool as a BI add-on. No layers to integrate with and configure on the server side, and error logs to sieve through.
  2. Visualization – Visualizations are yet simple and powerful. The presenter was quick to point out the areas where the tool excels in comparison with Power View. Tree maps and bubble maps were all over the place, as well as the ability to customize them, e.g. change the color of a pie slice. I liked the ability to dynamically group items (similar to custom MDX sets in Excel). For example, the presenter lassoed a bunch of cities in North America and created a North America dynamic group that was subsequently used to analyze sales. Another interesting in-the-box feature is ability to forecast data. In the Microsoft world, this would require some flavor of data mining (not a native feature in both Excel and Power View).
  3. Mobility – This is one area where Tableau has at least a year lead over Microsoft BI (see my number 2 wish item in the above blog). Once the dashboard is published to the server, it can be viewed and edited on mobile devices (iPad was demonstrated). By contrast, due to its Silverlight nature, Power View is currently supported only in Internet Explorer. However, as we’ve heard at conferences, Microsoft is hard at work to change this.
  4. JavaScript API – Developers creating web applications can integrate and embed interactive Tableau content into their applications via the new JavaScript API.
  5. Data reach – Apparently, Tableau customers are asking for specialized connectors to cloud data and emerging data sources. Version 8.0 introduces connectors for Salesforce.com, Google Analytics, Hadoop, SAP Hana, etc.

Being a self-service BI tool, in my opinion the backend is where Tableau trails behind Microsoft BI, including:

  1. No continuum for self-service to organizational BI – While you can publish a workbook to the server, you can’t upgrade the workbook to an organizational BI model. Nor can you connect to a published workbook as a data source, e.g. by using Excel as a front end.
  2. In-memory engine – Based on my experiments, xVelocity (the in-memory engine of PowerPivot and Tabular) excels the Tableau in-memory technology both in compression and speed.
  3. Programming – Tableau supports custom calculations but the potential is not even close to what you can do with DAX. For example, I couldn’t find a way to use many-to-many relationship (not natively supported in Power Pivot but can be handled with DAX formulas).
  4. Scalability – I haven’t tested the server edition yet but I would expect Analysis Services to scale much better than Tableau due to the desktop origin of the latter.
  5. Security – Tableau supports action-level security, e.g. allowing the user to edit workbooks, and basic user filters that allow user access to specific members, such as Bob can see only USA. There is no dynamic data security.
  6. Usability – Besides the presentation layer where Tableau excels, I’ve noticed areas that would be surprisingly difficult for a self-service BI tool. For example, there isn’t option to visualize tables and relationships in a diagram view. For some obscure reason (I guess so it knows what to aggregate and not), Tableau insists on separating entities as dimensions and measures. Tableau stores data and connection definition as external files, and maintains connections per entity. For example, I couldn’t find a way to import more tables using the same connection definition.

In summary, choosing between Tableau and PowerPivot for self-service BI would require careful examination of requirements and comparing pros and cons of the two. On the corporate BI side of things, you should definitely consider Tableau as a front end to Analysis Services if you need an alternative to Microsoft Excel and/or Power View.

Data Warehouse Physical Design Best Practices Session by Carlos Rodrigues at Atlanta BI Tonight

It looks like Carlos will deliver another slam tonight with a record number of some 63 people registered and counting… Join our Atlanta BI meeting at 6:30 to see his Data Warehouse Physical Design Best Practices presentation.

Query Options for Analytical Models and Transactional Reporting

Requesting both historical and transactional reports is a very common requirement. As they stand, neither Multidimensional nor Tabular are designed to support well transactional (detail-level) reporting with large datasets, as I discussed in my Transactional Reporting with Tabular blog post. There are two query options that might provide some relief with transactional reporting and they both have limitations.

DRILLTHROUGH QUERIES

You can use a DRILLTHROUGH query to request data at the lowest level. The syntax differs somewhat between Multidimensional and Tabular. Here is a sample drillthrough query to Adventure Works cube:

DRILLTHROUGH MAXROWS 1000

SELECT FROM [Adventure Works]

WHERE ([Measures].[Reseller Sales Amount],[Product].[Category].[Accessories],[Date].[Calendar Year].&[2006])

RETURN

[$Product].[Product],

[$Date].[Date],

[Reseller Sales].[Reseller Sales Amount],

[Reseller Sales].[Reseller Tax Amount] 

And a similar query to Adventure Works Tabular:

DRILLTHROUGH MAXROWS 1000

SELECT FROM [Model]

WHERE ([Measures].[Reseller Total Sales],[Product].[Category].[Category].&[Accessories],[Date].[Calendar].[Year].&[2005])

RETURN

[$Product].[Product Name],

[$Date].[Date],

[$Reseller Sales].[Sales Amount],

[$Reseller Sales].[Tax Amount] 

Notice that the WHERE clause specifies the coordinate that is drilled through. The optional RETURN statement enumerates the columns you need in the output with dimensions prefixed with a dollar sign. Since Tabular doesn’t have measure groups, you treat all tables as dimensions.

Drillthrough queries are very fast. They honor the data security (both dimension data security in Multidimensional and Row Filters in Tabular). Not allowed members simple don’t exist as far as DRILLTHROUGH is concerned. For example, attempting to request a dimension member outside the allowed set would return the following error:

Drillthrough failed because the coordinate identified by the SELECT clause is out of range.

This is the same error you will get if your request a non-existent or wrong dimension member.

DRILLTHROUGH queries have limitations:

  1. You can’t request measures from different measure groups in Multidimensional, such as Internet Sales and Reseller Sales. If you attempt to do so, you’ll get the following error:
    Errors from the SQL query module: Drillthrough columns must belong to exactly one measure group. The columns requested by the drillthrough statement are referencing multiple measure groups.
  2. The WHERE coordinate must resolve to a single coordinate. In other words, you can’t specify a set, such as {[Product].[Category].[Accessories],[Product].[Category].[Bikes]. You can avoid the error by using SUBSELECT instead of WHERE but you’ll find that the SUBSELECT clause doesn’t set the default member and you’ll get all products. This limitation will prevent you from using multi-select parameters in SSRS.
  3. Related to the previous item, you can’t specify FILTER conditions either, e.g. where the transaction date is between a range of dates. Again, you can only use a tuple in the WHERE clause.
  4. You must use the Analysis Services OLE DB provider in SSRS. If you need to parameterize the query, you need to use an expression-based query statement.

SQL QUERIES

You can also use SQL-like queries as Chris Webb explains in the “Can I run SQL against an Analysis Services cube?” blog. Here is a SQL query against the Adventure Works cube:

select

[Adventure Works].[$Customer].[City],

[Adventure Works].[Internet Sales].[Internet Sales Amount]

from [Adventure Works].[Internet Sales]

natural join [Adventure Works].[$Date]

natural join [Adventure Works].[$Customer]

where [Adventure Works].[$Date].[Calendar Year] = ‘CY 2008’ 

SQL queries have limitations as well:

  1. You must specify SQLQueryMode=DataKeys in the connection string if you connect to Tabular.
  2. SQL queries don’t work with data security (row filters) in Tabular but they appear to work with dimension data security in Multidimensional. Attempting to execute a SQL query against a Tabular model with row filters will throw the following exception:
    Tabular queries are not supported with row level security.
  3. Unlike T-SQL, SSAS SQL queries support only the equal operator. Anything else and you’ll get the following error. This limitation precludes more complicated filtering, such as returning only transactions within a given date range.
    Errors from the SQL query module: There is an unexpected operator in the WHERE clause. Only the Equals operator ( = ) is supported.
  4. You must use the Analysis Services OLE DB provider in SSRS. Unlike DRILLTHROUGH, you can use AND in the WHERE clause to support multi-valued parameters. However, since the Analysis Services OLE DB provider doesn’t support parameters, you must resort to using an expression-based query statement.

Feel-free to join me to escalate this issue to Microsoft because historical and transactional reporting are both very common scenarios. Thanks to its in-memory nature and more relaxed schema requirements, Tabular is best positioned to support both in a long run so I hope that Microsoft will make improvements in this area. For now, when SSAS doesn’t give you the performance you need, consider delegating transactional reporting to the Database Engine.

Transactional Reporting with Tabular

Scenario: We had a requirement to replace the existing implementation of transactional reporting over large data volumes. By “transactional reporting” I mean allowing the user to query individual transactions as they’re stored in the fact table. In some cases, this style of reporting requires simply reading data without aggregations. This, of course is what RDBMS are designed for but in our case, requirements call for extending the reporting model with metadata about the item, such as the item barcode, alias, etc., as well as supporting fast aggregation analysis (for trend reports) and data security. In general, when implementing BI solutions, you should always have an analytical layer between the database and presentation layer for the reasons I discussed in the “Why an Analytical Layer?” blog.

Challenges: Because of the in-memory nature of Tabular and more relaxed schema requirements, such as no separation of dimension and fact tables and ability to keep text-based columns in fact tables, we opted for Tabular on top of a data warehouse and star schema. Our initial schema included an Item dimension table (700 million rows) where the item metadata is kept and Transactions table (1.5 billion rows) that stores the item transactions. However, as we found out quickly, this design presented challenges for Tabular:

  1. Excessive processing time – It took very long to process the initial model. By default, SSMS generates a transactional processing batch that supports processing multiple tables in parallel. However, if there is an error during processing the entire transaction is rolled back and you need start from scratch. To avoid this, we tried scripting the batch and changing it to non-transactional (Transaction=’false’). This option preserves the processed partitions but it doesn’t allow tables to be processed in parallel. This is by design to avoid unexpected interactions due to the potential cross-object dependencies in the recalculation chain. Since both the Item and Transactions tables were portioned, the net effect was that the processing was serialized by table and partition when two scripts for processing Item and Transactions are simultaneously executed (partition 1 of Transactions is processed, partition 1 of Item is processed, partition 2 of Transactions is processed, partition 2 of Item is processed, and so on). To make things worse, there is currently a bug with Tabular where the server builds relationships even when Process Data is used as a processing option which further slows down processing.
  2. Excessive memory footprint – For some reason, Tabular used a lot of memory with the two table approach. Granted, we had to load the dataset twice (one for Item and a second time for Transactions) but still it should have taken less memory (see the memory footprint in the Solution section). The development server had 80 GB of RAM. We ran out of memory half-way during the full load and ended up with the in-memory model taking 65 GB of RAM.
  3. Excessing query time – This was the worst. When we tried Power View as a reporting tool, each time the Item table was involved in the query, the report took more than a minute to finish. The report filter would return only in a few rows from the Transactions table but when the join was made to the Item table (one the one side of the Item-Transactions relationship), the report performance would degrade significantly. As it turned out, Tabular hasn’t been tuned for such scenarios where queries involve large dimension tables and the filter is applied on the fact table.

Solution: To work around the Tabular limitations, we merged the two datasets by joining the Item and Transactions tables in a SQL view. This approach resolved all issues:

  1. Processing time was reduced x10 because Tabular had to process only one large table.
  2. The memory footprint of the entire database with all the data was reduced to 13.5 GB.
  3. The query times went down to seconds.

The only caveat is that processing would incur a hit because of the join between two large tables but we’d rather spend more time during processing than sacrificing query performance.

At least for now, don’t be confused by the “relational” nature of Tabular. It’s still optimized for queries that request a restricted set of columns and aggregate data. Currently, it’s not optimized for transactional (detail-level) style of reporting that requires many columns and requests data at the lowest level. Microsoft is aware of this and would probably introduce optimizations in time.

In a future blog, I’ll discuss techniques to optimize the query performance with transactional reporting and Tabular. Many thanks to Marius Dumitru (Principal Architect on the SSAS team) for answering my questions.