Divorce Your Methodology

At the Atlanta BI meeting last night, there was a question from the audience about differences between Inmon and Kimball and which methodology should be followed when implementing a data warehouse. I’ll recapture my thoughts and the feedback I shared.

As BI practitioners, most of us use methodologies and it’s easy to fall in love with a specific methodology. But sometimes methodologies conflict each other. So, don’t feel very strongly about methodologies. Instead, study them and try to synthetize their best. The best methodology is the one that delivers a business solution in the most practical and simple way. Back to Inmon vs Kimball, I have deep respect for both of them. They both contributed a lot to data warehousing that forms the backbone of modern BI. Both of these two methodologies aim to consolidate data and promote a single version of the truth. Both of them are “pure” database-focused and vendor-neutral methodologies for designing data structures. But they also differ in significant ways. This table summarizes the high level differences between the two methodologies.

InmonKimball
APPROACHTop-down
Data warehouse first, data marts later
Bottom-up
Data marts first, data warehouse later
SCHEMANormalized (3NF) schemaDenormalized (star) schema
HISTORYAll history needs to be capturedDepends on business requirements
Type 1 vs Type 2 dimensions

 

So, which one to follow? My answer is that there is a place for both. Consider the following BI architectural view.

012914_0241_DivorceYour1

Data comes from veriety of data sources. Some data, such as Products, Customers, Organizations, represents master data that should be ideally maintained in a separate repository, e.g. in Master Data Services. However, most of the source data is not master data and must be staged before it’s imported in the data warehouse. Instead of a transient staging database whose data is truncated with each ETL run, consider an ODS-style staging database that maintains historical changes.

Start_DateEnd_DateStoreProductDeleted_Flag
1/1/20105/1/2010AtlantaMountain Bike 1
5/2/20103/8/2012AtlantaMountain Bike 2
3/9/201212/31/9999NorcrossMountain Bike 2

The Start_Date and End_Date columns are used to record the lifespan of each record and create row new versions each time the source row is changed. The example shows three changes that a given product has undergone. This design offers two main benefits:

1.    It maintains a history of all changes that were made to all columns to all tables. Typically, OLTP systems don’t keep track of changes so the staging database can be used to record changes.

2.    It maintains a full backup of the data. If a data warehouse needs to be reloaded, its history can be recreated from the staging database.

This ODS design is effectively your Inmon methodology in practice. For the data warehouse, I’d go with Kimball dimensional modelling. Dimensional modelling is a practical design technique whose goal is to produce a simple schema that is optimized for reporting. As far as data marts, I’m not so excited about moving data in or out of the data warehouse. In most cases, the most practical approach would be to implement a single data warehouse database and extend it as more subject areas come onboard. However, large organizations might benefit from data marts. For example, a large organization might have an enterprise data warehouse but for whatever reasons (usually IT not having enough resources), it might be difficult or not practical to extend it with a new subject areas. Then, this department might spin off its own data mart, such as on a separate database server (even from a different vendor, e.g. DW on Oracle and DM on SQL Server).

Ideally, the data mart should be able to reuse some of conformant dimensions from the data warehouse instead of implementing them anew. In reality, though, the enterprise bus could remain a wishful thinking. Having been left on his own devices, that department would probably need to implement the dimensions from the data they work with. For example, if this is an HR data mart, they would probably source an Organization dimension from PeopleSoft which is where their core data might come from.

With the risk of repeating myself, I want to reemphasize the role of the semantic layer which plays a critical role in every BI architecture. If you are successful implementing an enterprise bus consisting of a data warehouse and data marts (hub and spokes architecture), the semantic layer can provide a unified view the combines these data structures. For more information about the semantic layer benefits, refer to my newsletter “Why Semantic Layer“.

Is SharePoint Overkill for BI?

As you know, Microsoft BI is tightly interwoven with Excel on the desktop and SharePoint on the server. Most complaints about SharePoint fall in two categories:

  • Complexity – SharePoint is like the ocean. The further you go, the deeper it gets. Troubleshooting is no fun. You know it and I know it.
  • Cost – Most BI features require SharePoint Server Enterprise. Besides the product cost, each user requires a client access (CAL) license (about $100 per user). With larger companies, cost quickly adds up when you roll BI to the masses. You can use the Microsoft License Advisor to estimate your investment but be sure to include additional cost for Windows and SQL servers.

So, is SharePoint overkill for BI? My answer is it depends. Let’s take a look at the graph that shows the correlation between the organization size and SharePoint features.

 

012414_0229_IsSharePoin1

A small organization might not need SharePoint at all. It’s likely that all BI needs of such a company will be met by SQL Server and Excel. As the company size increases, however, SharePoint might look attractive for what’s designed for. This company might decide to adopt SharePoint Foundation (freely available) for document storage and collaboration. Unfortunately, out of all Microsoft BI tools, only Reporting Services works on SharePoint Foundation (except Power View and Alerts), as explained in Microsoft Books Online.

Supported FeaturesMinimum SharePoint Edition Required
General Reporting Services report viewing and feature integration with SharePointSharePoint Foundation
Power View and AlertsSharePoint Server Enterprise

As the company continues to grow, its appetite for more BI features increases. This is the middle zone in the S curve. At this point, the company might be interested in team BI, including Power View and Power Pivot for SharePoint. At the same time, the company might not be interested in the SharePoint enterprise features but it’s out of luck. The BI features require SharePoint Enterprise. This is where it could be useful if Microsoft introduces a SharePoint BI Edition with a more attractive pricing model, such as an edition that has all BI features, doesn’t have the rest of the SharePoint enterprise features and it doesn’t require CAL licensing. SharePoint Online can mitigate some of the cost concerns but as it stands it’s limited to self-service BI (Power Pivot and Power View) only.

Moving to the upper mid-size and enterprise sector, you’re likely to be interested in more and more SharePoint enterprise features. For example, we’re currently working on a set of best practice documents for a Fortune 50 organization that’s rolling out Microsoft BI across the enterprise. They’re interested in data governance, taxonomy, retention, content approval, manageability, security, scalability, risk management, auditing, and others. You’ll be hard pressed to find such features in popular third-party BI offerings. And, one great thing about SharePoint is that it doesn’t discriminate among documents. As far as it’s concerned, a BI artifact, such as an SSRS report or Excel workbook, is just a document…all features just work. So, SharePoint Enterprise might be an attractive platform for larger companies although its CAL licensing might be difficult to swallow.

This table shows an expanded list of the Microsoft BI features but, as I mentioned, no further investigation is necessary. All BI features except general SSRS reporting requires SharePoint Enterprise.

"Bus Matrix – the Foundation of your Dimensional Data Model" Atlanta BI Presentation

Our next Atlanta Microsoft BI meeting will be on Monday, January 27th. The main presentation is “Bus Matrix – the Foundation of your Dimensional Data Model” the speaker is Bill Anton. The meeting will be sponsored by TEK Systems. I hope you can make it.

The Bus Matrix is the cornerstone of a successful Dimensional Data Modeling strategy. It serves many purposes: from communicating requirements, capabilities, and expectations with the business users down to the prioritization and delegation of tasks across the development team. Join me in this session and learn what a Bus Matrix is, why it is the single most important document in your Data Warehouse project, and what can go wrong without it. We’ll also cover several approaches for creating and maintaining the Bus Matrix.

Bill Anton is an independent consultant whose primary focus is designing and developing Data Warehouses and Business Intelligence solutions using the Microsoft BI stack. When he’s not working with clients to solve their data-related challenges, he can usually be found answering questions on the MSDN forums, attending PASS meetings, or writing blog posts over at byoBI.com.

Auto-generating SCHEMA.INI

Power Pivot (and Tabular) uses the Access Connectivity Engine (ACE) provider to import from text files. This provider was originally designed for Microsoft Access databases and it’s limited to import files no larger than 2 GB (or 4 GB with the Office cumulative update from April 2012) as explained here. The Power Pivot Table Import Wizard allows you to import from delimited files and only a subset of delimiters are available for you to choose from.

011814_0250_Autogenerat1

However, the ACE provider supports a SCHEMA.INI file that can describe other formats, such as fixed-length files, or delimited files that use other delimiters, such as a double-pipe character “||”. If a SCHEMA.INI file exists in the same folder where the source file is located, Power Pivot will use the file.

Generating the schema file can be tedious, especially if the source file has many columns. However, the ODBC Data Source Administrator tool can auto-generate the file for you.

  1. Open ODBC Data Source Administrator (32 or 64 bit) and click Add.
  2. In the Create New Data Source, select the Microsoft Text Driver.

011814_0250_Autogenerat3

  1. Click Finish. In the ODBC Text Setup, specify the folder where the source file that you want to import from is located.
  2. Click the Define Format button.
  3. The tool is capable of interfering the format by clicking the Guess button.

 

  1. Once you click OK, the tool generates the schema.ini file in the same folder where the source file is located.

For some obscure reason that this support article refers to, you might get an error “Failed to save table attributes of (null) into (null)” but the schema.ini file will be still generated.

Pick List in SSRS

A customer is moving from Impromptu to SSRS. One of the SSRS missing features that was very important for this customer was the Imprompu pick list that allows users to copy a list of parameter values, such as a list entered in Notepad, to the parameter prompt. For example, the user might get a list of account codes via e-mail and this list may vary from day to day. Instead of hardcoding the list in the query WHERE clause or in the parameter default values, the user wants an easy way to copy and paste the parameter values.

Actually, SSRS supports this scenario just fine and without any custom coding on your part. You just need to add a multivalued parameter of type Text and code your SQL query to use an IN clause as usual. The attached report shows how this is done. Just copy the codes below and paste in the parameter to test the report.

AR-5381

BA-8327

BE-2349

BE-2908

[View:https://prologika.com/CS/cfs-file.ashx/__key/CommunityServer.Blogs.Components.WeblogFiles/blog/2313.Pick-List.rdl:550:0]

 

Teo’s Best and Worst of Microsoft BI in 2013

In the spirit of the last day of 2013, I’m sipping eggnog and contemplating on BI in 2013.

BEST

  1. Microsoft leads the Gartner’s 2013 Business Intelligence Magic Quadrant.
  2. Microsoft leads the Gartner’s 2013 Data Warehouse Magic Quadrant.
  3. Power View connectivity to Multidimensional (DAXMD) released in CU4 of SQL Server 2012 SP1.
  4. SQL Server 2012 Parallel Data Warehouse appliance released with Hadoop support at very attractive pricing.
  5. Microsoft is the first vendor to offer natural queries (unfortunately only in the cloud at this point).

WORST

  1. Microsoft not doing enough to compete effectively on the presentation/visualization front against third party vendors. I’ll throw poor mobile support into this bucket too.
  2. Slow pace of BI enhancements across product groups. Example: Excel’s still lacking Power View support for Multidimensional (DAXMD was released in May).
  3. Microsoft pushing cloud BI and deciding to prioritize cloud enhancements, such as natural query, over on premises.
  4. Microsoft’s ongoing preoccupation with Tabular (it’s difficult to find a good case for it as it’s caught now in the middle between columnstore indexes and Multidimensional) and self-service BI (focus should be 80% on organizational BI and 20% on personal).
  5. Still no metadata lineage and change impact. What happened to Project Barcelona?

Happy New Year!