Gartner’s 2017 BI and Data Analytics Magic Quadrant Shows Microsoft Leading

Power BI is enjoying a tremendous momentum and unprecedented popularity. Just within this month, your humble correspondent has been teaching Power BI four times in a row. It looks like industry observers are taking notice of this momentum. As Kamal Hathi (General Manager, Microsoft BI) announced, the newly released Garner Magic Quadrant for Business Intelligence and Data Analytics gave Microsoft a very high score. The image below shows the Microsoft’s lift between last year and this year in the Gartner magic quadrant.

I’m not surprised about the Qlik drop given they sold out the company. What’s still surprising to me is that Gartner ranked Tableau and Microsoft almost the same on the ability to execute. Although the report is not out yet, judging by the stub, Gartner used the same 14 criteria as last year, but added one more which is unknown at this point (probably real-time where Microsoft can score very high as well). Here are my comments on where Microsoft stands on these 14 criteria. You might also find my two-part blog about Tableau vs. Microsoft useful if you are tasked to compare vendors.

Capability

Teo’s Rank for MS BI

Comments
Infrastructure BI Platform Administration
Capabilities that enable scaling the platform, optimizing performance and ensuring high availability and disaster recovery

High

On premises or cloud, I think the MS BI Platform is second to none
Cloud BI
Platform-as-a-service and analytic-application-as-a-service capabilities for building, deploying and managing analytics and analytic applications in the cloud, based on data both in the cloud and on-premises

High

Power BI supports both pure cloud and hybrid architectures
Security and User Administration
Capabilities that enable platform security, administering users, and auditing platform access and utilization

Medium

More work is required to support external users in Power BI, Power BI Embedded, and SSRS
Data Source Connectivity
Capabilities that allow users to connect to the structured and unstructured data contained within various types of storage platforms, both on-premises and in the cloud.

High

As of this time, Power BI supports close to 70 connectors to let you connect to cloud and on-premises data sources. No scripting required.
Data Management Governance and Metadata Management
Tools for enabling users to share the same systems-of-record semantic model and metadata. These should provide a robust and centralized way for administrators to search, capture, store, reuse and publish metadata objects, such as dimensions, hierarchies, measures, performance metrics/key performance indicators (KPIs) and report layout objects, parameters and so on. Administrators should have the ability to promote a business-user-defined data model to a system-of-record metadata object.

Medium

Power BI has done a good job to provide auditing and admin oversight but more work is required for proactive monitoring and improving its data governance capabilities
Self-Contained Extraction, Transformation and Loading (ETL) and Data Storage
Platform capabilities for accessing, integrating, transforming and loading data into a self-contained storage layer, with the ability to index data and manage data loads and refresh scheduling.

Medium

SSIS is the most popular on-premises ETL tool. More work is required to bring similar capabilities in the cloud (I think Azure Data Factory is a step backwards)
Self-Service Data Preparation
The drag-and-drop, user-driven data combination of different sources, and the creation of analytic models such as user-defined measures, sets, groups and hierarchies. Advanced capabilities include semantic autodiscovery, intelligent joins, intelligent profiling, hierarchy generation, data lineage and data blending on varied data sources, including multistructured data

High

Power BI Desktop and Excel has a fantastic query editor (originated from Power Query) that scores big with business users. Tableau doesn’t have such native capabilities. Power BI and Excel have best of class self-modeling capabilities (much better than Tableau). Azure Query Catalog can be used for dataset autodiscovery.
Analysis and Content Creation Embedded Advanced Analytics
Enables users to easily access advanced analytics capabilities that are self-contained within the platform itself or available through the import and integration of externally developed models.

High

Not sure what is meant here by “advanced analytics capabilities”. Power BI supports integration with R, Azure Machine Learning, clustering, forecasting, binning, but I might be missing something.
Analytic Dashboards
The ability to create highly interactive dashboards and content, with visual exploration and embedded advanced and geospatial analytics, to be consumed by others

High

“Highly interactive dashboards and content” is what Power BI is all about.
Interactive Visual Exploration
Enables the exploration of data via the manipulation of chart images, with the color, brightness, size, shape and motion of visual objects representing aspects of the dataset being analyzed. This includes an array of visualization options that go beyond those of pie, bar and line charts, to include heat and tree maps, geographic maps, scatter plots and other special-purpose visuals. These tools enable users to analyze the data by interacting directly with a visual representation of it

High

According to Gartner’s definition, Power BI should score high but more work is required on the visualization side of things, such as ability to drill through a chart point as we can do in SSRS.
Mobile Exploration and Authoring
Enables organizations to develop and deliver content to mobile devices in a publishing and/or interactive mode, and takes advantage of mobile devices’ native capabilities, such as touchscreen, camera, location awareness and natural-language query

High

Native apps for iOS, Android and Windows to surface both Power BI and SSRS reports.
Sharing of Findings Embedding Analytic Content
Capabilities including a software developer’s kit with APIs and support for open standards for creating and modifying analytic content, visualizations and applications, embedding them into a business process, and/or an application or portal. These capabilities can reside outside the application (reusing the analytic infrastructure), but must be easily and seamlessly accessible from inside the application without forcing users to switch between systems. The capabilities for integrating BI and analytics with the application architecture will enable users to choose where in the business process the analytics should be embedded.

High

An Azure cloud service, Power BI Embedded allows you to do this with an appealing cost-effective licensing model.
Publishing Analytic Content
Capabilities that allow users to publish, deploy and operationalize analytic content through various output types and distribution methods, with support for content search, storytelling, scheduling and alerts.

Medium

Power BI supports subscriptions and data alerts but we can do better, such as to allow an admin to subscribe other users. “Storytelling” can mean different things but I thought the integration with Narrative Science can fall into this category.
Collaboration and Social BI
Enables users to share and discuss information, analysis, analytic content and decisions via discussion threads, chat and annotations

High

Power BI supports this with workspaces and Office 365 unified groups.

Of course, there are many competing definitions of what constitutes a BI and Analytics platform. Again, it looks to me that Gartner has predominantly focused on the self-service BI aspect of it (even there Microsoft should have scored higher) and ignored the SQL Server BI features and all the cloud BI-related products (Azure SQL Database, SQL Data Warehouse, Azure ML, Query Catalog, HDInsight, StreamInsight). If we take them in consideration, where will that dot be?

Types of Power BI Real-time Datasets

Everyone wants real-time BI, even when it doesn’t have to be really “real time”. Today Microsoft announced General Availability of Power BI Real-Time Streaming Datasets. There are actually three types of Power BI real-time datasets, as mentioned in the documentation.

  • Push – Power BI permanently stores the data, enabling historic analysis, and reports creation atop the dataset. Behind the scenes, Power BI provisions an Azure SQL instance when the dataset is created. New data is pushed into SQL. Power BI then connects to that dataset via DirectQuery. Query Refresh (sending new queries to Azure SQL to update dashboard visuals) occurs whenever data is pushed in. When you create the dataset programmatically, you can specify a retention policy (defaultRetentionPolicy setting). When defaultRetentionPolicy is set to None, the dataset accumulates data to the maximum allowed Power BI limit (currently 1 GB). When set to basicFIFO, the dataset holds up to 200,000 rows and after that older rows are pushed out when the new ones come in.
  • Streaming — Power BI stores the data only in a transient cache – this means report creation, historic analysis is disabled, but in return there is consistently lower latency between when the data is pushed in and when the visuals update. The data flows into a Redis cache, and the dashboard visuals directly pull data from that Redis cache. Therefore, consider streaming datasets when you want the lowest latency (we are talking about milliseconds here) but you are limited to a few pre-defined visualizations supported by the Power BI dashboard real-time tiles. You can’t create custom reports.
  • Hybrid — Hybrid datasets send data to both the “push” and “streaming” endpoints, thereby getting the benefits of both at the expense of duplicate storage.

Unless you use StreamInsight (currently, it supports only push datasets) or PubNub (supports streaming datasets), you must create the dataset programmatically using the Power BI REST APIs. Currently, you can’t use Power BI Desktop to create real-time datasets.

Atlanta MS BI Group Meeting on January 30th

MS BI fans, join me for the next Atlanta MS BI and Power BI Group meeting on Monday, January 30th at 6:30 PM. One of our most experienced consultants, Neal Waterstreet, will share his real-life experience in master data management with SQL Server MDS. I’ll update you on the latest with Power BI and SQL Server 2017. Prologika will sponsor the meeting.

Rate this meeting http://aka.ms/PUGSurvey, PUG ID: 104
Presentation: Master Data Management with SQL Server 2016 MDS
Level: Intermediate
Date: January 30th, 2017
Time 6:30 – 8:30 PM ET
Place: South Terraces Building (Auditorium Room)

115 Perimeter Center Place

Atlanta, GA 30346

Overview: In this presentation we’ll first discuss the position Master Data Management plays in an organization’s overall data strategy. We’ll review the key concepts, different roles and responsibilities that members of the team typically play, potential risks, and best practices to help you get your organization moving forward with MDM. We’ll then take a look at some of the features of MDS 2016 such as the different ways of processing data, security improvements, Changesets and Entity Sync that make it an excellent tool for MDM.
Speaker: Neal Waterstreet is a BI Architect/Consultant with Prologika. He has more than 20 years of industry experience. Neal is skilled in the entire BI spectrum, including dimensional modeling, ETL design and development using Integration Services (SSIS), designing and developing multidimensional cubes and Tabular models using Analysis Services (SSAS) and Master Data Management using Microsoft Data Services (MDS). He’s also involved with the database community and is the co-founder and co-leader of the PASS Healthcare Virtual Chapter and the Atlanta Modern Excel User Group.
Sponsor: Prologika is one of the most trusted names in Data Analytics. Our clients, from small businesses to Fortune 100 enterprises, derive tremendous value from our services. Our mission is to help organizations make sense of data by applying the latest technologies for descriptive and predictive analytics and get actionable insights. Your organization will spend less time mining for information and be better equipped to make sound business decisions.
Prototypes with Pizza “Power BI subscriptions” and “Update on deploying Power BI reports to SSRS” by Teo Lachev

Unblocking the On-premises Data Gateway

Scenario: You have configured the Power BI on-premises data gateway for centralized data access and verified that its data sources test just fine. Direct query connections work. However, when you go to Power BI Service and attempt to schedule a data refresh for a dataset, you might find that the data gateway is disabled.

Solution: The most common reasons for Power BI to disable the on-premises data gateway for refresh are:

  1. Unlike the personal gateway, the on-premises data gateway requires you to register data sources. You must go to the gateway properties and create data sources for all data sources used in your Power BI Desktop file. Unfortunately, as it stands Power BI doesn’t allow you to select which data sources in the Power BI Desktop file will be refreshed and which ones don’t require a refresh. It’s all or nothing proposition. So, if one data source is not compatible or can’t be refreshed, the gateway will be disabled.
  2. The connection strings in data sources in the Power BI Desktop file might differ from the settings of the data sources you registered in the on-premises gateway. For example, in Power BI Desktop you might have imported data from a local Excel file. Then, you might have moved the file to a network share and established a gateway data source to point to the network share. Because the connection strings differ, Power BI Service won’t find an on-premises gateway to serve the Excel file and it will disable the gateway for refresh. So, triple-verify the that data sources match.
  3. You might have manually added a table to your model and entered some data using the Power BI Desktop “Enter Data” feature. Because custom tables can’t refresh, Power BI disables the gateway.

012817_0130_Unblockingt1.png

Customer Success Case – ZynBit

One of our customers, ZynBit, made the Power BI blog today! Initially, ZynBit was considering Tableau but abandoned it in favor of Power BI because of the Power BI superior data modeling capabilities and the cost effective licensing model of Power BI Embedded. Prologika helped ZynBit to transition their solution to Power BI, including designing the data model and integrating reports with Power BI Embedded. Read our case study here.

Monitoring Progress of UPDATE

How to monitor the progress of an UPDATE statement sent to SQL Server? Unfortunately, SQL Server currently doesn’t support an option to monitor the progress of DML operations. In the case of UPDATE against large tables, it might be much faster to recreate the table, e.g. with SELECT … INTO. But suppose that INSERT could take a very long time too and you prefer to update the data instead. Here is how to “monitor” the progress while the UPDATE statement is doing its job.

Suppose you are updating the entire table and it has 138,145,625 rows (consider doing the update in batches to avoid running out of log space). Let’s say the UPDATE statement changes the RowStartColumn column to the first day of the month:

UPDATE bi.FactAccountSnapshot

SET RowStartDate DATEADD(MONTHDATEDIFF(MONTH, 0, RowStartDate), 0);

Use these statements to monitor the remaining work by using a reverse WHERE clause. Make sure to execute both statements (SET TRAN and SELECT together) so that the SELECT statement can read the uncommitted changes.

SET TRAN ISOLATION LEVEL READ UNCOMMITTED;

SELECT CAST(1 – CAST(COUNT(*) AS DECIMAL/ 138145625 AS DECIMAL(5, 2)) AS PercentComplete ,COUNT(*) AS RowsRemaining

FROM bi.FactAccountSnapshot

WHERE RowStartDate <> DATEADD(MONTHDATEDIFF(MONTH, 0, RowStartDate), 0);

What about INSERT? If you know how many rows you are inserting, you can simply check the current count of the inserted rows by using the sp_spaceused stored procedure:
sp_spaceused ‘tableName’.

Make BI Great Again!

…with the second edition of my “Applied Microsoft Power BI” book. After seven books and starting from scratch every time, I finally got to write a revision! Thoroughly revised to reflect the current state of Power BI, it added more than 20% new content and probably that much content was rewritten to keep up with the ever changing world of Power BI. Because I had to draw a line somewhere, Applied Microsoft Power BI (2nd Edition) covers all features that were that were released by early January 2017 (including subscriptions). As with my previous books, I’m committed to help my readers with book-related questions and welcome all feedback on the book discussion forum on the book page. While you are there, feel free to check out the book resources (sample chapter, front matter, and more). Consider also following my blog at http://prologika.com/blog and subscribing to my newsletter at http://prologika.com to stay on the Power BI latest.

  • Buy the paper copy from Amazon
  • Buy the Kindle ebook from Amazon
  • Other popular channels in 2-3 weeks

Power BI Subscriptions

Today Microsoft released a highly anticipated Power BI feature – subscribed report delivery. Similar to SSRS individual subscriptions, users can go to a Power BI report and subscribe to one or more of its pages to receive a snapshot of the page on a scheduled basis. The following scenarios are possible depending on the report data source:

  • Imported datasets – the subscription follows the dataset refresh schedule. You’ll get an email every time the scheduled refresh happens, so long as you haven’t gotten an email in the last 24 hours.
  • DirectQuery datasets – Power BI checks the data source every 15 minutes. You’ll get an email as soon as the next check happens, provided that you haven’t gotten an email in the last 24 hours (if Daily is selected), or in the last seven days (if Weekly is selected).
  • Live connection to SSAS – Power BI checks the data source every 15 minutes and it’s capable of detecting if the data has changed. You’ll get an email only if the data has changed if you haven’t gotten an email in the last 24 hours
  • Connected Excel reports – Power BI checks the data source every hour. You’ll get an email only if the data has changed if you haven’t gotten an email in the last 24 hours.

011617_1307_PowerBISubs1.png

Power BI subscriptions have these limitations:

  • The only export option is screenshot. You can’t receive the page exported to PowerPoint, for example.
  • Users can create individual subscriptions only. You can’t subscribe other users as you can do with Reporting Services data-driven subscriptions.
  • The Power BI admin can’t see or manage subscriptions across the tenant.

Prologika Newsletter Winter 2016

Designing an Operational Data Store (ODS)


odsI hope you’re enjoying the holiday season. I wish you all the best in 2017! The subject of this newsletter came from a Planning and Strategy assessment for a large organization. Before I get to it and speaking of planning, don’t forget to use your Microsoft planning days as they will expire at the end of your fiscal year. This is free money that Microsoft gives you to engage Microsoft Gold partners, such as Prologika, to help you plan your SQL Server and BI initiatives. Learn how the process works here.


Just like a data warehouse, Operational Data Store (ODS) can mean different things for different people. Do you remember the time when ODS and DW were conflicting methodologies and each one claimed to be superior than the other? Since then the scholars buried the hatchet and reached a consensus that you need both. I agree.

To me, ODS is nothing more than a staging database on steroids that sits between the source systems and DW in the BI architectural stack.

What’s Operational Data Store?

According to Wikipedia “an operational data store (or “ODS”) is a database designed to integrate data from multiple sources for additional operations on the data…The general purpose of an ODS is to integrate data from disparate source systems in a single structure, using data integration technologies like data virtualization, data federation, or extract, transform, and load. This will allow operational access to the data for operational reporting, master data or reference data management. An ODS is not a replacement or substitute for a data warehouse but in turn could become a source.”

OK, this is a good starting point. See also the “Operational Data Source (ODS) Defined” blog by James Serra. But how do you design an ODS? In general, I’ve seen two implementation patterns but the design approach you take would really depends on how you plan to use the data in the ODS and what downstream systems would need that data.

One to One Pull

ODS is typically implemented as 1:1 data pull from the source systems, where ETL stages all source tables required for operational reporting and downstream systems, such loading the data warehouse. ETL typically runs daily but it could run more often to meet low-latency reporting needs.  The ETL process is typically just Extract and Load (it doesn’t do any transformations), except for keeping a history of changes (more on this in a moment). This results in a highly normalized schema that’s the same is the original source schema. Then when data is loaded in DW, it’s denormalized to conform to the star schema. Let’s summarize the pros and cons of the One:one Data Pull design pattern.

  Pros Cons
Table schema Highly normalized and identical to the source system The number of tables increase
Operational reporting Users can query the source data as it’s stored in the original source. This offloads reporting from the source systems No consolidated reporting if multiple source systems process same information, e.g. multiple systems to process claims
Changes to source schema Source schema is preserved Additional ETL is required to transform to star schema
ETL Extraction and load from source systems (no transformations) As source systems change, ETL needs to change

Common Format

This design is preferred when the same business data is sourced from multiple source systems, such as when the source systems might change or be replaced over time. For example, an insurance company might have several systems to process claims. Instead of ending up with three sets of tables (one for each source system), the ODS schema is standardized and the feeds from the source systems are loaded into a shared table. For example, a common Claim table stores claim “feeds” from the three systems. As long as the source endpoint (table, view, or stored procedure) returns the data according to an agreed “contract” for the feed, ODS is abstracted from source system changes. This design is much less normalized. In fact, for the most part it should mimic the DW schema so that DW tables can piggy back on the ODS tables with no or minimum ETL.

  Pros Cons
Table schema Denormalized and suitable for reporting The original schema is lost
Operational reporting Relevant information is consolidated and stored in one table Schema is denormalized and reports might not reflect how the data is stored in the source systems
Schema changes to source systems As long as the source endpoints adhere to the contract, ODS is abstracted from schema changes A design contract needs to be prepared and sources systems need to provide the data in the agreed format
ETL Less, or even no ETL to transform data from ODS to DW ETL needs to follow the contract specification so upfront design effort is required

Further Recommendations

Despite which design pattern you choose, here are some additional recommendations to take the most of your ODS:

  • Store data at its most atomic level – No aggregations and summaries. Your DW would need the data at its lowest level anyway.
  • Keep all the historical data or as long as required by your retention policy – This is great for auditing and allows you to reload the DW from ODS since it’s unlikely that source systems will keep historical data.
  • Apply minimum to no ETL transformations in ODS – You would want the staged data to keep the same parity with the source data so that you can apply data quality and auditing checks.
  • Avoid business calculations in ODS – Business calculations, such as YTD, QTD, variances, etc., have no place in ODS. They should be defined in the semantic layer, e.g. Analysis Services model. If you attempt to do so in ODS, it will surely impact performance, forcing to you to pre-aggregate data. The only permissible type of reporting in ODS is operational reporting, such as to produce the same reports as the original systems (without giving users access to the source) or to validate that the DW results match the source systems.
  • Maintain column changes to most columns – I kept the best for last. Treat most columns as Type 2 so that you now when a given row was changed in the source. This is great for auditing.

Here is a hypothetical Policy table that keeps Type 2 changes. In this example, the policy rate has changed on 5/8/2010. If you follow this design, you don’t have to maintain Type 2 in your DW (if you follow the Common Format pattern) and you don’t have to pick which columns are Type 2 (all of them are). It might be extreme but it’s good for auditing. Tip: use SQL Server 2016 temporal tables to simplify Type 2 date tracking.

RowStartDate RowEndDate SourceID RowIsCurrent RowIsDeleted ETLExecutionID PolicyKey PolicyID PremiumRate
5/2/2010 5/8/2010 1 0 0 0BB76521-AA63-… 1 496961 0.45
5/9/2010 12/31/9999 1 1 0 CD348258-42ED-.. 2 496961 0.50

MS BI Events in Atlanta

As you’d probably agree, the BI landscape is fast-moving and it might be overwhelming. If you need any help with planning and implementing your next-generation BI solution, don’t hesitate to contact me. As a Microsoft Gold Partner and premier BI firm, you can trust us to help you plan and implement your data analytics projects, and rest assured that you’ll get the best service.

Regards,

sig-1

Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

A Special Cyber Training Offer This Week!

What a better gift to give you than increasing your BI IQ? Don’t miss my highly-discounted ($150 per person only) Power BI Dashboard in a Day (DIAD) session this Friday! It’s one of the three precon sessions of SQL Server Saturday BI Edition 2016. There are only three days left to register and there are still a few seats available. Then, join me on Saturday at 9 AM at SQL Saturday to learn how to embed reports using Power BI Embedded. Both events are in the Microsoft Office in Alpharetta.

Power BI Dashboard in a Day (DIAD Precon Session

SQL Saturday Atlanta BI Edition is proud to announce this full day training Power BI Dashboard in a Day (DIAD) is designed to accelerate your Power BI experience with a comprehensive training program in a single day. All you have to do is bring your Windows-based laptop and we’ll supply the rest – even lunch! With DIAD you get practical hands-on training prepared by the Microsoft Power BI team. During this precon we’ll build a Power BI dashboard together. Along the way, you’ll learn:

  • Learn how to apply Power BI for self-service BI and organizational BI
  • How to connect to, import & transform data from a variety of sources
  • Build real data models, as well as author and publish insightful interactive reports
  • Customize and share your “creations” for collaboration with other groups securely inside your organization including mobile device sharing
  • Get your Power BI questions answered

Register your seat now to and witness the value Power BI can deliver to you and your organization!