Posts

Gartner’s 2017 BI and Data Analytics Magic Quadrant Shows Microsoft Leading

Power BI is enjoying a tremendous momentum and unprecedented popularity. Just within this month, your humble correspondent has been teaching Power BI four times in a row. It looks like industry observers are taking notice of this momentum. As Kamal Hathi (General Manager, Microsoft BI) announced, the newly released Garner Magic Quadrant for Business Intelligence and Data Analytics gave Microsoft a very high score. The image below shows the Microsoft’s lift between last year and this year in the Gartner magic quadrant.

I’m not surprised about the Qlik drop given they sold out the company. What’s still surprising to me is that Gartner ranked Tableau and Microsoft almost the same on the ability to execute. Although the report is not out yet, judging by the stub, Gartner used the same 14 criteria as last year, but added one more which is unknown at this point (probably real-time where Microsoft can score very high as well). Here are my comments on where Microsoft stands on these 14 criteria. You might also find my two-part blog about Tableau vs. Microsoft useful if you are tasked to compare vendors.

Capability

Teo’s Rank for MS BI

Comments
Infrastructure BI Platform Administration
Capabilities that enable scaling the platform, optimizing performance and ensuring high availability and disaster recovery

High

On premises or cloud, I think the MS BI Platform is second to none
Cloud BI
Platform-as-a-service and analytic-application-as-a-service capabilities for building, deploying and managing analytics and analytic applications in the cloud, based on data both in the cloud and on-premises

High

Power BI supports both pure cloud and hybrid architectures
Security and User Administration
Capabilities that enable platform security, administering users, and auditing platform access and utilization

Medium

More work is required to support external users in Power BI, Power BI Embedded, and SSRS
Data Source Connectivity
Capabilities that allow users to connect to the structured and unstructured data contained within various types of storage platforms, both on-premises and in the cloud.

High

As of this time, Power BI supports close to 70 connectors to let you connect to cloud and on-premises data sources. No scripting required.
Data Management Governance and Metadata Management
Tools for enabling users to share the same systems-of-record semantic model and metadata. These should provide a robust and centralized way for administrators to search, capture, store, reuse and publish metadata objects, such as dimensions, hierarchies, measures, performance metrics/key performance indicators (KPIs) and report layout objects, parameters and so on. Administrators should have the ability to promote a business-user-defined data model to a system-of-record metadata object.

Medium

Power BI has done a good job to provide auditing and admin oversight but more work is required for proactive monitoring and improving its data governance capabilities
Self-Contained Extraction, Transformation and Loading (ETL) and Data Storage
Platform capabilities for accessing, integrating, transforming and loading data into a self-contained storage layer, with the ability to index data and manage data loads and refresh scheduling.

Medium

SSIS is the most popular on-premises ETL tool. More work is required to bring similar capabilities in the cloud (I think Azure Data Factory is a step backwards)
Self-Service Data Preparation
The drag-and-drop, user-driven data combination of different sources, and the creation of analytic models such as user-defined measures, sets, groups and hierarchies. Advanced capabilities include semantic autodiscovery, intelligent joins, intelligent profiling, hierarchy generation, data lineage and data blending on varied data sources, including multistructured data

High

Power BI Desktop and Excel has a fantastic query editor (originated from Power Query) that scores big with business users. Tableau doesn’t have such native capabilities. Power BI and Excel have best of class self-modeling capabilities (much better than Tableau). Azure Query Catalog can be used for dataset autodiscovery.
Analysis and Content Creation Embedded Advanced Analytics
Enables users to easily access advanced analytics capabilities that are self-contained within the platform itself or available through the import and integration of externally developed models.

High

Not sure what is meant here by “advanced analytics capabilities”. Power BI supports integration with R, Azure Machine Learning, clustering, forecasting, binning, but I might be missing something.
Analytic Dashboards
The ability to create highly interactive dashboards and content, with visual exploration and embedded advanced and geospatial analytics, to be consumed by others

High

“Highly interactive dashboards and content” is what Power BI is all about.
Interactive Visual Exploration
Enables the exploration of data via the manipulation of chart images, with the color, brightness, size, shape and motion of visual objects representing aspects of the dataset being analyzed. This includes an array of visualization options that go beyond those of pie, bar and line charts, to include heat and tree maps, geographic maps, scatter plots and other special-purpose visuals. These tools enable users to analyze the data by interacting directly with a visual representation of it

High

According to Gartner’s definition, Power BI should score high but more work is required on the visualization side of things, such as ability to drill through a chart point as we can do in SSRS.
Mobile Exploration and Authoring
Enables organizations to develop and deliver content to mobile devices in a publishing and/or interactive mode, and takes advantage of mobile devices’ native capabilities, such as touchscreen, camera, location awareness and natural-language query

High

Native apps for iOS, Android and Windows to surface both Power BI and SSRS reports.
Sharing of Findings Embedding Analytic Content
Capabilities including a software developer’s kit with APIs and support for open standards for creating and modifying analytic content, visualizations and applications, embedding them into a business process, and/or an application or portal. These capabilities can reside outside the application (reusing the analytic infrastructure), but must be easily and seamlessly accessible from inside the application without forcing users to switch between systems. The capabilities for integrating BI and analytics with the application architecture will enable users to choose where in the business process the analytics should be embedded.

High

An Azure cloud service, Power BI Embedded allows you to do this with an appealing cost-effective licensing model.
Publishing Analytic Content
Capabilities that allow users to publish, deploy and operationalize analytic content through various output types and distribution methods, with support for content search, storytelling, scheduling and alerts.

Medium

Power BI supports subscriptions and data alerts but we can do better, such as to allow an admin to subscribe other users. “Storytelling” can mean different things but I thought the integration with Narrative Science can fall into this category.
Collaboration and Social BI
Enables users to share and discuss information, analysis, analytic content and decisions via discussion threads, chat and annotations

High

Power BI supports this with workspaces and Office 365 unified groups.

Of course, there are many competing definitions of what constitutes a BI and Analytics platform. Again, it looks to me that Gartner has predominantly focused on the self-service BI aspect of it (even there Microsoft should have scored higher) and ignored the SQL Server BI features and all the cloud BI-related products (Azure SQL Database, SQL Data Warehouse, Azure ML, Query Catalog, HDInsight, StreamInsight). If we take them in consideration, where will that dot be?

Disaster Recovering

Over the last couple of weeks, I learned a thing or two about disaster recovery. My server crashed and it had to be replaced. While we were able to get Prologika.com up and running quickly, the community site was down for a while. I am happy to report that while there are some kinks remaining, for the most part the community site has been restored.

My Wish List for SQL Server.vNext

In an attempt to give customers more transparency, Microsoft recently published a Cloud Platform Roadmap. The product groups are also actively seeking feedback for the on-premise products (also known as boxed products). For example, if you have subscribed to the Power BI.vNext public preview, you can use the BI in SQL vNext forum to provide feedback (not sure why there instead of on connect.microsoft.com).

So, here is my open high-level SQL Server.vNext wish list:

  • SSAS – Merge Multidimensional and Tabular and take best from both. I like the in-memory engine and flexibility of Tabular but at the same time I miss the MD pro features. To put this in perspective, I’d like to see the following features added to Tabular:
    • Additional relationship types, such as M2M relationships (it looks like this will happen judging by the Power BI features), multi-grain relationships (joining to a table at a higher grain than the key), role-playing
    • Scope assignments
    • Sets
    • MDX support for business calculations – BI pros have been learning MDX for the past 15 years so this knowledge shouldn’t be wasted. Besides, some constructs are better expressed in MDX not to mention that Tabular doesn’t have support for hierarchy navigation.
    • Enterprise scale features – parallel partition processing within a table, ability to define which columns will be hierarchized (measures typically shouldn’t, yet now every column is hierarchized), compression on calculated columns
    • Removal of DirectQuery limitations
    • Custom (non Windows-based) security – This applies to both MD and Tabular in order to allow developers to embed reports in custom applications.
    • Better toolset – No matter how good SSAS gets, it won’t go far if there is no good client support. Excel and Power View need catching up. For example, Excel need to generate DAX natively, optimize MDX queries (especially for detail-level reports), and add Power View support for MD, plus numerous enhancement to catch to modern interactive tools.
    • Tabular Designer enhancements – it’s painful to wait SSDT to refresh after every step.
  • SSRS
    • Report Designer should continue where it left off in 2010 when focus was shifted on Power View. See top requested features on connect.microsoft.com, especially in the area of report parameters.
    • Power View – conditional formatting, drillthrough, SSAS action support, mapping enhancements (region coloring, point-to-point mapping)
    • Ability to embed Power View reports in custom apps.
    • Data alerts on Power View and SSRS in native mode.
    • Q&A on prem
    • Bring Power BI features to boxed products. Since we decouple Power BI from SharePoint, installing it on a local IIS server shouldn’t be an issue.
  • SSIS – Project mode and development enhancements in 2012 were great but the product group should take a look at competing products, especially those that attempt to automate ETL. I don’t care much about the Data Flow since I typically use the ETL pattern, so I won’t comment on the data flow tasks.
    • Data profiling should be built in the data sources instead of separated as a separate task
    • Change management – make it easy to see what source mappings have changed. Don’t make me go through input and output connectors to fix these mappings when they change.
    • Scale out ETL across multiple servers
  • SQL Database Engine – Based on what I do, I need better support for data warehousing.
    • Improve query optimizer for large joins
    • Unify memory technologies – in most cases you want to have both in-memory tables and fast analysis.
    • Enhance clustered columnstore indexes to support additional regular indexes. See my post for more info.
  • MDS/DQS
    • Ideally, unify MDS and DQS into a single product as there is a significant overlap.
    • MDS Excel add-in is good but usability needs to be improved. No one wants to scroll a long pick list of a domain-based attribute to find something.
    • Ability to reference an entity from another model.

Microsoft is Serious about Statistical Analysis and Machine Learning

Microsoft announced two company acquisitions related to data analytics.

It announced that it will acquire Revolution Analytics. Revolution Analytics is the leading commercial provider of software and services for R, the world’s most widely used programming language for statistical computing and predictive analytics. This acquisition could help more companies use the power of R and data science to unlock big data insights with advanced analytics.

Previously, Microsoft also announced that it will acquire Equivio. Equivio is a provider of machine learning technologies for eDiscovery and information governance to help customers tackle the legal and compliance challenges inherent in managing large quantities of email and documents.

SQL Server and 20 Cores Limit

Scenario: You execute a SQL Server 2012 task that uses parallelism, such as index rebuild or a query on a server with more than 20 cores running SQL Server 2012 Enterprise Edition. In the Windows Task Manager, you observe that the task uses only 20 cores. We discovered this scenario during a rebuild of a columnstore index. To confirm this further, you examine the SQL Server log and notice that a similar message is logged when the SQL Server instance starts:

“SQL Server detected 8 sockets with 4 cores per socket and 4 logical processors per socket, 32 total logical processors; using 20 logical processors based on SQL Server licensing. This is an informational message; no user action is required.”

Explanation: More than likely, you have upgraded to SQL Server 2012 from SQL Server 2008 R2 under Software Assurance. Microsoft created a special SKU of Enterprise Edition to support this scenario with the caveat that this SKU limits an instance to using only 20 processor cores (or 40 CPU threads if hyperthreading is enabled). If this level of parallelism is not enough, the only solution is to switch to the Enterprise Edition SKU that is licensed per core and purchase a license that covers as many cores as needed. Once you obtain the new license key, you can upgrade your SQL Server instance:

Setup.exe /q /ACTION=editionupgrade /INSTANCENAME=MSSQLSERVER /PID=<PID key for new edition>” /IACCEPTSQLSERVERLICENSETERMS

2015 Annual TI Forecast by TEKsystems

TEKSystems has been a wonderful sponsor of the Atlanta MS BI Group. They’ve recently published an interesting 2015 Annual TI Forecast report. According to the report, Business Intelligence/Big Data will be among the top most impactful technologies in 2015. More key facts:

  • Seventy-one percent of IT leaders report confidence in their ability to satisfy business demands in 2015, representing an increase from 66 percent and 54 percent in forecasts for 2014 and 2013, respectively.
  • The top five areas where most IT leaders expect to increase spending in 2015 include security (65 percent), mobility (54 percent), cloud (53 percent), BI/Big Data (49 percent) and storage (46 percent). Twenty-nine percent of IT leaders also expect to increase spending on ERP.
  • Seventy-three percent of IT leaders indicate that operational objectives such as reducing costs, improving efficiency, consolidating, standardizing and streamlining present the biggest organizational challenges.
  • Salary increases are most likely to be average, with 68 percent of IT leaders saying that they expect overall staff salaries to increase by up to 5 percent. Only 8 percent expect increases of 6 percent or more and 21 percent expect salaries to remain the same.
  • Hiring expectations have also slowed. Entering 2014, 47 percent of IT leaders expected an increase in full-time IT staff hiring. Entering 2015, just 40 percent expect an increase, and 50 percent expect it to be the same as 2014.

Finalist for BI and Analytics Innovation Awards

Out of 26 applications, our BI solution for Recall Corporation was nominated as one of the six finalists for the BI and Analytics Innovation Awards presented tonight by the Atlanta TAG B/I society! Although we didn’t win the award, we’re honored to be shortlisted. Eric Falthzik, Co-chairman at Technology Association of Georgia – Business Intelligence and Analytics Society and Director of Customer Intelligence at Cox Communications said that this is most interesting and sophisticated Microsoft-based solution that he’s seen.

I’d like to thank Recall for giving us the opportunity to implement an innovative BI solution that changes the way they analyze their business. I’d like to also thank the TAG B/I panelists for appreciating its value.

111313_0341_Finalistfor1
111313_0341_Finalistfor2

Semantic Layer Flavors

Semantics relates to discovering the meaning of the message behind the words. In the context of data and BI, semantics represents the user’s perspective of data: how the end user views the data to derive knowledge from it. As a modeler, your job is to translate the machine-friendly database structures and terminology into a user-friendly semantic layer that describes the business problems to be solved and centralizes business calculations. Different vendors have different ideology and implementations of semantic layers. Some, such as Oracle and MicroStrategy, implement the semantic layer as a thin, pass-through layer whose main goal is to centralize access to data sources, metadata and business calculations in a single place.

A Fortune 100 company that had a major investment in Oracle engaged us to recommend a solution for complementing their OBIEE semantic layer with user-friendly reporting tool that would allow end users to create ad hoc transactional reports from their Oracle Exadata-based enterprise data warehouse. In the OBIEE world, the role of the semantic layer is fulfilled by the Common Enterprise Information Model that encompasses data sources, metrics, calculations, definitions, and hierarchies. When users submit report queries, the Oracle BI Server compiles incoming requests and auto-generates native queries against the data sources plugged in the Common Enterprise Information Model.

In theory, this “thin” implementation should work pretty well. In reality, several issues surface:

  1. Performance – Any tool, either a reporting tool or semantic layer, that auto-generates queries should be highly suspect of performance issues. First, no matter how well the tool is designed, it’s unlikely that it would generate efficient queries in all cases. Second, because the tool sends the query to the underlying data source, the overall report performance is determined by how fast the data source crunches data. In this case, the organization was facing performance issues with data aggregation. To circumvent them, their BI developers have implemented summarized fact tables. Once the user reaches the lowest level of the summary fact table, OBIEE would allow the user to drill down to a chained table, such as the actual fact table. Needless to say, business users were complaining that analysis was limited to the dimensionality of the summarized fact table. Further, to avoid severe performance issues with historical reports, the BI developers have taken the “no big aggregations, no problem” approach by forcing a time filter on each request that touches the large fact tables. This effectively precluded historical analysis by year, quarter, etc.
  2. Scalability – When a query involves multiple data source, OBIEE would send a query to each data source, load the data in memory, and then join the datasets together to find matching data. This also presents performance and scalability challenges. To solve performance and scalability issues with “thin” semantic layers and biggish data, organizations ultimately resort to expensive back-end MPP systems, such as Teradata or Netezza, which can parallelize data crunching and compensate for inefficient auto-generated queries.
  3. Common Denominator – You’re limited you to a subset of supported features of the underlying data source.
  4. Vendor Dependency – Such an architecture ultimately locks you in the data sources supported by the vendor. What happens when there is a new version of the data source that has some cool new features? Well, you need to wait until the vendor officially supports the new version and its extensions.

In the Microsoft BI world, the role of the semantic layer is fulfilled by the Business Intelligence Semantic Model (BISM). BISM is an umbrella name that encompasses two technologies: Multidimensional for implementing OLAP cubes and Tabular for designing relational-like models. The big difference is that by default both Multidimensional and Tabular store data on the server. In order words, from a data perspective, you can view BISM and its default storage mode (MOLAP for Multidimensional and xVelocity for Tabular) as a disk-based (MOLAP) or in-memory (xVelocity) cache of the data in your data warehouse. This proprietary storage is designed for fast data crunching even with billions of rows. You pay a price for loading and refreshing the data but you enjoy great query performance with historical and trend reports. And, you get a great ROI because you may not need an expensive MPP appliance or excessive data maintenance. When sent to BISM, queries are served by BISM, and not by the RDBMS. No need for summarized tables and time filters.