Prologika Newsletter Winter 2020

I hope you’re enjoying the holidays. In this newsletter, I’ll discuss a very important enhancement to Power BI that lets business users extend semantic models. But before I get to it, a quick announcement. I’m putting the finishing touches of the sixth edition of my “Applied Microsoft Power BI”! It should be available on Amazon in the first days of 2021. I’ve been updating this book thoroughly every year since 2015 to keep it up to the date with this fast-changing technology.

I’ve written extensively on the important role that EDW and organizational semantic models have for delivering the “Discipline at the core and flexibility at the edge” tenant for effective data analytics. Analysis Services Tabular is available in three SKUs: Power BI, Azure Analysis Services, and SSAS, and it’s the workhorse of Power BI Service. When you publish a Power BI desktop file, it becomes a database hosted in some Analysis Services Tabular server managed by Microsoft.

How live connections work

As the diagram below shows, Power BI uses a special live connectivity option when you connect live to Analysis Services in all its flavors (Multidimensional, Tabular, and Power BI published datasets) and SAP (SAP Hana and SAP Data Warehouse). In this case, the xVelocity engine isn’t used at all and the model is absent. Instead, Power BI connects directly to the data source and sends native queries. For example, Power BI generates DAX queries when connected to Analysis Services .

There is no Power Query in between Power BI Desktop and the data source, and data transformations and relationships are not available. In other words, Power BI becomes a presentation layer that is connected directly to the source, and the Fields pane shows the metadata from the model. This is conceptually very similar to connecting Excel to Analysis Services.

Unfortunately, once you connected Power BI Desktop to a multidimensional data source, that remote model was the only data source available for you.

Understanding the change

Power BI Desktop (December 2020 release) removes this long-standing limitation for live connections to Tabular. In the special case of connecting to a dataset published to Power BI Service and Azure Analysis Services (on-prem SSAS is not supported), you can switch from live connectivity to DirectQuery and add external data to build a composite model. This feature is very important because it allows business users to extend semantic models that could be sanctioned by someone else in the organization!

If the first connection you make is to the remote model then the connection will use Live Connect. The Power BI Desktop file will not store any metadata or data, expect for the connection string. The moment you use “Get Data” to connect to another source and accept the prompt, Power BI Desktop replaces permanently the live connection with a local DirectQuery layer and imports the metadata of the remote model. Even if you remove all external tables, you won’t be able to “undo” the change and switch back the file to Live Connect. In the diagram below, FactResellerSales, DimDate, and Employees tables are hosted in the remote model while FactSalesQuota is an external table that is imported (could be in DirectQuery mode).

What happens behind the scenes

In a nutshell, DirectQuery to Analysis Services Tabular is like other DirectQuery sources where DAX queries generated by Power BI are translated to native queries. However, in this case Power BI either sends the DAX queries directly to the remote model when possible or breaks them down into lower-level DAX queries. In the latter case, the DAX queries are executed on the remote model and then the results are combined in Power BI to return the result for the original DAX query. So, depending on the size of the tables involved in the join, this intermediate layer may negatively impact performance of visuals that mix fields from different data sources.

Applying your knowledge about composite models, you might attempt to configure the dimensions in dual storage, but you’ll find that this is not supported. Behind the scenes, Power BI handles the join automatically, so you do not need to set the storage mode to Dual. It’s interpreted as Dual internally. You can make metadata changes on top of the remote model. For example, you can format fields, create custom groups, implement your own measures, and even calculated columns (calculated columns are now evaluated at runtime and not materialized). The changes you make never affect the remote model. They are saved locally in the DirectQuery model.

Currently, row-level security (RLS) doesn’t propagate from the remote model to the other tables. For example, the remote model might allow salespersons to see only their sales data by applying RLS to the Employees table. However, the user will be allowed to see all the data in the FactSalesQuota table because it’s external to the remote model and RLS doesn’t affect it.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Prologika Newsletter Fall 2020

Cloud deployments are the norm nowadays for new software projects, including BI. And Azure Synapse Analytics shows a great potential for modern cloud-based data analytics platform. Here are some high-level pros and cons to keep in mind for implementing Azure Synapse-centered solutions that I harvested from my real-life projects and workshops.

 

What’s to Like

There is plenty to like in Azure Synapse which is the evaluation of Azure SQL DW. If you’re tasked to implement a cloud-based data warehouse, you might be evaluating three Azure SQL Server-based PaaS offerings: Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse. However, Azure SQL Database and Azure SQL MI are optimized for OLTP workloads. For example, they have full logging enabled and replicate each transaction across replicas. Full logging is usually a no-no for decent size DW workloads because of the massive ETL changes involved.

In addition, to achieve good performance, you’ll find yourself moving up the performance tiers and toward the price point of the lower Azure Synapse SKUs. Further, unlike Azure SQL Database, Azure Synapse can be paused, such as when reports hit a semantic layer instead of DW, and this may offer additional cost cutting options.

To get started with Synapse, you need to create a workspace. A workspace can host two analytics runtimes, SQL and Spark. These runtimes are hosted in pools. For example, you might have one pool Pool for your enterprise data warehouse, and other SQL pool for a departmental data mart, all provisioned within the same workspace.

A SQL Pool (previously known as Azure SQL DW) hosts your SQL relational database. As an MPP system, it can scale to petabytes of data with proper sizing and good design. The obvious benefit is that for the most part (see the Ugly section discussing exclusions) you can carry your SQL Server skills to Azure Synapse.

As a general rule of thumb, Azure Synapse should be at the forefront of your modern cloud DW with data loads starting with 500 GB.

I like the integration with Azure Data Lake Storage (ADLS). Previously, you had to set up external tables to access files in ADLS via PolyBase. This is might redundant now thanks to the serverless (on-demand) endpoint which also replaces the Gen1 U-SQL with T-SQL. All you have to do is link your ADLS storage accounts and use the T-SQL OPENROWSET construct to query your ADLS files and exposed them as SQL datasets. Not only does this opens new opportunities for ad hoc reporting directly on top of the data lake, but it also can help you save tons of ETL effort to avoid staging the ADLS data while loading your DW.

As the diagram shows, you can also provision an Azure Spark runtime under a Synapse workspace. This opens the possibility for using Azure Spark/Databricks for data transformation and data science. For example, a data scientist can create notebooks in Synapse Studio that use a programming language of their choice (SQL, Python, .NET, Java, Scala, and R are currently supported).

What Needs Improvement

You have two coding options to implement artifacts for Azure Synapse, such as ETL and reports:

  1. You can use the respective development environment, such as Azure Data Factory Studio for ADF development and Power BI Desktop for Power BI report development.
  2. You can use Synapse Studio which attempts to centralize coding in one place.

Besides the ADLS integration and notebooks, I find no advantages in the Synapse Studio “unified experience”, which is currently in preview. In fact, I found it confusing and limiting. The idea here is to offer an online development environment for coding and accessing in one place all artifacts surrounding your DW project. Currently, these are limited to T-SQL scripts, notebooks, Azure Data Factory pipelines, and Power BI datasets and reports. However, I see no good reason to abandon the ADF Studio and do ETL coding in Synapse Studio. In fact, I can only see reasons against doing so. For example, Synapse Studio lags in ADF features, such as source code integration and wrangling workflows. For some obscure reason, Synapse Studio separates data flows from ADF pipelines in its own Data Flows section in the Development Hub. This is like putting the SSIS data flow outside the package. Not to mention that inline data flows should be deemphasized since ELT (not ETL) is the preferred pattern for loading Azure Synapse. To make things worse, once you decide to embrace Synapse Studio for developing ADF artifacts, you can use only Synapse Studio because the ADF pipelines are not available outside Synapse.

On the Power BI side of things, you can create a Power BI dataset in Synapse Studio. All this saves is the effort to go through two Get Data steps in Power BI Desktop to connect to your Azure Synapse endpoint. You can also create Power BI reports using Power BI Embedded in Azure Synapse. Why would I do this outside Power BI Desktop is beyond me. Perhaps, I’m missing the big picture and more features might be coming as Synapse evolves. Speaking of evolution, as of this time, only SQL pools are generally available. Everything else is in preview.

What’s Not to Like

Developers moving from on-prem BI implementations to Azure Synapse, would naturally expect all SQL Server features to be supported, just like moving from on-prem SSAS to Azure AS. However, they will quickly find that Azure Synapse is “one of a kind” SQL Server. The nice user interface in SSMS for carrying out admin tasks, such as creating logins and assigning logins to roles, doesn’t work with Azure Synapse (all changes require T-SQL). SQL Profiler doesn’t work (tip: use the Profiler extension in Azure Data Studio for tracing queries in real time).

Because Azure Synapse uses Azure SQL Database, it’s limited to one physical database, so you must rely on using schemas to separate your staging and DW tables. There are also T-SQL limitations, such as my favorite MERGE for ELT-based UPSERTs is not yet supported (currently in preview for hash and round-robin distributed tables but not for replicated tables, such as dimension tables). So, don’t assume that you can seamlessly migrate your on-prem DW databases to Synapse. Also, don’t assume that your SQL Server knowledge will suffice. Prepare to seriously study Azure Synapse!


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Prologika Newsletter Summer 2020

More and more organizations are switching to Microsoft Teams mostly for online meetings. However, Microsoft Teams can deliver much more than that and it will be unjust to compare it with other popular meetings apps. In this newsletter you’ll learn how Prologika helped a public school district gain important insights from Microsoft Teams. Learn more and try an interactive Power BI report in the related case study.

Teams for Education

What Microsoft has done with Teams is amazing! Now that I got to study and program Microsoft Teams, I have found it a powerful and extensible platform. For example, Microsoft has provided add-ons for common verticals, such as Teams for Education. Microsoft Teams can be further enriched with apps and can be used to centralize all these Excel spreadsheets that are floating around. And the best of it: most of the Microsoft Teams data is exposed via a single data source: Microsoft Graph API, with the caveat that it is not that easy to get it out.

During the COVID-19 crisis, Fulton County Schools (FCS) transitioned from traditional classroom content delivery to online digital teaching and learning. FCS adopted Microsoft Teams for Education as an online platform to foster collaboration between teachers and students. However, FCS struggled for a long time trying to obtain the necessary data from Microsoft Teams to analyze how effectively teachers and students utilize Microsoft Teams in these challenging times. The Microsoft-provided Office 365 Usage Analytics for Power BI didn’t provide the depth of information FCS needed, such as to analyze counts of assignments that teachers give to students and submissions that students prepare for an assignment. FCS also wanted to understand about how often teachers and students meet and for how long.

Solution

Prologika developed a custom app to extract the required data by calling the Microsoft Graph APIs. The app supported a configurable degree of parallelism to process data in parallel. This reduced the duration for the full load to less than 7 hours. In addition, incremental daily ETL decreased the extraction time even further. The data was saved in a relational database consisting of several entities, such as Schools, Classes, Assignments, Submissions, and Events. Data availability led to data analytics. Power BI reports were implemented to deliver important insights, such as the report below.

Our BI solutions help clients implement a digital feedback loop and act upon the data. The process starts with acquiring the necessary data and designing a comprehensive data model that can answer important business questions. Then, the client applies the insights to improve their processes. This leads to a powerful feedback mechanism where outputs are routed back as inputs to business.

Prologika Newsletter Spring 2020

Do you know that according to Gartner, at least five of the top 10 technology trends for 2020 will involve predictive analytics? And the third on the list is “democratization” to deliver it to non-specialists. With the growing demand for predictive analytics, Automated Machine Learning (AutoML) aims to simplify and democratize predictive analytics so business users can create their own predictive models. The promise of AutoML is to bring predictive analytics to business users, just like Power BI democratizes data analytics, Power Apps democratizes app dev, and Power Query democratizes data shaping and transformation.

 

Comparing Options

As a business user, the two most popular options for applying Automated Machine Learning for predictive analytics are Power BI and AzureML. Behind the scenes, Power BI AutoML uses the automated machine learning feature of AzureML but there are differences and I summarize below the most important ones.

  Power BI AutoML Azure AutoML
Licensing Power BI Premium Azure ML (Enterprise Edition recommended)
Container Dataflow Experiment
Power Query Available Not available
Supported data sources Many A few (local files, Azure SQL DB, ADLS, and a few more)
Model Not Accessible (Power BI handles everything) Accessible
Web service endpoint Not available outside Power BI Available for app integration
Scoring Apply the model to entity Various options (Notebooks, SDK, custom integration)

To me, the best solution would have been the combination of both technologies. I like Power Query for sourcing, shaping and transforming the data, but I also like the flexibility that AzureML brings. Unfortunately, you can’t mix and match. It appears that AzureML has decided to roll out their own data connectivity mechanism and as a result, it supports a limited number of data sources (for example, on-prem data sources are not accessible). I expect this to change as the product evolves.

Azure ML Studio

I’ve done recently some work with the new version of Azure ML Studio (https://ml.azure.com/), and I’m impressed. Microsoft has learned important lessons from the previous AzureML (now called “classic”) and greatly enhanced the product. If you’re looking for a SaaS ML toolset that targets both business users and data scientists, AzureML should be on the top of your list. Speaking of its AutoML feature, the main advantages that it brings for predictive analytics are:

  • Determining the model type – classification, regression, and time series forecasting (the last one is not available yet in Power BI)
  • Automatic featurization
  • Selecting the best algorithm – For example, the screenshot below shows how AzureML has tested various algorithms and determined that VotingEnsemble performs the best.

Even if you’re a data scientist, the best algorithm selection feature alone justifies giving AutoML a try – if not for anything else but to select the best algorithm so that you don’t have to spend enormous time testing different algorithms.

Prologika Newsletter Winter 2019

Happy Holidays! I hope you’re enjoying this special time of the year. A few months ago, I did an assessment for a large company that was advised by an undisclosed source that they should use their Dynamics Financials and Operations (F&O) system as a data warehouse. Recently, I came across a similar wish but this time to use SAP as EDW. Can we do this? But before I give you my opinion, I’m excited to announce the availability of the fifth edition of my “Applied Microsoft Power BI” book – the only Power BI book that it’s updated every year to keep it up with the ever-changing world of Power BI and the Microsoft Data Platform! The book is making slowly its way to the retailers and it should be available on Amazon in the first days of 2020.

Operational Reporting

I understand that everyone wants to do more with less and shortcuts are tempting. But ERP systems are systems of record, just like any other data source. True, they could own most of the core data that you need for analytics. But that data is normalized and stored in a format that’s not conducive for analytics. To make things worse, Dynamics doesn’t even give you direct access to its SQL Server database on your production instance. You must go through REST APIs or export data to gain access to it. And to add new tables, you must create entities in Visual Studio! Still want to build a data warehouse in Dynamics?

ERP systems typically have some reporting features, but these features typically deliver only operational reporting. Operational reporting has a narrow view concerned with “now”, such as a report that shows customers with outstanding balances as of today. For example, Dynamics comes with standard SSRS reports. You could also enable analytical workspaces that deliver reports via Power BI Embedded. These reports, however, are operational reports. By contrast, BI is mostly concerned with historical and trend analysis.

BI Axioms

In math, axioms are statements that are assumed to be correct without a proof. We need BI axioms and the list can start like this:

  • Every mid to large company shall have a centralized data repository for consolidating trusted data that is optimized for reporting. The necessity for such a repository is in a direct correlation with the number of the data sources that must be integrated (that number will increase over time) and the complexity of the data transformation. The centralized data repository is commonly referred to as a data warehouse.
  • Dimensional modeling shall be the methodology to design the data warehouse schema. I know it’s tempting to declare your ODS as a data warehouse, but highly normalized schemas are not suitable for reporting.
  • If you’re after a single version of the truth, you shall have an organizational semantic layer. Find why you need it in my “WHY SEMANTIC LAYER?” newsletter.
  • ERP systems are not a replacement for a data warehouse. Neither are data lakes and Big Data.
  • You shall have both organizational and self-service BI, and they should complement each other. If you lean too much toward organization BI, you’ll get a backlog of requirements. If you lean too much toward self-service BI, you’ll end up with fragmented “spreadmarts”, which is where you probably started.
  • Most of the BI effort shall go toward organizational BI to integrate data, improve data quality, and centralize business calculations. Tools come and go but this effort shall endure.
  • Agile and managed self-service BI shall fill in the gaps. It should provide a feedback loop to extend organizational BI with data that the entire organization can benefit from.

 


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Gold Partner | Data Analytics

logo

Prologika Newsletter Fall 2019

090918_1951_DAXVariable1.pngWe all need to share. But until a couple of months ago, a Power BI training or assessment wouldn’t be complete unless I got hammered on the Power BI sharing limitations. Fortunately, Microsoft has addressed most of these and I have now a much better story to tell. And this is the subject of this newsletter.

Quo vadis, Power BI Sharing?

Power BI sharing road has been a long and winding one. I covered the gist in my blogs “Power BI Sharing is Getting Better“, “Power BI Sharing is Getting Better 2“, and “The Power BI Viewer Role“. The following table summarizes prior and standing sharing limitations. The Solution column lists the “fix” by Microsoft.

Limitation Solution
Workspace dependency on Office 365 groups V2 workspaces don’t depend on O365 groups
No IT oversight on creating workspaces You can now control who can create workspaces in the Power BI Admin center
Group membership limitations V2 workspaces support all O365 group types
Coarse content access level Contributor and Viewer roles
No cross workspace sharing A dataset can be shared across workspaces
No data governance A dataset can be promoted and certified
Power BI Premium sharing with viewers require report sharing or apps The Viewer role supports sharing with viewers
Only one app supported per workspace The Viewer role deemphasizes apps
No nesting support (subfolders)

Sharing Best Practices

Given the current state of Power BI, I’d like to share some best practices for organizing and sharing content:

  1. Don’t use report and dashboard sharing as it can quickly turn into a maintenance nightmare.
  2. Create workspaces to reflect your organizational functional areas, e.g. Sales, HR, Finance. Unfortunately, workspaces still don’t support subfolders (Microsoft hinted that they are working on such a feature), so for now you must resort to a flattened list.
  3. Instead of individual user assignments, add users to security groups and then add these groups as members to the workspace. This way, when the user leaves and company or moves to another department, you only need to change the user’s group membership without making changes to the workspace security policy.
  4. Come up with data governance policy. The Power BI data governance story is work in progress, but you can set up some ground rules. For example, once a contributor believes that a dataset is ready for a broader consumption, he can promote the dataset. Then, IT can verify the dataset and certify it. Currently, Power BI supports certifying only datasets (reports and dashboards aren’t certifiable yet).
  5. This one is hard. Teach data analysts best data modeling practices. Instead of creating a dataset per report, they should create a data model that correctly represents their subject area. A data analyst should create the model once so it can support multiple reports. Once the dataset is published and certified, users can create their own reports.
  6. Although Microsoft is pushing apps very hard and adding some nice features, such as navigation, a standing limitation is that you can create only one app per workspace. So, you can’t publish multiple apps from a workspace, such as to share some reports with one group of users and another set with a different group. Instead, grant access directly to workspaces by using the Contributor and Viewer roles. The Viewer role lifts a significant limitation that forced you to use report/dashboard sharing or apps to share content out of a premium workspace with viewers. If you want to share the entire content of the workspace, you don’t have to use apps. Instead, you can simply add users or groups as viewers to the workspace.

Despite some long standing limitations, Power BI sharing is coming out of age. Follow the above practices and you’ll have now a much better way to organize and share content.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Prologika Newsletter Summer 2019

090918_1951_DAXVariable1.pngAfter learning how to model the data properly (the most important skill), DAX would be the next hurdle in your self-service or organizational BI journey. You won’t get far in Microsoft BI without DAX. This letter shares a tip on how to identify DAX performance bottlenecks by using a recent feature in Power BI called Performance Analyzer.

Learning DAX

DAXBefore I get to Performance Analyzer, I’m excited to announce my latest book: Applied DAX with Power BI: From zero to hero with 15-minute lessons. This book was born out of necessity. In my consulting practice, I had been teaching and implementing Power BI and Analysis Services Tabular, and people were constantly asking for DAX book recommendations. Indeed, DAX is not an easy topic and has its ways to humble even experienced practitioners. There are a few good reference books out there, but they could be somewhat overwhelming for novice users. So, I turned my classroom and consulting experience into this book and designed it as a self-paced guide to help you learn DAX one lesson at a time.

This one was a different book to write. Unlike my other books, which tend to be heavy, this book is smaller (220 pages). Regardless, it took me a while to write it. The book also adopts a lesson-based approach, so the reader can learn and practice a specific DAX concept quickly. There are usually three sections in each lesson: Practice (with step-by-step guidance), Outcome (a visual that shows the result), and Analysis (explaining why DAX works this way). Most lessons are five to six pages long, and it should take no more than 15 minutes to complete the lesson’s exercises (you can download and view a sample lesson from the book page).

And if you do one lesson per day, you’ll be a DAX expert in a month!

Analyzing DAX Queries

“This report is slow!” I hope you never hear this, but the chances are that you will sooner or later. No one likes watching a spinning progress indicator and waiting for the report to show up. Tucked in the View ribbon of the Power BI Desktop recent builds is a handy tool called Performance Analyzer that can help you identify slow queries. It’s easy to use:

  1. In Power BI Desktop, click the View ribbon and check the Performance Analyzer setting. This will open the Performance Analyzer pane.
  2. Click Start Recording in the Performance Analyzer pane. Once you start recording, any action that requires refreshing a visual, such as filtering or cross-highlighting, will populate the Performance Analyzer pane. You’ll see the statistics of each visual logged in the load order with its corresponding load duration.
  3. You can click the “Refresh visuals” link in Performance Analyzer to refresh all visuals on the page and capture all queries. However, once you are in a recording mode, every visual adds a new icon to help you refresh only that visual.

PerformanceAnalyzer

Performance Analyzer captures the following information:

  • DAX query – The length of time to execute the query.
  • Visual display – How long it took for the visual to render on the screen after the query is executed.
  • Other – This is the time that the visual spent in other tasks, such as preparing queries, waiting for other visuals to complete, or doing some other background processing.

Although not as comprehensive as DAX Studio, Performance Analyzer is a quick and easy way to get an overall idea of what impacts your report page performance. Performance Analyzer is currently only available in Power BI Desktop and it’s not available in Power BI Service.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Prologika Newsletter Spring 2019

At least five of the top 10 IT trends in 2019 identified by Gartner involve machine learning in one form or the other. The list includes autonomous things, augmented analytics, AI-driven development, digital twins, and immersive experience. Gartner also predicts that by 2022, at least 40 percent of new application development projects will have machine learning (ML). Therefore, it’s not surprising to see vendors making huge investments in predictive analytics. This newsletter focuses on two important Power BI enhancements for ML that will be available for public review soon: AutoML and integration with Azure Predictive Analytics services (Azure ML and Cognitive Services).

AutoML

The problem with machine learning is that it’s hard and there aren’t that many data scientists out there (or least ones that you can afford). The promise of AutoML is to bring predictive analytics to data analysts just like self-service BI brought descriptive analytics to business users. Power BI AutoML is based on another recent technology introduced by Microsoft: ML.NET. ML.NET is for .NET developers who want to incorporate machine learning features in .NET apps without external dependencies or tools.

As it stands, Power BI AutoML requires a premium capacity (either via Power BI Premium or Power BI Embedded). The process starts with the data analyst creating an entity containing the data to be predicted. For example, the entity might return a list of customers who have purchased a product in the past with some fields that the data analyst believes are significant to “convert” a customer into a buyer, such as demographics fields. Then, the data analyst can use AutoML to create a predictive model that forecasts the likelihood of future customers to purchase a product. This could be useful for a targeted campaign or to implement “customers who purchased this product have also purchased another product” but many other ML scenarios are possible. The process for implementing an AutoML model is simple and starts with clicking the AI Insights button on the entity page.

030319_2019_PrologikaNe2.png

This starts a wizard that walks the data analyst through a few steps that I described in more detail in my “First Look at Power BI AutoML” blog. In the same blog, I shared some ideas to simplify the experience even further that I hope Microsoft will implement.

Integration with Azure ML and Cognitive Services

While AutoML is a “quick and dirty” way for data analysts to implement predictive models, data scientists need more control. They have many tools in the Microsoft ecosystem, ranging from R and Python in SQL Server to cloud-based PaaS services, such as Azure ML. One of the easiest tool is Azure ML.However, currently Power BI doesn’t integrate natively with Azure ML experiments, so you must resort to using the Web connector to call the experiment’s web service. I demonstrate how you can do this in a singleton way (one row at a time) in chapter 11 of my latest Power BI book (you can download the code from the book page). This method is still available but it’s cumbersome to implement and it’s slow.

I describe the new Power BI integration in my blog “First Look at the New Power BI Integration with Azure ML “. It offers two important benefits:

  • It simplifies the integration with Azure ML and Cognitive Services – In the case of Azure ML, all you have to do is to select the predictive web service and map the source columns as input to the model. Power BI will invoke the web service and add new predictive columns (in this case, to predict if the customer will be a buyer and the probability for this to happen).
    030319_2019_PrologikaNe3.png
  • It’s much faster – Power BI calls the service in batch mode, which is significantly faster than predicting one row at the time.

The Cognitive Services integration works in a similar way. It allows you to augment your Power BI data with cognitive features, such as sentiment analysis and image recognition. For more information, watch the related announcements: integration with Cognitive Services and Azure ML, and the new Auto ML features (skip to 1:08:30).

The forthcoming AutoML and Integration with Azure ML and Cognitive Services will be major enhancements to the Power BI predictive analytics capabilities. Both features will be available in public preview soon.

 


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Prologika Newsletter Winter 2018

Happy Holidays! I hope you’re enjoying this special time of the year. According to Gartner, “poor data quality is hitting organizations where it hurts – to the tune of $15 million as the average annual financial cost in 2017″. Data integration is another huge challenge, you know it and I know it. Microsoft hopes that Power BI dataflows will help data analysts tackle these issues and dataflows are the subject of this newsletter. But before I give you my opinion about this newcomer, I’m excited to announce the availability of the fourth edition of my “Applied Microsoft Power BI” book – the only Power BI book that it’s updated every year to keep it up with the ever-changing world of Power BI and the Microsoft Data Platform! The book is making slowly its way to the retailers and it should be available on Amazon in the first days of 2019.

What’s a dataflow?

I’m sure that by now you’ve heard about Power Query – a tool that is integrated with Power BI Desktop to help the data analyst clean and shape the data. The issue with the Power Query though is that it’s tightly coupled with Power BI Desktop while data integration challenges sometimes need to be solved at a larger scale. Think of dataflows as “Power Query in the Cloud”. Dataflows is to self-service BI what ETL is to Power BI pros.

A dataflow is a collection of Power Query queries, also known as entities, and it lives in a Power BI organizational workspace.

For example, if I want to stage data from Salesforce, I can create a dataflow with as many entities as Salesforce tables I want to stage. Unlike Power Query, which can only output to a table in a data model, a dataflow saves its output as a pair of a CSV file (data) and JSON file (metadata) in a Microsoft provided data lake store.

122018_1838_PrologikaNe2.png

So, for the BI pros who are familiar with SSIS projects for ETL, think of a dataflow as a project and queries as SSIS packages. Just like you can deploy an SSIS project and schedule it to run at a specific time, you can schedule a dataflow, and this will execute all its queries.

The Good

There is a lot to like about dataflows. I can think of two primary self-service scenarios that can benefit from dataflows:

  • Data staging – Many organizations implement operational data stores (ODS) and staging databases before the data is processed and loaded in a data warehouse. As a business user, you can use data-flows for a similar purpose. For example, one of our clients is a large insurance company that uses Microsoft Dynamics 365 for customer relationship management. Various data analysts create data models from the same CRM data, but they find that refreshing the CRM data is time consuming. Instead, they can create a dataflow to stage some CRM entities before importing them in Power BI Desktop. Even better, you could import the staged CRM data into a single dataset or in an organizational semantic model to multiple data copies and duplicating business logic.
  • Certified datasets – One way to improve data quality and promote better self-service BI is to prepare a set of certified common entities, such as Organization, Product, and Vendor. A data steward can be responsible for designing and managing these entities. Once in place, data analysts can import the certified entities in their data models.

There is a solid architecture, vision, and investments behind dataflows. You create dataflows using a tool that you’re already familiar with: Power Query. If you know Power Query, you know dataflows. Microsoft has provided the data lake storage and pricing is included in Power BI Pro/Power BI Premium. So, you have everything you need to get started with dataflows today. Ingesting the dataflow output is very fast too because you import CSV files.

The Bad

My blog on same subject started with positioning as I was left with the impression from watching the MS presentations that Microsoft encourages business users to consider dataflows for any data integration task, from data staging to loading data warehouses and even replacing data warehouses and ETL (heard that vibe before?) But Matthew Roche clarified that “dataflows don’t replace data warehouses and ETL any more than Power BI desktop replaces a full set of BI pro tools. Dataflows complement these pro tools, and enable users who are not BI pros to fill in more gaps in a BI solution, similar to other Power BI capabilities. Although dataflows mean that self-service users can do more without help, the same patterns still apply”.  So, consider this if you hear a different propaganda and use the right tool for task at hand.

You need Power BI Premium to realize the full potential of dataflows. For some obscure reason, features like linked and computed entities are not available in Power BI Pro. I wonder how many customers will feel pushed to go Power BI Premium as more and more features are only available there. I don’t mind scalability and performance related features but incremental refresh and linked entities?

Lastly, the CSV output is both a blessing and a curse. True, ingestion is fast, but text files are not a relational database. Business users can’t make direct changes to the staged data. Further, the only option is to import the dataflow output in Power BI as DirectQuery is not available for text files.

The Ugly

As dataflows stand today (in preview), you (almost) can’t access directly the output generated by dataflows. This precludes the continuum between self-service and organizational worlds. For example, IT might need to import a certified dataset into a data warehouse. But you can only connect to a dataflow in Power BI Desktop because the CSV files are not directly accessible. True, Power BI Premium lets you bring your own data lake storage (currently a preview feature), but a better option in my opinion was to provide access to the raw data in both Power BI Pro and Premium given that this is your data lake and your data.

Prologika Newsletter Fall 2018

Semantics relates to discovering the meaning of the message behind the words. In the context of data and BI, semantics represents the user’s perspective of data: how the end user views the data to derive knowledge from it. A modeler translates the machine-friendly database structures and terminology into a user-friendly semantic model that describes the business problems to be solved. To address this need, you create a semantic model. In my “Why Semantic Layer?” newsletter I explained the advantages of an organizational semantic model. In this newsletter, I’ll discuss how Power BI extends semantic modeling and converges it on a single platform. But before I go into details and speaking of semantic models, I’m excited to announce the availability of my new “Applied DAX with Power BI” workshop and its first in-person and public enrollment class on Oct 15-16 in Atlanta! Space is limited so RSVP today.

Semantic Model Flavors

In Microsoft BI, you can implement a semantic model using Power BI Desktop, Excel (Power Pivot) and Analysis Services (Multidimensional and Tabular). The first two are typically used by data analysts, while Analysis Services is considered a professional technology. Thus, we can classify semantic models into two broad categories: personal (self-service) and organizational. Behind the scenes, Power BI Desktop, Power Pivot and Analysis Services Tabular use the same foundation and storage engine. Nevertheless, up to now there have been feature differences and a strict division between these two types.

Personal Organizational
Author Data analyst BI Pro
Tool Power BI Desktop, Excel (Power Pivot) SSDT and Analysis Services
Scope Narrow (usually to solve specific need) Wide (multiple subject areas)
Implementation effort Short Longer (data warehouse, ETL, model, testing)
Data capacity Limited (up to a few million rows) Larger (millions or billions of rows)
Data quality Trust author Trust modeler and testers
Data centralization May lead to data duplication Single version of truth
Deployment Power BI Service, Power BI Report Server SSAS (on premises)

Azure Analysis Services (cloud)

Consumers Department, project Potentially entire organization

How Power BI Changes Semantic Modeling

Power BI will blur the boundary between the personal and organizational aspects of semantic modeling. First, we’ve already seen how Microsoft introduced the following “pro” features in Power BI that don’t even exist or more difficult to implement with Analysis Services:

  • Incremental refresh – Delivers the ability to refresh portions of a larger dataset, such as the last 7 days. Currently, the largest dataset size supported by Power BI Premium is 10 GB, but Microsoft has already announced that soon Power BI will support larger datasets. What this means for you is that you’d be able to deploy organizational semantic models to Power BI Premium and schedule them for incremental refresh. My blog “Notes on Power BI Incremental Refresh” provides the details on this feature.
  • Composite semantic models – A composite model has heterogenous storage, such as some tables are imported and some are DirectQuery, as I discussed in my blog “Power BI Composite Models: The Good, The Bad, The Ugly“. This brings a lot of flexibility to how you connect to data.
  • Aggregations – Aggregations are predefined data summaries to speed up queries with very large models. My blog “A First Look at Power BI Aggregations” covers Power BI aggregations.

On the tooling side of things, Power BI Desktop will also pick “pro” features, such as perspectives and displays folders. Microsoft hopes that in time Power BI Desktop will serve the needs of both data analysts and BI pros. However, the lack of extensibility and source control, as well as performance issues caused by committing every model change to the background Analysis Services instance, makes me skeptical that pros will embrace Power BI Desktop. But because Microsoft announced plans to open the Power BI Tabular management endpoint (recall that published Power BI datasets are hosted in a “hidden” Tabular server), pros can still use SSDT and community tools, such as Tabular Editor, to design and deploy their models to Power BI Premium.

In time Power BI Premium will become a single cloud platform for hosting Microsoft BI artifacts (semantic models and reports) and facilitating the continuum from personal to organizational BI. This is a great news for BI practitioners frustrated by tooling and deployment differences. At the end, the personal and organizational paths will converge without feature discrepancies. The only difference would be the scope of your organizational model and how for you want it to become “organizational”.

Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo