Prologika Newsletter Fall 2021

LabCorp operates one of the largest clinical laboratory networks in the world. It also has an Interactive Response Technology system that healthcare vendors can use to conduct case studies.  Thanks to the cloud data analytics solution implemented by Prologika, LabCorp and its vendors can now analyze data across case studies. Read this newsletter to learn more about the solution architecture and business value.

 

Business Needs

The data for each study was saved into a separate on-prem SQL database. The total number of databases was more than 1,000. After the initial assessment, Prologika realized that one of the main gaps was that vendors couldn’t report across their studies or gain performance insights from studies conducted by other vendors. Further, as the IRT system evolves over time and to accommodate special requests, there were scheme differences between different versions.

LabCorp underscored the importance of consolidating the data from multiple studies into a single repository. They envisioned a cloud-based PaaS BI solution that would extract data from all the on-prem databases without impacting the system performance and centralize it into an enterprise data warehouse. Vendors would log an external portal that will deliver embedded reports. The first iteration was focused on analyzing audit and log data to gain strategic insights, such as how many users are using the system.

Solution

After assessing the current state and objectives, Prologika recommended and implemented the following architecture:

Prologika implemented an Azure Data Factory (ADF) configurable framework to extract data from the on-prem databases hosted on two production SQL Servers in parallel. The framework would stage the data into partitioned parquet files in Azure Data Lake Storage (ADLS).

Then Prologika created views in Synapse Serverless to consolidate the file extracts. For example, an Audit table could exist in several on-prem databases on both production servers. While each table would be staged in a separate file (or multiple files if the table supports incremental extraction), a Synapse Serverless would present a consolidated view across all files. We were impressed by the capabilities and performance of Synapse Serverless.

Another ADF process would extract the data from the Synapse Serverless views and load the data into a data warehouse hosted in a Synapse SQL pool. Although we considered other methods, such as using a linked server in a Azure SQL Managed Instance to Synapse Serverless to avoid data staging, we settled on Synapse SQL mainly for its scalability. Finally, a Power BI semantic model was created and Power BI Embedded used to deliver reports.

Benefits

The solution delivered the following benefits to LabCorp:

  • Data consolidation – Data was extracted with minimum impact to the operational systems and consolidated in ADLS.
  • Data virtualization – Thanks to the Synapse Serverless flexible support of schema differences, virtual views were created.
  • Scalability – Synapse SQL pool can scale almost indefinitely.
  • Secure and fast insights – Each vendor can access only their data.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Prologika Newsletter Summer 2021

Business Performance Management (BPM) is a methodology to help the company predict its performance. An integral part of a BPM strategy is creating and monitoring a scorecard with Key Performance Indicators (KPIs). In this newsletter, I’ll discuss how the newly released Power BI Goals can help you augmenet your BPM strategy. But before that, I’d like to share my excitement that Microsoft have recently awarded me FastTrack Recognized Solution Architect – Power BI! I’m one of the 33 individuals worldwide who got nominated by the Power Platform product engineering team for consistently exhibiting deep architecture expertise and creating high quality solutions for customers during project engagements.

Introducing Power BI Goals

A vital BI practice for every organization, performance management ensures that important metrics, such as Key Performance Indicators (KPIs), meet established goals. The typical artifact to do so is implementing a scorecard: a report that compares the current state with the desired state of these metrics. You might have also heard the term “balanced scorecard” which is an organization-wide scorecard that tracks several subject areas, such as Finance, Customer, and Operations. In the past, organizations would use different tools, such as the now deprecated PerformancePoint (included in SharePoint Server) to implement balanced scorecards. Realizing the importance of scorecards, Power BI introduced Goals that aim to simplify the process of implementing departmental and organizational scorecards. For more information on how Goals works, watch the “Goals in Power BI” presentation from the Microsoft Business Application Summit

What’s to like
As with anything Power BI, Microsoft has democratized scorecards so business users with no reporting experience can quickly assemble them from existing reports. Think of a goal as a line (or KPIs) in the scorecard. Here is scorecard with two goals:

Currently, Power BI supports two goal types:

  • Static – The goal creator manually enters and track the goal properties, such as current value, target value, and status. This could be useful for quick and dirty KPIs that are not backed by a data source, such as launching a new promotion campaign. In the scorecard above, I created the Revenue goal by entering 5M as the current value and 5.5M as the goal.
  • Data-driven – The goal current value and/or target value can be data-driven and bound to metrics from existing report(s). Coming from Analysis Services, I was initially surprised that Power BI doesn’t require implementing KPI measures, but I get it: Microsoft decided to source the metrics from reports so business users can easily apply filters. If the goal owner chooses a metric from a visual that has a Date field, such as a time series chart, Power BI automatically shows a sparkline for the goal progress over time. And, of course, when the report dataset is refreshed, the goal values are updated.

So, no modeling or Power BI Desktop required assuming that someone else, such as a data analyst, has delivered functional reports with the metrics. Even better, the goal current and target values can come from different reports (even a report in a different workspace if you have permissions), e.g. a report with actuals and another report with targets. So, there is plenty of flexibility here. To mimic a balanced scorecard that spans multiple subject areas, the owner can create subgoals. For example, the main goal could be Finance with subgoals Revenue, Margin, etc. Because like dashboards, goals are “pinned” from reports, the end user can navigate to the underlying report to examine the data in more detail. Users can also add notes to explain the goal behavior to the teammates.

A scorecard is a first-class Power BI citizen, and as such, it can be secured, endorsed, secured with sensitivity labels, annotated, and shared, such as sharing the scorecard to a Microsoft Teams channel. The scorecard data is saved in a Power BI dataset that users can connect to build custom reports. Moreover, Power BI automatically adds daily snapshots to the dataset allowing users to build up a history of the goals. For example, if the underlying report is refreshed daily, the updated goal values will be appended to the dataset. Developers can use the Power BI REST APIs to implement programmatic scorecard management solutions.

What’s not to like

Besides navigating to the underlying report, a goal is a one-liner in the scorecard. I can’t define a goal that shows me a metric sliced by dimension members, such as business unit. Further, subgoals are not currently aggregable, such as to sum or average values when rolling up to the main goal. Like limitations with dashboards, there is no way to apply a global filter to the scorecard, e.g. to filter all goals for the prior month.

Besides current and target values, no other goal properties can be data driven. For example, unlike Analysis Services KPIs, the goal status can’t be currently bound to a DAX measure. Changing the status requires proactive manual “check ins” although Microsoft mentions a forthcoming feature that will let users define rules to change the status, like how you can define rules for dashboard tile alerts. Speaking of data-driven properties, I don’t understand why you must use a date field to get the progress as opposed to any other field, such as Month, in your Date table.

Finally and unfortunately, Goals require Power BI Premium. If we really want to democratize features, shouldn’t we make them available in Pro?

Conclusion

Goals are a Power BI Premium feature aimed at making it easier to create scorecards and monitoring metrics from existing reports. They promote a “bottom-up” culture, where business users can create departmental scorecards without reliance on IT. Microsoft plans more features by the end of the year to make Goals more appealing, such as integration with Power Automate to trigger actions, rolling up subgoals, changing the goal tracking cycle (DoD, MoM, YoY), custom goal formatting, Power BI Mobile experience optimized for phones, providing a scorecard visual, and cascaded goals (hierarchy of goals).

If you find Power BI Goals somewhat inflexible or you don’t have budget to upgrade to Premium, you don’t have to use the Goals feature to implement scorecards. You can define KPIs and create dashboard-looking reports where you have complete control over the scorecard presentation.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Prologika Newsletter Spring 2021


Amidst the COVID pandemic, the Houston Health Department (HHD) had another predicament to tackle. With lab results accumulating rapidly at one million cases per month, the vendor system they used for capturing and analyzing COVID data couldn’t keep up. In this newsletter, you’ll learn how Prologika implemented a BI solution powered by SQL Server and Power BI to solve technology challenges, and deliver fast and reliable insights.

Business Challenges

The vendor SQL Server database had large tables with normalized name-value pairs for each question and answer received from the patient, and for each investigation result. To facilitate reporting, the vendor system had scheduled ETL processes to extract data from the OLTP tables into denormalized tables. However, locking conflicts and large data volumes would crash the system and fail the ETL processes.

As a result, business analysts were not able to get timely data for reporting. HHD and the vendor tried to isolate the reporting loads by replicating the data to a reporting database but the issue with populating the denormalized tables remained

Solution

A good solution starts with a solid foundation. After assessing the current state and objectives, Prologika recommended and implemented the following architecture:

The stand-alone SQL database was replaced with an AlwaysOn availability group. Not only did this provide high availability, but it also isolated operational from reporting workloads.

In the client’s own words “we have compared the cluster server to the report server and cluster is vastly superior with regard to performance for regular queries. One simple run was 4x faster on cluster than the current report server. A much more complex run took four minutes on cluster and I stopped the same run on the report server after 87 minutes.”

Previously, data analysts would produce reports using different tools, ranging from SQL, Python, to Power BI reports. With scarce resources, HHD found it difficult to continue that path. Instead, Prologika implemented a semantic model that was hosted in Power BI.

Benefits

The new architecture and semantic model delivered the following benefits to HHD:

  • A single version of truth – Strategic data and business calculations were centralized on one place.
  • Fast insights – The Power BI reports crunching millions of rows were instantaneous.
  • Isolation – Report loads were isolated from the operational loads because the data was cached in the semantic model.
  • Standardization and centralization – Power BI became the reporting platform for delivering insights across the organization.

 


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Prologika Newsletter Winter 2020

I hope you’re enjoying the holidays. In this newsletter, I’ll discuss a very important enhancement to Power BI that lets business users extend semantic models. But before I get to it, a quick announcement. I’m putting the finishing touches of the sixth edition of my “Applied Microsoft Power BI”! It should be available on Amazon in the first days of 2021. I’ve been updating this book thoroughly every year since 2015 to keep it up to the date with this fast-changing technology.

I’ve written extensively on the important role that EDW and organizational semantic models have for delivering the “Discipline at the core and flexibility at the edge” tenant for effective data analytics. Analysis Services Tabular is available in three SKUs: Power BI, Azure Analysis Services, and SSAS, and it’s the workhorse of Power BI Service. When you publish a Power BI desktop file, it becomes a database hosted in some Analysis Services Tabular server managed by Microsoft.

How live connections work

As the diagram below shows, Power BI uses a special live connectivity option when you connect live to Analysis Services in all its flavors (Multidimensional, Tabular, and Power BI published datasets) and SAP (SAP Hana and SAP Data Warehouse). In this case, the xVelocity engine isn’t used at all and the model is absent. Instead, Power BI connects directly to the data source and sends native queries. For example, Power BI generates DAX queries when connected to Analysis Services .

There is no Power Query in between Power BI Desktop and the data source, and data transformations and relationships are not available. In other words, Power BI becomes a presentation layer that is connected directly to the source, and the Fields pane shows the metadata from the model. This is conceptually very similar to connecting Excel to Analysis Services.

Unfortunately, once you connected Power BI Desktop to a multidimensional data source, that remote model was the only data source available for you.

Understanding the change

Power BI Desktop (December 2020 release) removes this long-standing limitation for live connections to Tabular. In the special case of connecting to a dataset published to Power BI Service and Azure Analysis Services (on-prem SSAS is not supported), you can switch from live connectivity to DirectQuery and add external data to build a composite model. This feature is very important because it allows business users to extend semantic models that could be sanctioned by someone else in the organization!

If the first connection you make is to the remote model then the connection will use Live Connect. The Power BI Desktop file will not store any metadata or data, expect for the connection string. The moment you use “Get Data” to connect to another source and accept the prompt, Power BI Desktop replaces permanently the live connection with a local DirectQuery layer and imports the metadata of the remote model. Even if you remove all external tables, you won’t be able to “undo” the change and switch back the file to Live Connect. In the diagram below, FactResellerSales, DimDate, and Employees tables are hosted in the remote model while FactSalesQuota is an external table that is imported (could be in DirectQuery mode).

What happens behind the scenes

In a nutshell, DirectQuery to Analysis Services Tabular is like other DirectQuery sources where DAX queries generated by Power BI are translated to native queries. However, in this case Power BI either sends the DAX queries directly to the remote model when possible or breaks them down into lower-level DAX queries. In the latter case, the DAX queries are executed on the remote model and then the results are combined in Power BI to return the result for the original DAX query. So, depending on the size of the tables involved in the join, this intermediate layer may negatively impact performance of visuals that mix fields from different data sources.

Applying your knowledge about composite models, you might attempt to configure the dimensions in dual storage, but you’ll find that this is not supported. Behind the scenes, Power BI handles the join automatically, so you do not need to set the storage mode to Dual. It’s interpreted as Dual internally. You can make metadata changes on top of the remote model. For example, you can format fields, create custom groups, implement your own measures, and even calculated columns (calculated columns are now evaluated at runtime and not materialized). The changes you make never affect the remote model. They are saved locally in the DirectQuery model.

Currently, row-level security (RLS) doesn’t propagate from the remote model to the other tables. For example, the remote model might allow salespersons to see only their sales data by applying RLS to the Employees table. However, the user will be allowed to see all the data in the FactSalesQuota table because it’s external to the remote model and RLS doesn’t affect it.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Prologika Newsletter Fall 2020

Cloud deployments are the norm nowadays for new software projects, including BI. And Azure Synapse Analytics shows a great potential for modern cloud-based data analytics platform. Here are some high-level pros and cons to keep in mind for implementing Azure Synapse-centered solutions that I harvested from my real-life projects and workshops.

 

What’s to Like

There is plenty to like in Azure Synapse which is the evaluation of Azure SQL DW. If you’re tasked to implement a cloud-based data warehouse, you might be evaluating three Azure SQL Server-based PaaS offerings: Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse. However, Azure SQL Database and Azure SQL MI are optimized for OLTP workloads. For example, they have full logging enabled and replicate each transaction across replicas. Full logging is usually a no-no for decent size DW workloads because of the massive ETL changes involved.

In addition, to achieve good performance, you’ll find yourself moving up the performance tiers and toward the price point of the lower Azure Synapse SKUs. Further, unlike Azure SQL Database, Azure Synapse can be paused, such as when reports hit a semantic layer instead of DW, and this may offer additional cost cutting options.

To get started with Synapse, you need to create a workspace. A workspace can host two analytics runtimes, SQL and Spark. These runtimes are hosted in pools. For example, you might have one pool Pool for your enterprise data warehouse, and other SQL pool for a departmental data mart, all provisioned within the same workspace.

A SQL Pool (previously known as Azure SQL DW) hosts your SQL relational database. As an MPP system, it can scale to petabytes of data with proper sizing and good design. The obvious benefit is that for the most part (see the Ugly section discussing exclusions) you can carry your SQL Server skills to Azure Synapse.

As a general rule of thumb, Azure Synapse should be at the forefront of your modern cloud DW with data loads starting with 500 GB.

I like the integration with Azure Data Lake Storage (ADLS). Previously, you had to set up external tables to access files in ADLS via PolyBase. This is might redundant now thanks to the serverless (on-demand) endpoint which also replaces the Gen1 U-SQL with T-SQL. All you have to do is link your ADLS storage accounts and use the T-SQL OPENROWSET construct to query your ADLS files and exposed them as SQL datasets. Not only does this opens new opportunities for ad hoc reporting directly on top of the data lake, but it also can help you save tons of ETL effort to avoid staging the ADLS data while loading your DW.

As the diagram shows, you can also provision an Azure Spark runtime under a Synapse workspace. This opens the possibility for using Azure Spark/Databricks for data transformation and data science. For example, a data scientist can create notebooks in Synapse Studio that use a programming language of their choice (SQL, Python, .NET, Java, Scala, and R are currently supported).

What Needs Improvement

You have two coding options to implement artifacts for Azure Synapse, such as ETL and reports:

  1. You can use the respective development environment, such as Azure Data Factory Studio for ADF development and Power BI Desktop for Power BI report development.
  2. You can use Synapse Studio which attempts to centralize coding in one place.

Besides the ADLS integration and notebooks, I find no advantages in the Synapse Studio “unified experience”, which is currently in preview. In fact, I found it confusing and limiting. The idea here is to offer an online development environment for coding and accessing in one place all artifacts surrounding your DW project. Currently, these are limited to T-SQL scripts, notebooks, Azure Data Factory pipelines, and Power BI datasets and reports. However, I see no good reason to abandon the ADF Studio and do ETL coding in Synapse Studio. In fact, I can only see reasons against doing so. For example, Synapse Studio lags in ADF features, such as source code integration and wrangling workflows. For some obscure reason, Synapse Studio separates data flows from ADF pipelines in its own Data Flows section in the Development Hub. This is like putting the SSIS data flow outside the package. Not to mention that inline data flows should be deemphasized since ELT (not ETL) is the preferred pattern for loading Azure Synapse. To make things worse, once you decide to embrace Synapse Studio for developing ADF artifacts, you can use only Synapse Studio because the ADF pipelines are not available outside Synapse.

On the Power BI side of things, you can create a Power BI dataset in Synapse Studio. All this saves is the effort to go through two Get Data steps in Power BI Desktop to connect to your Azure Synapse endpoint. You can also create Power BI reports using Power BI Embedded in Azure Synapse. Why would I do this outside Power BI Desktop is beyond me. Perhaps, I’m missing the big picture and more features might be coming as Synapse evolves. Speaking of evolution, as of this time, only SQL pools are generally available. Everything else is in preview.

What’s Not to Like

Developers moving from on-prem BI implementations to Azure Synapse, would naturally expect all SQL Server features to be supported, just like moving from on-prem SSAS to Azure AS. However, they will quickly find that Azure Synapse is “one of a kind” SQL Server. The nice user interface in SSMS for carrying out admin tasks, such as creating logins and assigning logins to roles, doesn’t work with Azure Synapse (all changes require T-SQL). SQL Profiler doesn’t work (tip: use the Profiler extension in Azure Data Studio for tracing queries in real time).

Because Azure Synapse uses Azure SQL Database, it’s limited to one physical database, so you must rely on using schemas to separate your staging and DW tables. There are also T-SQL limitations, such as my favorite MERGE for ELT-based UPSERTs is not yet supported (currently in preview for hash and round-robin distributed tables but not for replicated tables, such as dimension tables). So, don’t assume that you can seamlessly migrate your on-prem DW databases to Synapse. Also, don’t assume that your SQL Server knowledge will suffice. Prepare to seriously study Azure Synapse!


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Prologika Newsletter Summer 2020

More and more organizations are switching to Microsoft Teams mostly for online meetings. However, Microsoft Teams can deliver much more than that and it will be unjust to compare it with other popular meetings apps. In this newsletter you’ll learn how Prologika helped a public school district gain important insights from Microsoft Teams. Learn more and try an interactive Power BI report in the related case study.

Teams for Education

What Microsoft has done with Teams is amazing! Now that I got to study and program Microsoft Teams, I have found it a powerful and extensible platform. For example, Microsoft has provided add-ons for common verticals, such as Teams for Education. Microsoft Teams can be further enriched with apps and can be used to centralize all these Excel spreadsheets that are floating around. And the best of it: most of the Microsoft Teams data is exposed via a single data source: Microsoft Graph API, with the caveat that it is not that easy to get it out.

During the COVID-19 crisis, Fulton County Schools (FCS) transitioned from traditional classroom content delivery to online digital teaching and learning. FCS adopted Microsoft Teams for Education as an online platform to foster collaboration between teachers and students. However, FCS struggled for a long time trying to obtain the necessary data from Microsoft Teams to analyze how effectively teachers and students utilize Microsoft Teams in these challenging times. The Microsoft-provided Office 365 Usage Analytics for Power BI didn’t provide the depth of information FCS needed, such as to analyze counts of assignments that teachers give to students and submissions that students prepare for an assignment. FCS also wanted to understand about how often teachers and students meet and for how long.

Solution

Prologika developed a custom app to extract the required data by calling the Microsoft Graph APIs. The app supported a configurable degree of parallelism to process data in parallel. This reduced the duration for the full load to less than 7 hours. In addition, incremental daily ETL decreased the extraction time even further. The data was saved in a relational database consisting of several entities, such as Schools, Classes, Assignments, Submissions, and Events. Data availability led to data analytics. Power BI reports were implemented to deliver important insights, such as the report below.

Our BI solutions help clients implement a digital feedback loop and act upon the data. The process starts with acquiring the necessary data and designing a comprehensive data model that can answer important business questions. Then, the client applies the insights to improve their processes. This leads to a powerful feedback mechanism where outputs are routed back as inputs to business.

Prologika Newsletter Spring 2020

Do you know that according to Gartner, at least five of the top 10 technology trends for 2020 will involve predictive analytics? And the third on the list is “democratization” to deliver it to non-specialists. With the growing demand for predictive analytics, Automated Machine Learning (AutoML) aims to simplify and democratize predictive analytics so business users can create their own predictive models. The promise of AutoML is to bring predictive analytics to business users, just like Power BI democratizes data analytics, Power Apps democratizes app dev, and Power Query democratizes data shaping and transformation.

 

Comparing Options

As a business user, the two most popular options for applying Automated Machine Learning for predictive analytics are Power BI and AzureML. Behind the scenes, Power BI AutoML uses the automated machine learning feature of AzureML but there are differences and I summarize below the most important ones.

  Power BI AutoML Azure AutoML
Licensing Power BI Premium Azure ML (Enterprise Edition recommended)
Container Dataflow Experiment
Power Query Available Not available
Supported data sources Many A few (local files, Azure SQL DB, ADLS, and a few more)
Model Not Accessible (Power BI handles everything) Accessible
Web service endpoint Not available outside Power BI Available for app integration
Scoring Apply the model to entity Various options (Notebooks, SDK, custom integration)

To me, the best solution would have been the combination of both technologies. I like Power Query for sourcing, shaping and transforming the data, but I also like the flexibility that AzureML brings. Unfortunately, you can’t mix and match. It appears that AzureML has decided to roll out their own data connectivity mechanism and as a result, it supports a limited number of data sources (for example, on-prem data sources are not accessible). I expect this to change as the product evolves.

Azure ML Studio

I’ve done recently some work with the new version of Azure ML Studio (https://ml.azure.com/), and I’m impressed. Microsoft has learned important lessons from the previous AzureML (now called “classic”) and greatly enhanced the product. If you’re looking for a SaaS ML toolset that targets both business users and data scientists, AzureML should be on the top of your list. Speaking of its AutoML feature, the main advantages that it brings for predictive analytics are:

  • Determining the model type – classification, regression, and time series forecasting (the last one is not available yet in Power BI)
  • Automatic featurization
  • Selecting the best algorithm – For example, the screenshot below shows how AzureML has tested various algorithms and determined that VotingEnsemble performs the best.

Even if you’re a data scientist, the best algorithm selection feature alone justifies giving AutoML a try – if not for anything else but to select the best algorithm so that you don’t have to spend enormous time testing different algorithms.

Prologika Newsletter Winter 2019

Happy Holidays! I hope you’re enjoying this special time of the year. A few months ago, I did an assessment for a large company that was advised by an undisclosed source that they should use their Dynamics Financials and Operations (F&O) system as a data warehouse. Recently, I came across a similar wish but this time to use SAP as EDW. Can we do this? But before I give you my opinion, I’m excited to announce the availability of the fifth edition of my “Applied Microsoft Power BI” book – the only Power BI book that it’s updated every year to keep it up with the ever-changing world of Power BI and the Microsoft Data Platform! The book is making slowly its way to the retailers and it should be available on Amazon in the first days of 2020.

Operational Reporting

I understand that everyone wants to do more with less and shortcuts are tempting. But ERP systems are systems of record, just like any other data source. True, they could own most of the core data that you need for analytics. But that data is normalized and stored in a format that’s not conducive for analytics. To make things worse, Dynamics doesn’t even give you direct access to its SQL Server database on your production instance. You must go through REST APIs or export data to gain access to it. And to add new tables, you must create entities in Visual Studio! Still want to build a data warehouse in Dynamics?

ERP systems typically have some reporting features, but these features typically deliver only operational reporting. Operational reporting has a narrow view concerned with “now”, such as a report that shows customers with outstanding balances as of today. For example, Dynamics comes with standard SSRS reports. You could also enable analytical workspaces that deliver reports via Power BI Embedded. These reports, however, are operational reports. By contrast, BI is mostly concerned with historical and trend analysis.

BI Axioms

In math, axioms are statements that are assumed to be correct without a proof. We need BI axioms and the list can start like this:

  • Every mid to large company shall have a centralized data repository for consolidating trusted data that is optimized for reporting. The necessity for such a repository is in a direct correlation with the number of the data sources that must be integrated (that number will increase over time) and the complexity of the data transformation. The centralized data repository is commonly referred to as a data warehouse.
  • Dimensional modeling shall be the methodology to design the data warehouse schema. I know it’s tempting to declare your ODS as a data warehouse, but highly normalized schemas are not suitable for reporting.
  • If you’re after a single version of the truth, you shall have an organizational semantic layer. Find why you need it in my “WHY SEMANTIC LAYER?” newsletter.
  • ERP systems are not a replacement for a data warehouse. Neither are data lakes and Big Data.
  • You shall have both organizational and self-service BI, and they should complement each other. If you lean too much toward organization BI, you’ll get a backlog of requirements. If you lean too much toward self-service BI, you’ll end up with fragmented “spreadmarts”, which is where you probably started.
  • Most of the BI effort shall go toward organizational BI to integrate data, improve data quality, and centralize business calculations. Tools come and go but this effort shall endure.
  • Agile and managed self-service BI shall fill in the gaps. It should provide a feedback loop to extend organizational BI with data that the entire organization can benefit from.

 


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Gold Partner | Data Analytics

logo

Prologika Newsletter Fall 2019

090918_1951_DAXVariable1.pngWe all need to share. But until a couple of months ago, a Power BI training or assessment wouldn’t be complete unless I got hammered on the Power BI sharing limitations. Fortunately, Microsoft has addressed most of these and I have now a much better story to tell. And this is the subject of this newsletter.

Quo vadis, Power BI Sharing?

Power BI sharing road has been a long and winding one. I covered the gist in my blogs “Power BI Sharing is Getting Better“, “Power BI Sharing is Getting Better 2“, and “The Power BI Viewer Role“. The following table summarizes prior and standing sharing limitations. The Solution column lists the “fix” by Microsoft.

LimitationSolution
Workspace dependency on Office 365 groupsV2 workspaces don’t depend on O365 groups
No IT oversight on creating workspacesYou can now control who can create workspaces in the Power BI Admin center
Group membership limitationsV2 workspaces support all O365 group types
Coarse content access levelContributor and Viewer roles
No cross workspace sharingA dataset can be shared across workspaces
No data governanceA dataset can be promoted and certified
Power BI Premium sharing with viewers require report sharing or appsThe Viewer role supports sharing with viewers
Only one app supported per workspaceThe Viewer role deemphasizes apps
No nesting support (subfolders)

Sharing Best Practices

Given the current state of Power BI, I’d like to share some best practices for organizing and sharing content:

  1. Don’t use report and dashboard sharing as it can quickly turn into a maintenance nightmare.
  2. Create workspaces to reflect your organizational functional areas, e.g. Sales, HR, Finance. Unfortunately, workspaces still don’t support subfolders (Microsoft hinted that they are working on such a feature), so for now you must resort to a flattened list.
  3. Instead of individual user assignments, add users to security groups and then add these groups as members to the workspace. This way, when the user leaves and company or moves to another department, you only need to change the user’s group membership without making changes to the workspace security policy.
  4. Come up with data governance policy. The Power BI data governance story is work in progress, but you can set up some ground rules. For example, once a contributor believes that a dataset is ready for a broader consumption, he can promote the dataset. Then, IT can verify the dataset and certify it. Currently, Power BI supports certifying only datasets (reports and dashboards aren’t certifiable yet).
  5. This one is hard. Teach data analysts best data modeling practices. Instead of creating a dataset per report, they should create a data model that correctly represents their subject area. A data analyst should create the model once so it can support multiple reports. Once the dataset is published and certified, users can create their own reports.
  6. Although Microsoft is pushing apps very hard and adding some nice features, such as navigation, a standing limitation is that you can create only one app per workspace. So, you can’t publish multiple apps from a workspace, such as to share some reports with one group of users and another set with a different group. Instead, grant access directly to workspaces by using the Contributor and Viewer roles. The Viewer role lifts a significant limitation that forced you to use report/dashboard sharing or apps to share content out of a premium workspace with viewers. If you want to share the entire content of the workspace, you don’t have to use apps. Instead, you can simply add users or groups as viewers to the workspace.

Despite some long standing limitations, Power BI sharing is coming out of age. Follow the above practices and you’ll have now a much better way to organize and share content.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Prologika Newsletter Summer 2019

090918_1951_DAXVariable1.pngAfter learning how to model the data properly (the most important skill), DAX would be the next hurdle in your self-service or organizational BI journey. You won’t get far in Microsoft BI without DAX. This letter shares a tip on how to identify DAX performance bottlenecks by using a recent feature in Power BI called Performance Analyzer.

Learning DAX

DAXBefore I get to Performance Analyzer, I’m excited to announce my latest book: Applied DAX with Power BI: From zero to hero with 15-minute lessons. This book was born out of necessity. In my consulting practice, I had been teaching and implementing Power BI and Analysis Services Tabular, and people were constantly asking for DAX book recommendations. Indeed, DAX is not an easy topic and has its ways to humble even experienced practitioners. There are a few good reference books out there, but they could be somewhat overwhelming for novice users. So, I turned my classroom and consulting experience into this book and designed it as a self-paced guide to help you learn DAX one lesson at a time.

This one was a different book to write. Unlike my other books, which tend to be heavy, this book is smaller (220 pages). Regardless, it took me a while to write it. The book also adopts a lesson-based approach, so the reader can learn and practice a specific DAX concept quickly. There are usually three sections in each lesson: Practice (with step-by-step guidance), Outcome (a visual that shows the result), and Analysis (explaining why DAX works this way). Most lessons are five to six pages long, and it should take no more than 15 minutes to complete the lesson’s exercises (you can download and view a sample lesson from the book page).

And if you do one lesson per day, you’ll be a DAX expert in a month!

Analyzing DAX Queries

“This report is slow!” I hope you never hear this, but the chances are that you will sooner or later. No one likes watching a spinning progress indicator and waiting for the report to show up. Tucked in the View ribbon of the Power BI Desktop recent builds is a handy tool called Performance Analyzer that can help you identify slow queries. It’s easy to use:

  1. In Power BI Desktop, click the View ribbon and check the Performance Analyzer setting. This will open the Performance Analyzer pane.
  2. Click Start Recording in the Performance Analyzer pane. Once you start recording, any action that requires refreshing a visual, such as filtering or cross-highlighting, will populate the Performance Analyzer pane. You’ll see the statistics of each visual logged in the load order with its corresponding load duration.
  3. You can click the “Refresh visuals” link in Performance Analyzer to refresh all visuals on the page and capture all queries. However, once you are in a recording mode, every visual adds a new icon to help you refresh only that visual.

PerformanceAnalyzer

Performance Analyzer captures the following information:

  • DAX query – The length of time to execute the query.
  • Visual display – How long it took for the visual to render on the screen after the query is executed.
  • Other – This is the time that the visual spent in other tasks, such as preparing queries, waiting for other visuals to complete, or doing some other background processing.

Although not as comprehensive as DAX Studio, Performance Analyzer is a quick and easy way to get an overall idea of what impacts your report page performance. Performance Analyzer is currently only available in Power BI Desktop and it’s not available in Power BI Service.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo