Interactive Excel Pivots in Power BI

A long-standing limitation of Power BI has been that Excel pivot reports connected to external Analysis Services models can’t be interacted with when uploaded to Power BI Service. You get the cached pivot report but any attempt to interact with the report online (that is in Excel Online), such as to change a slicer or filter, would result in an error complaining that the connection can’t be refreshed and there hasn’t been a workaround. Today, Microsoft partially lifted this limitation by supporting interactive pivots connected to Power BI datasets. Unfortunately, pivots connected to external Analysis Services models (both on-prem and cloud) still don’t support interactivity.

For some reason, this feature is called “connected PivotTable refresh” although a better name would be “interactive Excel pivots finally”. It works in Power BI Pro and Premium. Unlike the initial announcement that speculated that this feature would be available by simply uploading your Excel file to Power BI Service, it appears that it requires the Excel file to be uploaded to OneDrive or SharePoint Online. Here are the high-level steps that worked for me:

  1. Create your Excel pivot using the Analyze in Excel feature or directly from Excel. Assuming you have a recent Office 365 build, the latter option allows you to open Excel on the desktop and click Insert->Pivot Table->From Power BI to connect to a Power BI dataset from within Excel Desktop.
  2. Click Save As and save your workbook to OneDrive or SharePoint Online. Recall that you can specify a SharePoint site for storing workspace files in the Power BI workspace settings if you don’t want to upload the file to your personal folder.
  3. Use the Power BI Service “Get data” feature to upload (not import) the Excel workbook in a Power BI workspace. Again, the Excel workbook must be located either in OneDrive or SharePoint online.
  4. Interact with the report in Power BI Service, such by drilling down, drilling through, or even rebinding the report to different fields! As a bonus, when the underlying dataset refreshes, your pivot report will show the latest data as it connects live to the dataset.

The “connected PivotTable refresh” is a much-needed enhancement that brings the same interactive features to Excel pivot reports that users previously had in SharePoint Server. Now users who favor Excel pivots for interactive exploration can share their reports and the interactive features would be preserved when the report is rendered online. The feature requires storing the Excel workbook in OneDrive or SharePoint Online and using an Analyze in Excel connection. Unfortunately, it doesn’t work with Analysis Services models that are external to Power BI (not deployed to Power BI Service) which is another incentive to move your semantic models to Power BI as I’ve been advocating for some time.

Cloud Inhibitors

In a previous blog, I said I consider it a travesty when a vendor’s cloud offering prohibits direct access to the data in its native storage, which is typically a relational database, and therefore a perfect store for data analytics and integration, such as to run real-time analytics or export the data to load a data mart. There is nothing simpler than connecting to a relational database, which is designed to handle efficiently select and filter operations. Unfortunately, it’s becoming a norm where the move to the cloud requires going through all sorts of hoops to access the data. Some enlightened providers offer direct access to the native store by charging a premium fee. However, this is not usually an option and most vendors require you call their REST APIs.

Since, the move to the cloud shouldn’t make things more difficult, I call these vendors “cloud inhibitors”. Such vendors usually cite concerns, such as security and performance, as reasons to prevent access. But the irony is that such issues could be easier addressed by the vendor, such as by hosting the customer on a dedicated VM (think of Power BI Premium), instead of transferring the data integration burden to the customer.

The previous “Cloud Inhibitor” award was held by Microsoft Dynamics Online that has decided to prevent access to the Dynamics SQL database and recommends the customer to consume the staged data from Azure Data Lake CDM folders. One popular ERP vendor has gone even further when embracing AWS as their platform of choice. Not only did they disallow access to their SQL Server database, but they also disallow access to the S3 data lake where they stage the data. Instead, when moving their on-prem ERP solution to the cloud, a client has found that they have two choices for data integration: call the REST APIs or use a JDBC driver, with the latter option out of reach for Microsoft apps.

I’m still researching the best option to enable this client access their data with minimum integration effort although the recommended solution by the vendor was to write a client app to call REST APIs or use Java-based ETL. How ridiculous is this given that that client could simply connect to their ERP database in the past? So much about real-time BI and no-code data integration!

Synapse Serverless: The Good, The Bad, and The Ugly

When you a provision a Synapse workspace, you get a serverless endpoint for free (or almost free). This endpoint represents Synapse Serverless: a query service for ad-hoc exploration of data in CVS, Parquet, and JSON files stored in Azure Data Lake.

The Good

Being able to query files using SQL is great. Some of you might remember the U-SQL language that was introduced a few years ago alongside Azure Data Lake Storage (ADLS) Gen 1 which is now deprecated. It never caught up because it looked like SQL but it wasn’t (it was actually closer to C#). Now we’re talking about real SQL. To query files! This opens the possibility to implement a logical warehouse (the emphasis is on logical as everyone to my knowledge who tried to replace a data warehouse with a data lake has failed). Or, you can connect Power BI to the serverless endpoint and start querying all these files in DirectQuery mode. So, this enables real-time BI on top of file extracts.

Behind the scenes, Synapse Serverless uses a distributed query engine to conquer and process files in parallel. For more information, read the “POLARIS: The Distributed SQL Engine in Azure Synapse” whitepaper. I ran some limited performance testing and I’m impressed. I’m yet to test joins between large tables (I’d love to hear from someone on this subject).

Pricing is based on the data processed but it’s only $5 per TB. So, export your data in *.parquet files which are smaller and faster (TIP: you can use Azure Data Factory to output Parquet files). Furthermore, Serverless will automatically update statistics for Parquet files (CVS files will require you to update statistics manually) which together with proper partitioning could improve performance for joins and filters so that Serverless doesn’t have to read all the files.

The Serverless feature that I like most though is the flexible schema.

Consider the following query, which defines the expected schema using the WITH clause. Will the query break if a schema drift is detected, e.g. some of the files have additional columns or the column names don’t match? Nope. The query still executes successfully, and if not specified in the WITH clause, the schema drift columns will come back empty. This opens the opportunity to do analytics on top of different schemas which is something you can’t do with your trusted SQL Server!

SELECT

     COUNT([Month]), COUNT([Month1])

FROM

    OPENROWSET(

        BULK‘https://dvdls01.dfs.core.windows.net/dvdlsfs01/google/*.csv’,

        FORMAT = ‘CSV’,

        PARSER_VERSION=‘2.0’,

        HEADER_ROW = TRUE

    )

    WITH (

        [Year] int,

        [Month] NVARCHAR(200),

        [Month1] NVARCHAR(200),

        [Day] int,

        [Ecommerce Conversion Rate] NVARCHAR(50),

        [New Users] int

        )

        AS [result]

The Bad

Currently, Serverless is limited to accessing data stored in ADLS only. I wish Microsoft extends Serverless to other vendors, such as Amazon (S3 and Athena) or Google. As a PaaS offering, there is no option to plug in additional drivers. The same limitation applies to the Synapse SQL Pool by the way. Adding the ORC files to the list of supported file formats will be a good addition, as well as the AVRO format which is generated by the Event Hub Capture.

You can find more crowdsourced improvement ideas at the Synapse Forum.

The Ugly

Now that you have a logical data warehouse, a common requirement is to get that data and load it into a physical data warehouse hosted in the Synapse SQL Pool. Unfortunately, the current architecture keeps the two pools separate (think of them as two different SQL servers). Views saved in a Serverless database are not accessible by the SQL Pool. The SQL Pool doesn’t support OPENROWSET either, so the approach outlined here won’t work. You must use Azure Data Factory or another ETL tool to stage the data in the SQL Pool if you follow the ELT pattern. I’d love to see the SQL Pool extended to support OPENROWSET as Azure SQL Managed Instance does to avoid yet another data movement and data staging.

Atlanta MS BI and Power BI Group Meeting on July 5th

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, July 5th, at 6:30 PM.  Reid Havens (Consultant and MVP) will show us how to get the most out of Power BI bookmarks and buttons. And I’ll cover the Power BI latest enhancements. For more details and sign up, visit our group page.

Presentation:Bedazzling Your Bookmarks & Buttons
Date:July 5th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:Bookmarks and Interactive Buttons are two of the best ways to add web-like user experiences into Power BI reports. Bookmarks help create a more seamless filter and navigation experience. Buttons can be used with bookmarks, but can also be leveraged for a lot of other report functionality and navigation.
Speaker:Reid Havens is the founder of Havens Consulting Inc. and is a Microsoft Most Valued Professional (MVP). He has an extensive background in technology and organizational management, and has obtained a Master’s Degree in Organizational Development and Business Analytics. He has experience as a consultant working with many Fortune 50, 100, and 500 companies. Additionally, he teaches Business Intelligence, reporting, and data visualization courses at the University of Washington and other universities.
Prototypes without pizza:Power BI Latest

PowerBILogo

Top 5 Reasons to Switch to Power BI Gen2

You should definitely switch your Power BI Premium capacities to Gen2 although you might wait until it goes GA because Gen2 is currently in preview if you’re risk-averse. I switched an enterprise client P2 node to Gen 2 a few months ago and here are the top benefits we observed:

    1. Lower CPU utilization
      The Gen1 P2 node was under CPU pressure so the client was considering upgrading to P3. This went away after switching to Gen2 as the graph below shows. That’s because each operation essentially has access to all the cores on the node (which is essentially a P3 node). This can give you a nice boost to performance as well. This doesn’t mean that you get more cores for free. If Power BI detects that the CPU used by the capacity (across all its datasets/dataflows/etc.) is exceeding the CPU that you have purchased, then subsequent operations would be throttled (delayed).
    2. More memory
      Imported models are memory-resident so memory is usually the most constraining factor. With Gen2, the capacity maximum memory applies to the resource itself and not collectively across all resources in the capacity. Let’s say you are on a P1 plan which has a maximum memory capacity of 25GB. With Gen 1, you won’t be able to have two datasets, let’s say 20GB and 10GB, loaded at the same time. However, Gen2 will apply the 25GB limit to each dataset. So, each resource (dataset, report, dataflow) will be boxed within 25 GB. This feat is possible because Gen2 uses a SaaS approach, which means datasets are scattered across multiple cluster nodes instead of being associated with a dedicated capacity. A potential downside, however, could be “noisy neighbor” because a P3 cluster node may co-host datasets from different customers.
    3. Less “out of memory” refreshes
      Related to item 2, dataset refreshes now have more room. Assuming P1, a 10GB dataset is likely to refresh successfully regardless of other datasets loaded in the same capacity. As a rule of thumb, a dataset will require at least twice the memory to fully refresh, so a full refresh of the 20GB dataset is likely to run out of memory. However, you should be able to utilize less memory if you process specific partitions in large tables or configure them for incremental refresh.
    4. Less management overhead
      All activities running on the cluster are metered so it’s easy to understand when the capacity is overutilized by using the Gen2 utilization app. This removes the need to monitor Gen1 capacities proactively for signs of overutilization. Further, capacity admins can subscribe for notifications.
    5. No additional cost
      The best for last. You get all the above without paying more!

What I like to see improved in future:

    1. Customized capacity limits, such as when more memory but less cores are needed.
    2. More granular auto-scale. The current auto-scaling mechanism keeps the “ad hoc” cores for 24 hours. Ideally, I’d like to see the same auto-scaling mechanism as Azure SQL Database Serverless where the system auto-scales within minutes and can hydrate if database is not in use. Of course, this should apply to also removing provisioned cores, such as a P1 plan downgrading to less than 8 cores.
    3. Better utilization monitoring app. The current app has left vast areas for improvement. For example, it doesn’t currently report the memory utilization at all. The app should report memory utilization per dataset and refresh so that you can answer the question “why my refresh ran out of memory”.

Power BI Aggregations: Limitations and Workarounds

Power BI aggregations are meant to speed up queries to large DirectQuery tables, as a DBA would create summarized tables to speed up queries to large tables. The most appealing aspect of telling Power BI about these aggregations is that Power BI will automatically redirect the query to the aggregation cache if it determines that its dimensionality matches the dimensionality of the aggregated table, as explained in the documentation. However, there are a couple of limitations worth emphasizing that will prevent this from happening:

  1. Power BI requires regular relationships with 1:M cardinality and uni-directional filter between the dimension table and aggregation table. Many-to-many cardinality (aka “limited” relationships in the documentation) won’t work. For example, you might have a Customer table related with 1:M to a CustomerFilter table. Queries involving the CustomerFilter table won’t hit the aggregation cache.
  2. Dynamic relationship won’t hit the cache either. For example, as a workaround for the first limitation, you might attempt creating measures, such as Measure1 = CALCULATE([SomeMeasure], TREATAS(VALUES(CustomerFilter[Selection], Customer[CustomerName])), but this won’t work either.

Aggregation hits require active relationships or mappings based on replicated dimension values.

These limitations were showstoppers for using Power BI aggregations in a recent project. Instead, we rolled up a custom aggregation approach along the following lines:

  1. We introduced an aggregation table just like with Power BI aggs and joined to the related dimensions.
  2. We created two sets of measures:
    1. On top of the aggregation table, e.g. Sales (Agg) = SUM(FactSalesAgg[SalesAmount])
    2. On top of the DQ table, e.g. Sales (DQ) = SUM(FactSales[SalesAmount])
  3. For reporting, when it was clear which report pages, such as dashboards, would hit only the aggregation table, we used the (Agg) measures. Detail pages, such as drill-through pages, use the DQ measures. To simplify the measure choice for end users interesting in their own reporting, you can introduce measure wrappers, such as:
    Sales = IF(ISCROSSFILTERED(FactSales[PrimaryKey or other low granularity column]), [Sales (DQ)], [Sales (Agg)])

Atlanta MS BI and Power BI Group Meeting on June 7th

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, June 7th, at 6:30 PM.  Stacey Jones (Principal Data Solutions Architect with Microsoft) will introduce us to Azure Purview. And I’ll cover the Power BI latest enhancements. For more details and sign up, visit our group page.

Presentation:Data Governance with Azure Purview
Date:June 7th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:Have auditors asked for a report detailing sensitive data in your Company? Are you having trouble finding out where all your data resides? Are Data Scientists and analysts wasting time finding the right data? What will be impacted if I change this column in a table? Where is personnel data stored? Are we in compliance with information security standards that apply to us? If you answered yes to any of these or related questions, you need an Enterprise Data Discovery and Cataloging tool. Come find out how Microsoft’s new Azure Purview tool can create this valuable perspective on your data, making it much more useful and valuable!
Speaker:Stacey Jones specializes in mentoring and guiding firms in their efforts to build a modern Data, AI & BI governance programs that empower their business with Self-Service BI and Data Science capabilities. He currently serves as the Principal Data Solutions Architect with Microsoft at the Atlanta Microsoft Technology Center (MTC).
Prototypes without pizza:Power BI Latest by Teo Lachev

PowerBILogo

Tracing Anonymous Logons

In a recent project, I had to execute an SSIS package deployed to the catalog from a stored procedure. The package uses Windows authentication to connect to the data sources. When executing the package from the catalog or from the stored procedure, the package immediately fails and the following two errors are logged:

DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. The AcquireConnection method call to the connection manager “Source” failed with error code 0xC0202009. There may be error messages posted before this with more information on why the AcquireConnection method call failed.

OLE DB error has occurred. Error code: 0x80040E4D.
An OLE DB record is available. Source: “Microsoft SQL Server Native Client 11.0” Hresult: 0x80040E4D Description: “Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’.”.

The culprit of course is that this scenario requires Kerberos delegation because of the two-hop authentication required:

  1. From SSMS on my development machine to the server where the SSIS package is executed
  2. From the SSIS server to the data source.

By default, the SSIS server doesn’t have the calling user’s password and the request to the data source goes under the anonymous logon. The easiest way to confirm this is to remote in to the SSIS server and execute the package there by right-clicking the package in the catalog and clicking Execute. If it works then configuring Kerberos delegation is in order. The process is simple but requires some assistance from your helpful system services department and specifically from your AD administrator because it requires a small change to the calling Windows account (yours and other accounts that will be used to run the stored procedure).

Other possible workarounds:

  • Change all data sources to use standard security (SQL login and password)
  • Executing the package under the SQL Agent where you can specify a Windows proxy account. This works because there is a single hop involved in this case.

A First Look at Power BI Goals

A vital BI practice for every organization, performance management ensures that important metrics, such as Key Performance Indicators (KPIs), meet established goals. The typical artifact to do so is implementing a scorecard: a report that compares the current state with the desired state of these metrics. You might have also heard the term “balanced scorecard” which is an organization-wide scorecard that tracks several subject areas, such as Finance, Customer, and Operations. In the past you have probably used different tools, such as the now deprecated PerformancePoint (included in SharePoint Server) to implement scorecards.

Realizing the importance of scorecards, Power BI introduced Goals that aim to simplify the process of implementing departmental and organizational scorecards. For more information on how Goals works, watch the “Goals in Power BI” presentation from the Microsoft Business Application Summit.

The Good

As with anything Power BI, Microsoft has democratized scorecards so business users with no reporting experience can quickly assemble them from existing reports. Think of a goal as a line (or KPIs) in the scorecard. Here is scorecard with two goals:

Currently, Power BI supports two goal types:

  • Static – The goal creator manually enters and track the goal properties, such as current value, target value, and status. This could be useful for quick and dirty KPIs that are not backed by a data source, such as launching a new promotion campaign. In the scorecard above, I created the Revenue goal by entering 5M as the current value and 5.5M as the goal.
  • Data-driven – The goal current value and/or target value can be data-driven and bound to metrics from existing report(s). Coming from Analysis Services, I was initially surprised that Power BI doesn’t require implementing KPI measures, but I get it: Microsoft decided to source the metrics from reports so business users can easily apply filters. If the goal owner chooses a metric from a visual that has a Date field, such as a time series chart, Power BI automatically shows a sparkline for the goal progress over time. An, of course, when the report dataset is refreshed, the goal values are updated.

So, no modeling or Power BI Desktop required assuming that someone else, such as a data analyst, has delivered functional and vetted reports with the metrics. Even better, the goal current and target values can come from different reports (even a report in a different workspace if you have permissions), e.g. a report with actuals and another report with targets. So, there is plenty of flexibility here. To mimic a balanced scorecard that spans multiple subject areas, the owner can create subgoals. For example, the main goal could be Finance with subgoals Revenue, Margin, etc.

Because like dashboards, goals are “pinned” from reports, the end user can navigate to the underlying report to examine the data in more detail. Users can also add notes to explain the goal behavior to the teammates.

A scorecard is a first-class Power BI citizen, and as such, it can be secured, endorsed, secured with sensitivity labels, annotated, and shared, such as sharing the scorecard to a Microsoft Teams channel. The scorecard data is saved in a Power BI dataset that users can connect to build custom reports. Moreover, Power BI automatically adds daily snapshots to the dataset allowing users to build up a history of the goals. For example, if the underlying report is refreshed daily, the updated goal values will be appended to the dataset. Developers can use the Power BI REST APIs to implement programmatic scorecard management solutions.

The Bad

Besides navigating to the underlying report, a goal is a one-liner in the scorecard. I can’t define a goal that shows me a metric sliced by dimension members, such as business unit. Further, subgoals are not currently aggregable, such as to sum or average values when rolling up to the main goal. Like limitations with dashboards, there is no way to apply a global filter to the scorecard, e.g. to filter all goals for the prior month.

Besides current and target values, no other goal properties can be data driven. For example, unlike Analysis Services KPIs, the goal status can’t be currently bound to a DAX measure. Changing the status requires proactive manual “check ins” although Microsoft mentions a forthcoming feature that will let users define rules to change the status, like how you can define rules for dashboard tile alerts. Speaking of data-driven properties, I don’t understand why you must use a date field to get the progress as opposed to any other field, such as Month, in your Date table.

The Ugly

Another premium teaser… If we really want to democratize features, shouldn’t we make them available in Pro?

Goals are a Power BI premium feature aimed at making it easier to create scorecards and monitoring metrics from existing reports. They promote a “bottom-up” culture, where business users can create departmental scorecards without reliance on IT. Microsoft plans more features by the end of the year to make Goals more appealing, such as integration with Power Automate to trigger actions, rolling up subgoals, changing the goal tracking cycle (DoD, MoM, YoY), custom goal formatting, Power BI Mobile experience optimized for phones, providing a scorecard visual, and cascaded goals (hierarchy of goals).

If you find Power BI Goals somewhat inflexible or you don’t have budget to upgrade to Premium, you don’t have to use the Goals feature to implement scorecards. You can define KPIs and create dashboard-looking reports where you have complete control over the dataset, filters, and scorecard presentation although this approach would require more advanced Power BI skills.

Atlanta MS BI and Power BI Group Meeting on May 3rd

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, May 3rd, at 6:30 PM.  James Serra will overview Azure Synapse for implementing a data lakehouse. And I’ll cover the Power BI latest enhancements. For more details and sign up, visit our group page.

Presentation:Azure Synapse Analytics Overview: A Data Lakehouse
Date:May 3rd
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. In this presentation, James will talk about the new products and features that make up Azure Synapse Analytics and how it fits in a modern data warehouse, as well as provide demonstrations.
Speaker:James Serra is a Data Platform Architecture Lead at EY.  He is a thought leader in the use and application of Big Data and advanced analytics. Previously, James was an independent consultant working as a Data Warehouse/Business Intelligence architect and developer. He is a prior SQL Server MVP with over 35 years of IT experience. James is a popular blogger (JamesSerra.com) and speaker. He is the author of the book “Reporting with Microsoft SQL Server 2012”.
Prototypes without pizza:Power BI Latest

PowerBILogo