Batch Renaming Columns in Power BI Tables

Scenario: You need to rename many columns in a Power BI table. Usually, I rename measure-related columns with “Base” suffix and hide them, and then I create explicit measures. Since renaming one column at the time is no fun, you’re looking for an automated solution. Having heard about the Tabular Editor scripting feature, you’re tempted to whip out a script like this:

You select all columns that must be renamed, run the script, and it appears to work (make sure to enable the Power BI experimental feature in the Tabular Editor Preferences before you run the script since otherwise nothing will happen)! You save the changes, the columns show up with the new names in Power BI Desktop, and you’re about to call it a successful day. But then when you refresh the table, you get greeted with an error that the column already exists and can’t be added, and the refresh operation fails.

Solution:  In Power BI Desktop, tables always have a Power Query behind them. The rename operation requires a step in that query for every column that’s renamed.

Therefore, instead of using a Tabular Editor script to rename table columns, add a Renamed Column step to the table query, such as the code below. Unlike columns, you can use a Tabular Editor script to batch-rename measures because they are not physical columns and don’t require a RenameColumns step in Power Query.

= Table.RenameColumns(dbo_vFactAssets,{ {"Acquisition_Price","Acquisition_Price_Base"}
,{"Actual_FCL_Legal_Spend_to_Date","Actual_FCL_Legal_Spend_to_Date_Base"}
,…)

Tip: You can use replace and vertical selection features in Microsoft Word to “automate” the code in that step.

Refreshing Power BI Datasets from SSIS

Scenario: You use SSIS to load data for on-prem BI solution. As a last step of the ETL pipeline, you want to refresh a Power BI dataset. There’s quite a bit of misinformation on the Internet about how to do this, hence this blog.

Solution: If the dataset is hosted in Power BI Pro workspace, the only way is to use the Power BI REST APIs and there are some good examples out there using PowerShell or Microsoft Automate flows. However, if the dataset is hosted in a Premium-per-user (PPU) or Premium workspace, you can use the SSIS built-in Analysis Services Processing Task. And, no, you don’t need third-party components.

An unattended process can authenticate against Power BI using either a Power BI-licensed account or service principal. I couldn’t get service principals to work with Power BI datasets. More than likely, this is because the service principal needs to be added as an Analysis Services administrator, but we can’t do with Power BI (we can with Azure Analysis Services though). And so, the only option is to use a Power BI regular account (email and password). However, your organization probably uses multi-factor authentication (MFA) to secure access to cloud services. Because the SSIS process will run unattended, no one will be on a lookout to plug in the authentication code. Therefore, you must harass your helpful system administrator to provide you with a user account that is not enabled for MFA.

And so, the steps are:

  1. Provision a dedicated account in Azure Active Directory. Ideally, the account should have a password that doesn’t expire.
  2. Assign the account (or the AAD Security group it belongs to as a best practice) the Contributor (or higher Power BI role) to the workspace where the dataset resides.
  3. Enable the Power BI XMLA feature as ReadWrite.
  4. On your dev machine, install both the 32-bit and 64-bit versions of the latest Analysis Services MSOLAP providers. You need the 32-bit provider to test in Visual Studio because VS refuses to go 64-bit. However, if you run the package under SQL Agent, you’d need the 64-bit provider.
  5. In Visual Studio, add a connector to the SSIS project that uses SSIS MSOLAP100 Analysis Services provider, which is just a wrapper on top of the SSAS native MSOLAP provider. Configure the connector to connect to the XMLA dataset endpoint (you can copy it from the dataset settings in Power BI Service).
  6. Configure the SSIS connector to use the “Use a specific user name and password” and plug in the credentials of the dedicated account you configured in step 1.

Gotchas: For some obscure reason, the “Test Connection” button might generate an error, but the processing task should work. Unlike what you might believe, the “Allow saving password” option doesn’t persist the password for security reasons. So, you either need to use an SSIS configuration or retype the password if you close and reopen Visual Studio. Once you deploy the SSIS project to the Integration Services catalog and schedule it with the SQL Agent, make sure to update the connector’s connection string to store the password inside the SQL Agent task.

Atlanta MS BI and Power BI Group Meeting on September 12th (Incremental Refresh and Hybrid Tables in Power BI)

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, September 12th, at 6:30 PM ET.  Shabnam Watson (Microsoft MVP) will discuss how to reduce the dataset refresh time with incremental refresh policies and how to implement tables with hybrid storage. For more details and sign up, visit our group page.

Presentation:Incremental Refresh and Hybrid Tables in Power BI
Date:September 12th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:Incremental Refresh makes it possible for Power BI to handle large datasets by partitioning the data into segments and making refreshes faster, more reliable, and less resource intensive.

 

Hybrid tables, a new addition, can be easily configured for a table with Incremental Refresh to enable different storage modes for its different partitions. This allows historical data to load into Power BI’s memory (import) for super-fast query performance and to leave the most recent data in the backend (DirectQuery) for near real time results.

Speaker:Shabnam Watson is a Microsoft Data Platform MVP and Business Intelligence consultant with 20 years of experience developing Data Warehouse and Business Intelligence solutions. She has worked across several industries including Supply Chain, Finance, Retail, Insurance, and Health Care. Her areas of interest include Power BI, Analysis Services, Performance Tuning, PowerShell, DevOps, Azure, Natural Language Processing, and AI. Her work focus within the Microsoft BI Stack has been on Analysis Services and Power BI. She is a regular speaker and volunteer at national and local user groups and conferences. She holds a bachelor’s degree in Computer Engineering, a master’s degree in Computer Science, and a Certified Business Intelligence Professional (CBIP) certification by The Data Warehouse Institute (TDWI).
Prototypes without PizzaPower BI Latest

PowerBILogo

Not Your Kimball’s DW

BI practitioners, myself included, have been following the Kimbal tenets for dimensional schema design for years. In fact, it’s a second nature to dimensionalize every BI project the “proper” way. You identify fact tables, grain, dimensions, etc. Dimensions of course would have surrogate keys – your insurance against weird things happening down the road, such as tracking Type 2 changes, unreliable business keys, acquisitions, and what not. Of course, there are downsides. A lot of time is spent on the dimension design – regular dimensions, junk dimensions, mini dimensions… Your ETL process must load these dimensions. And there are dependencies. You can’t just truncate and reload a dimension because the surrogate keys in the fact tables will be invalidated.

But here comes a project that throws everything off. Don’t you love projects like these? I’m designing a data warehouse solution for small financial company, where I model financial loans. The wrinkle is that each loan has hundreds of attributes (the original source table actually has 4,500 fields!), including many dimension-like two-pair attributes, such as Loan Type and Loan Type Description. The number of loans is not big and can fit into an Excel spreadsheet. Where do you draw the line for dimensions here? You could identify “important” dimensions, such as these that require historical change tracking, such as Loan Status, Loan Type, etc. and leave the rest of the descriptors in the fact table. Or you could take the approach that I’m considering of leaving all attributes in the fact table, so that the Loan subject area consists of a LoanSnapshot accumulating snapshot table and Date dimension. Ta da – no dimensions besides Date!

What about conformed dimensions should other fact tables, such as LoanBudget, pop up later, like mushrooms on a rainy day? Well, then we can decouple the affected attributes and manufacture a dimension on the fly by simply adding a SQL view with SQL DISTINCT. And, yes, we will be relying on the business keys (no surrogate keys) for the joins. Unless there is a good reason to use surrogate keys, in which case we’d have to spin off more ETL.

Don’t get me wrong. Most projects will benefit from “properly” dimensionalizing the schema and often the dimension candidates, such as Product, Customer, Organization, Geography, are easy to spot. But sometimes, such as in the case of smaller and simpler businesses, this might be overkill and an ad-hoc and agile perspective might be preferable. It will save you tremendous effort and it’s not the end of the road should things get more complicated. Just don’t be a methodology stickler!

Atlanta MS BI and Power BI Group Meeting on August 1st (Power BI Automation)

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, August 1st, at 6:30 PM ET.  We’ll have two presentations by 3Cloud. For more details and sign up, visit our group page.

Presentation:1.   “Power BI Meets Programmability – TOM, XMLA, and C#” by Kristyna Hughes

2.  “Automation Using Tabular Editor Advanced Scripting” by Tim Keeler

Date:August 1st
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:Power BI Meets Programmability – TOM, XMLA, and C#

Tune in to learn how to programmatically add columns and measures to Power BI data models using TOM, XMLA, and C#! XMLA is a powerful tool available in the online Power BI service that allows report developers to connect to their data model and adjust a variety of entities outside the Power BI Desktop application. Combined with a .NET application, this can be a powerful tool in deploying changes to your Power BI data models programmatically.

Automation Using Tabular Editor Advanced Scripting

See examples of how advanced scripting in Tabular Editor can be used to automate the creation of DAX measures, calculation groups, and provide insights into your model while reducing development time and manual effort.

Speaker:Kristyna Hughes’s experience includes implementing and managing enterprise-level Power BI instance, training teams on reporting best practices, and building templates for scalable analytics. Currently, Kristina is a data & analytics consultant at 3Cloud and enjoy answering qualitative questions with quantitative answers. Check out my blog at https://dataonwheels.wordpress.com/ and connect on LinkedIn https://www.linkedin.com/in/kristyna-hughes-dataonwheels/

Tim Keeler is a data analytics professional with over 17 years of experience developing cost-to-serve models, business intelligence solutions, and managing teams of other data professionals to help organizations achieve their strategic objectives. Check my blog at https://www.linkedin.com/in/tim-keeler-32631912/

Prototypes without PizzaPower BI Latest

PowerBILogo

Datasets vs Datamarts

I sent a proposal for implementing a classic BI solution: Azure SQL-based datamart (not Power BI datamart please), ETL, semantic model, and reports. The client had a sticker shock. Return to sender … as other BI companies that quoted can do it for half! Upon digging, it turned out the other companies would build the semantic model (aka Power BI dataset) directly on top of the data source. On a Time&Materials (T&M) basis and charge by the hour, of course, what else? By contrast, I give fixed-price milestone-driven proposals and I don’t get paid unless I deliver and meet written and agreed upon success criteria, but that’s a different story.

So, let me count the ways as the poet would say. It’s certainly technically possible to slap a dataset on top of the data source(s). That’s what self-service BI is all about right … until it doesn’t serve anymore. Check the Microsoft’s “discipline at the core” story about that journey ended. But BI pros can do it better and more efficiently and still bypass building the datamart, right? Here is why the “shortcut” will probably not work so well:

  1. Architecture – If you don’t have a datamart, you’re betting it all on Power BI. But tools come and go and by no means I’d put all my eggs in a single basket regardless the respect I have to Power BI (far from ideal of course). By contrast, if one day you decide to switch to another tool and you have invested in a datamart, you have to replace the semantic model and reports only. Data staging, transformations, and improved data quality will stay on.
  2. ETL – We all agree by now that the star schema is our best friend. It’s certainly possible to use Power Query to shape the data anyway you want it. But Power Query can be notoriously slow and difficult to troubleshoot. I’ve seen companies getting in a lot of trouble when tilting too much toward Power Query. Also, what ETL assumptions are you making and what limitations betting against when using Power Query? No advanced transforms and no advanced requirements, such as Type 2 changes? No incremental data loads? No restartability? No decent monitoring and troubleshooting?
  3. Data integration – Important data should be consolidated and centralized into a repository. And that repository should be a relational database. Also, what about making the data available to other tools? Should we lock them to using the Power BI XMLA endpoint?
  4. Semantic model – What if you have to support data refreshes at different granularity, such as hourly vs daily, or import vs direct query? What if one Power Query fails to refresh? Should we fail the entire refresh?
  5. Data volumes – Although your initial dataset might be less than a million rows, what if that changes or customers decides to use larger external data? Can Power Query handle this?

Shortcuts are tempting and disguise themselves as cost-effective. If you’re a data analyst that doesn’t know or can’t afford any better, surely take the data source->dataset approach. At least, if you’re sourcing data from a relational database, insist on SQL views so you can offload some transformations upstream. If you’re a BI pro and you’re building a pilot, go ahead. But if you’re to build an organizational BI solution that must adapt, evolve, and endure, at least let your sponsor know about what assumptions and tradeoffs you’re making along the way. Let’s be honest.

Atlanta MS BI and Power BI Group Meeting on July 11th (Pushing the Query Folding limits with Power Query)

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, July 11th, at 6:30 PM ET.  For more details and sign up, visit our group page.

Presentation:Pushing the Query Folding limits with Power Query
Date:July 11th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:One of Power Query’s most powerful features is its ability to translate the Power Query formula language (M) back to a source systems native language and in this session, we’ll push the limits and possibilities to avoid “breaking the fold” and explore some potential dark magic with List functions. A base understanding of T-SQL is helpful though not required for this session.
Speaker:From financial services to felines, the World Wide Web to professional wrestling – Alex Powers has an affinity for the conventional and unconventional when it comes to information. A self-proclaimed Excel and Power BI Enthusiast Alex Powers enjoys contributing to online forums and sharing his passion for empowering others using Microsoft technologies.
Prototypes without PizzaPower BI Latest

PowerBILogo

Power BI Field Parameters

Coming back from a long vacation and I almost missed this new Power BI killer feature: Field Parameters! Not to be confused with Dynamic M Query Parameters that I ranted about here, field parameters solve a long-standing limitation of Power BI that prevents binding dynamically fields to a visual. Dynamic binding isn’t an issue with measures because they are dynamic and can evaluate runtime conditions, such as slicer selection, but dimensions are a different story. Once they are bound to a category bucket in a visual, you couldn’t change them on the fly.

Yet, one common scenario was to let the user control which fields will be used for slicing the measure(s) in a visual. I’ve seen rather convoluted implementations to get around this limitation. Field parameters to the rescue. Now once you create a field parameter and bind it to the visual, the user can simply select which field will be used for slicing.

Field parameters open the opportunity for packing more visuals on a single page and letting the user specify what they want to see in these visuals! Moreover, the fields can come from different tables. On the downside, one significant limitation not mentioned in the documentation, is that currently visuals can’t sort on the field parameter and no workaround exists (see this GitHub issue for details).

BTW, you can use this DAX measure to get the user-friendly selected value assuming you accepted the default name for the field parameter: ParameterTitle = MAX(Parameter[Parameter]). Or, because the parameter uses a groupby set, you can use this expression (thanks Alberto Ferrari):

ParameterTitle =
VAR _a = SUMMARIZE ( Parameter, Parameter[Parameter], Parameter[Parameter Fields] )
VAR _b = SELECTCOLUMNS ( _a, "Parameter", Parameter[Parameter] )
VAR _result = IF ( COUNTROWS ( _b ) = 1, _b )
RETURN
    _result

Power BI Datamarts: the Good, the Bad, and the Ugly

As Microsoft announced here, Power BI datamarts are upon us. I can almost see an important enterprise client demanding “self-service datamarts me now or else… “, thus inspiring an opportunity for another premium feature, spearheaded with great vision and effort, but questionable practical value. In a nutshell, a Power BI datamart is a combo of Power BI Premium and a Microsoft-hosted Azure SQL Database aiming to simplify the implementation of a departmental datamart.

The Good

Unlike other vendors, such as Domo and their proprietary and overly expensive stack, Microsoft has decided to go with somewhat open solution consisting of tools that Power BI users already know: Power Query, Power BI Desktop (for the first time some of its modeling features, such as relationships and DAX measures, made it to the cloud), and SQL Server. Microsoft provisions the database for you although surrounds it with some red tape (more on this in a moment). Thus, a business users aiming for “no code, low code” experience will whip out some dataflows that populate the database and then build a model (dataset) directly in Power BI Service. Obviously, the main goal is to simplify the experience as much as possible where all the action happens online.

It’s nice that Microsoft chose hosting the data in a SQL database instead of a “lakehouse”. Apparently, they learned some painful lessons from Power Query CDM folders. The database size is up to 100 GB which is not bad at all.

The Bad

From the announcement, “Best of all, IT doesn’t have to worry about getting all data into centrally governed data sources, thus providing discipline at the core and flexibility at the edge.” I failed to see how this will provide “discipline at the core” – a tenant that Microsoft learned from their own pain points after tilting too much toward self-service BI. I’ve seen also statements online that business users don’t have to “consult with IT anymore” when implementing datamarts. Really? What happened to managed self-service BI? I’m sure IT will be thrilled having corporate data in Microsoft-owned databases that they can’t manage and queries running amuck and consuming precious premium resources. Luckily, the admin portal has a switch to control who can create these datamarts. I hope at least we have a BYO database feature at some point.

The elastic Azure SQL database that Microsoft provisions is read-only, meaning that you can’t create objects. I’m a big fan of pushing calculations as much as possible to SQL Server, such as by implementing SQL views, but we can’t do that. Instead, we would use Power Query (what else of course) for all the transforms. But I have serious reservations against Power Query – a tool that is known to cause performance issues without providing any troubleshooting and maintenance insights.

The Ugly

Do we really need this feature? I would argue that what was really needed was extending Power Query with “destinations” where the user can specify where the data would land. If that was implemented, IT could selectively let business users augment the infrastructure set up by IT with self-service ETL (more than likely temporary) that sinks the data into an IT-sanctioned database. Further, it would have gotten us out of another proprietary mess that forces dataflows to save their output into CDM folders that make sense only to Microsoft (see my “Power BI Dataflows vs ADF Mapping Data Flows” blog for the gory details). Want to save dataflow data somewhere else? You got to use Power BI datamarts because this is the only way you can have your data in a (Microsoft) relational database and nowhere else.

Recently, an enterprise client has decided to migrate all self-service Alteryx flows to IT-governed ADF pipelines. More than likely, Power BI datamarts are heading in that direction. Be very careful about any pure self-service features, as you might find yourself in a bigger mess that you tried to solve.

A Case for Azure Analysis Services

Microsoft BI practitioners have three options for hosting semantic models: SSAS (on prem), Azure Analysis Services (cloud), and Power BI (cloud). AAS is somewhat caught between a rock and a hard place. Given that Power BI gets the most attention for cloud deployment, why would you consider AAS at all? There are two main reasons:

  1. Cost – Organizational semantic models might require a lot of memory and crunching power. Hosting them on AAS might be more cost effective. For example, AAS S4 runs at around $5,000 which at the same price point as Power BI Premium P1. However, it gives you 100 GB of RAM and 20 cores, whereas P1 has only 25 GB and 8 cores.
  2. Scaling out – A feature unique to AAS is ability to scale out to multiple query replicas. This is not an option with Power BI Premium, and it requires quite a bit of setup with SSAS. However, AAS makes scaling out easy by just changing a slider. And once you’re done, you can pause the instance, so it doesn’t incur cost!

Scaling out proved to be a useful feature lately when a client wanted to process massive queries in parallel. We cloned the model to AAS and wrote an ETL job to parallelize the query execution.

Note that the number of replicas depends on the data region and pricing level. For example, only East US 2 and West US support up to 7 query replicas up to S4. Another thing to watch for is that it’s not enough to just process the model on a scale-out farm. You’d need also to synchronize it across the query replicas. This could be done manually in the Azure Portal or automated, such by using the PowerShell script below that you can plug in a SQL Agent job. The script uses a regular AAD account which has admin rights to the server. You can also use a service principal, but I opted for a regular account because Microsoft removed the option for no expiration date for the client secret (the maximum lifetime of a client secret now is two years).

Import-Module Az.AnalysisServices
$password = "<account password>" | ConvertTo-SecureString -asPlainText -Force
$username = "craas@<domain>.com"
$aasendpoint = "asazure://aspaaseastus2.asazure.windows.net/crliveaas1"
$aasendpointmgmt = "asazure://aspaaseastus2.asazure.windows.net/crliveaas1:rw"
$TenantId = "<tenant id>"
$credential = New-Object System.Management.Automation.PSCredential($username,$password)
$defaultProfile = Connect-AzAccount -Credential $credential -Tenant $TenantId

Set-AzContext -Tenant $TenantId -DefaultProfile $defaultProfile
$server = Get-AzAnalysisServicesServer -ResourceGroupName "crliveaas_rg" -Name "crliveaas1" -DefaultProfile $defaultProfile
if ($server.State -eq "Paused")
{
    Resume-AzAnalysisServicesServer -Name "crliveaas1" -ResourceGroupName "crliveaas_rg"  
    #process database, first clear the data so processing doesn't go over memory limit
    Invoke-ProcessASDatabase -Server $aasendpointmgmt -DatabaseName "<databasename>" -RefreshType "ClearValues" -Credential $credential
    Invoke-ProcessASDatabase -Server $aasendpointmgmt -DatabaseName "<databasename>" -RefreshType "Full" -Credential $credential

    # sync database
   Add-AzAnalysisServicesAccount  -Credential:$credential -RolloutEnvironment:"aspaaseastus2.asazure.windows.net"
   Sync-AzAnalysisServicesInstance -Instance $aasendpointmgmt -Database "<databasename>" -PassThru
}