Datasets vs Datamarts

I sent a proposal for implementing a classic BI solution: Azure SQL-based datamart (not Power BI datamart please), ETL, semantic model, and reports. The client had a sticker shock. Return to sender … as other BI companies that quoted can do it for half! Upon digging, it turned out the other companies would build the semantic model (aka Power BI dataset) directly on top of the data source. On a Time&Materials (T&M) basis and charge by the hour, of course, what else? By contrast, I give fixed-price milestone-driven proposals and I don’t get paid unless I deliver and meet written and agreed upon success criteria, but that’s a different story.

So, let me count the ways as the poet would say. It’s certainly technically possible to slap a dataset on top of the data source(s). That’s what self-service BI is all about right … until it doesn’t serve anymore. Check the Microsoft’s “discipline at the core” story about that journey ended. But BI pros can do it better and more efficiently and still bypass building the datamart, right? Here is why the “shortcut” will probably not work so well:

  1. Architecture – If you don’t have a datamart, you’re betting it all on Power BI. But tools come and go and by no means I’d put all my eggs in a single basket regardless the respect I have to Power BI (far from ideal of course). By contrast, if one day you decide to switch to another tool and you have invested in a datamart, you have to replace the semantic model and reports only. Data staging, transformations, and improved data quality will stay on.
  2. ETL – We all agree by now that the star schema is our best friend. It’s certainly possible to use Power Query to shape the data anyway you want it. But Power Query can be notoriously slow and difficult to troubleshoot. I’ve seen companies getting in a lot of trouble when tilting too much toward Power Query. Also, what ETL assumptions are you making and what limitations betting against when using Power Query? No advanced transforms and no advanced requirements, such as Type 2 changes? No incremental data loads? No restartability? No decent monitoring and troubleshooting?
  3. Data integration – Important data should be consolidated and centralized into a repository. And that repository should be a relational database. Also, what about making the data available to other tools? Should we lock them to using the Power BI XMLA endpoint?
  4. Semantic model – What if you have to support data refreshes at different granularity, such as hourly vs daily, or import vs direct query? What if one Power Query fails to refresh? Should we fail the entire refresh?
  5. Data volumes – Although your initial dataset might be less than a million rows, what if that changes or customers decides to use larger external data? Can Power Query handle this?

Shortcuts are tempting and disguise themselves as cost-effective. If you’re a data analyst that doesn’t know or can’t afford any better, surely take the data source->dataset approach. At least, if you’re sourcing data from a relational database, insist on SQL views so you can offload some transformations upstream. If you’re a BI pro and you’re building a pilot, go ahead. But if you’re to build an organizational BI solution that must adapt, evolve, and endure, at least let your sponsor know about what assumptions and tradeoffs you’re making along the way. Let’s be honest.

Atlanta MS BI and Power BI Group Meeting on July 11th (Pushing the Query Folding limits with Power Query)

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, July 11th, at 6:30 PM ET.  For more details and sign up, visit our group page.

Presentation:Pushing the Query Folding limits with Power Query
Date:July 11th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:One of Power Query’s most powerful features is its ability to translate the Power Query formula language (M) back to a source systems native language and in this session, we’ll push the limits and possibilities to avoid “breaking the fold” and explore some potential dark magic with List functions. A base understanding of T-SQL is helpful though not required for this session.
Speaker:From financial services to felines, the World Wide Web to professional wrestling – Alex Powers has an affinity for the conventional and unconventional when it comes to information. A self-proclaimed Excel and Power BI Enthusiast Alex Powers enjoys contributing to online forums and sharing his passion for empowering others using Microsoft technologies.
Prototypes without PizzaPower BI Latest

PowerBILogo

Prologika Newsletter Summer 2022

The workhorse of any modern BI solution is the semantic model that provides unparallel performance and contains business logic and security roles. Microsoft BI gives us three options to host semantic model: SQL Server Analysis Services (SSAS), Azure Analysis Services (AAS), and Power BI.  This newsletter explains why it might be beneficial to consider AAS and includes a script for automating resuming, processing, and synchronizing a model hosted on AAS.

Why Azure Analysis Services?

Microsoft BI practitioners have three options for hosting semantic models: SSAS (on prem), Azure Analysis Services (cloud), and Power BI (cloud). AAS is somewhat caught between a rock and a hard place. Given that Power BI gets the most attention for cloud deployment, why would you consider AAS at all? There are two main reasons:

  1. Cost – Organizational semantic models might require a lot of memory and crunching power. Hosting them on AAS might be more cost effective. For example, AAS S4 runs at around $5,000 which at the same price point as Power BI Premium P1. However, it gives you 100 GB of RAM and 20 cores, whereas P1 has only 25 GB and 8 cores.
  2. Scaling out – A feature unique to AAS is ability to scale out to multiple query replicas. This is not an option with Power BI Premium, and it requires quite a bit of setup with SSAS. However, AAS makes scaling out easy by just changing a slider. And once you’re done, you can pause it, so it doesn’t incur cost!

Automating the solution

Scaling out proved to be a useful feature lately when a client wanted to process massive queries in parallel. We cloned the model to AAS and wrote an ETL job to parallelize the query execution.

Note that the number of replicas depends on the data region and pricing level. For example, only East US 2 and West US support up to 7 query replicas up to S4. Another thing to watch for is that it’s not enough to just process the model on a scale-out farm. You’d need also to synchronize it across the query replicas. This could be done manually in the Azure Portal or automated, such by using the PowerShell script below that you can plug in a SQL Agent job. The script uses a regular AAD account which has admin rights to the server. You can also use a service principal, but I opted for a regular account because Microsoft removed the option for no expiration date for the client secret (the maximum lifetime of a client secret now is two years).

Import-Module Az.AnalysisServices
$password = "<account password>" | ConvertTo-SecureString -asPlainText -Force
$username = "craas@<domain>.com"
$aasendpoint = "asazure://aspaaseastus2.asazure.windows.net/crliveaas1"
$aasendpointmgmt = "asazure://aspaaseastus2.asazure.windows.net/crliveaas1:rw"
$TenantId = "<tenant id>"
$credential = New-Object System.Management.Automation.PSCredential($username,$password)
$defaultProfile = Connect-AzAccount -Credential $credential -Tenant $TenantId

Set-AzContext -Tenant $TenantId -DefaultProfile $defaultProfile
$server = Get-AzAnalysisServicesServer -ResourceGroupName "crliveaas_rg" -Name "crliveaas1" -DefaultProfile $defaultProfile
if ($server.State -eq "Paused")
{
    Resume-AzAnalysisServicesServer -Name "crliveaas1" -ResourceGroupName "crliveaas_rg"  
    #process database; ClearValues removes the data to reduce the memory footprint
    Invoke-ProcessASDatabase -Server $aasendpointmgmt -DatabaseName "<databasename>" -RefreshType "ClearValues" -Credential $credential
    Invoke-ProcessASDatabase -Server $aasendpointmgmt -DatabaseName "<databasename>" -RefreshType "Full" -Credential $credential

    # sync database
    Add-AzAnalysisServicesAccount  -Credential:$credential -RolloutEnvironment:"aspaaseastus2.asazure.windows.net"
    Sync-AzAnalysisServicesInstance -Instance $aasendpointmgmt -Database "<databasename>" -PassThru
}

Conclusion

When it comes to cloud deployment of Analysis Services semantic models, Power BI is preferable because you’re always on the latest features. However, Azure Analysis Services can help you reduce cost and scale out query execution – feature that Power BI doesn’t support.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Power BI Field Parameters

Coming back from a long vacation and I almost missed this new Power BI killer feature: Field Parameters! Not to be confused with Dynamic M Query Parameters that I ranted about here, field parameters solve a long-standing limitation of Power BI that prevents binding dynamically fields to a visual. Dynamic binding isn’t an issue with measures because they are dynamic and can evaluate runtime conditions, such as slicer selection, but dimensions are a different story. Once they are bound to a category bucket in a visual, you couldn’t change them on the fly.

Yet, one common scenario was to let the user control which fields will be used for slicing the measure(s) in a visual. I’ve seen rather convoluted implementations to get around this limitation. Field parameters to the rescue. Now once you create a field parameter and bind it to the visual, the user can simply select which field will be used for slicing.

Field parameters open the opportunity for packing more visuals on a single page and letting the user specify what they want to see in these visuals! Moreover, the fields can come from different tables. On the downside, one significant limitation not mentioned in the documentation, is that currently visuals can’t sort on the field parameter and no workaround exists (see this GitHub issue for details).

BTW, you can use this DAX measure to get the user-friendly selected value assuming you accepted the default name for the field parameter: ParameterTitle = MAX(Parameter[Parameter]). Or, because the parameter uses a groupby set, you can use this expression (thanks Alberto Ferrari):

ParameterTitle =
VAR _a = SUMMARIZE ( Parameter, Parameter[Parameter], Parameter[Parameter Fields] )
VAR _b = SELECTCOLUMNS ( _a, "Parameter", Parameter[Parameter] )
VAR _result = IF ( COUNTROWS ( _b ) = 1, _b )
RETURN
    _result

Power BI Datamarts: the Good, the Bad, and the Ugly

As Microsoft announced here, Power BI datamarts are upon us. I can almost see an important enterprise client demanding “self-service datamarts me now or else… “, thus inspiring an opportunity for another premium feature, spearheaded with great vision and effort, but questionable practical value. In a nutshell, a Power BI datamart is a combo of Power BI Premium and a Microsoft-hosted Azure SQL Database aiming to simplify the implementation of a departmental datamart.

The Good

Unlike other vendors, such as Domo and their proprietary and overly expensive stack, Microsoft has decided to go with somewhat open solution consisting of tools that Power BI users already know: Power Query, Power BI Desktop (for the first time some of its modeling features, such as relationships and DAX measures, made it to the cloud), and SQL Server. Microsoft provisions the database for you although surrounds it with some red tape (more on this in a moment). Thus, a business users aiming for “no code, low code” experience will whip out some dataflows that populate the database and then build a model (dataset) directly in Power BI Service. Obviously, the main goal is to simplify the experience as much as possible where all the action happens online.

It’s nice that Microsoft chose hosting the data in a SQL database instead of a “lakehouse”. Apparently, they learned some painful lessons from Power Query CDM folders. The database size is up to 100 GB which is not bad at all.

The Bad

From the announcement, “Best of all, IT doesn’t have to worry about getting all data into centrally governed data sources, thus providing discipline at the core and flexibility at the edge.” I failed to see how this will provide “discipline at the core” – a tenant that Microsoft learned from their own pain points after tilting too much toward self-service BI. I’ve seen also statements online that business users don’t have to “consult with IT anymore” when implementing datamarts. Really? What happened to managed self-service BI? I’m sure IT will be thrilled having corporate data in Microsoft-owned databases that they can’t manage and queries running amuck and consuming precious premium resources. Luckily, the admin portal has a switch to control who can create these datamarts. I hope at least we have a BYO database feature at some point.

The elastic Azure SQL database that Microsoft provisions is read-only, meaning that you can’t create objects. I’m a big fan of pushing calculations as much as possible to SQL Server, such as by implementing SQL views, but we can’t do that. Instead, we would use Power Query (what else of course) for all the transforms. But I have serious reservations against Power Query – a tool that is known to cause performance issues without providing any troubleshooting and maintenance insights.

The Ugly

Do we really need this feature? I would argue that what was really needed was extending Power Query with “destinations” where the user can specify where the data would land. If that was implemented, IT could selectively let business users augment the infrastructure set up by IT with self-service ETL (more than likely temporary) that sinks the data into an IT-sanctioned database. Further, it would have gotten us out of another proprietary mess that forces dataflows to save their output into CDM folders that make sense only to Microsoft (see my “Power BI Dataflows vs ADF Mapping Data Flows” blog for the gory details). Want to save dataflow data somewhere else? You got to use Power BI datamarts because this is the only way you can have your data in a (Microsoft) relational database and nowhere else.

Recently, an enterprise client has decided to migrate all self-service Alteryx flows to IT-governed ADF pipelines. More than likely, Power BI datamarts are heading in that direction. Be very careful about any pure self-service features, as you might find yourself in a bigger mess that you tried to solve.

A Case for Azure Analysis Services

Microsoft BI practitioners have three options for hosting semantic models: SSAS (on prem), Azure Analysis Services (cloud), and Power BI (cloud). AAS is somewhat caught between a rock and a hard place. Given that Power BI gets the most attention for cloud deployment, why would you consider AAS at all? There are two main reasons:

  1. Cost – Organizational semantic models might require a lot of memory and crunching power. Hosting them on AAS might be more cost effective. For example, AAS S4 runs at around $5,000 which at the same price point as Power BI Premium P1. However, it gives you 100 GB of RAM and 20 cores, whereas P1 has only 25 GB and 8 cores.
  2. Scaling out – A feature unique to AAS is ability to scale out to multiple query replicas. This is not an option with Power BI Premium, and it requires quite a bit of setup with SSAS. However, AAS makes scaling out easy by just changing a slider. And once you’re done, you can pause the instance, so it doesn’t incur cost!

Scaling out proved to be a useful feature lately when a client wanted to process massive queries in parallel. We cloned the model to AAS and wrote an ETL job to parallelize the query execution.

Note that the number of replicas depends on the data region and pricing level. For example, only East US 2 and West US support up to 7 query replicas up to S4. Another thing to watch for is that it’s not enough to just process the model on a scale-out farm. You’d need also to synchronize it across the query replicas. This could be done manually in the Azure Portal or automated, such by using the PowerShell script below that you can plug in a SQL Agent job. The script uses a regular AAD account which has admin rights to the server. You can also use a service principal, but I opted for a regular account because Microsoft removed the option for no expiration date for the client secret (the maximum lifetime of a client secret now is two years).

Import-Module Az.AnalysisServices
$password = "<account password>" | ConvertTo-SecureString -asPlainText -Force
$username = "craas@<domain>.com"
$aasendpoint = "asazure://aspaaseastus2.asazure.windows.net/crliveaas1"
$aasendpointmgmt = "asazure://aspaaseastus2.asazure.windows.net/crliveaas1:rw"
$TenantId = "<tenant id>"
$credential = New-Object System.Management.Automation.PSCredential($username,$password)
$defaultProfile = Connect-AzAccount -Credential $credential -Tenant $TenantId

Set-AzContext -Tenant $TenantId -DefaultProfile $defaultProfile
$server = Get-AzAnalysisServicesServer -ResourceGroupName "crliveaas_rg" -Name "crliveaas1" -DefaultProfile $defaultProfile
if ($server.State -eq "Paused")
{
    Resume-AzAnalysisServicesServer -Name "crliveaas1" -ResourceGroupName "crliveaas_rg"  
    #process database, first clear the data so processing doesn't go over memory limit
    Invoke-ProcessASDatabase -Server $aasendpointmgmt -DatabaseName "<databasename>" -RefreshType "ClearValues" -Credential $credential
    Invoke-ProcessASDatabase -Server $aasendpointmgmt -DatabaseName "<databasename>" -RefreshType "Full" -Credential $credential

    # sync database
   Add-AzAnalysisServicesAccount  -Credential:$credential -RolloutEnvironment:"aspaaseastus2.asazure.windows.net"
   Sync-AzAnalysisServicesInstance -Instance $aasendpointmgmt -Database "<databasename>" -PassThru
}

Atlanta MS BI and Power BI Group Meeting on June 6th (How Power Query Thinks)

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, June 6th, at 6:30 PM ET.  For more details and sign up, visit our group page.

Presentation:How Power Query Thinks: Taking the Mystery Out of Streaming and Query Folding
Date:June 6th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:How does Power Query produce the table data your expressions ask it to output? Query folding is one key concept—where behind the scenes, part or all of your M code may be rewritten into the data source’s native query language then offloaded to the source for execution. Streaming, though perhaps a less familiar term, is even more fundamental, as it describes how table data (as well as list and binary data) flows between functions in M.

 

Understand these concepts, and you’ll be better positioned to write more efficient mashups, debug problems and avoid unexpected variability in results. Join this session to learn about these key concepts—in a nutshell, to learn about How Power Query Thinks when you ask it to produce table data!

 

Speaker:Author of the Power Query M Primer Series, an in-depth dive into the Power Query language, Ben Gribaudo is a seasoned architect, developer and data engineer.
Prototypes without PizzaPower BI Latest

PowerBILogo

Power BI Dataflows vs ADF Mapping Data Flows

An enterprise client wants to migrate many Alteryx workflows created over years by smart business users to the Microsoft ecosystem. During the initial intake, we discussed Power BI dataflows vs Azure Data Factory mapping data flows. Yep, Microsoft loves to confuse us, but these technologies have nothing to do with each other.

Power BI DataflowsADF mapping data flows
Target audienceBusiness usersProfessionals
AvailabilityPower BI ServiceAzure Data Factory and Synapse Studio
Underlying technologyPower QuerySpark
Expression languageM LanguageExpression functions evaluated to Spark data types
Computation EnginePower BI Compute EngineSpark clusters
OutputAzure Data Lake Storage (CSV files in CDM folders)Many sinks
DebuggingN/ADebug mode
MonitoringVery basic (refresh failures)Detail output

In a nutshell, Power BI dataflows are meant for self-service ETL, where the business users would be responsible for creating and managing the ETL flows. By contrast, ADF mapping data flows target BI Pros. If you have been following my blog, you know that I’m big proponent for the ELT pattern for various reasons, mainly better performance and avoiding tool dependencies. Despite how much Microsoft promotes ADF mapping data flows, my typical data integration projects don’t use them. The last time I looked at them, they didn’t support ADF self-hosted runtimes, and they are a pain to debug. But if you must do transformations on the fly, e.g. when dumping data into files, then you obviously don’t have a choice.

So, what did the client decide to do? IT decided to take over the Alteryx workflows and convert them to Azure Data Factory. So much about self-service ETL.

FastTrack Recognized Solution Architect 2022

Microsoft re-awarded me as a FastTrack Recognized Solution Architect – Power BI for 2022! This prestigious recognition is conferred by the Power Platform product engineering team for consistently exhibiting deep architecture expertise and creating high quality solutions for customers during project engagements. I’m one of the 36 individuals worldwide who met the following criteria:

  • Must have a minimum of 2 years of experience with Power BI and a minimum of 5 years of experience with Enterprise BI solutions
  • Must have a minimum of 2 years of experience as an Enterprise BI architect
  • Must be working for a partner with Gold certification in Data Analytics MPN competency
  • Must have been lead architect for at least 2 Power BI in-production implementations with at least 200 active users (Preferably for CAT managed customers)

Atlanta MS BI and Power BI Group Meeting on April 4th

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, April 4th, at 6:30 PM ET.  Michael Carlo (Power BI MVP) will show how to speed up your data modeling experience by using DAX templates. For more details and sign up, visit our group page.

Presentation:World’s Fastest DAX – Using Quick DAX Templates
Date:April 4th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:DAX is easy to Learn but hard to master.

This session is all about using the Growing Community Driven Library of Quick DAX Templates. Learn how to find the templates and use them in your daily workflow to drastically speed up your Data Modeling Experience.

Session Covers the following topics:

1. Introduces the concept of Templatized DAX

2. Consuming a Quick DAX Template using Power BI External Tools

3. Discusses how to create your own templates and use them

Multiple demos of working directly inside the tooling for consuming and creating Quick DAX templates.

Speaker:Michael Carlo is very passionate about data and analytics. He spent a massive amount of time learning Power BI. Through this learning process he felt that others may also be interested. Thus, he created a website, a knowledge repository for all things PowerBI (powerbi.tips). This site is uniquely set up for people to read and learn about data modeling and visualizations within Power BI.
Prototypes without PizzaPower BI Latest

PowerBILogo