Atlanta Microsoft BI Group Meeting on November 6th (Exploring Bravo for Power BI)

Atlanta BI fans, please join us for the next meeting on Monday, November 6th, at 6:30 PM ET. Macro Russo will introduce us to the community tool Bravo for Power BI. Your humble correspondent will help you catch up on Microsoft BI latest. I’ll sponsor the event. For more details and sign up, visit our group page.

PLEASE NOTE A CHANGE TO OUR MEETING POLICY. WE HAVE DISCONTINUED ONLINE MEETINGS VIA TEAMS. THIS GROUP MEETS ONLY IN PERSON. WE WON’T RECORD MEETINGS ANYMORE. THEREFORE, AS DURING THE PRE-PANDEMIC TIMES, PLEASE RSVP AND ATTEND IN PERSON IF YOU ARE INTERESTED IN THIS MEETING.

Presentation: Exploring Bravo for Power BI

Delivery: In-person

Date: November 6

Time: 18:30 – 20:30 ET

Level: Beginner

Food: Pizza and drinks will be provided

Agenda:

18:15-18:30 Registration and networking

18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)

19:00-20:15 Main presentation

20:15-20:30 Q&A

Venue

Improving Office

11675 Rainwater Dr

Suite #100

Alpharetta, GA 30009

Overview: Bravo for Power BI is a free external tool to analyze a model, format DAX, export data, and manage dates table and time intelligence calculations. We will explore this tool’s capabilities, with additional insights on the time intelligence calculations feature that can be customized by creating new templates. Some features are also available in more advanced tools like DAX Studio, but the goal in Bravo is to make them available to all Power BI users providing a simpler user interface that is good for the more common use cases.

Speaker: Marco Russo is a business intelligence consultant and mentor. He first started working on Analysis Services in 1998, back when Analysis Services was known as OLAP Services. Marco wrote several books about Power BI, Analysis Service, and Power Pivot with Alberto Ferrari. They also regularly write articles and white papers that are available on sqlbi.com.

Sponsor: Prologika

PowerBILogo

ADF Cross Environment Schema Compare Tool

Almost every BI project requires at least two environments, such as DEV and PROD. Unless I’ve missed something, one Azure Data Factory caveat irrespective of the code promotion path (manual or automated with Azure DevOps or Git CI/CD) is that it uses ARM template deployment where the template contains all the artifacts. The equivalent for database schema deployment would be to script the entire database schema which makes me nervous. Not to mention that ARM templates have their own gotchas, such as the one I discussed here when shared self-hosted integration runtimes are used.

So, I got on a quest to find an ADF schema comparison tool for a quick and dirty way to identify ADF code discrepancies between two environments and promote changes manually. Ideally, the tool would be something like SQL Server Database projects in Visual Studio.

However, I wasn’t able to find such a schema comparison tool for ADF. The closest match was a PowerShell script by Riccardo Pericoto that extracts the ADF artifacts as files. My humble contribution was a few minor changes to the output folders, so that the artifacts from the ADF DEV service are saved in subfolders under a “dev” folder, while the production artifacts are saved under a “prod” folder. I’ve made this change to take advantage of a WinMerge feature that can scan subfolders. I’ve also added a line to delete all files in the target folders.

I enclose the script with the changes in bold:

# Install-Module -Name Az.DataFactory
# Credits: Riccardo Perico https://medium.com/riccardo-perico/comparing-azure-data-factories-exporting-all-the-components-929861ad2e00
Login-AzAccount
Get-AzSubscription -SubscriptionName 'MTX_Int_BI' | Set-AzContext -Name 'MyContextName'
Set-AzContext -Context $(Get-AzContext -Name "MyContextName")
$base_output_url = "C:\temp\adf\"
$pipeline_output = "activities"
$dataset_output = "datasets"
$linked_services_output = "linkedservices"
$triggers_output =  "triggers"

$adfv2_rsg_name_tc1 = 'mtx-rsg-adf-dev'
$adfv2_rsg_name_tc2 = 'mtx-rsg-adf-prod'
$adfv2_name_tc1 = 'mtx-adf-dev'
$adfv2_name_tc2 = 'mtx-adf-prod'

#Delete all files
Get-ChildItem -Path $base_output_url -Include *.* -File -Recurse | foreach { $_.Delete()}
$js = (Get-AzDataFactoryV2Pipeline -ResourceGroupName $adfv2_rsg_name_tc1 -DataFactoryName $adfv2_name_tc1) | Select-Object "Name"
$js2 = (Get-AzDataFactoryV2Pipeline -ResourceGroupName $adfv2_rsg_name_tc2 -DataFactoryName $adfv2_name_tc2) | Select-Object "Name"
$js_dst = (Get-AzDataFactoryV2Dataset -ResourceGroupName $adfv2_rsg_name_tc1 -DataFactoryName $adfv2_name_tc1) | Select-Object "Name"
$js_dst2 = (Get-AzDataFactoryV2Dataset -ResourceGroupName $adfv2_rsg_name_tc2 -DataFactoryName $adfv2_name_tc2) | Select-Object "Name"
$js_lks = (Get-AzDataFactoryV2LinkedService -ResourceGroupName $adfv2_rsg_name_tc1 -DataFactoryName $adfv2_name_tc1) | Select-Object "Name"
$js_lks2 = (Get-AzDataFactoryV2LinkedService -ResourceGroupName $adfv2_rsg_name_tc2 -DataFactoryName $adfv2_name_tc2) | Select-Object "Name"
$js_trg = (Get-AzDataFactoryV2Trigger -ResourceGroupName $adfv2_rsg_name_tc1 -DataFactoryName $adfv2_name_tc1) | Select-Object "Name"
$js_trg2 = (Get-AzDataFactoryV2Trigger -ResourceGroupName $adfv2_rsg_name_tc2 -DataFactoryName $adfv2_name_tc2) | Select-Object "Name"

foreach ($pl in $js)
{
    (Get-AzDataFactoryV2Pipeline -ResourceGroupName $adfv2_rsg_name_tc1 -DataFactoryName $adfv2_name_tc1 -Name $pl.Name).Activities | Sort-Object "Name" | Out-File $base_output_url"dev\"$pipeline_output"\$($pl.Name).txt"
}
foreach ($pl2 in $js2)
{
    (Get-AzDataFactoryV2Pipeline -ResourceGroupName $adfv2_rsg_name_tc2 -DataFactoryName $adfv2_name_tc2 -Name $pl2.Name).Activities | Sort-Object "Name" | Out-File $base_output_url"prod\"$pipeline_output"\$($pl2.Name).txt"
}
foreach ($d in $js_dst)
{
    (Get-AzDataFactoryV2Dataset -ResourceGroupName $adfv2_rsg_name_tc1 -DataFactoryName $adfv2_name_tc1 -Name $d.Name).Properties | Sort-Object "Name" | Out-File $base_output_url"dev\"$dataset_output"\$($d.Name).txt"
}
foreach ($d2 in $js_dst2)
{
    (Get-AzDataFactoryV2Dataset -ResourceGroupName $adfv2_rsg_name_tc2 -DataFactoryName $adfv2_name_tc2 -Name $d2.Name).Properties | Sort-Object "Name" | Out-File $base_output_url"prod\"$dataset_output"\$($d2.Name).txt"
}
foreach ($l in $js_lks)
{
    (Get-AzDataFactoryV2LinkedService -ResourceGroupName $adfv2_rsg_name_tc1 -DataFactoryName $adfv2_name_tc1 -Name $l.Name).Properties | Sort-Object "Name" | Out-File $base_output_url"dev\"$linked_services_output"\$($l.Name).txt"
}
foreach ($l2 in $js_lks2)
{
    (Get-AzDataFactoryV2LinkedService -ResourceGroupName $adfv2_rsg_name_tc2 -DataFactoryName $adfv2_name_tc2 -Name $l2.Name).Properties | Sort-Object "Name" | Out-File $base_output_url"prod\"$linked_services_output"\$($l2.Name).txt"
}
foreach ($t in $js_trg)
{
    (Get-AzDataFactoryV2Trigger -ResourceGroupName $adfv2_rsg_name_tc1 -DataFactoryName $adfv2_name_tc1 -Name $t.Name).Properties | Sort-Object "Name" | Out-File $base_output_url"dev\$triggers_output$($t.Name).txt"
}
foreach ($t2 in $js_trg2)
{
    (Get-AzDataFactoryV2Trigger -ResourceGroupName $adfv2_rsg_name_tc2 -DataFactoryName $adfv2_name_tc2 -Name $t2.Name).Properties | Sort-Object "Name" | Out-File $base_output_url"prod\$triggers_output$($t2.Name).txt"
}

Once the files are saved, the next step is to use WinMerge configured to scan subfolders to identify differences:

Then, you can manually copy the JSON payload of the changed artifact and replace its equivalent in the other environment. Long live ADF continuous integration and delivery!

Deactivating ADF Activities

Lo and behold, Azure Data Factory now supports deactivating activities as a preview feature. It took only seven years…

You can deactivate/activate activity by right-clicking it in the canvas or from its General settings. You can also configure how the subsequent activities should treat a deactivated activity, with Succeeded being the most common outcome.

When All You Have is a Hammer… (Dbt Tool for ETL)

Someone asked the other day for my opinion about the open-source dbt tool for ETL. I hadn’t heard about it. Next thing I’ve noticed was that Fabric Warehouse added support for it, so I got inspired to take a first look. Seems like an ELT-oriented tool. Good, I’m a big fan of the ELT pattern whose virtues I extolled many times here. But a Python-based tool that requires writing templates in a dev environment, such as Visual Studio Code? Yuck!

So, what’s my first impression? Same thoughts as when I see developers use another generic programming language, such as C#, for ETL. You can do it but why?

For years, I’ve been trying to get developers out of custom coding for ETL to low-code ETL specialized tools, such as SSIS and ADF.

Just because you studied Python in college, should you use Python for everything? I guess the open-source custom code gravitational pull is too strong. Or there are plenty of masochists out there. One of their case studies hailed dbt for democratizing ETL because everyone knows SQL and can contribute. That’s true but what goes around SQL is also important, as well productivity and maintainability of the overall solution.

On the Azure platform my preferred ETL architecture remains ADF and ELT with SQL in stored procedures. I don’t see any dbt advantages. Dbt might make sense to you if want to stay vendor-neutral, but I’d argue that if you follow the ELT pattern, migrating your ETL processes to another vendor would be trivial.

What am I missing?

UPDATE 10/19/2023

I’m humbled by the interest and comments this blog inspired on LinkedIn. It might well become one of the most popular posts I’ve ever written! This is what I’ve established based on the feedback from people who have used dbt:

  1. Although designed for ELT, it does only the “T” (transformation) part of the ELT process. You’d still need other tools, such as ADF, to extract and load the data. And so in a typical DW project, you could use ADF to extract data from the data sources, stage it, and then execute your dbt process to load the DW tables.
  2. It’s a template-based tool, where you use Python-like syntax to define “models”, such a model corresponding to a fact table, and the SQL statement to load it. Therefore, the SQL statement is embedded inside the template.
  3. It supports features, such as tests, macros, lineage, and documentation.

I wonder how many ETL projects would need these features though. If yours does, then dbt might be of interest. For me, the tool is still a hard pass that shares the same dark corner as ETL automation tools and ETL with custom code. Everybody is trying to do more with less nowadays, so maintainability and productivity are more important. If you have resources and time, my recommendation would be to invest into a home-grown SQL generator that would auto-generate the “T” part, such as the MERGE statement.