Atlanta MS BI and Power BI Group Meeting on October 5th

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, October 5th, at 6:30 PM. Paco Gonzalez (CEO of SolidQ) will present and demonstrate the Power BI AI and ML capabilities. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:AI, ML and Power BI
Date:October 5th, 2020
Time6:30 – 8:30 PM ET
Place:Join Microsoft Teams Meeting

Learn more about Teams | Meeting options

Overview:Perhaps some of the most powerful features of Power BI involve the Artificial Intelligence and Machine Learning functionality built into the platform. In this session, we will review the AI and ML capabilities built into Power BI Desktop and the Power BI Service. From Data Visualizations to Data Preparation and Data Modeling, we will discuss the various ways that AI and ML can be easily implemented in your reporting.
Speaker:Paco Gonzalez is the CEO of SolidQ North America, and a Microsoft Data Platform MVP. Focused on Business Analytics and Artificial Intelligence, he specializes in helping organizations become data driven from a strategic and technical perspective. Paco is a speaker at small and large conferences such as PASS Summit, Ignite, and Business Applications Summit, and he has published several books and whitepapers. He is based in Atlanta, GA.
Prototypes without pizza:Power BI latest features

PowerBILogo

Prologika Newsletter Fall 2020

Cloud deployments are the norm nowadays for new software projects, including BI. And Azure Synapse Analytics shows a great potential for modern cloud-based data analytics platform. Here are some high-level pros and cons to keep in mind for implementing Azure Synapse-centered solutions that I harvested from my real-life projects and workshops.

 

What’s to Like

There is plenty to like in Azure Synapse which is the evaluation of Azure SQL DW. If you’re tasked to implement a cloud-based data warehouse, you might be evaluating three Azure SQL Server-based PaaS offerings: Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse. However, Azure SQL Database and Azure SQL MI are optimized for OLTP workloads. For example, they have full logging enabled and replicate each transaction across replicas. Full logging is usually a no-no for decent size DW workloads because of the massive ETL changes involved.

In addition, to achieve good performance, you’ll find yourself moving up the performance tiers and toward the price point of the lower Azure Synapse SKUs. Further, unlike Azure SQL Database, Azure Synapse can be paused, such as when reports hit a semantic layer instead of DW, and this may offer additional cost cutting options.

To get started with Synapse, you need to create a workspace. A workspace can host two analytics runtimes, SQL and Spark. These runtimes are hosted in pools. For example, you might have one pool Pool for your enterprise data warehouse, and other SQL pool for a departmental data mart, all provisioned within the same workspace.

A SQL Pool (previously known as Azure SQL DW) hosts your SQL relational database. As an MPP system, it can scale to petabytes of data with proper sizing and good design. The obvious benefit is that for the most part (see the Ugly section discussing exclusions) you can carry your SQL Server skills to Azure Synapse.

As a general rule of thumb, Azure Synapse should be at the forefront of your modern cloud DW with data loads starting with 500 GB.

I like the integration with Azure Data Lake Storage (ADLS). Previously, you had to set up external tables to access files in ADLS via PolyBase. This is might redundant now thanks to the serverless (on-demand) endpoint which also replaces the Gen1 U-SQL with T-SQL. All you have to do is link your ADLS storage accounts and use the T-SQL OPENROWSET construct to query your ADLS files and exposed them as SQL datasets. Not only does this opens new opportunities for ad hoc reporting directly on top of the data lake, but it also can help you save tons of ETL effort to avoid staging the ADLS data while loading your DW.

As the diagram shows, you can also provision an Azure Spark runtime under a Synapse workspace. This opens the possibility for using Azure Spark/Databricks for data transformation and data science. For example, a data scientist can create notebooks in Synapse Studio that use a programming language of their choice (SQL, Python, .NET, Java, Scala, and R are currently supported).

What Needs Improvement

You have two coding options to implement artifacts for Azure Synapse, such as ETL and reports:

  1. You can use the respective development environment, such as Azure Data Factory Studio for ADF development and Power BI Desktop for Power BI report development.
  2. You can use Synapse Studio which attempts to centralize coding in one place.

Besides the ADLS integration and notebooks, I find no advantages in the Synapse Studio “unified experience”, which is currently in preview. In fact, I found it confusing and limiting. The idea here is to offer an online development environment for coding and accessing in one place all artifacts surrounding your DW project. Currently, these are limited to T-SQL scripts, notebooks, Azure Data Factory pipelines, and Power BI datasets and reports. However, I see no good reason to abandon the ADF Studio and do ETL coding in Synapse Studio. In fact, I can only see reasons against doing so. For example, Synapse Studio lags in ADF features, such as source code integration and wrangling workflows. For some obscure reason, Synapse Studio separates data flows from ADF pipelines in its own Data Flows section in the Development Hub. This is like putting the SSIS data flow outside the package. Not to mention that inline data flows should be deemphasized since ELT (not ETL) is the preferred pattern for loading Azure Synapse. To make things worse, once you decide to embrace Synapse Studio for developing ADF artifacts, you can use only Synapse Studio because the ADF pipelines are not available outside Synapse.

On the Power BI side of things, you can create a Power BI dataset in Synapse Studio. All this saves is the effort to go through two Get Data steps in Power BI Desktop to connect to your Azure Synapse endpoint. You can also create Power BI reports using Power BI Embedded in Azure Synapse. Why would I do this outside Power BI Desktop is beyond me. Perhaps, I’m missing the big picture and more features might be coming as Synapse evolves. Speaking of evolution, as of this time, only SQL pools are generally available. Everything else is in preview.

What’s Not to Like

Developers moving from on-prem BI implementations to Azure Synapse, would naturally expect all SQL Server features to be supported, just like moving from on-prem SSAS to Azure AS. However, they will quickly find that Azure Synapse is “one of a kind” SQL Server. The nice user interface in SSMS for carrying out admin tasks, such as creating logins and assigning logins to roles, doesn’t work with Azure Synapse (all changes require T-SQL). SQL Profiler doesn’t work (tip: use the Profiler extension in Azure Data Studio for tracing queries in real time).

Because Azure Synapse uses Azure SQL Database, it’s limited to one physical database, so you must rely on using schemas to separate your staging and DW tables. There are also T-SQL limitations, such as my favorite MERGE for ELT-based UPSERTs is not yet supported (currently in preview for hash and round-robin distributed tables but not for replicated tables, such as dimension tables). So, don’t assume that you can seamlessly migrate your on-prem DW databases to Synapse. Also, don’t assume that your SQL Server knowledge will suffice. Prepare to seriously study Azure Synapse!


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Organizational Semantic Model

I am delivering a data governance assessment for an enterprise client. As a part of the effort to migrate reporting from MicroStrategy to Power BI, the client wants to improve data analytics. The gap analysis interviews with the business leaders revealed common pitfalls: no single version of truth, data is hard to come by, business users don’t know what data sources exist, business users spend more time in data wrangling than analytics, data quality is bad, IT is overwhelmed with report requests, report proliferation and duplication, and so on…

Sounds familiar? As I mentioned many times in my blog, an enterprise data warehouse (EDW) plays a critical role in overcoming the above challenges, but it’s not enough. A semantic model is needed and I extolled its virtues in my “Why Semantic Layer?” newsletter. In the Microsoft BI world, Analysis Services Tabular is commonly used to implement such models that are typically layered on top of EDW . In general, there are two ways to approach the model implementation:

  1. (Self-service BI path) Business users create self-service semantic models using Power BI Desktop. Behind the scenes, Power BI creates databases hosted in the Analysis Services Tabular server from the *.pbix files.
  2. (Organizational BI path) BI developers implement organizational semantic models.

Since both implementation paths lead to the same technology, it boils down to ownership, vision, and purpose.

Because IT is overwhelmed, the temptation is to transfer the semantic model development to business users. The issue with this approach is business users seldom have the skills, time, and vision to do so. And the end, mini “semantic models” (“spreadmarts”) are produced and the same problems are perpetuated.

In most cases, my recommendation is for IT to own the semantic model because they own the data warehouse and the “Discipline at the Core” vision. And yes, unless operational and security requirements dictate otherwise, it should strive for a single centralized semantic model that spans all subject areas. If the technology you use for semantic modeling can’t deliver acceptable performance with large models, then it’s time to change it.

Of course, not all data exists or will exist in EDW. This is where self-service “Flexibility at the Edge” comes in. I have high expectations for the forthcoming “Composite models over Power BI datasets and Azure Analysis Services” mega feature (public preview expected in November 2020). This will enable the following scenario that Power BI cannot deliver today:

  1. Business user starts by connecting live to corporate data in the organizational semantic model. Every new report requirement should start with evaluating if all or some of the data is in the semantic model, and if so, instructing the user to connect to the semantic model to avoid data modeling and data duplication.
  2. Business user wants to mash up this data with some data that is not in EDW by retaining the live connection to the organizational semantic model and importing (or connective live with DirectQuery) to other datasets.

This will give us the best of both worlds and help us achieve “Discipline at the Core, Flexibility at the Edge“.

Section Hiked A.T. in Georgia

My wife and I started section hiking the Appalachian Trail during weekends to escape the summer heat and the virus. A.T. runs for 2,200 miles from Georgia to Maine, with 78.6 miles in Georgia. Today we finished the Georgia part and entered North Carolina. We actually covered twice the distance (averaging 10-12 miles per section) because we had to come back each time to where we parked. We started hiking with the great Atlanta Outdoor Club back in February, 2020. But when the virus hit, group hikes were put on hold, so we were left to our own devices.

Hiking somehow grew on me. Perhaps, because it as a metaphor for life. There are ups and downs. Some sections are hard and require a great deal of effort and perspiration, while others are easy. There are exhilarating views but there are also areas with overgrown vegetation. Perceived risks, such as wild animals (note the bear spray on my belt ), are lurking in the distance. Some people tell you that you’re crazy. The climate is unpredictable, and planning is sometimes futile (we got ourselves in a really bad thunder storm and downpour once while the weather was supposed to be OK). But for the most part, it is just putting one foot in front of the other. It’s about the journey, not the destination. And the more you put in, the more you’ll get out.

Kudos to the wonderful people who maintain the A.T. and those who supported us with advice! Kudos to these brave thru-hikers that cover the whole distance. I’m really jealous…one day maybe.