Prologika Newsletter Winter 2020

I hope you’re enjoying the holidays. In this newsletter, I’ll discuss a very important enhancement to Power BI that lets business users extend semantic models. But before I get to it, a quick announcement. I’m putting the finishing touches of the sixth edition of my “Applied Microsoft Power BI”! It should be available on Amazon in the first days of 2021. I’ve been updating this book thoroughly every year since 2015 to keep it up to the date with this fast-changing technology.

I’ve written extensively on the important role that EDW and organizational semantic models have for delivering the “Discipline at the core and flexibility at the edge” tenant for effective data analytics. Analysis Services Tabular is available in three SKUs: Power BI, Azure Analysis Services, and SSAS, and it’s the workhorse of Power BI Service. When you publish a Power BI desktop file, it becomes a database hosted in some Analysis Services Tabular server managed by Microsoft.

How live connections work

As the diagram below shows, Power BI uses a special live connectivity option when you connect live to Analysis Services in all its flavors (Multidimensional, Tabular, and Power BI published datasets) and SAP (SAP Hana and SAP Data Warehouse). In this case, the xVelocity engine isn’t used at all and the model is absent. Instead, Power BI connects directly to the data source and sends native queries. For example, Power BI generates DAX queries when connected to Analysis Services .

There is no Power Query in between Power BI Desktop and the data source, and data transformations and relationships are not available. In other words, Power BI becomes a presentation layer that is connected directly to the source, and the Fields pane shows the metadata from the model. This is conceptually very similar to connecting Excel to Analysis Services.

Unfortunately, once you connected Power BI Desktop to a multidimensional data source, that remote model was the only data source available for you.

Understanding the change

Power BI Desktop (December 2020 release) removes this long-standing limitation for live connections to Tabular. In the special case of connecting to a dataset published to Power BI Service and Azure Analysis Services (on-prem SSAS is not supported), you can switch from live connectivity to DirectQuery and add external data to build a composite model. This feature is very important because it allows business users to extend semantic models that could be sanctioned by someone else in the organization!

If the first connection you make is to the remote model then the connection will use Live Connect. The Power BI Desktop file will not store any metadata or data, expect for the connection string. The moment you use “Get Data” to connect to another source and accept the prompt, Power BI Desktop replaces permanently the live connection with a local DirectQuery layer and imports the metadata of the remote model. Even if you remove all external tables, you won’t be able to “undo” the change and switch back the file to Live Connect. In the diagram below, FactResellerSales, DimDate, and Employees tables are hosted in the remote model while FactSalesQuota is an external table that is imported (could be in DirectQuery mode).

What happens behind the scenes

In a nutshell, DirectQuery to Analysis Services Tabular is like other DirectQuery sources where DAX queries generated by Power BI are translated to native queries. However, in this case Power BI either sends the DAX queries directly to the remote model when possible or breaks them down into lower-level DAX queries. In the latter case, the DAX queries are executed on the remote model and then the results are combined in Power BI to return the result for the original DAX query. So, depending on the size of the tables involved in the join, this intermediate layer may negatively impact performance of visuals that mix fields from different data sources.

Applying your knowledge about composite models, you might attempt to configure the dimensions in dual storage, but you’ll find that this is not supported. Behind the scenes, Power BI handles the join automatically, so you do not need to set the storage mode to Dual. It’s interpreted as Dual internally. You can make metadata changes on top of the remote model. For example, you can format fields, create custom groups, implement your own measures, and even calculated columns (calculated columns are now evaluated at runtime and not materialized). The changes you make never affect the remote model. They are saved locally in the DirectQuery model.

Currently, row-level security (RLS) doesn’t propagate from the remote model to the other tables. For example, the remote model might allow salespersons to see only their sales data by applying RLS to the Employees table. However, the user will be allowed to see all the data in the FactSalesQuota table because it’s external to the remote model and RLS doesn’t affect it.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

The Science of Counting

I’m watching the witness testimonies for election irregularities in Georgia (the state where I live). I’m shocked about how this election became such a mess and international embarrassment. United States spent 10 billion on the 2020 election. Georgia alone spent more than 100 million on some machines the security experts said can be hacked in minutes. If we add the countless number of manhours, investigations, and litigations, these numbers will probably double by the time the dust settles down.

What did we get back? Based on what I’ve heard, 50% of Americans believe this election is rigged, just like 50% believed so in 2016. The 2020 election added of course more options for abuse because of the large number of mail-in ballots. It’s astonishing how manual and complicated the whole process is, not to mention that each state does things differently. But the more human involvement and moving parts, the higher the attack vector and probability for intentional or unintentional mishandling due to “human nature”, improper training, or total disregard of rules. I casted an absentee ballot without knowing how it was applied.

So, your humble correspondent thinks that it’s about time to computerize voting. Where humans fall short, machines take over. Unless hacked, algorithms don’t make “mistakes”. How about a modern Federal Internet Voting system that can standardize voting in all states? If the Government can put together a system for our obligations to pay taxes, it should be able to do it for the right to vote. If the most advanced country can get a vaccine done in six months, we should be able to figure out how to count votes. Just like in data analytics, elections will benefit from a single version of truth. Despite the security concerns surrounding a web app, I believe it will be far more secure that this charade that’s going on right now. In this world where no one trusts anyone, we can’t apparently trust bureaucrats to do things right.

Other advantages:

  • Anybody can vote from any device so no vote “suppression”
  • Better authentication (face recognition and capture, cross-check with other systems, ML, etc.)
  • Vote confirmation
  • Ability to centralize security surveillance and monitoring by an independent committee
  • Report results in minutes
  • Additional options for analytics on post-election results
  • Save enormous amount of money and energy!

I’m just saying …

Atlanta MS BI and Power BI Group Meeting on December 7th

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, December 7th, at 6:30 PM.  Patrick LeBlanc (A Guy in the Cube) will share techniques to optimize your Power BI data models. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:Optimizing the size of your model
Date:December 7th, 2020
Time6:30 – 8:30 PM ET
Place:Click here to join the meeting

Learn More | Meeting options

Overview:When working with your Power BI Data Model/Dataset there are certain that can be done to optimize the size of the model.  With that, there are certain thing that can be done that wreaks havoc on your Data Model.  In this session we will walk you through several things that can be done to ensure that your data model is optimize for the best performance.  We will discuss and demonstration how items such as data types, model properties, and DAX calculations and adversely affect the size of the model.  That’s just a small list of items, join the meeting to learn all the tips and tricks.
Speaker:Patrick LeBlanc is a currently a Principal Program Manager at Microsoft and a contributing partner to Guy in a Cube. Along with his 15+ years’ experience in IT he holds a Masters of Science degree from Louisiana State University. He is the author and co-author of five SQL Server books. Prior to joining Microsoft he was awarded Microsoft MVP award for his contributions to the community.  Patrick is a regular speaker at many SQL Server Conferences and Community events.
Prototypes without pizza:“Power BI Latest” by Teo Lachev

PowerBILogo

Understanding Power BI Endpoints

In its early days, Power BI introduced an endpoint to support Analyze in Excel (AIXL). Later, the “Power BI datasets” connector relied on this endpoint to support connecting to published datasets. The AIXL endpoint was never intended to support other clients. It has a few limitations, such as it doesn’t support long running requests and write operations. Also, it doesn’t support importing data as you’ve probably found when connecting to published datasets.

Later, Power BI Premium added the XMLA endpoint to support external clients connected to datasets in a premium capacity. If you’re on Premium, you should use the XMLA endpoint to connect external clients by using the Azure Analysis Services connector. Unfortunately, if you do so in PBI Desktop, “Get Data” won’t give you the nice UX that shows you which datasets are certified.

It will be nice if Microsoft adds a mechanism in the future where the client libraries will automatically take the new style of connection string (workspace + dataset name) and redirect to the AIXL endpoint if the dataset is hosted in a shared capacity or the XMLA endpoint if the dataset is in Premium. Or even better, support the XMLA endpoint for both shared and premium capacities.

In summary:

  • AIXL and PBI Desktop use the AIXL connections.
  • You should never use AIXL connection strings directly.
  • Business users should continue using the “Power BI datasets” connector.
  • Other external clients should use the XMLA endpoint to connect to a dataset in a premium capacity.

Updated Export Settings

A Happy Thanksgiving to all of you! I’m thankful for your interest in and support of my work.

I’m glad that Microsoft has recently updated the Power BI tenant export settings and thus mitigated the the complaints I had in my “A False Sense of Data Security” blog. Disabling the “Export to …” settings, which now applies also to paginated reports, no longer disables live connections to published datasets and features that depend on it, such as connecting to published datasets in Power BI Desktop, accessing the Power BI Premium XMLA endpoint, and Analyze in Excel. Instead, there is a clear distinction and  now there two separate settings that affect external connectivity (for XMLA connectivity, the XMLA Endpoint capacity setting must be enabled in Read-Only or Read-Write modes):

  • Allow live connections – This is a catch-all setting for allowing the live connectivity features.
  • Allow XMLA endpoints and Analyze in Excel with on-premises datasets — Microsoft felt that there should be a separate setting (besides “Allow live connections”) for connecting to on-prem datasets. However, just like disabling “Allow live connections”, this setting also effectively disables the Power BI Premium XMLA endpoint preventing other tools, such as Visual Studio or SSMS, to connect to published datasets.

Power BI Dynamic M Query Parameters – Another Opportunity Missed for DirectQuery Users

UPDATE 17-Feb-2022: Dynamic query parameters now support popular relational data sources, and I posted an updated blog.

I had recently a Power BI advisory engagement for a large ISV. They had many customers. and each customer data was hosted in its own database for security reasons, with all databases having an identical schema. Power BI reports had to connect using DirectQuery to avoid refreshing data (real-time BI). Naturally, the customer envisioned a single set of reports with the ability to switch the dataset connection to the respective database depending on the user identity and company association. In addition, they wanted to piggyback on the existing database security policies by having each visual call a stored procedure and pass the user identity as a dynamic parameter.

Can Power BI do this? Unfortunately, no. The main architectural issue is that although the model (VertiPaq) and query (Power Query) are bundled in Power BI Desktop (and in a single dataset when published), they are like two separate universes. Naturally, this is not an issue when data is imported, but it becomes important with DirectQuery. As it stands, the only communication between these two layers is when Power Query passes the data back to the model. You can define query parameters, but they are static, meaning that they must be specified on the dataset settings page in Power BI Service. In other words, query parameter can’t reference DAX constructs, such as USERPRINCIPALNAME(), and therefore can’t reference runtime conditions.

That’s why I initially rejoiced when Microsoft announced Dynamic M Query Parameters, a feature that just went into public preview in the Power BI Desktop October 2020 release. Unfortunately, a better name of this feature would have been Kusto Dynamic Parameters since the primary design intent was to support Azure Data Explorer (internally named Kusto). It doesn’t apply to DirectQuery connections to most relational databases.

Although there is no indication, Power BI has two types of connectors: native and M-based. Native connectors target most popular relational data sources: TSQL (Azure Database, SQL Server, Synapse), PLSQL (Oracle), Teradata and relational SAP Hana. Dynamic query parameters won’t work with native connectors. The rest (Microsoft provided and custom) are M-based. If an M connector supports DirectQuery, it should support dynamic query parameters too. For example, besides Azure Data Explorer, other M-based data source that supports DirectQuery and therefore dynamic parameters are Amazon Redshift, Showflake, and Google BigQuery.

I hope Microsoft will revamp this feature to support the popular relational databases and DAX functions, so we can address more involved DirectQuery scenarios and make parameters really dynamics.

Stop Using the Power BI Folder Connector

I know that it’s tempting to use the Folder connector to combine multiple files with the same schema and load them in Power BI Desktop with a few clicks – an idea promoted by Microsoft’s own Power BI Dashboard in a Day (DIAD) training.

But what happens if you want to automate the dataset refresh after uploading your model to Power BI Service? Suddenly, things don’t work so well anymore. The only option is to install a personal gateway on your laptop with all the issues surrounding it.

Instead, I recommend you upload files to Azure Data Lake Storage and use the Power BI Azure Data Lake Storage Gen2 connector. Setting this up will require some assistance from your friendly IT department to set up ADLS and grant you access, but they shouldn’t complain if they are serious about data governance.

Like the Folder connector, you can combine multiple files from an ADLS folder. Moreover, this deployment options offers several benefits over keeping files locally:

  1. Eliminates gateways.
  2. Allows files to be centralized and organized nicely in folders.
  3. Enables integration scenarios, such as a third-party system pushing files on a schedule.
  4. Provides secured access and supervision by IT.
  5. Supports CDM folders in case you ever want to go there, such as to load files from Power BI dataflows or Dynamics CRM.
  6. It’s a highly available, performant, and scalable.

Power Query Merging on GUID Columns

Scenario: You are merging two tables in Power Query on GUID columns with Text data type. In my case, the first table (UserDim) came from Azure SQL DB and the second (systemusers) from Dynamics CRM, but I believe the source is irrelevant. Then, you expand the second table in the next step after the merge, select the columns you need, and see expected results in the preview pane. However, when you apply the changes, the related columns are not populated (empty) in the Power BI data model. In other words, the join fails to find matches.

Solution: I think there is some sort of a bug in Power Query. True, Power Query is case-sensitive so joining on the same text that differs in casing in the two columns would fail the match. However, in this case the guids were all lower case (or at least that’s how they appear in the preview). The only way I could explain the issue is that the refresh somehow converts the guid values to different casing before the join is performed. Anyhow, the problem was solved by adding an explicit step to convert both guid columns to lower case before the merge step.

Solving Configuration Errors with ADF Self-hosted IR

You’ve set up the Azure Data Factory self-hosted integration runtime to access on-prem data sources. You create a linked server, click Test Connection, and then get greeted with an error saying the security context can’t be passed. On the on-prem VM, you use the Integration Runtime Configuration Manager and get a similar error or something to the extent that JSON can’t be parsed. You spent a few hours in trying everything that comes to mind, such as checking firewalls, connectivity from SSMS, but nothing helps.

How do we fix this horrible problem? We double the backslashes in the server name (if you use a named instance) and in the user name (after the domain) on the linked server properties. Apparently, Spark/Databricks has an issue with backslashes.

Atlanta MS BI and Power BI Group Meeting on October 5th

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, October 5th, at 6:30 PM. Paco Gonzalez (CEO of SolidQ) will present and demonstrate the Power BI AI and ML capabilities. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:AI, ML and Power BI
Date:October 5th, 2020
Time6:30 – 8:30 PM ET
Place:Join Microsoft Teams Meeting

Learn more about Teams | Meeting options

Overview:Perhaps some of the most powerful features of Power BI involve the Artificial Intelligence and Machine Learning functionality built into the platform. In this session, we will review the AI and ML capabilities built into Power BI Desktop and the Power BI Service. From Data Visualizations to Data Preparation and Data Modeling, we will discuss the various ways that AI and ML can be easily implemented in your reporting.
Speaker:Paco Gonzalez is the CEO of SolidQ North America, and a Microsoft Data Platform MVP. Focused on Business Analytics and Artificial Intelligence, he specializes in helping organizations become data driven from a strategic and technical perspective. Paco is a speaker at small and large conferences such as PASS Summit, Ignite, and Business Applications Summit, and he has published several books and whitepapers. He is based in Atlanta, GA.
Prototypes without pizza:Power BI latest features

PowerBILogo