Posts

T-SQL, Power Query, or DAX?

I taught my Applied Power BI class last week to a group of smart data analysts. All of them were knowledgeable of T-SQL, which they have been using extensively for years to shape and transform the data and produce SSRS, Qlik and now Power BI reports. They haven’t previously used Power Query, but they liked my overview of its features. However, they were rightfully confused when to use T-SQL, Power Query, or DAX. Starting with the data source and moving up the chain, Power BI lets you use expressions in:

  1. Data source (custom SQL Query, SQL view)
  2. Power Query
  3. DAX calculated columns
  4. DAX measures (the uppermost level)

My advice is to invest time and learn all these capabilities. When you have diverse skills, you can use the best tool for the task at hand.

Although is hard to generalize as every data transformation is different, I recommend you shape the data as upstream (closer to the data source) as possible. So, if you are familiar with T-SQL and that’s your primary data source, then use T-SQL. As one of the Microsoft best products, SQL Server has been enjoying 30 years of continuous improvements. Not only can you apply the skills you already have, but you will gain in performance when you use T-SQL and delegate data crunching to SQL Server which is what it’s designed to do. You can also benefit from the rich data manipulation features of T-SQL. These are the same reasons why I favor the ELT (Extract, Load and Transform) pattern in organizational BI solutions instead of SSIS data flow transforms. And if you find T-SQL lacking in features, Power Query can supplement it nicely, such as to fill down missing values, quickly unpivot data, or apply fuzzy lookup.

When you connect to data sources that don’t support SQL, such as flat files or Excel, the next natural place is Power Query to shape and transform the data. The choice between Power Query and DAX calculated columns is a tricky one and typically involves a compromise between performance and skill set. In many cases, you’ll find that a custom column can be implemented both in Power Query and DAX calculated columns. In general, if performance is OK, use Power Query, which is further upstream than DAX calculated columns. As a bonus, your data model will see the custom columns as regular columns and compress them equally well. Consider DAX calculated columns (one level up Power Query) when:

  • You must use DAX features that Power Query lacks, such as ranking.
  • When Power Query transforms lead to long data refresh times, such as a lookup between two large tables.

Finally, while custom columns can be often implemented in any of the first three layers, DAX measures are unique, and they typically can be implemented in DAX only. For example, if you need the expression to reflect the end user filter selection, you must use a DAX measure because only DAX measures are evaluated at run time and can access runtime conditions, such as values from filters and slicers.

Atlanta MS BI and Power BI Group Meeting on January 7th

MS BI fans, join us for the next Atlanta MS BI and Power BI Group meeting on January 7th, Monday at 6:30 PM at the Microsoft office in Alpharetta. The main presentation will be “Predictive Analytics with Power BI”. I’ll walk you through the existing Power BI features for predictive analytics. Mark Tabladillo will show you exiting AI features that are coming up soon. CDATA will sponsor the event and do us a 15-min demo of their Power BI connector. For more details, visit our group page and don’t forget to RSVP (use the RSVP survey on the main page) if you’re planning to attend.

Presentation:Predictive Analytics with Power BI
Date:January 7, 2019, Monday
Time6:30 – 8:30 PM ET
Place:Microsoft Office (Alpharetta)

8000 Avalon Boulevard Suite 900
Alpharetta, GA 30009

Overview:Predictive analytics, also known as data mining, machine learning, and artificial intelligence (AI), is an increasingly popular requirement. Fortunately, you don’t have to be a data scientist to benefit from machine learning in Power BI. This session is organized in two parts:

  • Part 1 – We will revisit the existing machine learning features in Power BI, including Quick Insights, Explain Increase/Decrease, linear forecasting, integration with R and Python.
  • Part 2 – We will look at exciting new features that are coming up in near future, such as integration with Azure ML and Cognitive Services, and automated machine learning.
Speaker:Mark Tabladillo Ph.D. is a data scientist at Microsoft. His career has focused on industry application of advanced analytics, using a variety of analytics tools including SAS, SQL Server Analysis Services, Cortana Intelligence (including Microsoft R Server and Microsoft Machine Learning Services), R, and Python. He was a founding member of the Atlanta Microsoft BI User’s Group in 2010.

Through his Atlanta-based company Prologika (https://prologika.com), a Microsoft Gold Partner in Data Analytics, Teo Lachev helps organizations make sense of their most valuable asset: their data. Teo has authored and co-authored several bestselling books on organizational and self-service data analytics, and he has been leading the Atlanta Microsoft BI and Power BI group since he founded it in 2010. Teo has been a Microsoft Most Valued Professional (MVP) Data Platform since 2010.

Sponsor:CData Software is a leading provider of data access and connectivity solutions. We specialize in the development of Drivers and data access technologies for real-time access to on-line or on-premise applications, databases, and Web APIs. Our drivers are universally accessible, providing access to data through established data standards and application platforms such as ODBC, JDBC, ADO.NET, OData, SSIS, BizTalk, Excel, etc.

https://cdata.com

Prototypes with Pizza“TBD”


092417_1708_AtlantaMSBI1.png

Power BI External Users and Data Security

Power BI lets you share content directly with external users for B2B and B2C scenarios. When the benefits of this sharing option outweigh its limitations (read-only reports, requiring Azure AD federated access, per-user licensing, rendering the report inside Power BI), this is the easiest way to share Power BI content with an external party. However, the documentation, which is otherwise excellent, doesn’t explain the steps required to allow the external user to see only a subset of data when you have a dataset configured for data (RLS) security.

Granting access to an external user to a dataset configured for data security is like the proverbial chicken and egg problem. To grant access to the report, you need to share the report with the user, but you can’t add the user to the security role because the user is not provisioned yet. Hence, you’ll first share a non-sensitive report with the user before you share the actual report.

  1. Create an organizational workspace. I recommend you create a v2 workspace for the benefits I outlined in my “Power BI Sharing Is Getting Better” blog.
  2. In Power BI Desktop, import some sample data that is OK for the end user to see. Create a simple report for testing. A dataset is required because for some obscure reason, Power BI will prevent you from sharing an empty report that doesn’t have a dataset. Publish the Power BI Desktop file to powerbi.com.
  3. Share the non-sensitive report with the external user. At this point, if you examine the user account in Azure Active Directory in the Azure Portal (portal.azure.com), it will show that the user is invited.
  4. When the user gets the email, clicks the link, and navigates the prompts, the user will be eventually added to your tenant’s Active Directory and will show as Guest in the External Azure Active Directory source.
  5. Now you can assign a Power BI Pro license to this user in the Azure Portal. Remember that unless you are on Power BI Premium, any form of sharing requires a Power BI Pro license. One gotcha here is that the Azure Portal might refuse assigning a Power BI Pro license with the following error “License cannot be assigned to a user without a usage location specified.” To fix this horrible issue, go to Azure Active Directory (Users tab), click the external user, click Edit in the Settings section in the user profile, and then use the “Usage location” drop-down to select the country where the user is located.
  6. At this point, the external user is added to your tenant’s active directory and covered by Power BI Pro license. Next, publish the actual Power BI Desktop file with the sensitive report you want to share. This file should have a role that applies a row filter to one or more tables to enforce data security in a multi-tenant dataset. Once the file is published to powerbi.com, go to the dataset Security setting and add the external user to the role by typing the user’s email.
  7. The last step is to share the report with the external user.

    TIP: If you have a lot of external users, consider assigning them to a security group so you can grant access to the group instead of individual emails. Also, instead of sharing individual reports and dashboards, use Apps to grant access to the security groups. Note that as I explained in the “Power BI Sharing Is Getting Better” blog, currently you can’t assign viewers as members of the workspace. You must use either dashboard/report sharing or apps.

Expression-based Formatting in Power BI

If you have used SSRS, you know that paginated reports are very customizable, and you can make almost any property expression-based. Power BI is yet to deliver expression-based properties to change settings based on runtime condition, such as to change the font style based on the actual value or user selection. Currently, there are two places where you can use a DAX measure for expression-based formatting of colors:

Conditional formatting

You can use a DAX measure to change the fore color or background color when you apply conditional formatting to cells in Table and Matrix reports. To do so, when you configure the conditional formatting settings, choose the Field Value option. In the following report, I used a DAX measure to format the TaxAmount field with the following formula:

TaxColor = SWITCH(TRUE(), SUM(ResellerSales[TaxAmt])> 100000, “Red”, SUM(ResellerSales[TaxAmt])>50000,”Yellow”, SUM(ResellerSales[TaxAmt])>10000,”Green”, BLANK())

121518_2030_Expressionb1.png

Although the measure defines three bands of colors, which you can do by just using rules, it can also check additional runtime conditions, such as what field is used on the report or what value the user has selected in a slicer. You can also specify custom colors by using the hex triplet notation, such as #00AA09.

Chart data colors

Some charts with single data series, such as column and bar charts, support “Advanced controls” in the “Data colors” section of the Format tab in the Visualizations pane. This link brings you to the same window that you use to configure conditional formatting and you can use the “Field value” option to choose a DAX measure.

121518_2030_Expressionb2.png

Atlanta MS BI and Power BI Group Meeting on December 3th

MS BI fans, join us for the next Atlanta MS BI and Power BI Group meeting on December 3th, Tuesday at 6:30 PM at the Microsoft office in Alpharetta. I will discuss Power BI dataflows. A3 will sponsor the event. For more details, visit our group page and don’t forget to RSVP (use the RSVP survey on the main page) if you’re planning to attend.

Presentation:Understanding Power BI Dataflows
Date:December 3, 2018, Monday
Time6:30 – 8:30 PM ET
Place:Microsoft Office (Alpharetta)

8000 Avalon Boulevard Suite 900
Alpharetta, GA 30009

Overview:If you’re a business user, you might know about Microsoft Flow. If you’re a BI pro, you might have used the SSIS Control Flow and Data Flow. But do you know about the Power BI dataflow? Data preparation is often the most labor-intensive component of an analytics project and getting this right is vital if the results are to be accurate. While the growth of self-service BI has empowered users to answer important questions, ensuring well managed data is available to all employees remains one of businesses biggest challenges. Requiring no additional cost, dataflows are a building block of the Power BI ecosystem, providing a no-code/low-code approach using Power Query to create curated datasets that can be easily consumed by business analysts. Attend this presentation and you’ll learn:

  • How to stage data with dataflows
  • How dataflows can be used to create curated datasets
  • How dataflows can be created with the click of a button, ingesting data from many well knows SAAS solution, such as Dynamics and Salesforce
  • More advanced integration scenarios with Power BI Premium
Speaker:Through his Atlanta-based company Prologika (https://prologika.com), a Microsoft Gold Partner in Data Analytics, Teo Lachev helps organizations make sense of their most valuable asset: their data. His strategy formulation, trusted advisory and mentoring, design and implementation services empower clients to apply effectively data analytics in order to understand, improve, and transform their business processes. Teo has authored and co-authored several bestselling books on organizational and self-service data analytics, and he has been leading the Atlanta Microsoft BI and Power BI group since he founded it in 2010. Teo has been a Microsoft Most Valued Professional (MVP) Data Platform since 2004.
Sponsor:A3 Solutions is the developer of A3 Modeling. Empower your finance team with the only tool that supercharges end-user Excel into Enterprise Excel. Why incur the risk and cost to replace your Excel models? Just add A3! The San Francisco-based company’s product is in use at some of the world’s largest manufacturers, financial firms and retailers including Honda Manufacturing, Williams Sonoma, T.Rowe Price, and Fox TV, in addition to many mid-sized organizations. http://www.a3solutions.com/
Prototypes with Pizza“New relationship view, paginated reports, and filter pane” by Teo Lachev


092417_1708_AtlantaMSBI1.png

Dataflows: The Good, the Bad, the Ugly

This blog is an update of my previous blog “Common Data Service for Analytics: The Good, the Bad, the Ugly“. Things have changed in the past 6 months. The Common Data Service for Analytics is no more. It got superseded by Power BI dataflows, which Microsoft officially introduced at the PASS Summit last week.

Think of dataflows as “Power Query in the Cloud”. Dataflows is to self-service BI what ETL is to Power BI pros.

A dataflow is a collection of Power Query queries, also known as entities, and it lives in a Power BI organizational workspace. For example, if I want to stage data from Salesforce, I can create a dataflow with as many entities as Salesforce tables I want to stage. Unlike Power Query, which can only output to a table in a data model, a dataflow saves its output as a pair of a CSV file (data) and JSON file (metadata) in Azure Data Lake Store (ADLS).

111718_0237_DataflowsTh1.png

The Good

There is a lot to like about dataflows. I can think of two primary self-service scenarios that can benefit from dataflows:

  • Data staging – Many organizations implement operational data stores (ODS) and staging databases before the data is processed and loaded in a data warehouse. As a business user, you can use data-flows for a similar purpose. For example, one of our clients is a large insurance company that uses Microsoft Dynamics 365 for customer relationship management. Various data analysts create data models from the same CRM data, but they find that refreshing the CRM data is time consuming. Instead, they can create a dataflow to stage some CRM entities before importing them in Power BI Desktop. Even better, you could import the staged CRM data into a single dataset or in an organizational semantic model to multiple data copies and duplicating business logic.
  • Certified datasets – One way to improve data quality and promote better self-service BI is to prepare a set of certified common entities, such as Organization, Product, and Vendor. A data steward can be responsible for designing and managing these entities. Once in place, data analysts can import the certified entities in their data models.

There is a solid architecture, vision, and investments behind dataflows. You create dataflows using a tool that you’re already familiar with: Power Query. If you know Power Query, you know dataflows. Microsoft has provided the data lake storage and pricing is included in Power BI Pro/Power BI Premium. So, you have everything you need to get started with dataflows today. Ingesting the dataflow output is very fast too because you import CSV files.

The Bad

Let’s start with positioning. I’ve heard Microsoft position dataflows for any data integration task, from data staging to loading data warehouses and even replacing data warehouses and ETL (heard that vibe before?) I’ve seen business users doing impressive things with Power BI. But I’ve seen them also attempting to implement organizational solutions that collapse from their own weight. Dataflows are not an exception and I don’t think it’s a business user’s job to tackle ambitious data integration tasks.

You need Power BI Premium to realize the full potential of dataflows. For some obscure reason, features like linked and computed entities are not available in Power BI Pro. I wonder how many customers will feel pushed to go Power BI Premium as more and more features are only available there. I don’t mind scalability and performance related features but incremental refresh and linked entities?

The CSV output is both a blessing and a curse. True, ingestion is fast but text files are not a relational database. For example, the only option is to import the dataflow output in Power BI as DirectQuery is not available for text files.

The Ugly

You (almost) can’t access directly the output generated by dataflows. This precludes the continuum between self-service and organizational worlds. For example, IT might need to import a certified dataset into a data warehouse. But you can only connect to a dataflow in Power BI Desktop because the CSV files are not accessible. Power BI Premium will let you bring your own data lake storage in near future but a better option in my opinion was to provide access to the raw data in both Power BI Pro and Premium given that this is your data lake and your data.

Atlanta MS BI and Power BI Group Meeting on October 30

MS BI fans, join us for the next Atlanta MS BI and Power BI Group meeting on October 30th, Tuesday at 6:30 PM at the Microsoft office in Alpharetta. Neal Waterstreet will tackle the important and pervasive issue of bad data quality. ZAP BI will sponsor the event. And your humble correspondent will introduce you to Power BI Home and data profiling. For more details, visit our group page and don’t forget to RSVP (use the RSVP survey on the main page) if you’re planning to attend.

Presentation:Data Quality – Plain and Simple
Date:October 30, 2018, Tuesday
Time6:30 – 8:30 PM ET
Place:Microsoft Office (Alpharetta)

8000 Avalon Boulevard Suite 900
Alpharetta, GA 30009

 

Overview:Data quality is a subject that comes up repeatedly in many organizations. Most executives are concerned about the quality of the information used in their decisions. We talk about “good data” and “bad data” but what do those terms mean? In this presentation, we will first define what data quality is and look at how to measure it with data quality dimensions. Then, we’ll explore common causes for data quality issues and how to perform a data quality assessment. Finally, we’ll review the results and discuss some strategies and tools that can help improve the quality of the data in your organization.
Speaker:Neal Waterstreet is a BI Architect/Consultant with Prologika (https://prologika.com) ad he has more than 20 years of industry experience. Neal is skilled in the entire BI spectrum, including dimensional modeling, ETL design and development using Integration Services (SSIS), designing and developing multidimensional cubes and Tabular models using Analysis Services (SSAS) and Master Data Management using Microsoft Data Services (MDS). He’s also involved with the database community with the Atlanta BI and Power BI User Group and the Atlanta Modern Excel User Group.
Sponsor:ZAP’s mission is clear: to connect your business with data. Game-changing insight that impacts business performance only happens when you analyze data by business process or team objective, as opposed to by file type or IT system. This is what ZAP enables. https://www.zapbi.com/
Prototypes with Pizza“Power BI Home and data profiling” by Teo Lachev


092417_1708_AtlantaMSBI1.png

Power BI Release Notes

Want to know what Power BI features are in the works and when they will be released? My “Power BI Features Report” showed you how to find what features were released over time so it’s retrospective. On the other hand, Business Applications Release Notes are forward looking. For example, the October release notes for BI go all the way to March 2019. The release notes are for all business apps (not just Power BI): Dynamics, BI, PowerApps, Flow, AI, and others. There is also a change log.

Power BI Data Profiling

You know it and I know it. Data quality is a BIG problem that reduces the business value of BI. ETL practitioners will probably recall that SSIS includes a comprehensive Data Profiling Task but it is somewhat difficult to set up, especially if you wanted to profile multiple tables. It saves the results in an xml file and then you could use the Data Profile Viewer to visualize the results.

101618_0209_PowerBIData1.gif

Can we do something like this in Power BI? Starting with Power BI Desktop (October 2018) release you can. Well, sort of. Once you enable the column profiling preview feature, open the query behind any table and enable these options in the View ribbon.

101618_0209_PowerBIData2.png

You’ll get basic statistics showing the percentages of Valid, Error, and Empty values out of the sample size (the first 1,000 values). Here are definitions of these categories:

  • Valid – Non-Error and Non-Empty values out of the sample size
  • Error – Values with errors
  • Empty – Empty values

Below you get a column chart (not shown in the screenshot) that shows the distribution of the sample size, but unfortunately it doesn’t show the actual values. That’s all data profiling you get for now. Here is what it will take to make Power BI data profiling a killer feature:

  1. Allow data profiling over all the values (understandably there will be performance impact).
  2. Add more aggregates, such as Min/Max/Std/Median.
  3. The ability to dynamically filter the preview data for the selected bar in the profile.

Power BI Features Report

Want to know what Power BI features were released in a certain time period? Check out the Power BI Features report. After some delay, you should see the report embedded on the page but please be patient. If no patience, you can also download the pbix file from the same page. Then, use the slicer on the first page to filter your date range. In the “Count of Category” bar chart, right-click the category and then click See Records to see to the actual features. Once you drill through the category, there is a link next to each feature that redirects you to the corresponding blog to learn more.

pbifeatures