Atlanta MS BI and Power BI Group Meeting on February 3rd

MS BI fans, join us for the next Atlanta MS BI and Power BI Group meeting on February 3rd, Mondayat 6:30 PM at the Microsoft office in Alpharetta. Shabnan Watson will discuss how to apply aggregations to Power BI DirectQuery datasets to improve report performance.  Melissa will sponsor the meeting. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:Aggregations in Power BI
Date:February 3rd, 2020
Time6:30 – 8:30 PM ET
Place:Microsoft Office (Alpharetta)
8000 Avalon Boulevard Suite 900
Alpharetta, GA 30009
Overview:Aggregations are one of the most important optimization methods for managing big datasets in Power BI. Combined with Direct Query storage mode, they allow big datasets to be analyzed efficiently by answering high level analytical queries quickly from memory while sending more detailed queries back to the source database. In this session, you will learn about the concept of aggregations, different table storage modes in Power BI, different kinds of aggregation tables, how to configure aggregation tables so that they can answer high level user queries, and finally how to use tools such as DAX Studio or Extended Events to determine if the aggregations are actually being used.
Speaker:Shabnam Watson is a Business Intelligence consultant with 18 years of experience developing data warehouse and BI solutions. Her work focus within the Microsoft BI Stack has been on Analysis Services and Power BI. She is an active member of PASS community and has spoken at PASS Summit, PASS SQL Saturdays, PASS Women In Technology Virtual Chapter, and other Local user groups. She is one of the organizers of SQL Saturday Atlanta and SQL Saturday Atlanta BI Edition. She holds a master’s degree in computer science, a bachelor’s degree in Computer Engineering, and a Certified Business Intelligence Professional (CBIP) certification by The Data Warehouse Institute (TDWI).
Sponsor:Bad data is bad business. Melissa helps organizations profile, cleanse and verify, dedupe and enrich all their people data (name, address, email and phone number) and more.  With clean, accurate and up-to-date customer information, organizations can monetize Big Data, improve sales and marketing, reduce costs and drive business insight. https://www.melissa.com/
Prototypes with PizzaTBD

PowerBILogo

Embracing Automated Machine Learning (AutoML)

With the growing demand for predictive analytics, Automated Machine Learning (AutoML) aims to simplify and democratize predictive analytics so business users can create their own predictive models. The promise of AutoML is to bring predictive analytics to business users, just like Power BI democratizes data analytics, Power Apps democratizes app dev, and Power Query democratizes data shaping and transformation.

As a business user, the two most popular options for applying Automated Machine Learning for predictive analytics are Power BI and AzureML. Behind the scenes, Power BI AutoML uses the automated machine learning feature of AzureML but there are differences and I summarize below the most important ones.

Power BI AutoMLAzureML AutoML
LicensingPower BI PremiumAzure ML (Enterprise Edition recommended)
ContainerDataflowExperiment
Power QueryAvailableNot available
Supported data sourcesManyA few (local files, Azure SQL DB, ADLS, and a few more)
ModelNot Accessible (Power BI handles everything)Accessible
Web service endpointNot available outside Power BIAvailable for app integration
ScoringApply the model to entityVarious options (Notebooks, SDK, custom integration)

To me, the best solution would have been the combination of both. I like Power Query for sourcing, shaping and transforming the data, but I also like the flexibility that AzureML brings. Unfortunately, you can’t mix and match. It appears that AzureML has decided to roll out their own data connectivity mechanism and as a result, it supports a limited number of data sources (for example, on-prem data sources are not accessible). Of course, this will probably change soon as the product evolves.

I’ve done recently some work with Azure ML Studio (https://ml.azure.com/), and I’m impressed. Microsoft has learned important lessons from the previous AzureML (now called “classic”) and greatly enhanced the product. If you’re looking for a SaaS ML toolset that targets both business users and data scientists, AzureML should be on the top of your list. Speaking of its AutoML feature, the main advantages that it brings for predictive analytics are:

  • Determining the model type – classification, regression, and time series forecasting (the last one is not available yet in Power BI)
  • Automatic featurization
  • Selecting the best algorithm – For example, the screenshot below shows how AzureML has tested various algorithms and determined that VotingEnsemble performs the best.

Even if you’re a data scientist, the best algorithm selection feature alone justifies giving AutoML a try – if not for anything else but to select the best algorithm so that you don’t have spend enormous time testing different algorithms.

Get Power BI Training at Power Platform World Tour

Register for my full-day academy training at #PowerPlatformWT in Atlanta on Feb 10th for only $599 and learn how Power BI can bring your data to life!

011520_2018_GetPowerBIT1.png

Stop Using JavaScript for Data Security

As the poet said, “let me count the ways”. I’ve been amazed by the extent some organizations would go into inconveniencing their employees in the pursuit of better security while they do such dumb things. In one case, an organization would automatically log out login sessions after a certain period of inactivity, allow employees to access Power BI only on company’s approved devices, and prevent developers from accessing servers. At the same time, the same organization would configure an Internet proxy server for Basic authentication where user passwords are transmitted in plain text (not even https)! All the hacker must do is to plug into the corporate network and intercept the login passwords.

Yet, another organization would require developers to tunnel into a VDI environment, from where they will tunnel one more time into an Azure VM, before they can access Azure resources, such as an Azure SQL DB. Wasn’t the Cloud supposed to make things easier? Yet, developers within the same organization would use JavaScript to apply data security. Since this is the second time I see this, please repeat after me.

Thou shall not use the Power BI Embedded JavaScript APIs to apply data security.

In this case, the developer has decided to pass filters to the Power BI Embedded configuration object (notice the filters property) to restrict the data. The problem is that JavaScript code is not secure. This code will run on the client side and in the user’s browser session. All it takes is to put the browser in a debug mode and nuke the filters.

The Power BI Embedded JavaScript APIs were never meant as a replacement of data security (RLS). Instead, the scenario here is to let the developer provide an alternative filtering UI that could replace the default Power BI filter pane. Defaulting a filter, such as a Date filter, is OK since overwriting the filter is not a big deal. However, setting up a hidden filter with the customer identifier to limit data to that customer is not. This code must execute on the server.

Power BI Pro Storage Quota

Although Power BI has been evolving for almost five years now, basic concepts are sometimes worth revisiting. Recently, I had a discussion regarding the Power BI Pro storage quota on the Power BI MVP list and I want to share the conclusions confirmed by Microsoft.

For workspaces in shared capacity licensed with Power BI Pro (not a workspace in a Premium capacity):

  1. There is a per-workspace storage limit of 10 GB. So, My Workspace gets 10 GB and so does any org workspace.
  2. There is also an unofficial cross-workspace aggregate quota of 10 GB * the number of Pro User Licenses intended as a backstop to prevent abuse so that a Pro user doesn’t keep on indefinitely creating workspaces to get new chunks of 10 GB. So, if you have 50 Power BI Pro users, the aggregate cross-workspace storage quota would be 500 GB irrespective if only one or multiple Pro users contribute. You won’t see the cross-workspace quota in the Power BI Service UI and it’s not exposed through service code.

BTW, you should ignore the “Manage data storage in Power BI workspaces” document until it’s been updated (the current timestamp is 12/20/2018). As it stands, this document contains wrong and incomplete information. For example, sharing datasets, reports, dashboards should have no effect on the workspace storage quota for consumers.

Atlanta MS BI and Power BI Group Meeting on January 6th

MS BI fans, join us for the next Atlanta MS BI and Power BI Group meeting on January 6th, Mondayat 6:30 PM at the Microsoft office in Alpharetta. I’ll introduce to Power BI Premium Automated Machine Learning (AutoML). Prologika will sponsor the meeting. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:Power BI Automated Machine Learning (AutoML)
Date:January 6th, 2020
Time6:30 – 8:30 PM ET
Place:Microsoft Office (Alpharetta)
8000 Avalon Boulevard Suite 900
Alpharetta, GA 30009
Overview:With the growing demand for predictive analytics, Automated Machine Learning (AutoML) aims to simplify this process and democratize Machine Learning so business users can create their own basic predictive models. Join this presentation to learn how to apply AutoML in Power BI Premium to predict the customer probability to purchase a product. I’ll show you the end-to-end AutoML process, including:

·       Create a dataflow

·       Choose a field to predict

·       Choose a model type

·       Select input variables (features)

·       Train the model

·       Apply the model to new data

·       Bonus: Integrate Power BI with AzureML

Speaker:Through his Atlanta-based company Prologika (https://prologika.com), a Microsoft Gold Partner in Data Analytics, Teo Lachev helps organizations make sense of their most valuable asset: their data. His strategy formulation, trusted advisory and mentoring, design and implementation services empower clients to apply effectively data analytics in order to understand, improve, and transform their business processes. Teo has authored and co-authored several books on organizational and self-service data analytics, and he has been leading the Atlanta Microsoft BI and Power BI group since he founded it in 2010. Teo has been a Microsoft Most Valued Professional (MVP) Data Platform since 2004.
Sponsor:Prologika is one of the most trusted names in Data Analytics. Our clients, from small businesses to Fortune 100 enterprises, derive tremendous value from our services. Our mission is to help organizations make sense of data by applying the latest technologies for descriptive and predictive analytics and get actionable insights. Your organization will spend less time mining for information and be better equipped to make sound business decisions. https://prologika.com

PowerBILogo

Applied Power BI Book (5th Edition)

I’m excited to announce the fifth edition of my Applied Microsoft Power BI book! When the first edition was published  in January 2016, it was the first Power BI book at that time and it had less than 300 pages.  Since then, I helped many companies adopt or transition to Power BI and taught hundreds of students. It’s been a great experience to witness the momentum surrounding Power BI and how the tool has matured over time. As a result, the book also got thicker and it now stands at 528 pages. However, I believe what’s more important is that this book provides systematic, yet dependent, view by showing what Power BI can do for four types of users (business users, analysts, pros, and developers).

To my understanding, this is the only Power BI book that gets annual revisions to keep it up to date with this ever changing technology! Because I had to draw a line somewhere, Applied Microsoft Power BI (5th edition) covers all features that are in preview or released by December 2019. As with my previous books, I’m committed to help my readers with book-related questions and welcome all feedback on the book discussion forum on the book page. While you are there, feel free to check out the book resources (sample chapter, front matter, and more). I also encourage you to follow my blog at https://prologika.com/blog and subscribing to my newsletter at https://prologika.com to stay on the Power BI latest.

Bring your data to life for the forth time! Keep on reading, learning, and Happy New Year!

apbi5

Prologika Newsletter Winter 2019

Happy Holidays! I hope you’re enjoying this special time of the year. A few months ago, I did an assessment for a large company that was advised by an undisclosed source that they should use their Dynamics Financials and Operations (F&O) system as a data warehouse. Recently, I came across a similar wish but this time to use SAP as EDW. Can we do this? But before I give you my opinion, I’m excited to announce the availability of the fifth edition of my “Applied Microsoft Power BI” book – the only Power BI book that it’s updated every year to keep it up with the ever-changing world of Power BI and the Microsoft Data Platform! The book is making slowly its way to the retailers and it should be available on Amazon in the first days of 2020.

Operational Reporting

I understand that everyone wants to do more with less and shortcuts are tempting. But ERP systems are systems of record, just like any other data source. True, they could own most of the core data that you need for analytics. But that data is normalized and stored in a format that’s not conducive for analytics. To make things worse, Dynamics doesn’t even give you direct access to its SQL Server database on your production instance. You must go through REST APIs or export data to gain access to it. And to add new tables, you must create entities in Visual Studio! Still want to build a data warehouse in Dynamics?

ERP systems typically have some reporting features, but these features typically deliver only operational reporting. Operational reporting has a narrow view concerned with “now”, such as a report that shows customers with outstanding balances as of today. For example, Dynamics comes with standard SSRS reports. You could also enable analytical workspaces that deliver reports via Power BI Embedded. These reports, however, are operational reports. By contrast, BI is mostly concerned with historical and trend analysis.

BI Axioms

In math, axioms are statements that are assumed to be correct without a proof. We need BI axioms and the list can start like this:

  • Every mid to large company shall have a centralized data repository for consolidating trusted data that is optimized for reporting. The necessity for such a repository is in a direct correlation with the number of the data sources that must be integrated (that number will increase over time) and the complexity of the data transformation. The centralized data repository is commonly referred to as a data warehouse.
  • Dimensional modeling shall be the methodology to design the data warehouse schema. I know it’s tempting to declare your ODS as a data warehouse, but highly normalized schemas are not suitable for reporting.
  • If you’re after a single version of the truth, you shall have an organizational semantic layer. Find why you need it in my “WHY SEMANTIC LAYER?” newsletter.
  • ERP systems are not a replacement for a data warehouse. Neither are data lakes and Big Data.
  • You shall have both organizational and self-service BI, and they should complement each other. If you lean too much toward organization BI, you’ll get a backlog of requirements. If you lean too much toward self-service BI, you’ll end up with fragmented “spreadmarts”, which is where you probably started.
  • Most of the BI effort shall go toward organizational BI to integrate data, improve data quality, and centralize business calculations. Tools come and go but this effort shall endure.
  • Agile and managed self-service BI shall fill in the gaps. It should provide a feedback loop to extend organizational BI with data that the entire organization can benefit from.

 


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Gold Partner | Data Analytics

logo

Power Platform World Tour

Organized by Microsoft and Dynamic Communities, the Power Platform World Tour will take a place in Atlanta from 2/10-2/12, 2020. I’m teaching Power BI Dashboard in a Day (DIAD) on Feb 10 for a full day. Although this is a paid event ($599), you should get a great business value as the audience will probably be smaller and I’ll be able to provide more personal attention. Then, I’ll present “Bridge Analytics and Developer Worlds with Power Platform” on Feb 12 and show how Power BI can integrate with Power Apps to allow you to change the data behind a report.

Power BI Incremental Refresh

Power BI incremental refresh (a Power BI Premium feature) refreshes a subset of a table with imported data. The main goal is to reduce the refresh time so that new data becomes available online faster. Patrick LeBlanc has a great video about how to make the incremental refresh even more incremental by using the “Detect data changes” feature and he explains in detail how it works.

What if you want to fully refresh the dataset set up for incremental refresh? For example, you configure a table for incremental refresh periodically, but you want to fully process the dataset nightly, such as to pick the latest changes to dimensions. Currently, the only option to fully refresh the dataset with an incremental refresh policy is to republish the dataset and refresh it (this works because the first refresh is always full). When the XMLA endpoint becomes writeable, you’ll have the option to do so in an XMLA script. For example, the following script fully refreshes the InternetSales table without applying the refresh policy settings. Notice also the effectiveDate setting that allows you to overwrite the current date for testing purposes.

{

“refresh”: {

“type”: “full“,

applyRefreshPolicy“: false,

“effectiveDate”: “10/24/2019”,

“objects”: [

{

“database”: “AdventureWorks”,

“table”: “InternetSales”

} ] } }