Power BI Large Datasets: The Good, the Bad, and the Ugly

At Ignite 2019 Microsoft announced the public preview of large datasets in Power BI Premium. This is a significant milestone as now datasets can grow up to the capacity’s maximum memory (previously, the max size was 10 GB with P3 plan), thus opening the possibility of deploying organizational semantic models to Power BI. I consider this feature mostly suitable for organizational BI as I don’t imagine business users dealing with such large data volumes. I tested large datasets during its private preview, and I’d like to share some notes.

The Good

Today, BI developers can deploy organizational semantic models to three Analysis Services Tabular SKUs: SQL Server Analysis Services, Azure Analysis Services, and now Power BI Premium. SQL Server Analysis Services is the Microsoft on-prem offering and it aligns with the SQL Server release schedule. Traditionally, Azure Analysis Services has been the choice for cloud (PaaS) deployments. However, caught in the middle between SQL Server and Power BI, the AAS future is now uncertain given that Microsoft wants to make Power BI as your one-stop destination for all your BI needs. From a strategic perspective, it makes sense to consider Power BI Premium for deploying organizational semantic models because of the following main benefits:

  • Always on the latest – Both AAS and SQL Server lag in features compared to Power BI Premium. For example, composite models and aggregations are not in SQL Server 2019 and Azure Analysis Services. By deploying to Power BI, which is also powered by Analysis Services Tabular, your models will always be on the latest and greatest.
  • Feature parity – As I explain in my “Power BI Feature Discrepancies for Data Acquisition” blog, some Power BI features, such as Quick Insights, Explain Increase/Decrease, Power Query, are not supported with live connections to Analysis Services. By hosting your models in Power BI Premiums, these features are now supported because Power BI owns the data, just like you import data in Power BI Desktop and then publish the model.

The Bad

As a Power BI Premium feature, large datasets will require planning and purchasing a premium capacity. Given that you need at least twice the memory to fully process a model (less memory should be required if you process incrementally), you must size accordingly. For example, a 15 GB model would require at least 30 GB of memory to fully process, bringing you into the P2 plan territory. Memory is the most important constraint for Tabular. Unlike SQL Server, which doesn’t license by memory (you can add as much memory you like without paying a dime more in licensing fees), Power BI Power BI Premium plans cap the capacity memory. So, you’ll end up having a dedicated P1 or P2 plan for hosting your organizational semantic model, and another P plan(s) for self-service BI.

I’d like to see elastic scaling happening to Power BI Premium at some point in future. Instead of boxing me into a specific plan, which more than likely will be underutilized, I’d like to see Power BI Premium scaling up and down on demand. This should help lowering the cost.

The Ugly

The lack of DevOps in Power BI Premium will put another hole into your budget. Unlike SQL Server, where you pay only for production use, no special DEV or QA environments and licensing options exist in Power BI Premium. So, you must plan for additional premium capacities, such as for three separate capacities: PROD, DEV, and QA (I know of organizations that need many more DevOps environments). At this price point, even large organizations will reconsider the licensing cost of hosting their models in Power BI. How about leaving QA and DEV on prem? This would require coding for the least common denominator which defeats the benefit of deploying to Power BI Premium. You can get innovative and attempt to reduce licensing cost by purchasing Azure A plans for DEV and QA and stopping the A capacities when they are not in use, but I wonder how many organizations will be willing to go through the pain of doing this. The Cloud should make things easier, right?

Large datasets will open another deployment option for hosting organizational semantic models. This might be an attractive option for some organizations and ISVs. Others will find that staying on-prem could lower their licensing cost. Once the Power BI Premium XMLA endpoint supports write operations (promised for December 2019 in the roadmap), BI developers can use a tool of their choice, such as Tabular Editor or Visual Studio (I personally find Power BI Desktop not suitable for organizational model development, mainly because of its slow performance, lack of source control and extensibility) to develop and deploy semantic models that are always on the latest features and unifying BI on a single platform: Power BI.

SQL Server 2019 Installation Woes

Now that SQL Server 2019 is officially here, I was eager to try it out. My upgrade/reinstall experience ran into several issues that I thought might be worth sharing:

  1. Upgrade/reinstall fails with “An error occurred for a dependency of the feature causing the setup process for the feature to fail.” With no indication what dependency failed. I solved it by extracting the iso file into a folder instead of mounting it.
  2. Analysis Services Tabular fails to start with “An error occurred when loading the ‘ASSP’, from the file, ‘\\?\D:\MSSQLSERVER\SSAS\Data\ASSP.0.asm.xml’.” This error was caused by changing the Analysis Services Tabular default folders to another drive. It could be related to my setup, such as permissions granted to that drive. I fixed it by leaving the SQL Server and Analysis Services default folders.
  3. The Polybase services can’t start and show perpetually “Starting” in the Windows Services applet. Consequently, the SQL Server Database Engine can’t be stopped. This was solved by enabling the TCP/IP protocol in the SQL Server Configuration Manager.
  4. Still an open issue although not a blocker – Any right-click action in SSMS connected to Analysis Services Tabular results in a significant delay.

Predict This!

My wife bought a pack of replacement water filters from Amazon. It was tagged as “Amazon’s choice”. The product listing showed the manufacturer name and it had a nice product photo advertising genuine filters. Except that there were all fake, which we discovered quickly by the output water pressure. The water coming out of a filter should have lower pressure while there was no difference with the “genuine” filter as though there was no filtering going on at all. And the water had a bad aftertaste. So, we call the manufacturer. They compared the batch number from the package (manufactured in China) and found it fake. That filter could have had some Chinese poison in it and Amazon would have sold it under “Amazon’s choice”. BTW, when we reported this to Amazon, the product was listed under a different seller.

There is a lot of noise (mostly vendor-induced propaganda as previously with Big Data) around AI, ML, and other catchy flavors of predictive analytics. I have a lot of respect for Amazon and I’m sure they know a lot about ML. Yet, fake products sneak in undetected and bad people find ways to cheat the system. In fact, I don’t think that advanced analytics is needed to solve this problem. All Amazon has to do is let big name manufactures register their approved resellers on the Amazon website. If the seller is not on the list, Amazon could flash a big juicy warning for the buyer. This would be a win-win for Amazon by improving their credibility as a retailer, manufacturers, and buyers. BTW, many manufacturers don’t sell on Amazon because of the exact same reason: counterfeited products that harm the brand.

But Amazon, whose platform’s main goal appears to be making as much money as possible by “democratizing” the retail industry, doesn’t do this. I might not know much about business, but I know that trust is paramount. When corporate greed takes over and their platform tags fake products as “Amazon’s choice”, where is that trust? BTW, I was told by the Amazon’s rep is that I could have a better confidence that a product is genuine if it says that it’s “sold and distributed by Amazon”. Which makes me skeptical because it means that their warehouses must be divided in two areas: one for sold and distributed by Amazon that stores genuine products supposedly procured by Amazon and another for sold by someone else but distributed by Amazon. Somehow, I doubt they do this.

So, Amazon, apply ML to predict the probability of buying a genuine product and show me the confidence score. Perhaps, I’ll buy more…

Atlanta MS BI and Power BI Group Meeting on November 4th

MS BI fans, join us for the next Atlanta MS BI and Power BI Group meeting on November 4, Mondayat 6:30 PM at the Microsoft office in Alpharetta. Andy Lawrence will share best practices for impactful Power BI Dashboards. CCG Analytics will sponsor the meeting. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:Best Practices for Impactful Power BI Dashboards
Date:November 4, 2019
Time6:30 – 8:30 PM ET
Place:Microsoft Office (Alpharetta)
8000 Avalon Boulevard Suite 900
Alpharetta, GA 30009
Overview:Power BI is gaining momentum as a preferred tool for dashboards and interactive reports. Let’s revisit some best practices for dashboard development, such as:

·       The importance of form and function

·       Facilitating user adoption

·       Mistakes that everyone makes

·       Fast shortcuts for clean reports

·       Hidden settings that are lifesavers

·       Best Power BI updates of 2019

·       Live demo of a fast dashboard build

·       Questions

Speaker:Andy Lawrence is a senior Power BI consultant at CCG Analytics and the leader of the Tampa Power BI user group. He’s a Florida native and a proud UF Gator (MBA) and USF Bull (MIS). At CCG he provides guidance on data modeling, tabular environments, azure administration, DAX writing, T-SQL and data visualization best practices. All of which he can expand upon if you have questions during his presentation.
Sponsor:CCG specializes in deploying solutions that not only provide value to the business but are adopted by users ensuring accountability of the IT driven system.  Moving beyond reporting, our Business Intelligence solutions support data governance, quality and standardization across the organization and enable stakeholders with tools like predictive analytics, user-defined alerts, data mining, what-if analysis and visually appealing dashboards.
Prototypes with Pizza“Lineage view” by Teo Lachev

PowerBILogo

Securing Power BI Report Pages

Every now and then I get a question about how to secure Power BI report pages. A business analyst has created a multi-page report that requires limited access to some sensitive information in one or more report pages. I always cringe at this idea and recommend securing the data instead. That’s because report-level or dashboard-level security can be easily compromised. For example, if the user has permissions to use Analyze in Excel, they will surely bypass report security and get access to all the data. The same will happen if they use Report Builder or connect a third-party tool to the published Power BI dataset. Instead, you should apply row-level security (RLS) to protect the data. Currently, Power BI doesn’t let you secure report pages but if you must apply report or visual security, you have options:

  1. Break the report into two set of reports: one with open access and another with “secure” pages. Share the report with the secure pages with the people who can access it or publish it to a separate workspace.
  2. Add a hidden page-level or report-level slicer that filters on a DAX measure. The DAX measure can check the user identity using the USERPRINCIPALNAME() function, so the slicer doesn’t return any values if the interactive user is not authorized. Granted, a better (and preferred) way to achieve this could be to apply RLS but RLS will restrict data to all visuals where as a slicer can filter only selected visuals.

Filtering Power BI Visuals by Measure

Power BI lets you filter the visual data by a measure, such as by a measure that dynamically calculates the customer’s rank. This is convenient but there are performance implications related to the way Power BI autogenerates the DAX query. If a visual has a visual-level filter on a measure, Power BI Desktop defines a variable table filter that includes all measures in the visual. In the following example, the visual filters on the IncludedInRank measure, but the filter table includes all measures in the visual:

VAR __ValueFilterDM1 =
FILTER (
KEEPFILTERS (
SUMMARIZECOLUMNS (
‘Customer'[PARENT_NAME],
‘Customer'[Label_DM],
‘Customer'[VP_NAME],
__DS0FilterTable,
__DS0FilterTable2,
__DS0FilterTable3,
__DS0FilterTable4,
“Vol_Gr_WKND”, ‘NRT_PRR_Parent'[Vol_Gr_WKND],
“Vol_Gr_WE_Mon”, ‘NRT_PRR_Parent'[Vol_Gr_WE-Mon],
“Vol_Gr_WE_Tue”, ‘NRT_PRR_Parent'[Vol_Gr_WE-Tue],
“Vol_Gr_WE_Wed”, ‘NRT_PRR_Parent'[Vol_Gr_WE-Wed],
“Vol_Gr_WE_Thu”, ‘NRT_PRR_Parent'[Vol_Gr_WE-Thu],
“Vol_Gr_WE_Fri”, ‘NRT_PRR_Parent'[Vol_Gr_WE-Fri],
“WTD_Gr_ptc”, ‘NRT_PRR_Parent'[WTD_Gr_ptc],
“IncludedInRank”, IGNORE ( ‘NRT_PRR_Parent'[IncludedInRank] )
)
),
[IncludedInRank] = 1
)

As a result, these measures are evaluated twice: once in the filter table and twice in the main query (EVALUATE clause). This could definitely impact performance if the measures are slow to begin with. While waiting on Microsoft to provide more control over the autogenerated query, one approach to mitigate the performance impact is to use the Power BI-provided TOPN filter. In this example, instead of sorting and filtering on IncludedInRank to find the top N customers, you would apply a TOPN filter on Customer[Parent Name] based on the IncludedInRank measure. When TOPN is used, the autogenerated filter table will filter only on the measure used for TOPN, thus avoiding evaluating the rest of the measures.

Staying Cool

A discussion overheard between two attendees at SQLSaturday yesterday:

  • What is this Tabular thing?
  • Like a cube but better.
  • How’s it better?
  • It’s in memory.
  • I wonder if it can improve the performance of my report…
  • It should, it’s also in the Cloud.

Note to myself. Use “in memory” and “cloud” more often.

Presenting at SQL Saturday Atlanta (BI Edition)

SQL Saturday Atlanta (BI Edition) will be held on October 19th. I’ll present a “Power BI Pro, or Premium: that is the question” strategy-oriented session at 12:30.

One question that always pops us during my Power BI assessments for companies planning to adopt Power BI is which SKU to buy. The choice might seem obvious to your organization, but there is much more below the surface. Join this session and learn important considerations that might influence your decision, including:

•    Licensing cost

•    Hosting organizational semantic layers

•    External reporting

•    Features

•    On-premises reporting

 

Atlanta MS BI and Power BI Group Meeting on October 7th

MS BI fans, join us for the next Atlanta MS BI and Power BI Group meeting on October 7, Mondayat 6:30 PM at the Microsoft office in Alpharetta. Qubole will show us how Presto, Azure Data Lake, and Power BI can be used to analyze Big Data. Qubole will sponsor the meeting. For more details, visit our group page and don’t forget to RSVP (fill in the RSVP survey if you’re planning to attend).

Presentation:Leveraging Power BI on Presto for the Azure Data Lake
Date:October 7, 2019
Time6:30 – 8:30 PM ET
Place:Microsoft Office (Alpharetta)
8000 Avalon Boulevard Suite 900
Alpharetta, GA 30009
Overview:Presto is a distributed ANSI SQL engine designed for running interactive analytics queries. Presto outshines other data processing engines when used for business intelligence (BI) or data discovery because of its ability to join terabytes of unstructured and structured data in seconds, or cache queries intermittently for a rapid response upon later runs. Presto can also be used in place of other well-known interactive open-source query engine such as Impala, Hive or traditional SQL data warehouses. Attend this event to learn:

·        Why Presto is better suited for ad-hoc queries than other engines like Apache Spark

·        How to jumpstart analysts across your organization to harness the power of your big data

·        How to generate interactive or ad hoc queries or scheduled reports using Presto

·        Real-world examples of companies using Presto

Speaker:Man Zhang is a Solutions Architect at Qubole. He has 20+ years in software systems architecture, development, and integration and 4+ years in Big Data architecture. https://www.linkedin.com/in/man-zhang-34a887/
Sponsor:Qubole delivers a Self-Service Platform for Big Data Analytics built on Amazon Web Services, Microsoft and Google Clouds. We were started by the team that built and ran Facebook’s Data Service when they founded and authored Apache Hive.  With Qubole, a data scientist can now spin up hundreds of clusters on their public cloud of choice and begin creating ad hoc and/or batch queries in under five minutes and have the system autoscale to the optimal compute levels as needed.  Please feel free to test Qubole Data Services for yourself by clicking “Free Trial”​ on the website.
Prototypes with PizzaTBD

DCI_PowerBI

Prologika Newsletter Fall 2019

090918_1951_DAXVariable1.pngWe all need to share. But until a couple of months ago, a Power BI training or assessment wouldn’t be complete unless I got hammered on the Power BI sharing limitations. Fortunately, Microsoft has addressed most of these and I have now a much better story to tell. And this is the subject of this newsletter.

Quo vadis, Power BI Sharing?

Power BI sharing road has been a long and winding one. I covered the gist in my blogs “Power BI Sharing is Getting Better“, “Power BI Sharing is Getting Better 2“, and “The Power BI Viewer Role“. The following table summarizes prior and standing sharing limitations. The Solution column lists the “fix” by Microsoft.

LimitationSolution
Workspace dependency on Office 365 groupsV2 workspaces don’t depend on O365 groups
No IT oversight on creating workspacesYou can now control who can create workspaces in the Power BI Admin center
Group membership limitationsV2 workspaces support all O365 group types
Coarse content access levelContributor and Viewer roles
No cross workspace sharingA dataset can be shared across workspaces
No data governanceA dataset can be promoted and certified
Power BI Premium sharing with viewers require report sharing or appsThe Viewer role supports sharing with viewers
Only one app supported per workspaceThe Viewer role deemphasizes apps
No nesting support (subfolders)

Sharing Best Practices

Given the current state of Power BI, I’d like to share some best practices for organizing and sharing content:

  1. Don’t use report and dashboard sharing as it can quickly turn into a maintenance nightmare.
  2. Create workspaces to reflect your organizational functional areas, e.g. Sales, HR, Finance. Unfortunately, workspaces still don’t support subfolders (Microsoft hinted that they are working on such a feature), so for now you must resort to a flattened list.
  3. Instead of individual user assignments, add users to security groups and then add these groups as members to the workspace. This way, when the user leaves and company or moves to another department, you only need to change the user’s group membership without making changes to the workspace security policy.
  4. Come up with data governance policy. The Power BI data governance story is work in progress, but you can set up some ground rules. For example, once a contributor believes that a dataset is ready for a broader consumption, he can promote the dataset. Then, IT can verify the dataset and certify it. Currently, Power BI supports certifying only datasets (reports and dashboards aren’t certifiable yet).
  5. This one is hard. Teach data analysts best data modeling practices. Instead of creating a dataset per report, they should create a data model that correctly represents their subject area. A data analyst should create the model once so it can support multiple reports. Once the dataset is published and certified, users can create their own reports.
  6. Although Microsoft is pushing apps very hard and adding some nice features, such as navigation, a standing limitation is that you can create only one app per workspace. So, you can’t publish multiple apps from a workspace, such as to share some reports with one group of users and another set with a different group. Instead, grant access directly to workspaces by using the Contributor and Viewer roles. The Viewer role lifts a significant limitation that forced you to use report/dashboard sharing or apps to share content out of a premium workspace with viewers. If you want to share the entire content of the workspace, you don’t have to use apps. Instead, you can simply add users or groups as viewers to the workspace.

Despite some long standing limitations, Power BI sharing is coming out of age. Follow the above practices and you’ll have now a much better way to organize and share content.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo