Load Testing Tabular

I while back I did a TechEd presentation “Can Your BI Solution Scale?”, when I discussed a methodology for load testing SSAS and SSRS. A customer wanted to ensure that its Tabular model can scale to thousands of deployed users when it goes live.

You can still use the excellent Microsoft-originated AS Load Sim framework that I demonstrated in the presentation to load test Tabular. And you can use it can send both MDX and DAX queries.

One aspect that deserves more attention is how to tweak the framework to parameterize DAX queries. The framework was design to parameterize MDX queries with tupples. For example, if you want to parameterize an MDX query by month, you can specify the set NonEmpty( [Date].[Calendar].[Month].Members, [Measures].[Internet Sales Amount] ). Then, the framework executes the set and assigns tupples from the set in random so you don’t just get cached results from the same query.

However, you need to make a small change to the framework to parameterize DAX queries. Because DAX queries doesn’t support the MDX UniqueName syntax for filtering, you can’t parse the UniqueName of the tupple member to extract only the name. Instead, you can use the DAX MID function for this purpose. For example, if I want to filter the Customer[Customer Name] column on the actual name, e.g. Acme, you can use the following expression:

Customer[Customer Name] = MID(“([Customer].[Customer Name].&[Acme])”, SEARCH(“&[“, “([Customer].[Customer Name].&[Acme])”) + 2, SEARCH(“])”, “([Customer].[Customer Name].&[Acme])”) – SEARCH(“&[“, “([Customer].[Customer Name].&[Acme])”) – 2)

Basically, this expression extracts the string “Acme” from ([Customer].[Customer Name].&[Acme]). Since, the customer names will vary. it’s a generic and a rather convoluted expression to extract a string surrounded by “&[” and “])”.

041716_2018_LoadTesting1.jpg

SSRS UX Changes in SQL Server 2016

SQL Server 2016 RC3 (last and feature complete RC) just came out for public review. It includes a couple of interesting UX enhancements. The first one is more of a teaser but shows you that Microsoft is committed to fulfill and go beyond its reporting roadmap. SSRS in native mode plays a central role in this roadmap as the on-premises BI reporting platform.

The new portal (the old Report Manager portal is gone BTW) now includes sections if you upload Power BI Desktop files and even Excel workbooks! For SQL Server 2016 RTM timeframe, clicking a file of these two types simply opens it on the client with the corresponding application (Power BI Desktop for PBIX files and Excel for Excel workbooks). So, no embedded web rendering yet but I guess these features won’t be there if Microsoft isn’t prepared to travel the full distance after RTM.

041616_2211_UXChangesin1.png

Second, we now have branding of the portal and mobile reports, as Chris Finlan explains in his “How to create a custom brand package for Reporting Services with SQL Server 2016” blog.

041616_2211_UXChangesin2.png

Why Business Like Yours Choose Power BI Over Sisense

As Power BI gains a momentum, expect attacks from vendors to intensify. Do you know that there are thousands of vendors offering BI tools! There is not a month passed by when I’m not asked about some cool vendor. I usually don’t criticize other vendors but sometimes I get provoked by their audacity and I need to keep ’em honest. Recently, a customer shared a Sisense whitepaper “Why Business Like Yours Choose Sisense over Power BI” and asked about my thoughts. The whitepaper is not published yet but I guess it will be soon as Sisense has deployed another battle card “Why Business Like Yours Choose Sisense over QlikView” that’s already in the open. Overall, Sisense appears to be a just another pure self-service BI player that it’s trying to aggressively get noticed  and refuses to see further then its nose. Judging by their mantra on YouTube and elsewhere, data warehousing is dead, OLAP is dead, star schema is dead, as well as pretty much everything else except Sisense. In their own words:

“DO I NEED TO BUILD A DATA WAREHOUSE?”
Absolutely not! Data warehouses are one of the most notorious projects associated with BI tools. That’s exactly what we have vowed to eliminate. We use an in-memory columnar database that automatically connects to your data and builds everything for you. You do not need to worry about a complex data modeling or performance. You just say which data you want to add, and Sisense does the rest.

Dream come true? Actually, nothing new here despite their Don Quixote’s rhetoric. If your BI solution can be done just by joining a bunch of tables, you can do it with any self-service BI tool, as folks have done for many years using Excel, Access, and for the past decade other self-service BI oriented tools. A tool that allows you to just import more data doesn’t solve the inherent problems of self-service BI. Complex data transformation, automation, and consolidation requires a centralized repository. When done right and if you need it, implementing DW shouldn’t be risky, and it should yield a nice ROI together with a true single version of truth. As I said many times, any vendor or a consulting firm that forces you in a particular methodology (pure self-service BI in this case), is just trying to score points and it shouldn’t be taken seriously. There isn’t one-size-fits-all tool or methodology when it comes to data analytics.

Let’s take a look at some of the statements that Sisense makes about Power BI and Sisense offerings. I actually installed their 15-day trial and did some limited testing. Mind you that their whitepaper is limited to comparing the self-service aspect of Power BI with Sisense because they don’t have organizational BI solution (which of course they hold in disdain). So, we’re comparing just Power BI Desktop self-service models and PowerBI Service with Sisense ElastiCube and dashboards. Let’s review some of the claims Sisense makes:

  1. “Power BI is a good out the box tool for simple data analysis, but if you need to analyze larger and more complex data you will likely need to invest in a costly high performance data store. Why? Because of the MS data and technology limitations to query larger or complex data sets, Power BI requires a direct query to the data. If you want the query to run fast or to scale, their lack of a high performance data engine creates 3 issues:
    1. You need expensive technical skills to get the data properly prepared for analysis in the data store.
    2. If you do not have a powerful data store, you need to invest in it.
    3. Business will be more reliant on IT to prepare data, which will create more, not less overhead and longer time to value.”

    Sisense is exalting the virtues of ElastiCube as superior to xVelocity (the memory engine behind Power BI Desktop/Power Pivot, Tabular and SQL Server columnstore indexes). SiSense ElastiCube is a proprietary multidimensional storage that is only accessible by Sisense dashboards. Think of ElastiCube as SSAS Tabular but simplified to target business users. Of course, simplification comes at expense of features. However, its storage is disk first, memory second, and it brings data in memory on demand (so think of it as a hybrid between Multidimensional and Tabular). Sisense claims that this architecture is highly scalable and superior to both OLAP and in-memory columnar databases. However, similar to the SSAS default cached storage, ElastiCube requires that all the data must be imported first. I don’t see a pass-through configuration where ElastiCube can pass queries to a fast database. Direct Query, of course, is mentioned as a limitation of Power BI while it should be the other way around. As far as scaling without a backend data store that is sanctioned by IT, I kept on asking myself “How many business users out there have access to billions of rows?” (that need to be imported, mind you!). To Sisense’s point, it’s true that PowerBI.com now limits the datasets you upload to 250 MB (expect this limitation to be lifted) but this 250 MB still allows you to pack a few million rows because data is highly compressed. Anyway, if your business users need to import billions of rows without IT getting involved and data doesn’t require extensive transformations, then Sisense might be worth a try but you probably shouldn’t be on the self-service bandwagon with these data volumes to start with. As far as complex data, I found Sisense to be no match to xVelocity and DAX as you’ll quickly find out when you start modeling with Sisense ElastiCube Manager and look at the limited calculations and relationship options the tool supports.
    UPDATE 5/16/2016 – Microsoft increased the max file size to 1 GB; UPDATE 7/115/2017 – Power BI Premium has further increased the dataset size to 10 GB.

  2. Dashboard filtering in Power BI – “Limited access pre-defined by dashboard creator using ‘slicer’ widgets”. Power BI supports visual-level, page-level, and report-level filters which Sisense obviously missed. Power BI support basic and advanced filtering modes. Interestingly, when I played with Sisense, their filtering options filters data before it’s aggregated.
  3. Widget drilldown in Power BI – “Only supported in Power BI Desktop, pre-defined by dashboard creator”. Another wrong statement. Users can create ad-hoc reports and they have the same reporting capabilities as in Power BI Desktop.
  4. “To avoid direct queries against data, Power BI Desktop uses a Memory intensive data engine with some data compression – this has all the disadvantages of the in-memory approach relating to performance limitations (you can’t get large data sets into memory) and cost to scale (memory is expensive).” – This goes back to the first point. I’m yet to see business users who are given rights and authority to analyze billions of rows, not to mention the performance implication of importing such a enormous dataset (the only option supported by ElastiCube). And, Power BI compresses everything so the statement “some data compression” is technically inaccurate.
  5. “Applying changes to an existing data model, for example adding or editing a column is like starting from scratch as the model will have to do all the data import and transformations again – very time consuming” – Nope. The Power BI engine is smart enough to apply the minimum processing. If you add a column, it will reload only that table. Not sure what Sisense means by “editing”, since the data is read-only but renaming the column is only a metadata operation and it’s very quick.
  6. “The Query Editor has a wizard feel, but it somewhat complex and clunky” – What’s the Sisense alternative that is not complex and clunky for data transformations by business users? Power Query resonates very well with business users and I don’t agree with this statement but beauty is the eye of the beholder.
  7. “In practice, if analytics are to be done on a larger, more complex data set, much care must be taken to pre-aggregate and clean the data to fit into the data size limitations.” – I don’t see how Sisense would address more complex data. More complex data would probably require an organizational BI solution which they don’t support.
  8. “However as data complexity and requirements generally grow as users’ appetites for more analytics and intelligent dashboards, problems will quickly arise due to the strict data limitations of Power BI.” Would you want your business users to manage complex solutions? Are they actually capable of doing so? If they are, Microsoft gives you a nice continuum where you can move the solution to a dedicated server. True, the modeler needs to learn a few more tricks and get out of Power BI Desktop/Excel and into Visual Studio, and gain many more features than Sisense provides.
  9. “Sisense can deploy to a cloud hosted service, or to an on premise server, while Power BI currently only offers a cloud hosted solution for sharing.” From here, we learn that Sisense Cloud is actually VM-hosted. So, no PaaS. You still have to configure it, manage it, license it, etc.
  10. “You will lose the ability to perform analytics on larger data sets, and will need to make decisions to pre-aggregate data in a data warehouse, or drop portions of the dataset in order to adhere to the data size limits.” We typically don’t pre-aggregate data as we have efficient backend technologies to aggregate it for us.
  11. “In Sisense, you will have a much more flexible, scalable solution that can be maintained much more easily by less technical resources.” – This goes back into the self-service mantra. Large, complex, and scalable solutions are often needed and it’s too much to ask of business users to tackle them.

Of course, Sisense leaves out many other points and features Power BI excels. Q&A? Quick Insights? Excel integration? Security? Governance? Sharing? Speed of development? Real-time BI? Machine Learning? Pricing – for this, you need to call Sisense.

In summary, why Business Like Yours Choose Power BI Over Sisense? Because Sisense is two quadrants behind (it made the Magic Quadrant this year). Read the latest Gartner report to find what Gartner thinks about Sisense (tip: you can download the report from the Sisense site).

Granting Publishing Rights to Power BI Enterprise Gateway

As I explained in a previous blog, the Power BI Enterprise Gateway allows Power BI reports to connect to on-premise data source. A gateway can support multiple data sources. When setting up a data source, you’ll see the Users tab on the data source properties. The users you add here will have rights to publish reports that can connect to a data source serviced by the gateway. This is another security check that Microsoft implemented to limit the number of people who can expose data via the gateway.

If a user is not in the Users tab, the user can publish a Power BI Desktop file to My Workspace (or to another Power BI workspace if it has rights) but the user won’t be able to view reports that connect to a gateway data source. When the user attempts to do so, Power BI will show an error that it can’t connect to the gateway but unfortunately it won’t tell you why. Another gotcha is that even though you might be an admin on the gateway, you won’t be able to view reports you publish unless you are added to the Users tab. I think Power BI just adds the original gateway admin but it doesn’t add users that are subsequently added as gateway administrators.

So, if you face obscure gateway connectivity errors, check the Users tab. In the screenshot below, only I have rights to publish reports that will connect to the gateway.

Question: How do I specify which gateway to use when I create a Power BI Desktop model.

Answer: You never specify a gateway that the report will use. You connect to the data source as you normally would, e.g. you specify the SSAS instance name as you would when connecting to it in Excel. Once you publish the Power BI Desktop file to powerbi.com and a user views the report, Power BI figures which gateway to use. So, gateways are transparent to both report author and viewer. This is good because gateways can be removed or multiple gateways can service the same data source to make it highly available.

040816_1904_GrantingPub1.png

Power BI Embedded

Embedding reports is an extremely popular scenario for ISVs and developers coding external (customer-facing) applications. As I wrote a while back in my “Power BI Embedded Dashboards Without Authentication UI” blog, Power BI supports REST APIs that allow developers to embed dashboards and reports. However, these APIs don’t support custom security so you have to provision users with Power BI. Furthermore, a hybrid architecture (reports definitions in the cloud and data on premises) requires Power BI Pro license for each user. This pricing model could quickly become overly expensive if you have to onboard hundreds of users.

Power BI Embedded, available for preview on April 1st, aims to remove these obstacles. Designed as an Azure service, it doesn’t require changes to the application security. For example, if your application uses Forms Authentication, users can still continue logging in using a user name and password. The application then calls the Azure APIs to obtain an authorization token that is passed onto Power BI. Once the user is authenticated, the app uses the Power BI REST APIs to embed Power BI content. The other benefit from the Azure integration that the application developer no longer have to work with OAuth API to handle security, as explained in more details here. Power BI Embedded also introduces a new licensing model, where you’re priced per the number of dashboard and reports views that your users render instead of by user. Notice that the licensing terms state that “you may use the Power BI Embedded service within an application you develop only if your application (1) adds primary and significant functionality to our [Power BI] service and is not primarily a substitute for any Power BI service, and (2) is provided solely for external users. You may not use the Power BI Embedded service within internal business applications”.

On the downside, the preview doesn’t support refreshing imported Power BI Desktop models. As far as direct connectivity, the preview is currently limited to Microsoft Azure data sources that support basic security (Azure SQL, Azure SQL DW, and HD Insight Spark). So, no support for SSAS yet as SSAS is not available (yet) as PaaS. This limitation also prevents implementing multi-tenant solutions (a must for most ISVs), where the user is authorized to see only a subset of data. Microsoft has provided a sample ASP.NET MVC app and excellent step-by-step documentation to help you get started. Below is a snapshot of the app, which I customized to display embedded custom reports that are demonstrated in the Prologika Power BI showcase.

Power BI Embedded is the missing piece that many ISVs need to integrate interactive Power BI reports and dashboards in their offerings. Although still lacking in features, Power BI Embedded has a bright future.

040716_1237_PowerBIEmbe1.png

 

Power BI SandDance Visual

One of the announcements from the Data Insights Summit was the SandDance custom visual. Originating from Microsoft Research and coded in SVG, it not only allows you to visually explore data in versatile ways but it also demonstrates how far your custom visuals can go. This is a super visual that combines multiple visualizations, including column chart, grid, scatter chart, density chart, stack chart, and squarify chart (similar to Treemap) visualization. It also demonstrates animations and storytelling with data. You can test the visual outside Power BI with some predefined datasets by going to https://sanddance.azurewebsites.net. Or, you can download it from the Power BI Gallery and try it with your data in Power BI Desktop and Power BI service.

When you compare visualization tools, pay attention to how open their capabilities are. Power BI provides several extensibility features. Custom visuals let any web developer extend the Power BI visualization features with “widgets” that can leverage popular visualization frameworks, such D3.js and SVG. Do other vendors let you do this?

Power BI Measure Dimensions

I had an inquiry about how to implement in Power BI/Power Pivot/Tabular something similar to the new Level of Detail (LOD) Expressions feature in Tableau 9.0. A more generic question would be how to turn a measure into a dimension so that you can analyze your data by the distinct values of the measure. Now, in my opinion, most real-life requirements would go beyond just using the distinct values. For example, the first sample Tableau report demonstrates a banding example. But what if you want to group measure values into “buckets”, e.g. 1, 2, 3, 4, 5-10, 10-20, etc?

Fortunately, DAX has supported powerful expressions since its very beginning. To make the code more compact, the example below uses DAX variables, which were introduced in Power BI Desktop and Excel 2016. However, you don’t have to use variables if you use a lower version of Excel. The sample model (attached) has three tables: FactInternetSales, DimCustomer, and DimDate. The scenario is to analyze data by a NumberOrders band that discretizes the number of orders the customers have placed over time. The trick is to add a calculated column to DimCustomer which has the following DAX formula:

NumberOrders =
VAR SumOrders =
CALCULATE ( DISTINCTCOUNT ( FactInternetSales[SalesOrderNumber] ) )
RETURN
SWITCH (
TRUE (),
SumOrders <= 4, FORMAT ( SumOrders, “General Number” ),
SumOrders >= 5 && SumOrders <= 10, “5-10”,
SumOrders > 10, “More than 10”
)

This expression defines a SumOrders variable whose expression calculates the number of orders each customer has in FactInternetSales. Because the grain of FactInternetSales is a sales order line item, we need to use DISTINCTCOUNT. Then, for more compact syntax, instead of using a bunch of nested IF functions, we use the SWITCH function to evaluate the value of the SumOrders variable and create the bands. This is where the variable comes handy. If you don’t have a variable, you need to create a separate measure for CALCULATE ( DISTINCTCOUNT ( FactInternetSales[SalesOrderNumber] ) ) and use this measure so you don’t repeat the same formula in the SWITCH statement. Variables also has the advantage of evaluating the expression once so the expressions that reference them should perform better.

Now we can have a report that uses the measure dimension, such as to show sales by the number of orders.

measure_dimensions

 

Atlanta MS BI Group Meeting on Mar 28th

Atlanta MS BI fans, join me for the next Atlanta MS BI Group meeting on Mar 28th and 6:30 PM. Luis Figueroa will show us how to do cloud ETL with Azure Data Factory. Allegient will sponsor the event. Don’t forget to register at atlantabi.sqlpass.org.

ate this meetinghttp://aka.ms/PUGSurvey, PUG ID: 104
Presentation:A Closer Look at Azure Data Factory
Level: Intermediate
Date:Monday, March 28th, 2016
Time6:30 – 8:30 PM ET
Place:South Terraces Building (Auditorium Room)

115 Perimeter Center Place

Atlanta, GA 30346

Overview:Azure Data Factory is a modern data orchestration platform design to help you manage any data of any size from any source. It also makes a number of technologies available to you such as Azure Machine learning so that you can integrate predictive machine learning models in your data pipelines. This platform enables you to create intelligent applications that not only provide insights but can also recommend actions upon specific events. Join me in this session as we take a closer look at many of the capabilities built into the platform.
Speaker:Luis Figueroa serves as the Practice Lead for the Data Platform and Analytics division at Allegient, LLC. Luis is focused on delivering business solutions that leverage the Microsoft ecosystem of products. His areas of expertise are Data Warehousing, Business Intelligence, Big Data and Advanced Analytics. With 10+ years of experience in the field, Luis has delivered data management solutions for companies across many industry verticals. Luis is also a Microsoft P-TSP and chapter leader of the PASS Data Science Virtual Chapter.
Sponsor:Allegient is a leading IT consulting firm that specializes in translating business objectives into technology solutions. We leverage proven technologies and methodologies that best align with your organization’s needs, resources and timeline. Focus areas include Business Process Management, Business Intelligence, Collaboration, Relationship Management (CRM/xRM), Custom Application Development & Sustainment, and IT Service/Project Management.

Power BI Hybrid Architecture

A unique and tremendously useful Power BI feature allows you to implement hybrid solutions where your data remains on premises (SSAS, SQL Server, Oracle, and other data structures) but only report and dashboard definitions are deployed to powerbi.com. This hybrid architecture gives you the best of both worlds: cutting edge cloud-based visualizations with on-premises security, scale, and single version of truth. Since no cloud application can connect directly to your network, a connectivity software, called Power BI Enterprise Management Gateway, acts as a bridge between powerbi.com and your data sources.

I’m wrapping up a Power BI Hybrid Architecture proof of concept (POC) for a large organization and I want to recap some important lessons learned:

  1. I’m really astonished by the roundtrip speed! It’s remarkable that the round trip completes so fast given that powerbi.com (a data center in Virginia for East Cost I believe) connects to a data center in Atlanta. Granted, not much goes across the wire. Power BI sends a DAX query and SSAS sends summarized results. Still, latency is not an issue. As another customer puts it, “the Power BI reports have better performance than our on-premises SharePoint reports”.
  2. Power BI Enterprise Gateway uses the Azure Secure Service Bus relay connection. What this means for you is that you don’t have to open any incoming ports on your company’s firewall. When the gateway establishes an outbound connection, it scans the available outbound ports. If the only port open is 443 (HTTPS), then it uses this port. Because port 443 might be heavily used, you can open outbound ports 5671-5672 and 9350-9354 for better performance. However, I didn’t see any performance degradation with port 443. This is another remarkable performance observation given that this organization has thousands of computers that use this port when connecting to Internet.
  3. The only setup snag I encountered was this organization uses a proxy that is configured for Basic Authentication only and gateway failed to connect on a first try. I had to create a .NET assembly that returns the credentials of an account with permissions to the proxy using the module configuration element. If your organization doesn’t use a proxy or it does, but the proxy supports NTLM security, you won’t have this issue.
  4. The gateway can be installed on the same machine as the data source or another machine. I recommend installing it on a different machine for an additional level of security. Since one gateway can access multiple data sources, a dedicated gateway server makes more sense. Currently, the gateway service is not fault tolerant but you can install another gateway instance on another machine that services the data sources that need to be highly available.
  5. By default, the gateway runs under a NT SERVICE\PBUEgwService account that the gateway setup creates. You can use another Windows account of course. It needs to have a “Log on as service” right and it needs to be a member of the Performance Log Users local group. The gateway should have no issue running as another low-privileged account, such as LocalService.
  6. Besides the setup requirements and limitations when connecting to SSAS, creating a data source requires hardcoding the credentials of a Windows account that has admin rights to SSAS to delegate the user identity via EffectiveUserName so that SSAS data security works. This might present an issue if your organization has a password expiration policy for service accounts (typically this is not the case bus this organization has to comply to regulations and customer service contracts). I suggested to Microsoft to support another option that would allow connecting to SSAS using the gateway service account via NTLM security. This would allow you to use a managed service account for the gateway service whose password expiration is managed by the domain controller.

powerbi-hybrid

Tabular Display Folders

A welcome usability enhancement in SQL Server 2016 Tabular is the ability to configure display folders to organize table fields and measures in logical folders. Previously, you had to use BIDS Helper or DAX Editor (for measures) to implement display folders. Starting with SQL Server 2016, you can just set the Display Folder property of a measure or a field.

This adds the display folder to Fields List in Excel and Power BI Desktop.