Power BI Report Embedding on the Horizon

For years I’ve been harassing Microsoft to support embedding interactive reports connected to SSAS on custom apps. This feature has been sorely missing in the Microsoft BI stack. Yet, it’s a must-have feature that enables a wide range of integration scenarios, such as customer-facing reporting. Imagine your customers being able to ask natural questions for data exploration. This is a killer feature!

It looks like Power BI might finally bring some relief. I while back, I posted an Embedding Reporting feature wish on the Power BI discussion list. It now has 643 votes and it’s one of the most voted for features. Recently, when asked to provide feedback, I communicated to Microsoft how I envision report embedding to work:

  1. Developers must have the ability to embed reports/dashboards in applications. This should be conceptually similar to how they can do this now with ReportViewer. However, instead of an ASP.NET control, I’d imagine REST APIs with jQuery.
  2. The tricky part is security. It is paramount to be able to pass the user identity to on-premises SSAS models. Ideally, we don’t want to ask external users to register in Power BI. Instead, some kind of bulk licensing needs to be supported to allow requests to Power BI to go under a single trusted account but yet pass the user identity to SSAS.
  3. In a nutshell, the external user goes to the customer website and he’s authenticated using whatever authentication mechanism the app uses. Then, a call is made to Power BI to authenticate using a trusted account. When the user requests a report, the call must allow somehow passing the user identity. And, then this identity needs to flow to SSAS. Previously, to achieve this scenario and circumvent the SSAS limitation of supporting Windows security only, developers would add a CustomData connection string to the SSAS connection so that SSAS can use it for row-level security.

Recently, Lukasz Pawlowski (a Senior Product Manager formerly on the SSRS team and now heading the Power BI APIs) responded:

“Quick update here. While it’s not report embedding, we just announced a preview of an API to integrate dashboard tiles into applications. Read about it on the Power BI Developer blog”

So, it looks like the first wave of the embedding APIs has arrived. While it’s not what we need, I think we can expect full-featured embedding to arrive soon.

Azure D-Series VM Prices Go Down

I’m optimizing a SQL Server OLTP mirrored database hosted on an Azure VM. After load testing, it was clear that CPU is the primary resource constraint so I suggested an upgrade to a higher A-series machine alongside other reconfiguration and storage changes. Meanwhile, I’ve noticed that starting in October, the D-Series pricing goes down. Moreover, D-series include a SSD temporary drive to host the temp database and buffer extensions. And, according to Microsoft, D-series have “60% faster processors than the A-series”. In this case, to save the customer money, I recommended a D-Series VM with less cores than the A-series VM upgrade I had in mind.

Load Testing Azure Applications

Scenario: You’re conducting a capacity planning effort and load testing an Azure website (Azure App Service) that saves data to SQL Server running on Azure VM. You’ve created a load test, e.g. using Visual Studio, and you’ve realized that the throughput caps for no obvious reason. SQL Server and Azure performance counters indicate that both layers are not under pressure.

Solution: Although no specifics are provided by Microsoft, it appears that Azure throttles user loads to prevent denial-of-service attacks. Based on my observations, Azure detects that the requests come from the same IP and caps the requests. As a workaround, consider conducting the load test using Visual Studio Online. Visual Studio Online allows you to distribute the test across multiple agents that can use different IP addresses (configurable as a test property). In our case, this allowed us to conduct successfully a stepped load test until we reached a performance bottleneck. Interestingly, in this case the bottleneck we first reached was CPU utilization on the front end even although we scaled it to three large Azure web instances. SQL Server was churning along just fine with a single data disk. You results might differ of course.

Atlanta MS BI Group 5th Anniversary

Our next Atlanta MS BI Group meeting will be on August 31st. This happens to be our 5th anniversary! Thanks for everybody who contributed to the group success! We’ve come a long way since our first meeting with a dozen people to the attendance we enjoy today of 60-70+. Come and join the festivities and learn about sentiment analysis with Big Data & Machine Learning.

 

Presentation:

Tone/Sentiment analysis with Big Data & Machine Learning

  

Level: Intermediate

Date:

Monday, August 31th, 2015

Time

6:30 – 8:30 PM ET

Place:

South Terraces Building (Auditorium Room)

115 Perimeter Center Place

Atlanta, GA 30346

Overview:

People’s sentiments and opinions are written in social networks. There are tweets, Facebook posts, book reviews, forum discussions, and more. These attitudes and feelings are communicated using text, with format depending on the social network. Twitter messages are limited to 140 characters and use hash-tags and Facebook messages can be longer.

 

This session reviews the different Natural Language Processing, text mining, and data mining techniques you can use for sentiment and tone analysis. Organizations can use the extracted knowledge for brand reputation, market predictions, and automatic learning. We’ll look at, Hadoop, data mining, Microsoft Big Data Hadoop distribution HDInsight and Azure Machine Learning.

Speaker:

Paco Gonzalez, a SolidQ Mentor, is the PASS Big Data Virtual Chapter Leader, SQL Saturday Barcelona Organizer and assistant at the PASS Atlanta BI User Group. He is finishing his PhD thesis about: “Analyzing Social Data with Machine Learning.” He is an MCT and MCP on BizTalk Server and SQL Server. He is coauthor of Microsoft Training Kit 441. Paco is a frequent speaker at large and small conferences. SQL Saturdays, TechEds, PASS BA, PASS Summit, DevWeek London, and PAW Chicago and London.

Sponsor:

TEKSYSTEMS is a subsidiary of Allegis Group, the largest private talent management firm in the world. Our longstanding history and industry-leading position speak to our success in providing the IT staffing solutions, IT services and talent management insight required for you to actualize ROI and sustain a truly competitive advantage in a fast-changing market.

 

 

Offsetting Week Start Day

Scenario: You have a date dimension table and typical date-related columns, such as DayOfWeek, WeekEndDate, etc. In US, weeks start on Sunday and end on Saturday (ISO weeks on the other hand start on Monday and end on Sunday). You have a requirement to overwrite the week start and end days for a Power Pivot, Tabular, or MD model. For example, while the Date table defines the week start day on Sunday, you might need to overwrite it to start on Monday.

Solution: I typically derive as many of the date calculations in a SQL view on top of the Date table instead of saving the results in the table itself. For example, the WeekOfYear calculation might look like this:

DATEPART(WEEK, [Date]) [WeekOfYear]

If you have date calculations like this, instead of changing all week-related calculations to reflect the desired week start day, the easiest way to offset the week day is to use the T-SQL SET DATEFIRST statement preceding the actual SELECT statement. The following example taken from Power Pivot/Tabular overwrites the first day of the week to start on Monday.

Applied Power BI Service Class Available

I’m happy to announce the availability of my latest training course – Applied Power BI Service. I believe that the new version of Power BI, known as Power BI Service or Power BI 2.0, is going to be very successful and it should warrant your serious interest. This one to two-day class is designed to help you become proficient with the Power BI Service and acquire the necessary skills to work with online and on-prem data, implement data models on a par with professional models created by BI pros, unlock the power of data by creating interactive reports and dashboards, and share insights with other users. No prior data modeling or reporting knowledge is assumed. Students are welcome to bring their own data to the second day of the class.

Where Is My DQS KB Gone?

Scenario: You use the DQS Cleansing Transformation in an SSIS package to integrate data cleansing with DQS. Once you are done developing, you export the DQS KB and then create a new DQS KB by importing and publishing the KB to another server. Or, you might have deleted and recreated the KB with the same name. When you open the DQS Cleansing Transformation Editor, although the KB name is the same, you find that the reference to your DQS KB is gone and you get a warning message “The data quality Knowledge Base property is empty. Specify a valid Knowledge Base”. Naturally, you don’t want to remap the KB and recreate all mappings as this a tedious process involving many steps with many columns.

Solution: I’m not sure what design pattern was followed here, by SSIS identifies the KB by its identifier instead of by name. To fix the issue without remapping, you need to find the new KB ID and update the package source by following these steps:

  1. On the target SQL Server that is integrated with DQS, connect to the DQS_MAIN database and issue the following query by replacing the token with the name of your KB:
    SELECT * FROM [DQS_MAIN].[dbo].[A_KNOWLEDGEBASE] WHERE NAME = ‘<YOUR KB NAME>’
  2. In the query results, locate the ID column and copy the identifier.
  3. In SSDT, right-click on the package and then click View Code to open the package XML source.
  4. Find the knowledgebaseName property and overwrite the new identifier.

<property dataType=“System.Int64” name=“KnowledgebaseName” typeConverter=“NOTBROWSABLE”>1020033</property>

  1. Save the source and open the package in the designer.

Note that to resolve the KB identifier successfully, not only you need to import the KB on the target server but also publish the KB.

Power BI Desktop Supports Direct Connectivity to Tabular

As you’ve probably heard the news, Power BI went live yesterday. Among the many new features, Power BI Desktop (previously known as Power BI Designer) now supports live connections to Tabular. This is conceptually similar to connecting Excel to Tabular or Power View to Tabular to create ad hoc reports. However, Power BI Desktop is a standalone and freely available tool that is independent of Office or SharePoint.

When connected directly to Tabular, Power BI Desktop generates DAX queries as a result of report changes, sends them to Tabular and displays the results. Of course, we know we have new visualizations that don’t exist in Excel and on-prem Power View, such as tree maps, region maps, etc.

Upon discovering connectivity to Tabular, Power BI Desktop asks you how you want to connect:

Once you create the report, you can publish the dataset and report to Power BI. As a prerequisite, you need to install and configure the Analysis Services Connector to that the report can connect to Tabular on premises. If the connector is not installed, you will get an error on publishing. Speaking of the connector, at this time it cannot be installed side by side with the Personal Gateway which is used for refreshing data from on-prem data sources. You need to install them on separate machines.

Tabular is the first data source that supports direct connectivity in Power BI Desktop. I’d imagine that live connections to other data sources will follow soon. Direct connectivity is useful to support fast databases where there is no need to import and cache the data. In this case, you’ll need to weight out the pros and cons of direct connectivity vs. data import.

Atlanta BI Group Meeting on July 27th 2015

Come and join us for our next information-packed meeting on Monday, July 27th. Stacey Jones from UPS will present “Big Data as part of your Data Strategy” and Patrick LeBlanc from Microsoft will do a quick demo of Power BI Desktop now that Power BI has gone officially live. Our sponsor will be Prestige Staffing.

Presentation:

Big Data as part of your Data Strategy

  

Level: Intermediate

Date:

Monday, July 27th, 2015

Time

6:30 – 8:30 PM ET

Place:

South Terraces Building (Auditorium Room)

115 Perimeter Center Place

Atlanta, GA 30346

Overview:

New data technologies are evolving at an unprecedented pace. The traditional RDBMS is no longer the only choice for storing your data. Big data technologies such as Hadoop and other data technologies such as Data Virtualization, NoSQL, In-memory databases, Columnar Databases, and Graph databases offer new capabilities, but hold the potential to simply distract or disrupt business. Do relational databases still play a role? Using the wrong technology for the problem at hand can be expensive, time consuming, and counter-productive. I will discuss the importance of developing a data strategy for your company and how to determine the correct technologies. Takeaways:

 
 

•    Understand your data consumption needs

•    Document your data ecosystem

•    Develop your data strategy to apply the best data technology

Speaker:

Stacey Jones is a Data Architect filling the role of the UPS Chief Business Intelligence Evangelist. Stacey spends his time researching and promoting the use of modern BI- related technologies such as Big Data, Self-Service, Data Virtualization, Polyglot Persistence, In-Memory Databases and Predictive/Prescriptive Analytics. Stacey recently transitioned UPS from reporting to Self-Service Business Intelligence and architected the largest cube ever built at UPS. He is passionate about the art of writing and optimizing the performance of SQL queries, database design, database code, and all things BI. You can reach him at staceyjones@ups.com or on LinkedIn at https://www.linkedin.com/pub/stacey-jones/24/891/59

Prototypes with Pizza

Power BI Desktop with Patrick LeBlanc, Microsoft

Sponsor:

Prestige Staffing is an award-winning, nationally recognized leader staff augmentation. We specialize in providing elite candidates to fulfill job openings in the IT, Telecommunications, Healthcare and Accounting/Finance industries. Our clients include both Fortune 500 companies and small, local businesses. Our mission is to provide employers with only the most skilled applicants, while simultaneously helping qualified professionals improve their careers by placing them in the most ideal work situations.

Using DQS for Externalizing Data Validation Rules

Developers spend plenty of time coding for exceptions. The customer SSN must be 9 characters. The organization code must be four characters and it must be constrained to specific values. The happy path is easy, coding for exceptions is tedious. Some exceptions are easy to catch, others might require more complicated logic than implementing the happy path. And, it gets more difficult if the rules change often or your BI system needs to support adding new rules.

This is where Data Quality Services (DQS) can help. To me, the biggest strength of DQS is externalizing rules. Instead of hardcoding rules in SQL or ETL, you can externalize them and even offload rule management to business users. Using the Data Quality Client tool, a business user can define simple rules and corrections, such as that the e-mail address must match a specific regular expression. Then, the ETL developer can integrate the data flow with DQS to separate invalid and valid rows. Moreover, DQS will tell you exactly which rule is broken and a confidence factor for matching. This helps you avoid tedious coding and saves your company a lot of effort.

In its first release, DQS has left substantial areas for improvement which I hope SQL Server 2016 will address, including:

  1. I’m sure that there is a solid academic theory behind the “domains” concept, but requiring the user to create composite domains for simple-cross field validations might too much to ask. Instead, a better approach might be to have a hidden composite domain consisting of all fields.
  2. The lookup lists, such as “Value Is In” cannot be data-driven, requiring you to hardcode lookup values.
  3. More advanced rules are needed, such as IF-THEN, without requiring composite domains.
  4. Very slow performance with the SSIS DQS task because it uses a single thread.
  5. Currently, DQS has no API so automation is not possible.