Scaling out SSIS in SQL Server 2017

SQL Server 2017 introduces the ability to scale out SSIS. The primary scenario is to enable customers to scale out their SSIS execution at the package level. Imagine a retail business that has about 300 packages to run at the end of the day to push transactional data/finance data from multiple stores to the centralized data warehouse for reporting. All the 300 packages need to run at night but they also need to finish by 6 AM in the morning before new business day starts. As business grows, data size grows, and it takes longer time to run the 300 packages and it is becoming more difficult for the ETL processing window to finish before 6 AM. With SSIS scale out feature, you can now set up the scale out cluster with 1 master and N workers, so that the master can automatically distribute the packages to the workers based on the worker availability, therefore, it helps parallelize the execution at package level and achieve faster completion time. You don’t have to worry about which packages will be executed by which worker, as it’s handled by the master. All you need is to deploy the 300 packages into the catalog and then trigger the ETL run.

Sounds like an easy fix to your long ETL processing times? As with any technology, my advice is to focus on improving the performance of your ETL first, then scale up, then scale out. Here are the top 5 performance issues that I see over and over in my consulting practice:

  1. No parallelism – Packages run sequentially, e.g. dimensions are populated one by one and then fact tables one by one. Meanwhile, despite all the CPU power and cores you fought hard to secure, the server isn’t doing much. Instead, you should run things in parallel as much you can with the goal to saturate that server CPU bandwidth. Ideally, you should invest into a framework that can automatically distribute work across packages with on a configurable degree of parallelism, as the one that we use and is mentioned in my newsletter ” Is ETL (E)ating (T)hou (L)ive?”.
  2. No incremental extraction – The less data you process, the faster your ETL will be. You should strive to extract only the rows that have changed if possible, using techniques such as LastUpdated timestamp, Temporal tables, or CDC.
  3. ETL instead of ELT pattern – Influenced by Microsoft and expert advice, you’ve decided to use the data flow transforms. For example, in a recent ETL review, I found that a Lookup task caches all the 50 million rows from a fact table in memory to determine a row should be inserted or updated! What happens when (not if) one day the fact table swells to 500 million rows? Instead, almost always the ELT pattern that relies on stored procedures and T-SQL MERGE would be a better choice from a performance standpoint. 30 years went into improving and evolving Microsoft SQL Server. It would be naïve not to take advantage of this evaluation and its set-based processing. And as a bonus, one day if you decide to migrate your DW to the cloud, such as by using Azure SQL Data Warehouse, you’ll find that ELT is recommended (unfortunately, Azure SQL DW still doesn’t support MERGE but one day I hope it will).
  4. Query optimization – Needless to say, your queries should be optimized. In the same recent review, I found that one query that extract data from ODS takes four hours to execute!
  5. Excessive data movement – This goes back to incremental extraction but I often see ETL that does full load or partial load (e.g. the last six months) and copies millions of rows from source to staging table 1, staging table 2, …, and finally data warehouse.

If, after you follow the above best practices, you still find that you’re exceeding your ETL processing windows, you should consider scaling out SSIS. Before doing so, consider the following limitations in the SSIS scale-out feature in SQL Server 2017:

  • A best practice is to partition your ETL process in child packages that are orchestrated by a master package using the Execute Package Task (EPT). SSIS scale-out can run packages with EPT but it does not scale out the EPT packages execution on another machine. If you have a main package, which as various sub package execution with EPT, the package and all EPT sub package execution will run on the same worker machine, while other workers in your scale out cluster can handle other packages that are independent. If some order/dependency is needed, there will be extra work needed from you to design a master package on purpose (i.e. one master package to control the overall flow and trigger execution in Scale-out and wait for the result accordingly).
  • If you opt for the ELT pattern as I suggested, scaling out might not help all that much. Your gain will depend on where the performance bottleneck is. If the bottleneck is already in the database, scaling out ETL across multiple nodes will be a futile effort. But if the bottleneck is not the database, then the scale-out can still help to take advantage of multiple machines.
  • When a package is scheduled in scale-out, by default it is possible for it to be assigned to any worker node to be executed. But if needed, you can also specify which worker node(s) you want the package execution to be assigned to when triggering the execution. The master node doesn’t monitor the worker utilization. Instead the worker node does its own CPU or Memory’s monitoring and tells the master if it can take more packages.

Power BI Excel Publisher Doesn’t Load

Issue: You want to use the Power BI Excel Publisher but it doesn’t load. You look at the Excel add-ins and you see that it’s deactivated. You try to activate it and then you get an error.

Solution: Copy the following Office PIA files to the add-in folder (C:\Program Files\Microsoft Power BI Publisher for Excel\bin).

  1. Microsoft.Office.Interop.SQLIS.12.0.nupkg
  2. OFFICE.dll

Unfortunately, I don’t know where these files are located. They were given to me by the Microsoft Support and I can’t redistribute. If you can’t find them, you can call MS support and reference support case: 117041015581533.

Now when you open Excel, you should see the Power BI ribbon tab.

“Understanding Power BI Premium” Events

I’ll present “Understanding Power BI Premium” at three public events in June.

  • June 8th, Carolinas Power BI Group in Greensville, SC at 6 PM
  • June 24th, SQL Saturday Chattanooga, TN at 1:15 PM
  • June 26th, Atlanta MS BI and Power BI Group, Atlanta, GA at 6:30 PM

My calendar has the details.

Microsoft Power BI has enjoyed a lot of attention and success since it became generally available in July 2015.  But its licensing model and cloud-hosted limitations barred wide adoption, especially with larger organizations. The recently introduced Power BI Premium will change all of this. Join this session to learn how Power BI Premium will allow your organization to achieve:
• Flexibility to license by capacity
• Greater scale and performance
• Extending on-premises reporting with Power BI Report Server
• Embedded analytics

Understanding Writeback Target Allocation

I’m working on architecting a financial planning solution powered by Analysis Services Multidimensional. One thing that might not be obvious is how Multidimensional selects the target of writeback allocation. In this case, planning will be done at Customer and Product level. With the default equal allocation when writing at the customer level, it might appear that writeback doesn’t work correctly. You’d expect that only the cells that contribute to the aggregated value (10 in the screenshot below) will be affected by writeback. However, if the Customer and Product entities are in different dimensions, writeback will affect all products!

The reason behind this becomes obvious if you right-click the pivot table, and from its options enable “Shows rows with no data”. Then, you’ll see all products appearing with each customer (customers are crossjoned with products). Recall that by default, the pivot table uses NON EMPTY in MDX query to exclude combinations that don’t exist in the cube. But writeback makes no such assumptions. The reason for this is that the writeback cell is empty, then there is nowhere the writeback value will be allocated to. If the Customer and Product entities are in the same dimension, then the default equal allocation will write to all children of the affected parent, irrespective if their values contribute to its aggregated cell.

So, writeback is not the same as drilling through a cell. Now that you know how it works you can use different allocation settings to achieve the behavior you want. For example, you can choose a weighted allocation with the following expression to avoid writing back to empty values:

iif(Measures.CurrentMember = 0, null, Measures.CurrentMember)

Atlanta MS BI Group Meeting on May 22th

MS BI fans, join me for the next must-attend Atlanta MS BI and Power BI Group meeting on May 22th at 6:30 PM. My esteemed friend, Stacey Jones from Microsoft will explain the predictive analytics capability of Cortana Analytics Suite. He’ll demo a predictive analytics solution that was developed in a single day! It leverages Azure Machine Learning, Azure Data Factory, HDInsight, Spark, Power BI, and Intelligent Apps to deliver a solution that predicts the likelihood of flight delays for a customer. I will do a quick demo of the latest Power BI cool features. Our premium sponsor, TEKSystems, will sponsor the meeting.

Presentation:Cortana Intelligence Suite End-to-End
Level: Intermediate
Date:May 22nd, 2017
Time6:30 – 8:30 PM ET
Place:South Terraces Building (Auditorium Room)
115 Perimeter Center Place
Atlanta, GA 30346
Overview:
 
The Cortana Intelligence Suite provides tools to cover all types of business intelligence needs, from compelling dashboards to cutting edge predictive analytics! In this session we will look at a predictive analytics solution that was developed in a single day. It leverages Azure Machine Learning, Azure Data Factory, HDInsight, Spark, Power BI, and Intelligent Apps to deliver a solution that predicts the likelihood of flight delays for a customer.
Speaker:Stacey Jones has enjoyed his 27 year specializing in Business Intelligence and all things data. He currently serves as the Data Solutions Architect for Microsoft at the Atlanta Microsoft Technology Center (MTC).
Sponsor:People are at the heart of every successful business initiative. At TEKsystems, a leading provider of IT staffing, IT talent management and IT services, we understand people. Every year we deploy over 80,000 IT professionals at 6,000 client sites across North America, Europe and Asia. Our deep insights into the IT labor market enable us to help clients achieve their business goals—while optimizing their IT workforce strategies.

What Does Power BI Premium Mean for You?

I’m sure we’ve heard the announcements today about Power BI Premium. In fact, Power BI Premium is so important that Microsoft has positioned it as a new product under the Power BI marketing umbrella name instead of a new licensing model. Microsoft and industry experts covered the announcements well so I won’t reiterate the obvious. You may wonder what these changes mean for you. Let’s summarize.

Power BI Portfolio

In a nutshell, Power BI Premium targets larger organizations which have faced two issues with the current Power BI licensing model:

  • No “reader” license. If a report has a Power BI Pro features, all users accessing reports would need Power BI Pro license. So, if you a report that used Power BI Pro features, such as gateways or live connections and you won’t this report to be available to 1,000 users, you had to foot $10,000/month bill because everyone required Power BI Pro.
  • Per user license. A case in point – one year after a successful Power BI hybrid pilot, a Fortune 100 organization has purchased whopping 5 Power BI Pro licenses. There are several reasons for the slow adoption by large companies but one of them is the per-user license.

Large organizations who are seeking a mass Power BI deployment to potentially thousands of users could save big with Power BI Premium (use the nice calculator to find how much). On the downside, I’m not happy about Microsoft requiring Power BI Pro licenses for contributors on top of Power BI Premium.

I don’t see smaller organizations being very much interested in Power BI Premium. For them, a welcome change would be that Power BI Free adds Power BI Pro features. On the downside, Power BI Free loses simple dashboard sharing. This reflects the Microsoft vision about Power BI Free: it is for individual users who are evaluating Power BI. To mitigate the impact of the Power BI Free changes, Microsoft offers one year Power BI Pro trial offer to all Power BI Free users as of May 2nd.

TodayStarting in June 2017th
Power BI DesktopConnect to 70+ data sources

Data transformations

Report creation and exploration

No changes
Power BI FreeNo live connections, No gateway connectivity

Smaller capacity limits and data refresh rates

Only simple dashboard sharing

Access to all data sources
Performance equivalent to Power BI Pro
No sharing (not even simple dashboard sharing)
Power BI ProAccess to all data sources

Larger capacity limits and data refresh rates

All sharing options (simple, workspaces, org content packs)

No changes
Power BI PremiumIncreased capacity limits
Dedicated environment
Content distribution (reader license)
Power BI Report Server
More features in future, such as in-memory caching, incremental refresh (read the whitepaper)

Personally, I’d like to see more Power BI pricing tiers added, e.g. Standard tiers. Currently, the lowest Power BI Premium tier (P1) is $5,000 per month which would be probably out of reach for smaller organizations. But fear not, you can stay within the old Power BI Pro licensing model.

Power BI Report Server

Microsoft has decoupled SSRS from SQL Server so it gets more frequent updates. SSRS becomes actually two products:

  • SSRS – This is the SSRS we know it but with no Power BI integration. It will get new RDL features but no Power BI features. See the Microsoft blogs here and here for more details.
  • Power BI Report Server – Distributed as a standalone installer, Power BI Report Server is a superset SSRS as it gets both existing report types and Power BI reports. As far as the reason for the name change, the Power BI name is a strong brand while SSRS has been associated with the old style paginated reports.

You can get Power BI Report Server in two ways:

  • As a part of the Power BI Premium bundle. You get the same number as licensed EE cores as the number of v-cores you purchased with Power BI Premium.
  • Standalone and covered by a SQL Server Enterprise Edition with Software Assurance license, plus Power BI Pro licenses for report authors (as with Power BI Premium).

So, although Power BI Report Server has divorced SQL Server, it’s still covered by its license (kind of when you send your kid to college but she still lives with you). Currently, SQL Server doesn’t check for Software Assurance in any way (there isn’t such SKU). So, it looks like Power BI Premium licensing would be an honor system for customers who want to get it standalone covered by a SQL Server Enterprise Edition license.

Power BI Embedded

Power BI Embedded has been gaining a lot of traction but the problem was that it’s separate from Power BI Service. Consequently, it had to catch up with Power BI Service. For example, it still doesn’t have connectivity to on-premises data sources. The good news is that Power BI Embedded marries Power BI Service so there will be a feature parity and a common set of APIs. The part that I’m not excited about is that its new licensing model requires Power BI Premium (good bye per-session licensing). This might be a showstopper for small ISVs. I hope that Microsoft introduces less expensive pricing tiers to better cater for needs of smaller companies. [Update 6/15/2017:  Microsoft announced low-cost EM* plans for Power BI Embedded starting at $625/mo]

powerbipremium

Power BI Report Measures Over Tabular Models

The May release of Power BI Desktop adds the ability to define DAX calculated measures when Power BI Desktop is connected live to a Tabular model or Power BI datasets. This is conceptually similar to defining MDX calculated members in Excel connected to a cube. The measure definitions are local to the Power BI Desktop model (the Tabular model is not modified). You can do all measure-related tasks as when you define measures in the data model, such as changing the data type and formatting the measure or changing the home table. In the screenshot below, I’ve defined a YTD report measure over the Adventure Works Tabular model.

050317_0117_PowerBIRepo1.png

Behind the scenes, the DAX query generated by Power BI Desktop adds the measures as query-scoped measures in the /* USER DAX BEGIN/END */ section:

DEFINE MEASURE ‘Reseller Sales'[Reseller Sales YTD] =

(/* USER DAX BEGIN */

TOTALYTD(SUM(‘Reseller Sales'[Sales Amount]), ‘Date'[Date])

/* USER DAX END */)

EVALUATE

ROW(

“Reseller_Total_Sales”, ‘Reseller Sales'[Reseller Total Sales],

“Reseller Sales YTD”, ‘Reseller Sales'[Reseller Sales YTD]

)

Report-level measures are a welcome enhancement. Bringing this further, I’d like to see the ability to define report-level measures using the Quick Measure feature. Another feature that I’m waiting for is the ability to use custom measures (both defined in the model and report-level) in the new numeric range slicer (currently in preview).

Important Power BI Announcements on May 3rd

Please make sure you register free and join Microsoft Business Forward online event (https://www.microsoft.com/en-us/dynamics365/business-forward) on May 3rd at 10 AM ET.

Join Satya Nadella (CEO Microsoft), James Phillips (Corporate Vice President, Business Applications, Platform and Intelligence), Judson Althoff (Executive Vice President, Worldwide Commercial Business) for major announcements and details on the “new generation” of Power BI, Dynamics 365 applications, LinkedIn, and the Microsoft Cloud.

This should be an important event not to miss!

Atlanta MS BI Group Meeting on April 24th

MS BI fans, join me for the next must-attend Atlanta MS BI and Power BI Group meeting on April 24th at 6:30 PM. Mike Bruce and Alex Higgins from Acuity Brands will share how they use Power BI to improve their development process. Acuity Brands will sponsor the event. And I’ll demo the new Quick Measures Power BI measure.

Presentation:Using Power BI to Track Software Development Performance
Level: Intermediate
Date:April 24, 2017
Time6:30 – 8:30 PM ET
Place:South Terraces Building (Auditorium Room)

115 Perimeter Center Place

Atlanta, GA 30346

Overview:By using Power BI, Acuity Brands can monitor development teams’ progress with rich, interactive dashboards. Data from Visual Studio Team Services ODATA feeds and APIs as well as pulling data from DocumentDB, teams can drill into their development performance and see where they may be having development performance. Data is accessed via embedded Power BI reports running on dedicated hardware throughout development team spaces as well as accessible via the web using SSO!
Speaker:Mike Bruce has been developing software for the past 22 years focusing on Microsoft products. He is currently runs the DevOps and QA team for Platform Architecture at Acuity Brands.

Alex Higgins recently joined Acuity Brands via the Leadership Program. He has lead the effort to build custom visuals in PowerBI as well as creating embedded reports.

Sponsor:Acuity Brands is the North American market leader and one of the world’s leading providers of lighting solutions for both indoor and outdoor applications. We provide customer-driven smart and simple lighting solutions that offer quality lighting and value-added benefits by empowering world-class talent to create and leverage our industry-leading portfolio of products, technology, and services; drive world-class cost efficiency; and leverage a culture of continuous improvement.

Tabular DAX Editor

The Tabular toolset is getting better. One thing that I miss from Multidimensional is the cube script that lets you view all custom calculations in one place so that you can organize them any way you want, add comments, etc. This is why I contributed to the DAX Editor tool. Microsoft has taken notice and introduced a tool (also called DAX Editor) in the latest SSDT release. Read Kay Unkroth’s announcement here.

The Microsoft DAX Editor supports the old XML-based schema and the new JSON schema. On the upside, it gives you a break from the Measure Grid and the formula bar. On the downside, you can work only on one measure at the time. So, let’s leave it to marinate it a few more months with the hope that we can finally have a Tabular script. As Kay commented at the end of his blog post there is a hope:

Yes, we are hearing this a lot from you guys! Having all expressions in a single document makes it easy to find and replace, search, etc. It’s on the backlog, but not yet on the top of the priorities. Looking at the higher prio work we still need to get done , it’s more mid-termish. But we know how we want to achieve this and we are laying down the foundation with this DAX Editor. Btw. it is much, much more than just an editor window. That is really just the tip of the iceberg. Same with the DAX query window in SSSM. The real beauty (and complexity) is in the DAX parser behind these windows, and a few other features like IntelliSense. It’s coming together. Brick by brick!

Meanwhile, use the community DAX Editor.