Data Integration Options with Cloud Providers

It’s unlikely to envision a data analytics solution without having to ingest data from some cloud vendor. I previously ranted here about cloud prohibitors – cloud providers that negate the benefits of moving to the cloud by enforcing all sorts of hoops to get to your data. This blog summarizes a few data integration options with cloud vendors from best to worst.

  1. Direct access to the database – This is by far your best option. If the provider allows you to access the data by directly connecting to its native storage, you should take it even if it involves higher fees, such as upgrading to a “premium” tier (a travesty considering that you didn’t have pay higher fees with their on-prem installations). There are no insurmountable engineering barriers for the vendor to provide direct access to the data in its native storage, but very few cloud vendors do it citing “issues”, such as security, impact on operational processes, etc. Here is a free piece of advice to cloud vendors: provide direct access to the underlying relational database and use this as a big differentiator against the competition. Security and performance should be on you and not the customer.
  2. Data staging – If the direct access is not an option, the second best would be for the vendor to stage the data out, ideally to a relational database you provision, such as by using the Dynamics Data Export add-in to export data from Dynamics Online (now regretfully deprecated by Microsoft in favor of staging to data lake). Unfortunately, it looks like the norm nowadays is to export to a data lake as flat files and the extraction path becomes relational database -> flat files -> relational database. I hope that makes sense to you because it doesn’t to me. Things to watch out here for are where is the data lake located (Azure, AWS, others) and what integration options are provided. For example, only major ERP vendor supports only S3 as a data lake and provides and a JDBC driver to connect it (JDBC is a Java-based connector that can’t be used by any Microsoft-based integration tool.
  3. Data push as flat files – Smaller and more flexible cloud providers might be willing to push the data to a storage that you provide. In such cases, you should strongly consider Azure data lake instead of FTP. For example, one cloud provider I integrated with used Ruby to call the Azure API to post data extracts to the client’s data lake storage that we set up.
  4. REST APIs – Most cloud integration scenarios will fall in this bucket so that the vendor ensures “secure” and “controlled” access to data. The reality is that many of these REST APIs are horrible in both implementation and throughput. For example, the REST APIs of another ERP vendor couldn’t handle a batch export of 500,000 rows (a dataset that can fit into an Excel spreadsheet and be emailed around mind you). Their server would time out because of the “massive” data, and the client was asked to use callback APIs to chunk the export which of course wouldn’t either. And complexity only grows from here as some implementations require result paging, error handling, etc. forcing you to write custom code.
  5. Manual export – If the above options are not possible, your last resort might be to use the vendor app and export the data manually. So much about automating the data integration…

In summary, when evaluating a cloud vendor, don’t forget to ask how you can access your data. Extracting data by connecting to its native storage (typically a relational database) is the best option for integrating with cloud data sources, as the data is in its native format, you can fold operations, such as filtering and sorting, you can extract data incrementally, or even build SQL views if the vendor allows it. Unfortunately, the norm nowadays is to force you to call the vendor’s API, so the extraction effort in your ETL is on you and not the vendor.

Atlanta MS BI and Power BI Group Meeting on March 7th

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, March 7th, at 6:30 PM ET.  Elayne Jones (Consultant with BlueGranite) will cover the pros and cons of migrating semantic models from Azure Analysis Services to Power BI. For more details and sign up, visit our group page.

Presentation:Migrating Azure Analysis Services Models to PBI Premium
Date:March 7th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:Migrating Azure Analysis Services models to Power BI Premium simplifies resource maintenance and drives efficient business intelligence environments. Here, Elayne Jones covers the benefits, drawbacks, prerequisites, and available deployment options.
Speaker:Elayne Jones is a Staff Consultant at BlueGranite who brings experience with data analysis, visualization, and troubleshooting in the Transportation and Shipping industries. She graduated Summa Cum Laude from The University of North Georgia and brings her knowledge from multiple Microsoft certifications with her. Elayne is passionate about harnessing the transformative power of data to drive insights for businesses.
Prototypes without PizzaPower BI Latest by Teo Lachev

PowerBILogo

How Power BI Broke Amazon

Amazon started in 1994 as an online marketplace for books. You’d think that after being in this business for almost 30 years, they’ve figured out the book ecommerce inside out. Unfortunately, rapid revisions (I revise annually my Power BI book) break their model. Every new revision wreaks havoc in the customer experience to purchase a book and punishes the publisher. Here is what I’ve learned after having my books sold on Amazon for many years.

  1. Amazon has a proprietary algorithm that decides what to show on the book detail page. My books are available in two formats: paperback and ebook (Kindle). When a customer lands on the book page, the natural expectation is that the Kindle tab would redirect to the ebook format and the Paperback tab would redirect to the paperback format of the same edition. Unfortunately, the publisher has no control over what the Paperback tab would eventually link to. For example, in the screenshot below, today the algorithm has decided that the customer would be more interested to purchase the 5th edition paperback (7th edition is currently the latest) for the modest price of $79.82 because it’s a vintage. Not to mention that the 5th edition has been long unpublished and only available from third-party sellers that should’ve been listed on the “Other Sellers” tab.
  2. Every time a new revision is published, the book fights with itself. Despite the 7-year overall history and 75 reviews of my Power BI book, a new revision goes all the way back in the line, and its sales ranking starts from zero. Amazon doesn’t give any credit to prior revisions or reviews. And by the time the book has enough history to bubble up in the search results, it’s time for a new revision, and the cycle repeats itself.

I wrote this post mostly as an apology to my customers since there is nothing that be done to improve their Amazon book purchasing experience. The book is at the mercy of Amazon and its obsolete algorithms.

Power BI Dynamic M Query Parameters Reloaded

In a previous blog from October 2020, I ranted about the narrow list of data sources supported by Power BI Dynamic M query parameters. The February 2022 update of Power BI Desktop expanded the feature reach to relational data sources, such as SQL Server (all flavors), Oracle, Teradata, and SAP. This warranted a second look to see where we’re now and where we’re going.

Dynamic queries are now possible to popular relational data sources

The extended support makes it possible to execute SQL queries or stored procedures by passing parameters based on the user selection. I attach a sample pbix file that demonstrates this scenario. It calls a stored procedure uspGetManagerEmployeesTL in the AdventureWorks2021 database. The stored procedure is a slightly enhanced version of the original uspGetManagerEmployees sp included with AdventureWorks because I wanted to experiment with different scenarios around data security. The idea is to execute the stored procedure after each EmailAddress slicer change to see the employees reporting to that manager.

The following needs to happen to get this feature to work:

  1. You must configure the stored procedure to execute in DirectQuery mode when setting up the connection (the Employee table that I use for the slicer could be imported).
  2. In the case of calling a parameterized stored procedure in DirectQuery, you must wrap the call with OPENQUERY (OPENROWSET won’t work for Azure SQL Database) to avoid Power BI choking when running the report (<storedprocedure parameter> only actually works in Power Query but not when the report renders), as explained in more detail in my blog “Power BI DirectQuery with Parameterized Stored Procedure“.
  3. You must create a query parameter (Login parameter in my case).
  4. In the Model view, you must map the filtered field (in this case EmailAddress in the Employee table) to the Login parameter in the Advanced Properties.
  5. In Power Query, modify the stored procedure call to use the Login parameter.
 Source = Sql.Database("xps", "AdventureWorks2012", [Query="SELECT * FROM #(lf)OPENROWSET('SQLNCLI','trusted_connection=yes', 'exec AdventureWorks2012..uspGetManagerEmployeesTL ''" & Login & "''')", CreateNavigationProperties=false])

How about passing dynamically user context or DAX measure output?

If you read the original blog, I was after passing the user identity obtained from USERPRINCIPALNAME() as a parameter to the stored procedure to support the scenario where the client already has a database security in place. Unfortunately, we still can’t meet this scenario because of the other limitations with dynamic query parameters listed in the documentation. Specifically:

  1. Currently, there no way to pass results of DAX measures to Power Query. You might get innovative and try filtering the slicer using a DAX measure, such as the one below but it won’t work because it uses an “advanced” filter.
    EmployeeFilter = IF(SELECTEDVALUE(Employee[EmailAddress]) = USERPRINCIPALNAME(),1)
  2. The filtered field must be the only field in the context. For example, you can’t use cross-highlighting from another visual that lists another employee’s attribute, such as FullName, even if that maps 1:1 to the filtered field. The classic example where this could be useful would be to show the user-friendly name in a slicer but to pass the system key to the database. Sorry, we’re not that advanced yet although SSRS could do this 20 years ago…

In summary, you can now execute parameterized dynamic SQL statements or stored procedure against traditional relational data sources in DirectQuery by allowing the user to select the parameter value mapped to a dynamic query parameter. However, existing limitations prevent more flexible integration scenarios, such as passing DAX measures to the data source. Unfortunately, Power Query and the model continue to act as two tenants in common that don’t care much about each other’s whereabouts.

Azure Data Engineer Badge Acquired

Just passed the Microsoft DP-203 exam and added yet another badge to my certification portfolio. This certification is one of the requirements for partners to maintain Microsoft Gold Data Analytics Competency, so I’m glad to take this off my list.

Atlanta MS BI and Power BI Group Meeting on February 7th

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, February 7th, at 6:30 PM ET.  Your humble correspondent will show you different techniques to implement scorecards with Power BI. For more details and sign up, visit our group page. Download the presentation assets here.

Presentation:Implementing Scorecards
Date:February 7th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:Join this session to learn how to create scorecards measuring strategic objectives with Power BI. We’ll start by introducing you to balanced scorecards and KPIs. Then, I’ll compare different ways to assemble scorecards, including:

·       Conditional formatting

·       Analysis Services KPI

·       Power BI Goals

Demos will make it all clear. As a bonus, you’ll learn some black-belt modeling techniques, such as using Tabular Editor where Power BI Desktop falls short.

Speaker:Teo Lachev is a consultant, author, and mentor, with a focus on Microsoft BI. Through his Atlanta-based company Prologika (a Microsoft Gold Partner in Data Analytics and Data Platform) he designs and implements innovative solutions that bring tremendous value to his clients. Teo has authored and co-authored several books, and he has been leading the Atlanta Microsoft Business Intelligence group since he founded it in 2010. Microsoft has recognized Teo’s contributions to the community by awarding him the prestigious Microsoft Most Valuable Professional (MVP) Data Platform status for 15 years. In 2021, Microsoft selected Teo as one of only 30 FastTrack Solution Architects for Power BI worldwide. https://prologika.com
Prototypes without PizzaPower BI Latest

PowerBILogo

DirectQuery Performance Progression

So much data, so little budget! Caching data in Power BI gives you the best report performance, but budget constraints usually put downward pressure to stay within lower Power BI Premium premium plans. So, what to do?

  1. Split large models. Remember that Power BI Premium Gen2 grants each dataset a 25 GB memory quota. I’m actually a big proponent for consolidated organizational semantic models, but this is where best practices meet reality.
  2. Switch large tables to DirectQuery. When report performance sucks, follow this performance optimization progression:
    1. Add a columnstore index to the fact table – As VertiPaq, a columnstore index organizes data in columns so aggregate queries should see an immediate performance boost. Detail-level queries, e.g. sales by customer, not so much as they probably won’t hit the index.
    2. Try hybrid tables when you can get away with a compromise where the latest data can be cached, but archive data left in DirectQuery.
    3. Try aggregation tables (first automatic, then manual) when you need to speed up aggregate queries at a higher-level grain, such as dashboard queries but leave lower-level queries pass through.
    4. Pray for mercy to come! Or take the stakeholder out for lunch and insist that the time for some compromise has come…

Azure Learning Path

I’m preparing for the new Azure Data Engineer exam, which is one of options to maintain a Gold Certification for Data Analytics for MS partners that has to be renewed annually. I’m really impressed by the Azure Learning Path self-paced environment that Microsoft put together. The training material is top-notch, and I actually learn some practical and useful stuff from the covered topics on ADLS, ADF, Synapse, etc. And you can even practice in an Azure sandbox environment that Microsoft sets up and tears down for you! It’s clear that tremendous effort has gone into setting this online and free learning option. No excuse to not certify anymore …

Check it out the online learning option for exam DP-203 at https://docs.microsoft.com/en-us/learn/certifications/azure-data-engineer/.

Applied Power BI Book (7th Edition)

Happy New Year!

I’m excited to announce the availability of the seventh edition of my Applied Microsoft Power BI book! When the first edition was published  in January 2016, it was the first Power BI book at that time and it had less than 300 pages. Since then, I helped many companies adopt or transition to Power BI and taught hundreds of students. It’s been a great experience to witness the momentum surrounding Power BI and how the tool has matured over time. As a result, the book also got thicker and doubled in size. However, I believe what’s more important is that this book provides systematic, yet dependent, view by showing what Power BI can do for four types of users (business users, analysts, pros, and developers).

For the first time, the book features a Kindle PDF (Amazon Print Replica) ebook version which replaces the Kindle native “reflow” format previously used. Now if you buy the ebook, you will read it on your Kindle device or phone exactly as it was formatted to print the paperback! The ebook is also enrolled in Kindle Unlimited – an Amazon subscription program where you can read as many eBooks as you like and keep them as long as you want.

To my understanding, this is the only Power BI book that is revised annually to keep it up to date with this ever changing technology! Hint: you can tell how old Power BI is from the book’s edition number.

Because I had to draw a line somewhere, Applied Microsoft Power BI (7th edition) covers all features that are in preview or released by December 2021. As with my previous books, I’m committed to help my readers with book-related questions and welcome all feedback on the book discussion forum on the book page. While you are there, feel free to check out the book resources (sample chapter, front matter, and more). I also encourage you to follow my blog at https://prologika.com/blog and subscribing to my newsletter at https://prologika.com to stay on the Power BI latest.

Bring your data to life for the seventh time! Keep on reading and learning!

Atlanta MS BI and Power BI Group Meeting on January 3rd

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, January 3rd, at 6:30 PM ET.  Paul Turley will show us how to integrate Power BI with paginated (SSRS) reports. And your humble correspondent will update you on the Power BI latest.  For more details and sign up, visit our group page.

Presentation:Power BI Paginated Reports: The New Old Operational Reporting Platform
Date:January 3rd
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:Power BI Paginated Reports (aka SQL Server Reporting Services) was old but now it’s new again. Available on-premises or in the Power BI service with flexible licensing, you have multiple options to implement operational reports. This session will briefly cover the differences between analytic and operational reports; and help you understand the advantages and trade-offs using Power BI Paginated Reports, Power BI Report Server and SQL Server Reporting Services. Material from our forthcoming book: Paginated Report Recipes.
Speaker:Paul is a Principal Consultant for 3Cloud Solutions (formerly Pragmatic Works), a Mentor and Microsoft Data Platform MVP. He consults, writes, speaks, teaches & blogs about business intelligence and reporting solutions. He works with companies around the world to model data, visualize and deliver critical information to make informed business decisions; using the Microsoft data platform and business analytics tools. He is a Director of the Oregon Data Community PASS chapter & user group, the author and lead author of Professional SQL Server 2016 Reporting Services and 14 other titles from Wrox & Microsoft Press.  Paul is a 2021 FastTrack Recognized Solution Architect and holds several certifications including MCSE for the Data Platform and BI.
Prototypes without PizzaPower BI Latest

PowerBILogo