Posts

Atlanta MS BI and Power BI Group Meeting on March 7th

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, March 7th, at 6:30 PM ET.  Elayne Jones (Consultant with BlueGranite) will cover the pros and cons of migrating semantic models from Azure Analysis Services to Power BI. For more details and sign up, visit our group page.

Presentation:Migrating Azure Analysis Services Models to PBI Premium
Date:March 7th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:Migrating Azure Analysis Services models to Power BI Premium simplifies resource maintenance and drives efficient business intelligence environments. Here, Elayne Jones covers the benefits, drawbacks, prerequisites, and available deployment options.
Speaker:Elayne Jones is a Staff Consultant at BlueGranite who brings experience with data analysis, visualization, and troubleshooting in the Transportation and Shipping industries. She graduated Summa Cum Laude from The University of North Georgia and brings her knowledge from multiple Microsoft certifications with her. Elayne is passionate about harnessing the transformative power of data to drive insights for businesses.
Prototypes without PizzaPower BI Latest by Teo Lachev

PowerBILogo

Power BI Dynamic M Query Parameters Reloaded

In a previous blog from October 2020, I ranted about the narrow list of data sources supported by Power BI Dynamic M query parameters. The February 2022 update of Power BI Desktop expanded the feature reach to relational data sources, such as SQL Server (all flavors), Oracle, Teradata, and SAP. This warranted a second look to see where we’re now and where we’re going.

Dynamic queries are now possible to popular relational data sources

The extended support makes it possible to execute SQL queries or stored procedures by passing parameters based on the user selection. I attach a sample pbix file that demonstrates this scenario. It calls a stored procedure uspGetManagerEmployeesTL in the AdventureWorks2021 database. The stored procedure is a slightly enhanced version of the original uspGetManagerEmployees sp included with AdventureWorks because I wanted to experiment with different scenarios around data security. The idea is to execute the stored procedure after each EmailAddress slicer change to see the employees reporting to that manager.

The following needs to happen to get this feature to work:

  1. You must configure the stored procedure to execute in DirectQuery mode when setting up the connection (the Employee table that I use for the slicer could be imported).
  2. In the case of calling a parameterized stored procedure in DirectQuery, you must wrap the call with OPENQUERY (OPENROWSET won’t work for Azure SQL Database) to avoid Power BI choking when running the report (<storedprocedure parameter> only actually works in Power Query but not when the report renders), as explained in more detail in my blog “Power BI DirectQuery with Parameterized Stored Procedure“.
  3. You must create a query parameter (Login parameter in my case).
  4. In the Model view, you must map the filtered field (in this case EmailAddress in the Employee table) to the Login parameter in the Advanced Properties.
  5. In Power Query, modify the stored procedure call to use the Login parameter.
 Source = Sql.Database("xps", "AdventureWorks2012", [Query="SELECT * FROM #(lf)OPENROWSET('SQLNCLI','trusted_connection=yes', 'exec AdventureWorks2012..uspGetManagerEmployeesTL ''" & Login & "''')", CreateNavigationProperties=false])

How about passing dynamically user context or DAX measure output?

If you read the original blog, I was after passing the user identity obtained from USERPRINCIPALNAME() as a parameter to the stored procedure to support the scenario where the client already has a database security in place. Unfortunately, we still can’t meet this scenario because of the other limitations with dynamic query parameters listed in the documentation. Specifically:

  1. Currently, there no way to pass results of DAX measures to Power Query. You might get innovative and try filtering the slicer using a DAX measure, such as the one below but it won’t work because it uses an “advanced” filter.
    EmployeeFilter = IF(SELECTEDVALUE(Employee[EmailAddress]) = USERPRINCIPALNAME(),1)
  2. The filtered field must be the only field in the context. For example, you can’t use cross-highlighting from another visual that lists another employee’s attribute, such as FullName, even if that maps 1:1 to the filtered field. The classic example where this could be useful would be to show the user-friendly name in a slicer but to pass the system key to the database. Sorry, we’re not that advanced yet although SSRS could do this 20 years ago…

In summary, you can now execute parameterized dynamic SQL statements or stored procedure against traditional relational data sources in DirectQuery by allowing the user to select the parameter value mapped to a dynamic query parameter. However, existing limitations prevent more flexible integration scenarios, such as passing DAX measures to the data source. Unfortunately, Power Query and the model continue to act as two tenants in common that don’t care much about each other’s whereabouts.

Atlanta MS BI and Power BI Group Meeting on February 7th

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, February 7th, at 6:30 PM ET.  Your humble correspondent will show you different techniques to implement scorecards with Power BI. For more details and sign up, visit our group page. Download the presentation assets here.

Presentation:Implementing Scorecards
Date:February 7th
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:Join this session to learn how to create scorecards measuring strategic objectives with Power BI. We’ll start by introducing you to balanced scorecards and KPIs. Then, I’ll compare different ways to assemble scorecards, including:

·       Conditional formatting

·       Analysis Services KPI

·       Power BI Goals

Demos will make it all clear. As a bonus, you’ll learn some black-belt modeling techniques, such as using Tabular Editor where Power BI Desktop falls short.

Speaker:Teo Lachev is a consultant, author, and mentor, with a focus on Microsoft BI. Through his Atlanta-based company Prologika (a Microsoft Gold Partner in Data Analytics and Data Platform) he designs and implements innovative solutions that bring tremendous value to his clients. Teo has authored and co-authored several books, and he has been leading the Atlanta Microsoft Business Intelligence group since he founded it in 2010. Microsoft has recognized Teo’s contributions to the community by awarding him the prestigious Microsoft Most Valuable Professional (MVP) Data Platform status for 15 years. In 2021, Microsoft selected Teo as one of only 30 FastTrack Solution Architects for Power BI worldwide. https://prologika.com
Prototypes without PizzaPower BI Latest

PowerBILogo

DirectQuery Performance Progression

So much data, so little budget! Caching data in Power BI gives you the best report performance, but budget constraints usually put downward pressure to stay within lower Power BI Premium premium plans. So, what to do?

  1. Split large models. Remember that Power BI Premium Gen2 grants each dataset a 25 GB memory quota. I’m actually a big proponent for consolidated organizational semantic models, but this is where best practices meet reality.
  2. Switch large tables to DirectQuery. When report performance sucks, follow this performance optimization progression:
    1. Add a columnstore index to the fact table – As VertiPaq, a columnstore index organizes data in columns so aggregate queries should see an immediate performance boost. Detail-level queries, e.g. sales by customer, not so much as they probably won’t hit the index.
    2. Try hybrid tables when you can get away with a compromise where the latest data can be cached, but archive data left in DirectQuery.
    3. Try aggregation tables (first automatic, then manual) when you need to speed up aggregate queries at a higher-level grain, such as dashboard queries but leave lower-level queries pass through.
    4. Pray for mercy to come! Or take the stakeholder out for lunch and insist that the time for some compromise has come…

Atlanta MS BI and Power BI Group Meeting on January 3rd

Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, January 3rd, at 6:30 PM ET.  Paul Turley will show us how to integrate Power BI with paginated (SSRS) reports. And your humble correspondent will update you on the Power BI latest.  For more details and sign up, visit our group page.

Presentation:Power BI Paginated Reports: The New Old Operational Reporting Platform
Date:January 3rd
Time:6:30 – 8:30 PM ET
Place:Click here to join the meeting
Overview:Power BI Paginated Reports (aka SQL Server Reporting Services) was old but now it’s new again. Available on-premises or in the Power BI service with flexible licensing, you have multiple options to implement operational reports. This session will briefly cover the differences between analytic and operational reports; and help you understand the advantages and trade-offs using Power BI Paginated Reports, Power BI Report Server and SQL Server Reporting Services. Material from our forthcoming book: Paginated Report Recipes.
Speaker:Paul is a Principal Consultant for 3Cloud Solutions (formerly Pragmatic Works), a Mentor and Microsoft Data Platform MVP. He consults, writes, speaks, teaches & blogs about business intelligence and reporting solutions. He works with companies around the world to model data, visualize and deliver critical information to make informed business decisions; using the Microsoft data platform and business analytics tools. He is a Director of the Oregon Data Community PASS chapter & user group, the author and lead author of Professional SQL Server 2016 Reporting Services and 14 other titles from Wrox & Microsoft Press.  Paul is a 2021 FastTrack Recognized Solution Architect and holds several certifications including MCSE for the Data Platform and BI.
Prototypes without PizzaPower BI Latest

PowerBILogo

Prologika Newsletter Winter 2021

Happy Holidays! More and more organizations consider data virtualization to abstract the underlying storage and integrate siloed sources. In this letter, I’ll discuss a real-life project that used PolyBase to expose third-party ERP data as SQL tables. Before I get to the subject of this newsletter, I’m excited to announce the seventh edition of my “Applied Microsoft Power BI” book. It should be available on Amazon in the first days of 2022. As far as I know, it’s the only book that is updated annually to keep it up to the date with the fast-changing Power BI. Stay tuned for a future blog with more details about the book.

 

Business Case

Think of data virtualization is a logical data layer that integrates enterprise data across various on-premises and cloud sources. A large, multinational chemical manufacturer decided to migrate their on-premises ERP system to the cloud. As usually happens, the tradeoff for embracing the cloud is losing access to your data in its native storage. Previously, the client could readily integrate the ERP data stored in a SQL Server database. But the ERP vendor didn’t support this option in their cloud offering. The usual explanation cites security and performance issues, although none of them really hold water. The ERP vendor could have supported a premium tier where data is exposed privately without affecting other customers, and report queries could have been redirected to a secondary replica. This is no different that securing and scaling an Azure SQL Database. Alas, as more and more companies find when embracing the cloud, the integration burden gets heavier and is on them and not on the vendor.

To make things even more difficult, the vendor had a replication mechanism to export the data to AWS S3 data lake. Realizing that most clients would struggle calling their REST APIs, the vendor provided an JDBC driver that abstracted the APIs. Great, except that the client wanted to access the data on Microsoft Azure, but no Microsoft tool supports JDBC drivers because no Microsoft BI tool is written in Java.

Integration Options

One integration option could have been to use a JDBC-capable ETL tool, such as Pentaho. But that would have required implementing integration pipelines to pull the data periodically and stage it on Azure. This presented two issues. First, the data integration effort became more difficult as someone had to own and troubleshoot ETL failures. Second, Business wanted as much real-time access to data as possible. In the past, the business users had implemented self-service Power BI models that they would refresh as needed to cache the data from the on-premises database. However, since the ERP vendor required at least 20 minutes for data changes to be applied to the S3 data lake and ETL needed additional time (even with incremental extraction), the data latency became an issue.

The second option was to somehow virtualize the data. Had the vendor supported exporting the data as files on Azure, Synapse Serverless could have been used to expose the data as virtual tables that can be queried with SQL and loaded in Power BI Desktop. But Serverless doesn’t support AWS S3 and even if it did, the vendor didn’t allow direct access to the staged data (REST APIs and JDBC driver were the only supported options).

PolyBase to the Rescue

The solution I proposed was to use PolyBase which is included in SQL Server and Azure SQL Managed Instance. Ideally, the client wanted a full PaaS solution but only PolyBase in SQL Server supports ODBC. So, we had to use an IaaS VM just to virtualize the data. This diagram shows the solution architecture.

An Azure VM was provisioned with SQL Server 2019 Standard Edition. The JDBC driver was installed on the VM and configured to access the ERP data. We used a third-party JDBC-to-ODBC bridge driver to map the JDBC data source as an ODBC data source. Then, PolyBase external tables were set up to virtualize the ERP data as SQL Server tables.

The main drawback of this solution was that no matter how small the source table was, PolyBase would add about 30 seconds in internal processing. Specifically, the PolyBase runtime log has a detailed trail that shows that it takes some time for PolyBase to “warm up” before it gets to the query, and then it needs even more time to process the results. That’s because, as a distributed system (like Synapse), a head node coordinates the query execution with data nodes even if everything is installed on a single VM. More gotchas specific to the fact that the vendor has decided to use Oracle as their relational database can be found at https://prologika.com/polybase-adventures/.

Conclusion

Microsoft has made bold strides in data virtualization. I’m really impressed by Synapse Serverless, which I used for other projects, such as for the project described in this case study. I wish Microsoft extends Serverless to support more storage options. If Synapse Serverless is not an option, your next best bet would be PolyBase. Although PolyBase is supported in SQL MI and Synapse, only the SQL Server box SKU supports ODBC data sources, requiring an IaaS layer to virtualize the data.

Benefits

The solution delivered the following benefits to the client:

  • No ETL effort – Data was left at the original source.
  • Data virtualization – Polybase was used to create external tables that can be queried just like SQL Server regular tables.
  • Reduced data latency – Data changes were available as soon as they are replicated to the data lake.
  • Scalability – PolyBase can be scaled out to other servers if needed.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Power BI Hybrid Tables

A while back a client wanted to avoid importing a large snapshot fact table with loan balances because its memory footprint would require them to upgrade to a higher Power BI premium plan. This of course required leaving the table in DirectQuery mode at the expense of query performance. Luckily, most users would be interested in the latest six months of data. To speed up performance, we opted for aggregations. However, to complicate things further, they had M2M relationships between dimensions and the fact table which Power BI aggregations don’t support. So, we had to roll out our own “aggregation hits” by redirecting DAX measures either to the aggregated table if the as-of date was in the last six months or to the DirectQuery table otherwise.

Seasoned BI pros might recall that Multidimensional supports measure groups with a mixed storage by creating MOLAP and ROLAP partitions within the same table. The recently announced Power BI hybrid tables carry this concept to Tabular models hosted in Power BI Premium or PPU. This enables two new scenarios:

  • Implement real-time “hot” partitions – For best report performance, you can continue importing and refreshing the data periodically, but you can also implement a “hot” partition configured for DirectQuery to show the latest changes. This is the scenario supported by the incremental refresh policy discussed in the announcement.
  • Leave infrequently accessed data in DirectQuery – This is the scenario that I believe would inspire more interest to address the above requirement.

Implementing real-time partitions

The easiest way to implement a real-time partition is by defining an incremental refresh policy. All you must do is check the “Get the latest data in real time with DirectQuery” checkbox. This will add a DirectQuery partition to the end of the partition design created by Power BI. When the scheduled refresh runs, Power BI will refresh the historical partitions as it would normally do. However, all queries that request data after the scheduled refresh date (at the day boundary) will be sent to the data source. Consequently, your model will have new data that is inserted into the table. I suggest you also configure the report pages that show the real-time data for automatic page refresh (in Power BI Desktop, select the page and turn on the “Page refresh” slider) so that the visuals poll for data changes at a predefined cadence.

Leave infrequently accessed data in DirectQuery

Think of this scenario as the opposite of the real-time partitions because the frequently requested data is imported while the historical data is DirectQuery. Because Power BI Desktop doesn’t support custom partitions, you must use another tool, such as Tabular Editor or SSDT, to configure the partitions by connecting it to the published dataset via the XMLA endpoint (or working with a local *.bim file). If you prefer Power BI Desktop, another option could be to create the partitions in a published test or production model, use Power BI Desktop for development, and configure deployment pipelines to propagate the changes and preserve the partition design in the non-development environments. You’d probably need only two partitions:

  • Historical partition – Specify a SQL statement that queries the historical data with a WHERE clause that qualifies rows using a relative date, such as six months before the system date. Change the partition mode to DirectQuery.
  • Current partition – Specify a SQL statement that defines the slice for the frequently used data. Change the partition mode to Import.

Here are the high-level steps to implement the custom partition design with Tabular Editor assuming you would create the partitions in a published dataset. Note that the first step use Power BI Desktop. You can use Tabular Editor for all steps but it’s a more advanced tool so you might want to use something you’re more familiar with.

  1. Start by opening your dataset in Power BI Desktop. For best performance, drop all imported dimension tables that relate to the large fact table, reimport them in DirectQuery, and then change their storage mode to Dual. That’s because currently Power BI Desktop doesn’t let you switch from Import to DirectQuery and there is no workaround.
  2. Open the Power Query Editor. By default, every table will have one partition and if you used Power BI Desktop, the partition will be an M partition (meaning it will have a Power Query). Change the Source step of the first partition to use a custom query (click the gear icon next to the Source step and then enter your custom SQL statement as per you requirements. For example, if you want the default partition to import the last six months and you target SQL Server, the Source step might look like this:
    let
    Source = Sql.Database("XPS", "AdventureWorksDW2012", [Query="select * from FactResellerSales where OrderDate >= DATEADD(month, -6, GETDATE())"])
    in Source
  3. Publish the dataset to Power BI Service. Close Power BI Desktop and promise yourself never to use it again for that dataset (if you can’t live without it, see the aforementioned note about deployment pipelines). That’s because if you republish the model, PBI Desktop will nuke the custom partitions in the deployed dataset (nice work here Microsoft). Remember that the only partition design supported and preserved by Power BI Desktop is the incremental refresh policy but you can’t use incremental refresh for this scenario.
  4. Connect the Tabular Editor to the published dataset XMLA endpoint (File, Open, From DB). As a prerequisite, make sure to enable the XMLA Endpont for Read/Write in the capacity settings.
  5. Rename the first partition, e.g. ResellerSales-Import.
  6. Duplicate the partition and rename the second one ResellerSales-DQ. Change its Mode property to DirectQuery. Change its source query to slice the historical data.
    let
    Source = Sql.Database("XPS", "AdventureWorksDW2012", [Query="select * from FactResellerSales where OrderDate < DATEADD(month, -6, GETDATE())"])
    in Source
  7. Save (or deploy) the changes to Power BI Service and refresh the dataset.
  8. (Optional) Open SQL Server Profiler connected to the published dataset. User Power BI Desktop or Analyze in Excel to create a report that queries the fact table by date. Notice that when the date filter falls within the last six months, there are no events in the profiler because the query is answered by the Tabular cache. However, when the date filter is outside that period, the profiler shows a SQL SELECT statement because that data is left in the data source to reduce the model’s memory footprint at the expense of performance.

This is what your partition design should look like:

Hybrid tables go even further than composite models by allowing you to mix storage modes within a table and having partitions in Import or DirectQuery storage modes. They enable two scenarios: real-time “hot” partitions and leaving infrequently accessed data in DirectQuery data. What I would like to see Microsoft improve in future is a) let us switch from Import to DirectQuery although this might break calculated columns and Power Query transforms, b) extend Power BI Desktop to support custom partitions so that we can connect Tabular Editor as an external tool and change the partition design and save it in the pbix file, and c) extend Power BI Desktop to preserve the partition design on deploy.

Power BI Bookmark Navigator – A Better Hack

As a report author, you are constantly pressed to fit more visuals into a single page. The November release of Power BI Desktop introduced the Power BI Bookmark Navigator, which simplifies the process of creating a tabbed interface, such as this one.

Since Power BI doesn’t support visual containers or a “menu” visual, you must resort to the awful hack of hiding and showing UX elements by bookmarking them. This reminds me of the beginning of my career as a developer where we didn’t have widgets and we had to hack our way through implementing a custom navigation “experience” by toggling visibility. Alas, this continues in the 21st century but at least the hack got simplified. To implement the tabbed interface:

  1. Add two (or more) overlapping visuals.
  2. Add two bookmarks (Bar Chart and Column Chart) that show and hide the appropriate visual. Don’t worry about hidden visuals impacting the report performance because Power BI doesn’t process them.
  3. Add the two bookmarks to a Tabbed Interface bookmark group.
  4. In Report View, go to the Insert ribbon, expand the Buttons menu, and then click Navigators, “Bookmark navigator”.

Currently, Power BI supports two navigators. The “Page navigator” adds a tabbed navigation menu with a tab for each report page to let the user navigate to a given page by clicking the corresponding tab. The navigator that will inspire more interest is the “Bookmark navigator”.

  1. Notice that by default the navigator adds a tab for each bookmark defined in the report, but in this case, you just need to restrict it to the two bookmarks that you previously created. With the navigator selected, expand the Bookmarks section in the “Format navigator” pane, and select the “Tabbed Interface” bookmark group.
  2. Position the navigator above the two visuals. Remember that in Power BI Desktop, you need to press Ctrl when you click that navigator tabs to switch between the visuals.

Limitations and bugs:

  1. The previously selected tab gets stuck in a highlighted state, so you must hover on it to make it appear “unselected”.
  2. Hierarchical navigation is not supported. For example, you might want to build a page navigation experience like in Power BI apps. However, you can’t define a hierarchy, such as to start the user at the bookmark group level and then drill down to bookmarks.
  3. Although you can somewhat customize the tab appearance, no UX designer will probably be impressed. For example, one feature that could be useful to free up more page real estate is to be able to toggle the navigator visibility.

Drillthrough Paginated Reports

I’m helping a client convert a few SSRS reports from SharePoint to Power BI Premium Per User (PPU). SSRS is of course near and dear to my heart because of all the work I’ve done around it circa 2004-2010 (yep, it’s been that long), books, MVP awards, etc. Since its humble beginnings, SSRS have had a solid architecture that excelled in extensibility. You’d be hardly pressed to face a requirement that couldn’t meet with SSRS back then.

Unfortunately, most of these extensibility features, such as custom assemblies, custom security, custom delivery extensions, custom renderers (essentially everything related to custom code) didn’t make it to paginated reports in Power BI Premium. Not many companies are using these features, so they probably won’t be a showstopper for your migration. To their credit, Microsoft is closing the gap between SSRS and paginated reports. As of now, the feature limits that you might run into are:

Missing FeatureWorkaround
Shared data sources/datasetsReport-specific (embedded) data sources/datasets (yep, a maintenance nightmare)
Drillthrough report actionsChange to URL action and provide URL to drillthrough reports passing filters on the URL
Document mapNo workaround

Instead of drillthrough report actions, implement URL-based actions. Assuming embedding for your organization, here are the high-level steps:

  1. Deploy the drillthrough report to Power BI. Again, you must deploy to a Premium or PPU workspace. Run the report. It should run successfully. Go to File, Embed menu and copy the iframe code (assuming you want to embed the report in your company’s portal). The iframe code should look like this:
    <iframe width="800" height="600" src="https://app.powerbi.com/rdlEmbed?reportId=b82ed928-d9d9-42b5-b6cc-a6e8f3e9dc4d&autoAuth=true&ctid=<your tenant id>" frameborder="0" allowFullScreen="true"></iframe>
  2. Open the main report in Visual Studio or Power BI Report Builder. In the report properties, Code tab, define a public constant to the drillthrough report(s). Or, you can define internal parameters.

  3. Now find all instances of drillthrough actions in the main report. If there are many, it might be easier to open the report RDL in Visual studio and search for <Drillthrough>. For each textbox with drillthrough replace the Drillthrough section in the <ActionInfo> element with a Hyperlink, such as
<ActionInfo>
 <Actions>
 <Action>
 <Hyperlink>=String.Format("{0}&amp;rp:ProfileID={1}&amp;rp:Month1={2}&amp;rp:Month2={3}", Code.reportMigrationDrillthroughUri, Parameters!ProfileID.Value, Parameters!Month1.Value, Parameters!Month2.Value)</Hyperlink>
 </Action>
 </Actions>
</ActionInfo>
  1. In the example above, the positional string replace will replace the {0} placeholder with the report drillthrough constant you defined before. Then, you provide values for the drillthrough report parameters (in this case, the drillthrough report takes three parameters).
  2. Test the main report and drillthrough links. If all is well, publish the main report to Power BI. Obtain its embedded iframe code and add it to your app page.

Use the secure embed iframe URL for the main report so that it’s embedded in the app instead of opening in a new browser tab. Unfortunately, the drillthrough reports will open in a separate tab and I couldn’t find a workaround to render them inside the iframe of the main report.

Prologika Newsletter Fall 2021

LabCorp operates one of the largest clinical laboratory networks in the world. It also has an Interactive Response Technology system that healthcare vendors can use to conduct case studies.  Thanks to the cloud data analytics solution implemented by Prologika, LabCorp and its vendors can now analyze data across case studies. Read this newsletter to learn more about the solution architecture and business value.

 

Business Needs

The data for each study was saved into a separate on-prem SQL database. The total number of databases was more than 1,000. After the initial assessment, Prologika realized that one of the main gaps was that vendors couldn’t report across their studies or gain performance insights from studies conducted by other vendors. Further, as the IRT system evolves over time and to accommodate special requests, there were scheme differences between different versions.

LabCorp underscored the importance of consolidating the data from multiple studies into a single repository. They envisioned a cloud-based PaaS BI solution that would extract data from all the on-prem databases without impacting the system performance and centralize it into an enterprise data warehouse. Vendors would log an external portal that will deliver embedded reports. The first iteration was focused on analyzing audit and log data to gain strategic insights, such as how many users are using the system.

Solution

After assessing the current state and objectives, Prologika recommended and implemented the following architecture:

Prologika implemented an Azure Data Factory (ADF) configurable framework to extract data from the on-prem databases hosted on two production SQL Servers in parallel. The framework would stage the data into partitioned parquet files in Azure Data Lake Storage (ADLS).

Then Prologika created views in Synapse Serverless to consolidate the file extracts. For example, an Audit table could exist in several on-prem databases on both production servers. While each table would be staged in a separate file (or multiple files if the table supports incremental extraction), a Synapse Serverless would present a consolidated view across all files. We were impressed by the capabilities and performance of Synapse Serverless.

Another ADF process would extract the data from the Synapse Serverless views and load the data into a data warehouse hosted in a Synapse SQL pool. Although we considered other methods, such as using a linked server in a Azure SQL Managed Instance to Synapse Serverless to avoid data staging, we settled on Synapse SQL mainly for its scalability. Finally, a Power BI semantic model was created and Power BI Embedded used to deliver reports.

Benefits

The solution delivered the following benefits to LabCorp:

  • Data consolidation – Data was extracted with minimum impact to the operational systems and consolidated in ADLS.
  • Data virtualization – Thanks to the Synapse Serverless flexible support of schema differences, virtual views were created.
  • Scalability – Synapse SQL pool can scale almost indefinitely.
  • Secure and fast insights – Each vendor can access only their data.


Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo