Power BI Hybrid Tables

A while back a client wanted to avoid importing a large snapshot fact table with loan balances because its memory footprint would require them to upgrade to a higher Power BI premium plan. This of course required leaving the table in DirectQuery mode at the expense of query performance. Luckily, most users would be interested in the latest six months of data. To speed up performance, we opted for aggregations. However, to complicate things further, they had M2M relationships between dimensions and the fact table which Power BI aggregations don’t support. So, we had to roll out our own “aggregation hits” by redirecting DAX measures either to the aggregated table if the as-of date was in the last six months or to the DirectQuery table otherwise.

Seasoned BI pros might recall that Multidimensional supports measure groups with a mixed storage by creating MOLAP and ROLAP partitions within the same table. The recently announced Power BI hybrid tables carry this concept to Tabular models hosted in Power BI Premium or PPU. This enables two new scenarios:

  • Implement real-time “hot” partitions – For best report performance, you can continue importing and refreshing the data periodically, but you can also implement a “hot” partition configured for DirectQuery to show the latest changes. This is the scenario supported by the incremental refresh policy discussed in the announcement.
  • Leave infrequently accessed data in DirectQuery – This is the scenario that I believe would inspire more interest to address the above requirement.

Implementing real-time partitions

The easiest way to implement a real-time partition is by defining an incremental refresh policy. All you must do is check the “Get the latest data in real time with DirectQuery” checkbox. This will add a DirectQuery partition to the end of the partition design created by Power BI. When the scheduled refresh runs, Power BI will refresh the historical partitions as it would normally do. However, all queries that request data after the scheduled refresh date (at the day boundary) will be sent to the data source. Consequently, your model will have new data that is inserted into the table. I suggest you also configure the report pages that show the real-time data for automatic page refresh (in Power BI Desktop, select the page and turn on the “Page refresh” slider) so that the visuals poll for data changes at a predefined cadence.

Leave infrequently accessed data in DirectQuery

Think of this scenario as the opposite of the real-time partitions because the frequently requested data is imported while the historical data is DirectQuery. Because Power BI Desktop doesn’t support custom partitions, you must use another tool, such as Tabular Editor or SSDT, to configure the partitions by connecting it to the published dataset via the XMLA endpoint (or working with a local *.bim file). If you prefer Power BI Desktop, another option could be to create the partitions in a published test or production model, use Power BI Desktop for development, and configure deployment pipelines to propagate the changes and preserve the partition design in the non-development environments. You’d probably need only two partitions:

  • Historical partition – Specify a SQL statement that queries the historical data with a WHERE clause that qualifies rows using a relative date, such as six months before the system date. Change the partition mode to DirectQuery.
  • Current partition – Specify a SQL statement that defines the slice for the frequently used data. Change the partition mode to Import.

Here are the high-level steps to implement the custom partition design with Tabular Editor assuming you would create the partitions in a published dataset. Note that the first step use Power BI Desktop. You can use Tabular Editor for all steps but it’s a more advanced tool so you might want to use something you’re more familiar with.

  1. Start by opening your dataset in Power BI Desktop. For best performance, drop all imported dimension tables that relate to the large fact table, reimport them in DirectQuery, and then change their storage mode to Dual. That’s because currently Power BI Desktop doesn’t let you switch from Import to DirectQuery and there is no workaround.
  2. Open the Power Query Editor. By default, every table will have one partition and if you used Power BI Desktop, the partition will be an M partition (meaning it will have a Power Query). Change the Source step of the first partition to use a custom query (click the gear icon next to the Source step and then enter your custom SQL statement as per you requirements. For example, if you want the default partition to import the last six months and you target SQL Server, the Source step might look like this:
    let
    Source = Sql.Database("XPS", "AdventureWorksDW2012", [Query="select * from FactResellerSales where OrderDate >= DATEADD(month, -6, GETDATE())"])
    in Source
  3. Publish the dataset to Power BI Service. Close Power BI Desktop and promise yourself never to use it again for that dataset (if you can’t live without it, see the aforementioned note about deployment pipelines). That’s because if you republish the model, PBI Desktop will nuke the custom partitions in the deployed dataset (nice work here Microsoft). Remember that the only partition design supported and preserved by Power BI Desktop is the incremental refresh policy but you can’t use incremental refresh for this scenario.
  4. Connect the Tabular Editor to the published dataset XMLA endpoint (File, Open, From DB). As a prerequisite, make sure to enable the XMLA Endpont for Read/Write in the capacity settings.
  5. Rename the first partition, e.g. ResellerSales-Import.
  6. Duplicate the partition and rename the second one ResellerSales-DQ. Change its Mode property to DirectQuery. Change its source query to slice the historical data.
    let
    Source = Sql.Database("XPS", "AdventureWorksDW2012", [Query="select * from FactResellerSales where OrderDate < DATEADD(month, -6, GETDATE())"])
    in Source
  7. Save (or deploy) the changes to Power BI Service and refresh the dataset.
  8. (Optional) Open SQL Server Profiler connected to the published dataset. User Power BI Desktop or Analyze in Excel to create a report that queries the fact table by date. Notice that when the date filter falls within the last six months, there are no events in the profiler because the query is answered by the Tabular cache. However, when the date filter is outside that period, the profiler shows a SQL SELECT statement because that data is left in the data source to reduce the model’s memory footprint at the expense of performance.

This is what your partition design should look like:

Hybrid tables go even further than composite models by allowing you to mix storage modes within a table and having partitions in Import or DirectQuery storage modes. They enable two scenarios: real-time “hot” partitions and leaving infrequently accessed data in DirectQuery data. What I would like to see Microsoft improve in future is a) let us switch from Import to DirectQuery although this might break calculated columns and Power Query transforms, b) extend Power BI Desktop to support custom partitions so that we can connect Tabular Editor as an external tool and change the partition design and save it in the pbix file, and c) extend Power BI Desktop to preserve the partition design on deploy.