Migrating Fabric Import Semantic Models to Direct Lake

I’ve recently written about strategies for addressing memory pressures with Fabric large semantic models and I mentioned that one of them was switching to Direct Lake. This blog captures my experience of migrating a real-life import semantic model to Direct Lake.

About the project

In this case, the client had a 40 GB semantic model with 250 million rows spread across two fact tables. The semantic model imported data from a Google BigQuery (GBQ) data warehouse. The client applied every trick in the book to optimize the model, but they’ve found themselves forced to upgrade from a Power BI P1 to P2 to P3 capacity.

I’ve written in the past about my frustration with Power BI/Fabric capacity resource limits. While the 25 GB RAM grant of a P1/F64 capacity for each dataset is generous for smaller semantic models, such as for self-service BI, it’s inadequate for large organizational semantic models. Ultimately, the developers must face gut wrenching decisions, such as whether to split the model into smaller semantic models, to obey what are in my opinion artificially low and inflexible memory limits. As a reference, my laptop has more memory than the P1/F64 memory grant and the price for 1 GB server RAM is $10.

So, we’ve decided to replicate the GBQ data to a Fabric lakehouse and try Direct Lake in order to avoid the dataset refresh which requires at least twice the memory. Granted, replicating data is an awkward solution, but currently Direct Lake requires data to be in a Fabric repository (Lakehouse or Data Warehouse) in the same tenant.

I’d like to indulge myself and imagine a future where other columnar database vendors will follow Microsoft and Databricks and embrace Delta storage, instead of proprietary formats, such as in the case of GBQ. This could allow semantic models to map directly to the vendor’s database, thus avoiding replication and facilitating cross-vendor BI architectures.

Replicating data

I used a Fabric Copy Job to replicate about 50 tables (all tables in one job) from GBQ to Fabric Lakehouse. Overall, the team was happy with how easy is to set up Copy Job and its job performance (it took about 40 minutes for full replication). As it stands (currently in preview), I uncovered a few Copy Job shortcomings compared to the ADF Copy activity:

  1. No incremental extraction (currently in preview for other data sources but not for GBQ).
  2. Doesn’t support mixing different options, such as incremental extraction for some tables and full load for others, or Overwrite mode for some tables an append for others. Currently, you must split multiple tables in separate jobs to meet such requirements. In addition, copying multiple tables doesn’t give you the option to use custom SQL statement to extract the data.
  3. Bugs. It looks like every time we make a change to the Source Data, such as changing the name of the destination tables, the explicit column mappings are lost to a point where we had to stop using them.
    4. Cannot change the job’s JSON file, such as if you want to quickly make find and replace changes.
    5. The user interface is clunky and it’s difficult to work with. For example, you can’t resize or maximize the screen.

Semantic model migration notes

Here are a few notes about the actual migration of the semantic model to Direct Lake that go beyond the officially documented Direct Lake limitations:

  1. Then, I used the Microsoft Semantic Link “Migration to Direct Lake” notebook. The notebook is very easy to use and did a respectable job of switching the table partitions to Direct Lake.
  2. In this project, the client used a Power Query function to translate the column names. Luckily, the client had a mapping table. I used ChatGPT to update the model by asking to look up each column in the mapping table and derive what the sourceColumn mapping should be. Most of time was spent fixing these bindings.
  3. Then, I used the preview feature of Power BI Desktop to create a project connected to the Direct Lake model. Then, I changed the model setting to Direct Lake behavior to avoid falling back on Direct Query.
    A screenshot of a computer AI-generated content may be incorrect.
  4. I quickly found that Direct Lake is very picky about metadata. Even if one column mapping is wrong, it invalidates the entire model. This manifests with errors showing for each table when you switch to the Model tab in Power BI Desktop.
    A screenshot of a computer AI-generated content may be incorrect.
  5. Attempting to refresh the model in Power BI Desktop to figure out what’s wrong produces all sorts of nonsensical errors. Instead, I used the “Refresh now” task in Power BI Service. This shows an error indicator next to the model that you can click to see the error description and you have to tackle each error one at the time by fixing it in the model.bim file in Power BI Desktop. Again, most errors were caused by wrong column mappings.
    A screenshot of a computer AI-generated content may be incorrect.
  6. The Microsoft notebook doesn’t migrate field parameters successfully. I had to look up the extended properties for the main field and add it manually to the model.bim file.
    extendedProperties": [{
    "type": "json",
"name": "ParameterMetadata",
"value": {
"version": 3,
"kind": 2
}
} ]
  1. The client had a Date dimension with dates starting with 1/1/1899 in order to support spurious dates from legacy data sources. This caused all measures to produce an error “A DateTime value is outside of the transportation protocol’s supported range. Dates must be between ‘1899-12-30T00:00:00’ and ‘9999-12-31T23:59:59’.” In import mode, Power BI auto-fixes bad dates but in DirectLake you must fix this on your own. So, we nuked 1899 in the date table.
  2. The original model had two date-related tables mapped to the same DW Date table. This is not allowed in Direct Lake so I had to clone the Date table one more time in the lakehouse.
  3. The original model used dummy tables to organize measures. These tables were created as DAX calculated tables which (together with calculated columns) are not allowed in Direct Lake. Instead, I created dummy tables with one column using a Python notebook in the lakehouse.

Using dummy tables to organize measures is a popular technique which I personally avoid for two main reasons. A while back I assessed a semantic model which had thousands of measures assigned to one Measures dummy table. This caused significant report performance degradation. Also, this approach confuses Power BI Q&A (not sure about copilot). I don’t know if Microsoft has resolved these issues, but I personally don’t use dummy tables. Instead, I assign measures to the actual fact tables.

Summary

Other than that, and after a few days of struggle, the model has been successfully blessed by Fabric as a Direct Lake model. Preliminary testing shows that performance is on par with import mode and we are optimistic that this approach will help significantly reduce the model memory footprint and possibly allow the client to downgrade the capacity, but more testing is warranted. And this will probably justify another blog in near future so stay tuned.

 

Atlanta Microsoft BI Group Meeting on July 7nd (Getting Started with Copilot in Microsoft Fabric)

Atlanta BI fans, please join us in person for our next meeting on Monday, July 7nd at 18:30 ET. Shabnam Watson (BI consultant and Microsoft MVP) will introduce you to Microsoft Fabric copilots. And your humble correspondent will walk you through some of the latest Power BI and Fabric enhancements. For more details and sign up, visit our group page.

Delivery: In-person
Level: Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Overview: Microsoft Fabric is revolutionizing data analytics, and Copilot is here to make it even smarter! Now available across all Fabric capacities, Copilot is more accessible than ever, allowing users at all levels to take advantage of AI-driven efficiencies. This session is a beginner-friendly introduction to Copilot within Microsoft Fabric, showcasing how AI can streamline data workflows, enhance insights, and boost productivity.

We’ll dive into what Copilot in Fabric is and how it fits into different data workflows, helping with things like data engineering, analytics, and reporting. You’ll see real-world examples of how Copilot makes tasks—like writing SQL queries, creating Power BI visuals, and summarizing data—way easier.

By the end of this session, attendees will have a solid foundation in Copilot’s capabilities within Fabric and feel ready to incorporate AI-driven efficiencies into their data workflows.

Speaker: Shabnam is a business intelligence consultant and owner of ABI Cube, a company that specializes in delivering data solutions using the Microsoft Data Platform. She has over 20 years of experience and is recognized as a Microsoft Data Platform MVP for her technical excellence and community involvement.

She is passionate about helping organizations harness the power of data to drive insights and innovation. She has a deep expertise in Microsoft Analysis Services, Power BI, Azure Synapse Analytics, and Microsoft Fabric. She is also a speaker, blogger, and organizer for SQL Saturday Atlanta – BI version, where she shares her knowledge and best practices with the data community.

Sponsor: Shabnam Watson (Abicube)

PowerBILogo

SaaS Losers and Winners: Workday and NetSuite

“Well, I’ll be damned
Here comes your ghost again
But that’s not unusual…”

“Diamonds and Rust”, Joan Baez

I’ve said in the past that I consider it a travesty when a SaaS provider doesn’t give you direct access to the data in its native storage, such as by ODBC and OLE DB providers, and forcing you to use file extracts or APIs (often horrible). This greatly inhibits data integration scenarios, such as extracting data for data warehousing. I wrote on this subject many times, including here, here and here. Let’s look at how two popular vendors have approached data access.

SaaS Loser: Workday

Workday is a popular HR cloud platform. Unfortunately, the only way to get your data out is via their horrible APIs, as a large law firm has discovered. If you use Workday, I recommend you explore the Workday Query Language (WQL) so you don’t have to parse the API results yourself. A third-party (CDATA) driver makes the WQL calls easier but if you don’t want to rely on external software, try the following approach.

Workday’s REST API, particularly the /data endpoint, supports WQL queries directly, allowing custom applications to execute WQL without a third-party tool. The /data endpoint (e.g., https://wd2-impl-services1.workday.com/ccx/api/wql/v1/tenant_name/data) accepts WQL queries, including joins based on Workday’s object relationships. Once you enable WQL, you can use ADF or another data integration tool to send SQL-like queries:

sql
SELECT w.Worker_ID, w.First_Name, o.Name
FROM Workers w
LEFT JOIN Organizations o ON w.Organization_ID = o.ID
WHERE w.Active = 1

Results are returned as JSON which you need to flatten as a dataset. At least, you don’t have to parse the horrible JSON nested payload yourself!

SaaS Winner: NetSuite

NetSuite is a very popular ERP software. I’m currently amid a data integration project with NetSuite. NetSuite has provided two interfaces:

  • REST (and SOAP) API: This is your best bet if you want to automate writes to NetSuite, such as creating orders and invoices. The REST interface also supports sending SQL queries via the SuiteQL endpoint (https://<account_id>.suitetalk.api.netsuite.com/services/rest/query/v1/suiteql). This is a good option if you want to use SQL for lookups and queries that don’t return a lot of data.
  • ODBC: Use this option to extract massive datasets, such as when loading a data warehouse. To dump some cold water on NetSuite here, not many integration tools come with native NetSuite providers, so you must resort to the NetSuite ODBC driver (this requires a self-hosted runtime if you use Azure Data Factory). Further, since being an Oracle product, NetSuite probably uses an Oracle backend database, performance is suboptimal, as I have come to expect every time I integrate with Oracle. But a better option for sure than dealing with REST APIs.

A person sitting at a computer AI-generated content may be incorrect.

 

Live Monitoring of Fabric Mirroring

A current project called for mirroring a Google BigQuery dataset to Fabric. This feature is currently in private preview so don’t try to find it. However, the tips I share here should be applicable to other available mirroring scenarios, such as mirroring from Azure SQL Database.

One of the GBQ tables was a transaction fact table with some 130 million rows. The issue was that the mirroring window would show this table as normally replicating table with Running green status, but we waited and waited and nothing was happening…

A screenshot of a computer AI-generated content may be incorrect.

First, the “Rows Replicated” counter is not continuously updating (doesn’t report live mirroring info). Instead, it’s updated when the table is fully replicated. That’s because the Delta table operations are atomic in nature.

So, how do we do live monitoring of the initial synchronization then? We can use the Azure Storage Explorer to access directly the delta files of the mirrored database as described here. After some experimenting (shorten the URL below to the workspace name only to see all OneLake artifacts in the workspace), I’ve found that the itemType of a mirrored database is MountedRelationalDatabase. So, the URL to an Adventure Works mirrored database in a workspace named Fabric Tutorial would be
https://onelake.dfs.fabric.microsoft.com/Fabric Tutorial/Adventure Works.MountedRelationalDatabase/

This lets us get to the underlying delta tables and delta logs in Azure Storage Explorer which are of course updating as data is being replicated. Examining the underlying storage is also useful to detect potential issues. Here is an example of the first table that was successfully replicated. Note that there is a parquet file containing the actual data.

And here is an example of the second table. As we can see, the delta log was created but there is no parquet file. So, something went wrong although according to Fabric all is well.

To summarize, the Fabric mirroring UI has left a lot of room for improvement. By examining the delta log file size and delta files of a specific table we can discover how the mirroring session of the initial synchronization is progressing.

 

Atlanta Microsoft BI Group Meeting on June 2nd (Power BI monitoring with Log Analytics Workspace)

Atlanta BI fans, please join us in person for our next meeting on Monday, June 2nd at 18:30 ET. Akshata Revankar (Senior Data & Applied Scientist with Microsoft) will show us how to streamline and optimize your Power BI environment using Azure Data Explorer (ADX) and Kusto Query Language (KQL). And your humble correspondent will walk you through some of the latest Power BI and Fabric enhancements. For more details and sign up, visit our group page.

Delivery: In-person

Level: Intermediate

Food: Pizza and drinks will be provided

 

Agenda:

18:15-18:30 Registration and networking

18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)

19:00-20:15 Main presentation

20:15-20:30 Q&A

Overview: Are you an admin overseeing multiple Power BI workspaces and dealing with the challenge of monitoring activities and performance across all of them in a unified view? This session is designed precisely for you. Join us as we delve into a comprehensive guide on how to streamline and optimize your Power BI environment using Azure Data Explorer (ADX) and Kusto Query Language (KQL).

Speaker: Akshata Revankar (Senior Data & Applied Scientist with Microsoft) has 20+ Years of experience in the data engineering and data reporting space. She has worked with Oracle database, SQL Server, SSIS, Informatica Power Center, Hadoop systems, Qlik and Power BI. Akshata enjoys being in the data space and learning new things.

Sponsor: At CloudStaff, we believe technology should make work easier, not more complicated. We help big companies use AI and automation to solve real problems and make employees’ lives better (CloudStaff.ai).

PowerBILogo

First Look at Fabric Translytical Task Flows

The May release of Power BI Desktop includes a new feature called Translytical Task Flows which aims to augment Power BI reports with rudimentary writeback capabilities, such as to make corrections to data behind a report. Previously, one way to accomplish this was to integrate the report with Power Apps as I demonstrated a while back here. My claim to fame was that Microsoft liked this demo so much that it was running for years on big monitors in the local Microsoft office!

Are translytical flows a better way to implement report writeback? I followed the steps to test this feature and here are my thoughts.

The Good

I like that translytical flows don’t require external integration and additional licensing. By contrast, the Power Apps integration required implementing an app and incurring additional licensing cost.

I like that Microsoft is getting serious about report writeback and has extended Power BI Desktop with specific features to support it, such as action buttons, new slicers, and button-triggered report refresh.

I like that you can configure the action button to refresh the report after writeback so you can see immediately the changes (assuming DirectQuery or DirectLake semantic models). I tested the feature with a report connected to a published dataset and it works. Of course, if the model imports data, refreshing the report won’t show the latest.

The Bad

Currently, you must use the new button, text, or list slicers, which can only provide a single value to the writeback function. No validation, except if you use a list slicer. From end user experience, every modifiable field would have a separate textbox. Ideally, I’d like to see Microsoft extending the Table visual to allow editing in place.

Translytical flows require a Python function to propagate the changes. Although this opens new possibilities (you can do more things with custom code), what happened to low-code, no-code mantra?

The Ugly

Currently, the data can be written to only four destinations:

  • Fabric SQL DB (Azure SQL DB provisioned in Fabric)
  • Fabric Lakehouse
  • Fabric Warehouse
  • Fabric Mirrored DB

Notice the “Fabric” prefix in all four options? Customers will probably interpret this as another attempt to force them into Fabric. Not to mention that this imitation excludes 99% of real-life scenarios where the data is in other data sources. I surely hope Microsoft will open this feature to external data sources in future.

So, is the Power Apps integration for writeback obsolete? Not really because it is more flexible and provides better user experience at the expense of additional licensing cost.

In summary, as they stand today, transalytical task flows attempt to address basic writeback needs within Fabric, such as changing a limited number of report fields or performing massive updates on a single field. They are heavily dependent on Fabric and support writeback to only Fabric data sources. Keep an eye on this feature with the hope that it will evolve over time to something more useful.

Thumbnail Image A digital illustration of a computer screen displaying a table report, similar to a spreadsheet or database table. One of the cells in the table is highlighted with a blinking text cursor inside it, indicating that the data is being edited. The table contains rows and columns with sample data like names, dates, and numbers. In the background or to the side, include a visual representation of a relational database—such as a 3D cylinder labeled 'Database' connected with lines to the table, symbolizing data flow. The overall style should be clean and modern, suitable for a tech or data visualization context.

 

App Integration Tools and Patterns

“…That’s when the teardrops start, fella…
…Dirty deeds and they’re done dirt cheap”

“Dirty deeds done dirt cheap”, AC/DC

A few months ago, I did an assessment for a large organization seeking to modernize their data analytics processes. As a part of standardizing the toolset, the client was considering Salesforce MuleSoft for both application and data integration. After all, the vendor told them the tool can do “everything”. Indeed, taken from the vendor page, we learn that “MuleSoft is a platform that gives IT the tools to automate everything. This includes integrating data and systems, automating workflows and processes, and creating incredible digital experiences — all on a single, easy-to-use platform.”

Let’s parse this claim.

Defining integration types

Quick as a bunny and excited that I finally have one tool that does it all, we hop to the Gartner’s Magic Quadrant for Data Integration Tools. Surprise, Salesforce MuleSoft is not even there. Where is MuleSoft? It’s on the Gartner’s Magic Quadrant for Integration Platform as a Service and the Magic Quadrant for API Management. Confused? Let’s start with definitions.

Data integration (note the focus on data)

Gartner defines data integration as the discipline that includes architectural patterns, tools, and methodologies for accessing, harmonizing, transforming, processing, and moving data across various endpoints and infrastructures. It ensures consistent access and delivery of data across different subject areas and structures to meet business and application needs.

Integration Platform (note the focus on applications)

“Gartner defines integration platform as a service (iPaaS) as a vendor-managed cloud service that enables end users to implement integrations between a variety of applications, services and data sources, both internal and external to their organization.”

API Management (note the focus on APIs)

“Gartner defines the application programming interface (API) management market as the market for software that supports API life cycle stages such as planning and design, implementation and testing, deployment and operation, and versioning and retirement.”

So, should we use MuleSoft for ETL and data integration patterns typically involving large data volumes, such as to load a data warehouse? I say we don’t. You probably could, but it will be like trying to fit a square peg into a round hole. Did that client try it despite my advice against it? They did and came to the same realization. Should we consider MuleSoft for enterprise-level app integration, such as something happening in one system triggers something happening in another system by calling APIs? Probably, that’s what’s designed to do anyway. Let’s choose the right tool for the job.

Application integration done dirt cheap

What about a true application integration scenario? In a current assessment, a client uses a third-party app which stores data in a SQL Server database. Each subsidiary has its own database that tracks invoicing locally through the app. However, a central NetSuite ERP system is used to consolidate invoices from all subsidiaries. So, the integration task at hand is to implement a bi-directional synchronization where each new local invoice (and its associated line items) is exported to NetSuite and the local invoice is then updated with the corresponding NetSuite transaction id (it’s more complicated than this but you get the idea).

Now this is where we can surely use “specialized” tools like MuleSoft, Boomi, Celigo, etc., right? And we surely pay a premium, to the likes of thousands of dollars per month. I don’t claim to know these tools but for the most part they appear to me as nice UI wrappers on top of the vendor APIs. They certainly avoid dealing with the actual API complexities, JSONs, etc, but should you lock yourself and break the bank for small scale app integration? Nowadays, everyone wants to do more with less.

Therefore, knowing and using the Microsoft Azure stack, my personal choice is to first consider Azure Data Factory. True, it’s a generic data integration tool and I must learn how to call the vendor APIs and deal with JSONs but it’s not a big deal (AI to help here). ADF can even do webhooks. And it will save you tons of money.

A group of people standing around a large device AI-generated content may be incorrect.

 

Atlanta Microsoft BI Group Meeting on May 5th (Optimize Power BI Reports and Semantic Models Using External Tools)

Atlanta BI fans, please join us in person for our next meeting on Monday, May 5th at 18:30 ET. Lakshmana R Boya will show us how to optimize Power BI reports by identifying the performance and size issues using “Measure Killer” community app. Your humble correspondent will walk you through some of the Power BI and Fabric enhancements of late. For more details and sign up, visit our group page.

Delivery: In-person

Level: Intermediate

Food: Pizza and drinks will be provided

 

Agenda:

18:15-18:30 Registration and networking

18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)

19:00-20:15 Main presentation

20:15-20:30 Q&A

Overview: We will deep dive into how we can optimize Power BI reports and semantic models by identifying the performance and size issues using the “Measure Killer” app.

Sponsor: Prologika (https://prologika.com) helps organizations of all sizes to make sense of data by delivering tailored BI solutions that drive actionable insights and maximize ROI. Your BI project will be your best investment, we guarantee it!

PowerBILogo

Two Takeaways from FabCon 2025

The 2025 Fabric Community conference has just concluded. There were many announcements but I found two of them intriguing to make a case for Fabric more compelling should a suitable project comes around:

  • OneSecurity – Analysis Services Tabular and Power BI semantic models have long supported a comprehensive and flexible row-level security (RLS), such as to support scenarios requiring dynamically denying rows. Unfortunately, column-level security (aka object-level security) has been less flexible as I explain in more detail here. I hope OneSecurity will prove more flexible over time. Currently, it doesn’t support dynamic security, but I hope it eventually will, such as by allowing a SELECT statement to evaluate runtime conditions. So, keep an eye out for OneSecurity as it evolves over time if you face more complex column-level security requirements.
  • Copilots for all F SKUs – Previously, I complained that Microsoft has established that AI must require premium pricing starting at $5,000/mo (F64 and above). But now they’ve promised that by the end of April copilots would go mainstream and available in all F SKUs should you need AI assistance.

Alas, no major announcements about Fabric Warehouse. As with Synapse, Microsoft went back to the drawing board with Fabric Warehouse. Consequently, the its T-SQL surface area doesn’t support various features I use. For example, IDENTITY and MERGE were promised long ago but unfortunately didn’t materialize. I personally don’t care for the most part about relational-like lakehouses or feature bundles. I don’t dare even dream about “advanced” features that SQL Server had for a long time, such as multiple databases, temporal tables, etc. Without robust warehousing, I still find it difficult to substantiate a good case for Fabric for most of my projects.

A group of people in a conference room AI-generated content may be incorrect.

Atlanta Microsoft BI Group Meeting on April 7th (Source Control for Power BI)

Atlanta BI fans, please join us in person for our next meeting on Monday, April 7th at 18:30 ET. Neel Pathak will explain how to do source control for Power BI with Git integration. Your humble correspondent will walk you through some of the Power BI and Fabric enhancements of late. For more details and sign up, visit our group page.

Presentation: Source Control for Power BI
Delivery: In-person
Level: Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Overview: The days of sharing pbix files and copying in changes are gone. Git integration now enables development teams to collaborate and provides version control/rollbacks.

Speaker: Neel Pathak (Senior Consultant with 3Cloud) a BI and analytics professional with an expertise in Power BI and guiding clients through their analytics transformation journeys.

Sponsor: 3Cloud is the #1 Microsoft Azure partner. No other partner combines 3Cloud’s expertise in Azure with a deep understanding of infrastructure, data and analytics and app innovation. We have proven our impact and are trusted with client’s most complex business problems. We look for people who share our passion for technology and want to unleash their potential.

PowerBILogo