Posts

Direct Lake Composite Models

I’ve mentioned previously in the “A Couple of Direct Lake Gotchas” post that unlike Power BI reports, Excel surprised me by not showing user-defined hierarchies in a Direct Lake model. Direct Lake comes with other gotchas, such as not supporting DAX calculated columns and SQL views. I normally don’t use calculated columns, but they can come in handy, such as for flatting a parent-child hierarchies outside ETL. And I like SQL views as an insurance policy for making quick transforms or filters on top of loaded tables to avoid ETL changes.

Recently, Microsoft introduced composite Direct Lake models which I demonstrated as a part of a pilot project, mainly to preserve the Excel report experience for financial users.

Direct Lake Only vs Composite Direct Lake

I view the composite Direct Lake models as the best of both worlds and this table summarizes their characteristics.

Direct Lake Only Composite (Direct Lake + Import)
Public Preview
Storage mode Direct Lake Some tables, such as dimensions, in import mode, others, such as fact tables, in Direct Lake
Tooling Web modeling, Power BI Desktop Web modeling only (limited support in PBI Desktop)
Refresh required No (Fabric monitors the Delta log) Imported tables must be refreshed, such as overnight for dimension changes
Memory consumption Used columns in reports are paged in and out Refresh requires at least twice the memory of imported objects
SQL Views No Yes
Calculated columns No Yes, such as PATH and PATHITEM to flatten parent-child hierarchies outside ETL
User-defined hierarchies Power BI reports Power BI, Excel, and Anaplan
Power Query No Yes

Therefore, composite Direct Lake models could be a good compromise between real-time BI and flexibility. For example, now you can implement the following configuration:
1. Dimensions in Import mode and refresh them overnight since they probably don’t change frequently anyway.
2. Large fact tables or tables requiring real-time BI in Direct Lake without refresh.

Lessons Learned

If composite models sound appealing, you might be eager to convert an existing Direct Lake model to composite. Here are some issues/gotchas that I ran into doing so:

  1. The web modeling experience (currently, the only way to add imported tables using Microsoft tooling) would show the Get Data buttons disabled. After some reverse-engineering of a brand new model, I fixed it my changing the connection expression in the *.bim file to use onelake (previously, it was pointing to a Fabric warehouse).
"expression": [
"let",
" Source = AzureStorage.DataLake(\"https://onelake.dfs.fabric.microsoft.com/44744d84-...\", [HierarchicalNavigation=true])",
  1. If you’re switching many tables from Direct Lake, you can automate this in two ways:
import sempy_labs as sl
from sempy_labs import migration
migration.migrate_direct_lake_to_import(
  dataset="Your_Direct_Lake_Model_Name",
  workspace="Your_Workspace_Name",
  mode="import" # or "directquery"
)
  • C# script in the Tabular Editor
  1. In my case, since there weren’t that many tables, I converted the dimension table partitions manually to “M” partitions, as in this example (change bolded text for each table):
    "partitions": [
{
"name": "Account",
"mode": "import",
"source": {
"expression": [
"let",
" Source = Sql.Database(\"<guid>.datawarehouse.fabric.microsoft.com\", \"<warehouse name>\"),",
" #\"Navigation 1\" = Source{[Schema = \"dbo\", Item = \"DimAccount\"]}[Data]",
"in",
" #\"Navigation 1\""
],
"type": "m"
}
} ]
  1. After switching a partition from Direct Lake to Import in PBI Desktop project and synchronizing to the connected published model, Fabric rejected the change complaining that existing DirectLake table can’t be switched to imported storage. As a workaround, I dropped the connected model.
  2. Being in public review, Composite Direct Lake is rough around the edges. I’ve got various complaints about credentials missing which I fixed in the dataset settings.
  3. Although the documentation says that web modeling is the only tooling experience, Power BI Desktop worked for me as with the Direct Lake only counterpart. However, currently Power Query and Get Data (unless you add the tables directly in *.bim file) is only available on the web.

A “Limited” Performance Note

I know everyone is interested in performance. I did some limited performance tests by tracing a massive query against equivalent Direct Lake Only and Composite Direct Lake models. On a cold cache, Composite outperformed Direct Lake Only by some 20%. On a warm cache, surprisingly I saw the reverse, Direct Lake Only outperforming Composite five to six times. Please don’t take this verbatim. More than likely, your results will vary. For example, in that blog I said that I saw much better performance with SWITCH…CASE in Imported vs Direct Lake. Test!

 

Atlanta Microsoft BI Group Meeting on March 2nd (Your First Steps in Microsoft Fabric Using Just SQL)

Atlanta BI fans, please join us in person for our next meeting on Monday, March 2nd at 18:30 ET. Shabnam Watson will show you how you can apply your SQL skills in Microsoft Fabric. And your humble correspondent will walk you through some of the latest Power BI and Fabric enhancements. I will sponsor the meeting. For more details and sign up, visit our group page.

Delivery: In-person
Level: Beginner/Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Overview: New to Microsoft Fabric? Don’t worry—you already know more than you think. In this beginner-friendly session, we’ll explore how your existing SQL skills translate directly into Fabric without needing to learn Spark, Python, or unfamiliar engineering tools. You’ll see how SQL can be applied across Fabric items to explore, shape, and analyze data with confidence. If you’re just beginning your Fabric journey, this session offers a simple, approachable path to success using the skills you already have.

Speaker: Shabnam is a business intelligence consultant and owner of ABI Cube, a company that specializes in delivering data solutions using the Microsoft Data Platform. She has over 20 years of experience and is recognized as a Microsoft Data Platform MVP for her technical excellence and community involvement. She is passionate about helping organizations harness the power of data to drive insights and innovation. She has a deep expertise in Microsoft Analysis Services, Power BI, Azure Synapse Analytics, and Microsoft Fabric. She is also a speaker, blogger, and organizer for SQL Saturday Atlanta – BI version, where she shares her knowledge and best practices with the data community.

Sponsor: Prologika (https://prologika.com) helps organizations of all sizes to make sense of data by delivering tailored BI solutions that drive actionable insights and maximize ROI. Your BI project will be your best investment, we guarantee it!

PowerBILogo

Atlanta Microsoft BI Group Meeting on February 2nd (Power BI Translytical Taskflows)

Atlanta BI fans, please join us in person for our next meeting on Monday, February 2nd at 18:30 ET. Sukhwant Kaur (Product Manager at Microsoft) will show you how supercharge your Power BI reports with translytical taskflows. And your humble correspondent will walk you through some of the latest Power BI and Fabric enhancements. CloudStaff.au will sponsor the meeting. For more details and sign up, visit our group page.

Delivery: In-person
Level: Beginner/Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Overview: Join us for an engaging session exploring how to build powerful translytical applications using Power BI, Functions, and SQL Database within Microsoft Fabric. We’ll discuss best practices for integrating analytics and transactional workloads, demonstrate real-world use cases, and provide actionable tips for leveraging Fabric’s unified platform. This talk is ideal for data professionals interested in bridging analytics and operations for enhanced business value.

Speaker: Sukhwant has served as a Product Manager at Microsoft for the past few development cycles. During this time, she’s focused on the entire product management lifecycle, from working with development teams and user experience to collaborating with cross-functional teams to drive customer satisfaction in ensuring our products not only meet but exceed customer expectations.

Sponsor: At CloudStaff.ai we’re making work MORE. HUMAN. We believe in the power of technology to enhance human potential, not replace it. Our innovative AI and automation solutions are designed to make work easier, more efficient, and more meaningful. We help businesses of all sizes streamline their operations, boost productivity, and solve real-world challenges. Our approach combines cutting-edge technology with a deep understanding of human needs, creating solutions that work the way people do! https://cloudstaff.ai

PowerBILogo

Atlanta Microsoft BI Group Meeting on January 5th (Visual Calculations in Power BI)

Atlanta BI fans, please join us in person for our next meeting on Monday, January 5th at 18:30 ET. Dean Jurecic will show you how Power BI visual calculations can simplify the process of writing DAX. And your humble correspondent will walk you through some of the latest Power BI and Fabric enhancements. Key2 Consulting will sponsor the meeting. For more details and sign up, visit our group page.

Delivery: In-person
Level: Beginner/Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Overview: Do you sometimes get lost in a sea of complicated DAX and wonder if there is an easier way? Is it difficult to drive self-service reporting in your organization because business users aren’t familiar with the nuances of DAX and Semantic Models? Visual Calculations might be able to help!

Introduced in 2024 and currently in preview, this feature is designed to simplify the process of writing DAX and combines the simplicity of calculated columns with the on-demand calculation flexibility of measures. This session is an overview of Visual Calculations and how they can be used to quickly produce results including:
• Background
• Example Use Cases
• Performance
• Considerations and Limitations

Speaker: Dean Jurecic is a business intelligence analyst and consultant specializing in Power BI and Microsoft Fabric with experience across diverse industries, including utilities, retail, government, and education. Dean is a Fabric Community Super User who holds a number of Microsoft certifications and has participated in the “Ask the Experts” program for Power BI at the Microsoft Fabric Community Conference.

Sponsor: Key2 Consulting is a cloud analytics consultancy that helps business leaders maximize their data. We are a Microsoft Gold-Certified Partner and our specialty is the Microsoft cloud analytics stack (Azure, Power BI, SQL Server).

PowerBILogo

First Look at Fabric IQ: The Good, The Bad, and The Ugly

Telegraph sang a song about the world outside
Telegraph road got so deep and so wide
Like a rolling river…

The Telegraph Road, Dire Straits

At Ignite in November, 2025, Microsoft introduced Fabric IQ. I noted to go beyond the marketing hype and check if Fabric IQ makes any sense. The next thing I know, around the holidays I’m talking to an enterprise strategy manager from an airline company and McKinsey consultant about ontologies.

Ontology – A branch of philosophy, ontology is the study of being that investigates the nature of existence, the features all entities have in common, and how they are divided into basic categories of being. In computer science and AI, ontology refers to a set of concepts and categories in a subject area or domain that shows their properties and the relations between them.

So, what better way to spend the holidays than to play with new shaky software?

What is Fabric IQ?

According to Microsoft, Fabric IQ is “a unified intelligence platform developed by Microsoft that enhances data management and decision-making through semantic understanding and AI capabilities.” Clear enough? If not, if you view Fabric as Microsoft’s answer to Palantir’s Foundry, then Fabric IQ is the Microsoft equivalent of Palantir’s Foundry Ontology, whose success apparently inspired Microsoft.

Therefore, my unassuming layman definition of Fabric IQ is a metadata layer on top of data in Fabric that defines entities and their relationships so that AI can make sense of and relate the underlying data.

For example, you may have an organizational semantic model built on top of an enterprise data warehouse (EDW) that spans several subject areas. And then you might have some data that isn’t in EDW and therefore outside the semantic model, such as HR file extracts in a lakehouse. You can use Fabric IQ as a glue that bridges that data together. And so, when the user asks the agent “correlate revenue by employee with hours they worked”, the agent knows where to go for answers.

Following this line of thinking, Microsoft BI practitioners may view Fabric IQ as a Power BI composite semantic model on steroids. The big difference is that a composite model can only reference other semantic models while Fabric IQ can span data in multiple formats.

The Good

Palantir had a head start of a decade or so compared to Microsoft Fabric, but yet even in its preview stage, I like a thing or two about Fabric IQ from what I’ve seen so far:

  • Its oncology can span Power BI semantic models (with caveats explained in the next section), powered by best-in-class technology. As I mentioned before, this allows you to bridge all the business logic and calculations you carefully crafted in a semantic model to the rest of your Fabric data estate.
  • Fabric IQ integrates with other Microsoft technologies, such as real-time intelligence (eventhouses), Copilot Studio, Graph. This tight integration turns Fabric into a true “intelligence platform,” reducing duplicated logic, one-off models, and maintenance while enabling multi-hop reasoning and real-time operational agents.
  • Democratized and no-code friendly – Visual tools allow business users to build and evolve the ontology, lowering barriers compared to more engineering-heavy alternatives. Making it easy to use has always been a Microsoft strength.
  • Groundbreaking semantics for AI Agents: Fabric IQ elevates AI from pattern-matching to true business understanding, allowing agents to reason over cascading effects, constraints, and objectives—leading to more reliable, auditable decisions and automation.
  • Compared to Palantir, I also like that Fabric OneLake has standardized on an open Delta Parquet format and embraced data movements tools Microsoft BI pros and business users are already familiar with, such as Dataflows and pipelines, to bring data in Fabric and therefore Fabric IQ.

The Bad

I hope some of these limitations will be lifted after the preview but:

  • Only DirectLake semantic models are accessible to AI agents. Import and DirectQuery models are not currently supported for entity and relationships binding. Not only does this limitation rule out pretty much 99.9% of the existing semantic models, but it also prevents useful business scenarios, such as accessing the data where it is with DirectQuery instead of duplicating the data in OneLake.
  • No automatic ontology building – It requires cross-functional agreement on business definitions, workshops, and governance—labor-intensive for organizations without mature semantic models. I hope Microsoft will simplify this process like how Purview has automated scans.
  • Risk of overhype vs. delivery gap – We’ve seen this before when new products got unveiled with a lot of fanfare, only to be abandoned later.

The Ugly

OneLake-centric dependency. Except for shortcuts to Delta Parquet files which can be kept external, your data must be in OneLake. What about these enterprises with investments in Google BigQuery, Teradata, Snowflake, and even SQL Server or Azure SQL DB? Gotta bring that data over to OneLake. Even shortcut transformations to CSV, Parquet, JSON files in OneLake, S3, Google Cloud Storage, will copy the data to OneLake. By contrast, Palantir has limited support for virtual tables to some popular file formats, such as Parquet, Iceberg, Delta, etc.

What happened to all the investments in data virtualization and logical warehouses that Microsoft has made over years, such as PolyBase and the deprecated Polaris in Synapse Serverless? What’s this fascination with copying data and having all the data in OneLake? Why can’t we build Fabric IQ on top of true data virtualization?

Which is where I was thinking that semantic models with DirectQuery can be used as a workaround to avoid copying data over from supported data sources, but alas Fabric IQ doesn’t like them yet.

Summary

Microsoft Fabric IQ is a metadata layer on top of Fabric data to build ontologies and expose relevant data to AI reasoning. It will be undoubtedly appealing to enterprise customers with complex data estates and existing investments in Power BI and Fabric. However, as it stands, Fabric IQ is OneLake-centric. Expect Microsoft to invest heavily in Fabric and Fabric IQ to compete better with Palantir.

Performance and Cost Considerations from Power BI Pro/PPU to Fabric

What performance and cost considerations should you keep in mind if you are currently on Power BI Pro/PPU, but Fabric looks increasingly enticing and you want to upgrade an existing workspace to Fabric? For example, let’s say you’ve started with a pay-per-user workspace, but now you want that workspace to have Fabric features, such as Copilot, Lakehouse, etc. Or, as a typical use case for small to mid-size companies, you could have a Corporate BI workspace with org semantic model(s) that you want to transition to Fabric, such as to take advantage of DirectLake.

Performance

Performance is difficult to translate because Power BI Pro/PPU run in a shared capacity, meaning compute resources (v‑cores) are pooled across many tenants and dynamically allocated, whereas Fabric capacities are dedicated, meaning that Microsoft grants specific resources expressed as number of cores and memory. Therefore, Fabric performance is predicable while Pro/PPU might not be, although I’m yet to hear from client complaining about unpredictable performance.

Also, keep in mind that Power BI Pro limits you to a quota of 1 GB per dataset, PPU to 100 GB per dataset, and Fabric starts at 3 GB per dataset with F2 and doubles the grant up the chain. This is important for semantic models with imported data.

Although the tool wasn’t designed for estimating upgrade scenarios, you could start with the Fabric Capacity Estimator (preview) to get an initial ballpark estimate for the Fabric capacity. Start low, then monitor the capacity performance using the Microsoft Fabric Capacity Metrics app and be prepared to upgrade if necessary, such as when more parallelism is needed. 

Cost

This is easier. Here are the advertised, undiscounted and unreserved prices:

  • Power BI Pro: $14/user/month (free with M365 E5 plan)
  • PPU: $24/user/month ($14 discount with M365 E5 plan)
  • Fabric: Starts at $262.80 per month with F2 and doubles the price up the chain. Finding what capacity you need requires evaluating what workloads you will be running to ensure you have enough resources.

It’s important to note that Fabric capacities lower than F64 require a Power BI Pro license for every user who accesses shared content, regardless of viewing or creating content. Microsoft Copilot and ChatGPT got this wrong by adamantly claiming that viewers don’t require Pro license, while Grok got it right, so be careful which agent you use when researching. The Fabric Capacity estimator also correctly identifies the required Pro licenses.

Of course, Fabric gives you features unfortunately not available in the pay per user licensing plans, so the actual decision in favor of Fabric will probably transcend just performance and cost. When evaluating the performance of the lower Fabric SKUs, you might find the following blogs I wrote on this subject helpful:

Notes on Fabric F2 Performance: Warehouse ETL

Notes on Fabric F2 Performance: Report Load

 

Prologika Newsletter Winter 2025

Diogenes holding a lantern

If Microsoft Fabric is in your future, you need to come up with a strategy to get your data in Fabric OneLake. That’s because the holy grail of Fabric is the Delta Parquet file format. The good news is that all Fabric data ingestion options (Dataflows Gen 2, pipelines, Copy Job and notebooks) support this format and the Microsoft V-Order extension that’s important for Direct Lake performance. Fabric also supports mirroring data from a growing list of data sources. This could be useful if your data is outside Fabric, such as EDW hosted in Google BigQuery, which is the scenario discussed in this newsletter.

Avoiding mirroring issues

A recent engagement required replicating some DW tables from Google BigQuery to a Fabric Lakehouse. We considered the Fabric mirroring feature for Google BigQuery (back then in private preview, now in public preview) and learned some lessons along the way:

1. 400 Error during replication configuration – Caused by attempting to use a read-only GBQ dataset that is linked to another GBQ dataset, but the link was broken.

2. Internal System Error – Again caused by GBQ linked datasets which are read-only. Fabric mirroring requires GBQ change history to be enabled on tables so that it can track changes and only mirror incremental changes after first initial load.

3. (Showstopper for this project) The two permissions that raised security red flags are bigquery.datasets.create and bigquery.jobs.create. To grant those permissions, you must assign one of these BigQuery roles:

• BigQuery Admin

• BigQuery Data Editor

• BigQuery Data Owner

• BigQuery Studio Admin

• BigQuery User

All these roles grant other permissions, and the client was cautious about data security. At the end, we end up using a nightly Fabric Copy Job to replicate the data.

Fabric Copy Job Pros and Cons

The client was overall pleased with the Fabric Copy Job.

Pros

  • 250 million rows replicated in 30-40 seconds!
  • You can have only one job to replicate all tables in Overwrite mode.
  • In the simplest case, you don’t need to create pipelines.

Cons

The Copy Job is work in progress and subject to various limitations.

  • No incremental extraction
  • You can’t mix different load options (Append and Overwrite) so you must split tables in separate jobs
  • No custom SQL SELECT when copying multiple tables
  • (Bug) Lost explicit column bindings when making changes
  • Cannot change the job’s JSON file
  • The user interface is clunky and it’s difficult to work with
  • No failure notification mechanism. As a workaround: add Copy Job to data pipeline or call it via REST API

Summary

In summary, the Fabric Google BigQuery built-in mirroring could be useful for real-time data replication. However, it relies on GBQ change history which requires certain permissions. Kudos to Microsoft for their excellent support during the private preview.


Teo Lachev
Prologika, LLC | Making Sense of Data
logo

Atlanta Microsoft BI Group Meeting on December 1st (Migrating Semantic Models to Fabric Direct Lake)

Atlanta BI fans, please join us in person for our next meeting on Monday, December 1st at 18:30 ET. I’ll show you how to Fabric DirectLake semantic models can help you tackle long refresh cycles and scalability headaches. And your humble correspondent will walk you through some of the latest Power BI and Fabric enhancements. Improving will sponsor the meeting. For more details and sign up, visit our group page.

Delivery: In-person
Level: Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Overview: Are your Power BI semantic models hitting memory limits? Tired of bending backwards to mitigate long refresh cycles and scalability headaches? Join me for a deep dive into Fabric Direct Lake — a game-changing feature that can help enterprise customers eliminate refreshes, lower licensing cost, and work with production-scale data instantly.

You’ll learn:
-Why Direct Lake is a breakthrough for large semantic models
-How to migrate from Import mode with real-world tools and strategies
-Common pitfalls and how to avoid them
-Performance insights and practical tips from actual project

Bonus: See how AI tools like Grok, Copilot or ChatGPT can streamline your migration process!

Whether you’re a BI pro, data engineer, or decision-maker, this session will equip you with the knowledge to scale smarter, design better, and deliver faster.

Speaker: Teo Lachev is a consultant, author, and mentor, with a focus on Microsoft BI. Through his Atlanta-based company Prologika (a Microsoft Gold Partner in Data Analytics and Data Platform) he designs and implements innovative solutions that bring tremendous value to his clients. Teo has authored and co-authored several books, and he has been leading the Atlanta Microsoft Business Intelligence group since he founded it in 2010. Microsoft has recognized Teo’s contributions to the community by awarding him the prestigious Microsoft Most Valuable Professional (MVP) Data Platform status for 15 years. Microsoft selected Teo as one of only 30 FastTrack Solution Architects for Power BI worldwide.

Sponsor: Prologika (https://prologika.com) helps organizations of all sizes to make sense of data by delivering tailored BI solutions that drive actionable insights and maximize ROI. Your BI project will be your best investment!

Presentation Slides

PowerBILogo

Atlanta Microsoft BI Group Meeting on November 3rd (Semantic Link Labs: A Link to the Future)

Atlanta BI fans, please join us in person for our next meeting on Monday, November 3rd at 18:30 ET. Jason Romans (Microsoft MVP) will show you how to use Semantic Link Labs to troubleshoot unreliable reports and semantic models. And your humble correspondent will walk you through some of the latest Power BI and Fabric enhancements. Improving will sponsor the meeting. For more details and sign up, visit our group page.

Delivery: In-person
Level: Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Overview: It’s dangerous to go alone—take Semantic Link Labs!
When users are the first to discover that a Power BI report is broken, the damage is already done. Trust is lost, adoption slows, and credibility suffers. Instead of wandering into these traps unprepared, what if you had the Master Sword in hand—ready to defeat broken models and guard against treacherous usability pitfalls? That’s the power of Semantic Link Labs.

In this session, we’ll set out on a quest through Microsoft Fabric notebooks and pipelines, using Semantic Link Labs as our weapon and shield against unreliable reports. Along the way, we’ll face down the “mini-bosses” of BI development:
• Reports that collapse due to structural changes
• Models that underperform because best practices were skipped
• Usability pitfalls that make reports “technically fine” but functionally broken for end users

You’ll learn how to install and configure Semantic Link Labs, explore its legendary features, and see how it integrates seamlessly into Fabric. We’ll then take it a step further, automating health checks and governance with notebooks and pipelines—turning one-time fixes into repeatable spells.

By the end of this adventure, you’ll uncover your own “Triforce of Best Practices”—a report that tracks the best practices of all semantic models in your environment. You’ll leave equipped with a map, a shield, and the Master Sword itself: the tools you need to keep your BI world in legendary shape, where broken reports are discovered early, performance issues are vanquished, and best practices reign supreme.

Speaker: Jason Romans is a Business Intelligence engineer in Nashville, TN working with the Microsoft Business Intelligence stack. Jason is a Microsoft MVP who started his career as a DBA and over the years moved to working in his passion of Business Intelligence and data modeling. His first computer was a Commodore 64 and he’s been hooked ever since.
Blog: www.thedaxshepherd.com
Sessionize: https://sessionize.com/jason-romans/
LinkedIn: https://www.linkedin.com/in/jason-r-sql-jar

Sponsor: Improving is a leading IT professional services firm committed to helping companies achieve lasting success through modern technology. With core expertise in AI, Data, and Applications, we specialize in transforming legacy systems, building cloud-native platforms, and delivering intelligent, future-ready solutions for today’s complex business needs.

PowerBILogo

Replicating BigQuery to Fabric

A recent engagement required replicating some DW tables from Google BigQuery to a Fabric Lakehouse. We considered the Fabric mirroring feature (back then in private preview, now publicly available) and learned some lessons along the way:

1. 400 Error during replication configuration – Caused by attempting to use a read-only GBQ dataset that is linked to another GBQ dataset but the link was broken.

2. Internal System Error – Again caused by GBQ linked datasets which are read-only. Fabric mirroring requires GBQ change history to be enabled on tables so that it can track changes and only mirror incremental changes after first initial load.

3 (Showstopper) The two permissions that raised security red flags are bigquery.datasets.create and bigquery.jobs.create. To grant those permissions, you must assign one of these BigQuery roles:

• BigQuery Admin

• BigQuery Data Editor

• BigQuery Data Owner

• BigQuery Studio Admin

• BigQuery User

All these roles grant other permissions and the client was cautious about data security. At the end, we end up using a nightly Fabric Copy Job to replicate the data.

In summary, the Fabric Google BigQuery built-in mirroring could be useful for real-time data replication. However, it relies on GBQ change history which requires certain permissions. Kudos to Microsoft for their excellent support during the private preview.