Data Virtualization

Telegraph sang a song about the world outside
Telegraph road got so deep and so wide
Like a rolling river…

The Telegraph Road, Dire Straits

At Ignite in November, 2025, Microsoft introduced Fabric IQ. I noted to go beyond the marketing hype and check if Fabric IQ makes any sense. The next thing I know, around the holidays I’m talking to an enterprise strategy manager from an airline company and McKinsey consultant about ontologies.

Ontology – A branch of philosophy, ontology is the study of being that investigates the nature of existence, the features all entities have in common, and how they are divided into basic categories of being. In computer science and AI, ontology refers to a set of concepts and categories in a subject area or domain that shows their properties and the relations between them.

So, what better way to spend the holidays than to play with new shaky software?

What is Fabric IQ?

According to Microsoft, Fabric IQ is “a unified intelligence platform developed by Microsoft that enhances data management and decision-making through semantic understanding and AI capabilities.” Clear enough? If not, if you view Fabric as Microsoft’s answer to Palantir’s Foundry, then Fabric IQ is the Microsoft equivalent of Palantir’s Foundry Ontology, whose success apparently inspired Microsoft.

Therefore, my unassuming layman definition of Fabric IQ is a metadata layer on top of data in Fabric that defines entities and their relationships so that AI can make sense of and relate the underlying data.

For example, you may have an organizational semantic model built on top of an enterprise data warehouse (EDW) that spans several subject areas. And then you might have some data that isn’t in EDW and therefore outside the semantic model, such as HR file extracts in a lakehouse. You can use Fabric IQ as a glue that bridges that data together. And so, when the user asks the agent “correlate revenue by employee with hours they worked”, the agent knows where to go for answers.

Following this line of thinking, Microsoft BI practitioners may view Fabric IQ as a Power BI composite semantic model on steroids. The big difference is that a composite model can only reference other semantic models while Fabric IQ can span data in multiple formats.

The Good

Palantir had a head start of a decade or so compared to Microsoft Fabric, but yet even in its preview stage, I like a thing or two about Fabric IQ from what I’ve seen so far:

Its oncology can span Power BI semantic models (with caveats explained in the next section), powered by best-in-class technology. As I mentioned before, this allows you to bridge all the business logic and calculations you carefully crafted in a semantic model to the rest of your Fabric data estate.
Fabric IQ integrates with other Microsoft technologies, such as real-time intelligence (eventhouses), Copilot Studio, Graph. This tight integration turns Fabric into a true “intelligence platform,” reducing duplicated logic, one-off models, and maintenance while enabling multi-hop reasoning and real-time operational agents.
Democratized and no-code friendly – Visual tools allow business users to build and evolve the ontology, lowering barriers compared to more engineering-heavy alternatives. Making it easy to use has always been a Microsoft strength.
Groundbreaking semantics for AI Agents: Fabric IQ elevates AI from pattern-matching to true business understanding, allowing agents to reason over cascading effects, constraints, and objectives—leading to more reliable, auditable decisions and automation.
Compared to Palantir, I also like that Fabric OneLake has standardized on an open Delta Parquet format and embraced data movements tools Microsoft BI pros and business users are already familiar with, such as Dataflows and pipelines, to bring data in Fabric and therefore Fabric IQ.

The Bad

I hope some of these limitations will be lifted after the preview but:

Only DirectLake semantic models are accessible to AI agents. Import and DirectQuery models are not currently supported for entity and relationships binding. Not only does this limitation rule out pretty much 99.9% of the existing semantic models, but it also prevents useful business scenarios, such as accessing the data where it is with DirectQuery instead of duplicating the data in OneLake.
No automatic ontology building – It requires cross-functional agreement on business definitions, workshops, and governance—labor-intensive for organizations without mature semantic models. I hope Microsoft will simplify this process like how Purview has automated scans.
Risk of overhype vs. delivery gap – We’ve seen this before when new products got unveiled with a lot of fanfare, only to be abandoned later.

The Ugly

OneLake-centric dependency. Except for shortcuts to Delta Parquet files which can be kept external, your data must be in OneLake. What about these enterprises with investments in Google BigQuery, Teradata, Snowflake, and even SQL Server or Azure SQL DB? Gotta bring that data over to OneLake. Even shortcut transformations to CSV, Parquet, JSON files in OneLake, S3, Google Cloud Storage, will copy the data to OneLake. By contrast, Palantir has limited support for virtual tables to some popular file formats, such as Parquet, Iceberg, Delta, etc.

What happened to all the investments in data virtualization and logical warehouses that Microsoft has made over years, such as PolyBase and the deprecated Polaris in Synapse Serverless? What’s this fascination with copying data and having all the data in OneLake? Why can’t we build Fabric IQ on top of true data virtualization?

Which is where I was thinking that semantic models with DirectQuery can be used as a workaround to avoid copying data over from supported data sources, but alas Fabric IQ doesn’t like them yet.

Summary

Microsoft Fabric IQ is a metadata layer on top of Fabric data to build ontologies and expose relevant data to AI reasoning. It will be undoubtedly appealing to enterprise customers with complex data estates and existing investments in Power BI and Fabric. However, as it stands, Fabric IQ is OneLake-centric. Expect Microsoft to invest heavily in Fabric and Fabric IQ to compete better with Palantir.

Let’s face it, the larger the company, the more difficult is to achieve the dream of single enterprise data warehouse (EDW). In a typical mid-size to large organization, data is found in many data repositories and integrating all this data is difficult. I’m doing an assessment and strategy engagement now for a unit in a large organization, and they need access to at least 10 other on-premises systems, including two very large repositories. Naturally, they don’t want to import all of this data, which could be millions of rows per day, and recreate their own copy of these large corporate repositories. So what to do?

In my “QUO VADIS DATA WAREHOUSE?” newsletter, I defined a logical data warehouse (LDW), also known as data virtualization, is an emerging technology that allows you to access the data where it is. Don’t we have linked servers in SQL Server that do this? We do and they might work to a point. But what if you want scale out distributed queries to achieve better performance? In today’s SQL PASS SUMMIT keynote, Day 1 Keynote “A.C.I.D. Intelligence” (A.C.I.D stands for Algorithms, Cloud, IoT, Data), Rohan Kumar showed something that I think it’s very important and it deserves much more attention than occasional references in blogs. It showed where Microsoft is bringing PolyBase and how this technology could be the Microsoft implementation for data virtualization.

In SQL Server 2016, PolyBase allows you to access data in on-premises Hadoop cluster and in Azure Blob Storage. For example, you can store some files in HDFS and define an external PolyBase table. Then, you can have a query with heterogeneous join between a local SQL Server table and the external table. Rohan showed that Microsoft will extend PolyBase to other popular SQL and NoSQL databases. More importantly, it showed that just like an MPP appliance, such as Microsoft Analytics Platform System (APS) or Azure SQL Data Warehouse, a SQL Server node would allow you to combine multiple SQL Server instances as compute notes so that you scale out access to these data sources. For example, if you have two SQL Server compute nodes and you use PolyBase to access an Oracle database, you’ll be essentially spreading the query across these nodes in parallel and then combine the results. Of course, just like linked servers, there are technical challenges, such as cases where SQL Server might need to move data to the other node. Rohan mentioned that the SQL Server query optimizer will have smarts to optimize heterogeneous joins.

If you’re in the market for a logical data warehouse vendor, don’t rule out Microsoft. Stay tuned for more news around PolyBase and the investments Microsoft makes in this area after the Metanautix acquisition.

Posts

First Look at Fabric IQ: The Good, The Bad, and The Ugly

What is Fabric IQ?

The Good

The Bad

The Ugly

Summary

The Future of Microsoft Logical Data Warehouse

Follow Us

Subscribe to our quarterly newsletter

Categories

Tag Archive for: Data Virtualization

Posts

What is Fabric IQ?

The Good

The Bad

The Ugly

Summary

Follow Us

Subscribe to our quarterly newsletter

Categories