Replicating BigQuery to Fabric Reloaded
In a previous post, I referred to an engagement where we used the Fabric Copy Job activity to replicate Google BigQuery tables to Fabric, so we can use Direct Lake semantic models. A few months later, the client reported that they pivoted from the Copy Job to notebooks using the Spark BigQuery connector for two main benefits:
- Much better copy performance – Although the Copy Job would copy tables in parallel, the Spark BigQuery connector reduced significantly the data transfer time. The Copy Job would fully copy all tables in about 40 min. With the Notebook, while the tables run sequentially instead of in parallel, most tables would take between 20-30 sec and one huge 140M fact table takes around 2 minutes to copy fully. Altogether, the required tables take about 20-22 minutes to load, which is almost half the time less than the Copy Job. Upon further research to understand the difference in performance, I’ve discovered that the Spark BigQuery connector (the official one from Google) uses the high-performance BigQuery Storage Read API. This is a highly optimized, columnar, parallel reader designed for analytical workloads. It can push down filters, projections (column selection), and sometimes aggregations directly to BigQuery. It streams data very efficiently to Spark executors. By contrast, the Copy Job is generic, and I don’t expect such performance gain with other sources, such as copying data from Azure SQL DB.
- Custom code flexibility – For example, the client implemented data-driven metadata discovery to determine which columns to copy per table. In addition, they could trigger the notebook execution via REST API.
In summary, there are various options to replicate data from Google BigQuery to Fabric, including mirroring, Copy Job, and notebooks. Each approach has its pros and cons. Notebooks using the Spark BigQuery connector would probably give you the best throughput for batch-oriented, massive replication.




