A couple of examples:
I’m looking forward to Grok 3!
]]>Presentation: Fabric Warehouse vs. Fabric Lakehouse: Choosing the Right Architecture for Your Environment
Delivery: In-person
Level: Intermediate
Food: Pizza and drinks will be provided
Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A
Overview: The modern data landscape demands scalable, flexible, and efficient architectures to support diverse business needs. With Microsoft Fabric, two leading paradigms have emerged to address these challenges: the Fabric Warehouse and the Fabric Lakehouse. While both tools aim to provide robust solutions for data storage, processing, and analytics, their approaches, strengths, and trade-offs differ significantly.
This presentation explores the core concepts, architectures, and use cases of the Fabric Warehouse and Fabric Lakehouse. I will compare their performance in areas such as data integration, scalability, and cost-efficiency. Attendees will gain insights into how these approaches align with specific business objectives and workloads, enabling informed decisions about which model best suits their organization’s data strategy.
Sponsor: Protiviti (www.protiviti.com) is a global consulting firm that delivers deep expertise, objective insights, a tailored approach and unparalleled collaboration to help leaders confidently face the future. Protiviti and its independent and locally owned member firms provide clients with consulting and managed solutions in finance, technology, operations, data, digital, legal, HR, risk and internal audit through a network of more than 90 offices in over 25 countries.
I conducted recently an assessment for a client facing memory pressure in Power BI Premium. You know these pesky out of memory errors when refreshing a biggish dataset. They started with P1, moved to P2, and now are on P3 but still more memory is needed to satisfy the memory appetite of full refresh. The runtime memory footprint of the problematic semantic model with imported data is 45 GB and they’ve done their best to optimize it. This newsletter outlines a few strategies to tackle excessive memory consumption with large semantic models. Unfortunately, given the current state of Power BI boxed capacities, no option is perfect and at end a compromise will probably be needed somewhere between latency and performance.
Since its beginning, Power BI Pro per-user licensing (and later Premium Per User (PPU) licensing) has been very attractive. Many organizations with a limited number of report users flocked to Power BI to save cost. However, organizations with more BI consumers gravitated toward premium licensing where they could have unlimited number of report readers against a fixed monthly fee starting at listed price of $5,000/mo for P1. Sounds like a great deal, right?
I must admit that I detest the premium licensing model because it boxes into certain resource constraints, such as 8 backend cores and 25 GB RAM for P1. There are no custom configurations to let you balance between compute and memory needs. And while there is an auto-scale compute model, it’s very coarse and it applies only to processing cores. The memory constraints are especially problematic given that imported models are memory resident and require more than twice the memory for full refresh. From the outside, these memory constraints seem artificially low to force clients into perpetual upgrades. The new Fabric F capacities that supersede the P plans are even more expensive, justifying the price increase with the added flexibility to pause the capacity which is often impractical.
It looks to me that the premium licensing is pretty good deal for Microsoft. Outgrown 25 GB of RAM in P1? Time to shelve another 5K per month for 25 GB more even if you don’t need more compute power. Meanwhile, the price of 32GB of RAM is less than $100 and falling.
It will be great if at some point Power BI introduces custom capacities. Even better, how about auto-scaling where the capacity resources (both memory and CPU) scale up and down on demand within minutes, such as adding more memory during refresh and reducing the memory when the refresh is over?
Strategies to combat out-of-memory scenarios
So, what should you do if you are strapped for cash? Consider evaluating and adopting one or more of the following memory saving techniques, including:
Considering Direct Lake storage
If Fabric is in your future, one relatively new option to tackle out-of-memory scenarios that deserves to be evaluated and added to the list is semantic models configured for Direct Lake storage. Direct Lake on-demand loading should utilize memory much more efficiently for interactive operations, such as Power BI report execution. This is a bonus to the fact that data Direct Lake models don’t require refresh. Eliminating refresh could save tremendous amount of memory to start with, even if you apply advanced techniques such as incremental refresh or hybrid tables to models with imported data.
I did limited testing to compare performance of import and Direct Lake and posted detailed results in the “Fabric Direct Lake: Memory Utilization with Interactive Operations” blog.
I concluded that if Direct Lake is an option for you, it should be at the forefront of your efforts to combat out-of-memory errors with large datasets.
On the downside, more than likely you’ll have to implement ETL processes to synchronize your data warehouse to a Fabric lakehouse, unless your data is in Fabric to start with, or you use Fabric database mirroring for the currently supported data sources (Azure SQL DB, Cosmos, and Snowflake). I’m not counting the data synchronization time as a downside.
]]>In a recent project, a client needed guidance on where to handle data quality and enrichment processes. Like cakes and ogres, a classic BI solution has layers. Starting upstream in the data pipeline and moving downstream are data sources, ODS (data staging and change data tracking), EDW (star schema) and semantic models.
In general, unless absolutely necessary, I’m against master data management (MDM) systems because of their complexity and cost. Instead, I believe that most data quality and enrichment tasks can be addressed efficiently without formal MDM. I’m also a big believer in tackling these issues as further upstream as possible, ideally in the data source.
Consider the following guidelines:
Happy Thanksgiving! What a better way for me to spend a Thanksgiving week than doing more AI? After a year of letting it (and other Microsoft LLM offerings) simmer and awaken by the latest AI hoopla from the Ignite conference, I took another look at Microsoft Copilot Studio. For the uninitiated, Copilot Studio lets you implement AI-powered smart bots (“agents”) for deriving knowledge from documents or websites. Basically, you can view the relationship of Copilot Studio to Retrieval-augmented Generation apps as what Power BI is to self-service BI.
Copilot Studio licensing starts at $200 per month for up to 25,000 messages (interactions between user and agent) although at Ignite Microsoft hinted that pay-as-you-go licensing will be coming.
The Good
A few months ago, when I discussed RAG apps, it was obvious that a lot of custom code had to be written to glue the services together and implement the user interface. Microsoft Copilot Studio has the potential to change and simplify this. It offers a Power Automate-like environment for no-code, low-code implementation of AI agents and therefore opens new possibilities for faster implementation of various and specialized AI agents across the enterprise. I was impressed by how easy the process was and how capable the tool was to create more complex topics, such as conditional branches based on user input.
Like Power BI, the tool gets additional appeal from its integration with the Microsoft ecosystem. For example, it can index SharePoint and OneDrive documents. It can integrate with Power Automate, Azure AI Search and Azure Open AI.
I was impressed by how easy is to use the tool to connect to and intelligently search an existing website. For now, I see this as being its main strength. Organizations can quickly implement agents to help their employees or external users to derive knowledge from intranet or Internet websites.
To demonstrate this, I implemented an agent to index my blog and embedded it below for you to try it out before my free trial expires. Please feel free to ask more sophisticated questions, such as “What’s the author’s sentiment toward Fabric?”, “What are the pros and cons of Fabric?”. Or “I need help with Power BI budget” (I got innovative here and implemented a conditional topic with branches depending on the budget you specify). I instructed the tool to stay only within the content of my website, so the answers are not diluted from other public sources. Given that no custom code was written, Copilot Studio is pretty impressive.
The Bad
Everyone wants to be autonomous and AI agents are no exception. In fact, “autonomous agent” is the buzzword of AI world today. Not to be outdone, Copilot Studio claims that it can “build agents that operate independently to dynamically plan, learn, and escalate on your behalf”. However, as the tool stands today, I don’t think there is much to this claim. Or it could be that my definition of “autonomous” is different than Microsoft’s.
To me, an autonomous agent must be capable of making decisions and taking actions on its own. Like you tell your assistant that you plan a trip, give her some constraints, such as how much to spend on hotel and air, and let her make travel reservations. As it stands, Copilot Studio offers none of this. It follows a workflow you specify. Again, its output is more or less a smarter bot than the ones you see on many websites.
However, at Ignite Microsoft claimed that autonomy is coming so it will be interesting to see how the tool will evolve. Don’t get me wrong. Even as it stands, I believe the tool has enormous potential for more intelligent search and retrieval of information.
The Ugly
My basic complaint as of now is performance. It took the tool 10 minutes to index a PDF document. Then in a momentary lapse of reason, I connected it to an Azure SQL Database with Adventure Works with 15 tables (the max number of tables currently supported) and it’s still not done indexing after a day. Given that many implementations would require searching the data in relational databases, I believe this is not acceptable. Not to mention there isn’t much insight on how far it’s done indexing or limit the number of fields it should index.
Therefore, I believe most real-world architectures for implementing AI agents will take the path Copilot Studio->Azure AI Search ->Azure Open AI, where Copilot Studio is used for implementing the UI and workflows (topics and actions), while the data indexing is done by Azure AI Search with semantic ranking in conjunction with Azure Open AI for embedded vectors.
]]>
Overview: The landscape for developing enterprise-scale models has never been more exciting than it is now! Developer mode in Power BI Desktop and the new TMDL language unlock new scenarios that previously weren’t possible, such as great source control and co-development experiences with Git integration. Additionally, the TMDL Visual Studio Code extension offers a new, powerful and efficient, code-first semantic modeling experience. Join us to discover the new and powerful ways you can leverage TMDL to accelerate your model development and get a sneak peek into the TMDL roadmap from the Power BI product team.
Speaker: Rui Romano is an experienced Microsoft Professional with a deep passion for data and analytics. He has spent the last decade helping companies make better data-driven decisions and is known for his innovative and practical solutions to complex problems. Currently works as a Product Manager at Microsoft on the Power BI product team, focusing on Pro-BI experiences.
At the same time, temporal tables are somewhat more difficult to work with. For example, you must disable system versioning before you alter the table. Here is the recommended approach for altering the schema by the documentation:
BEGIN TRANSACTION ALTER TABLE [dbo].[CompanyLocation] SET (SYSTEM_VERSIONING = OFF); ALTER TABLE [CompanyLocation] ADD Cntr INT IDENTITY (1, 1); ALTER TABLE [dbo].[CompanyLocation] SET ( SYSTEM_VERSIONING = ON (HISTORY_TABLE = [dbo].[CompanyLocationHistory]) ); COMMIT;
However, if you follow these steps, you will be greeted with the following error when you attempt to restore the system versioning in the third step:
Cannot set SYSTEM_VERSIONING to ON when SYSTEM_TIME period is not defined and the LEDGER=ON option is not specified.
Instead, the following works:
ALTER TABLE <system versioned table> SET (SYSTEM_VERSIONING = OFF) -- disable system versioning temporarily -- make schema changes, such as adding new columns ALTER TABLE <system versioned table> ADD PERIOD FOR SYSTEM_TIME (StartDate, EndDate); -- restore time period ALTER TABLE [mulesoft].[Employee] SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = [mulesoft].[EmployeeHistory])) -- restore system versioning
Presentation: Accelerating your Fabric Data Estate with AI & Copilot
Delivery: In-person
Date: November 4th, 2024
Time: 18:30 – 20:30 ET
Level: Beginner to Intermediate
Food: Pizza and drinks will be provided
Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A
Venue
Improving Office
11675 Rainwater Dr
Suite #100
Alpharetta, GA 30009
Overview: In this presentation, we will explore the groundbreaking AI and Copilot capabilities within Microsoft Fabric, a comprehensive platform designed to enhance productivity and collaboration. By leveraging advanced machine learning algorithms and natural language processing, Microsoft Fabric’s AI/Copilot not only streamlines workflows but also provides intelligent insights and automation, empowering users to achieve more with less effort. Join us as we delve into the features and functionalities that make Microsoft Fabric an indispensable tool for modern enterprises.
Sponsor: CloudStaff.ai
Given the existing technology limitations, you have two implementation choices for implementing subsequent role-playing dimensions: duplicating the dimension table (either in DW or semantic model) or denormalizing the dimension fields into the fact table. The following table presents pros and cons of each option:
Option | Pros | Cons |
Duplicate dimension table in semantic model or DW
|
No or minimum impact on ETL
Minimum maintenance in semantic model All dimension attributes are available |
Metadata complexity and confusion
(potentially mitigated with perspectives that will filter metadata for specific subject area) |
Denormalizing fields from into fact table | Avoid role-playing dimension instances
More intuitive model to business users |
Increased fact table size and memory footprint
Impact on ETL Limited number of dimension attributes Track visited dimension changes as Type 2 with incremental extraction (while it could be Type 1) If applicable, inability to reuse the role-playing dimension for another fact table and do cross-fact table analysis |
So, which approach should you take? The middle path might make sense. If you need only a limited number of fields for the second role-playing dimension, you could add them to the fact table to avoid another dimension and confusion. For example, if you have a DimEmployee dimension and you need a second instance for the person making the changes to the fact table, you can add the administrator’s full name to the fact table assuming you need only this field from DimEmployee.
By contrast, if you need most of the fields in the role-playing instances, then cloning might make more sense. For example, analyzing fact data by shipped date or due date that requires the established hierarchies in DimDate, could be addressed by cloning DimDate. Then to avoid confusion, consider using Tabular Editor to create perspectives for each subject area where each perspective includes only the role-playing dimensions applicable to that subject area.
Yet, a narrow-case third option exists when you only need role-playing measures, such as SalesAmountByShipDate and SalesAmountByDueDate. This scenario can be addressed by forcing DAX measures to “travel” the inactive relationship by using the USERELATIONSHIP function.
]]>