-
Handling Data Quality and Data Enrichment
December 7, 2024 / No Comments »
Some of the most difficult and tedious issues in almost every BI project are handling data quality issues (aka exceptions) and data enrichment. Examples of data quality issues typically originate from wrong data entries or violated business rules, such as misspelled and misclassified products. Data enrichment tasks may go beyond the design of the original data source, such as introducing new taxonomies for products or customers. In a recent project, a client needed guidance on where to handle data quality and enrichment processes. Like cakes and ogres, a classic BI solution has layers. Starting upstream in the data pipeline and moving downstream are data sources, ODS (data staging and change data tracking), EDW (star schema) and semantic models. In general, unless absolutely necessary, I'm against master data management (MDM) systems because of their complexity and cost. Instead, I believe that most data quality and enrichment tasks can be addressed efficiently...
-
LLM Adventures: Microsoft Copilot Studio (The Good, The Bad, and The Ugly)
November 28, 2024 / No Comments »
"A momentary lapse of reason That binds a life to a life You won't regret, you will never forget There'll be no sleep in here tonight" "The Slip", Pink Floyd Happy Thanksgiving! What a better way for me to spend a Thanksgiving week than doing more AI? After a year of letting it (and other Microsoft LLM offerings) simmer and awaken by the latest AI hoopla from the Ignite conference, I took another look at Microsoft Copilot Studio. For the uninitiated, Copilot Studio lets you implement AI-powered smart bots (“agents”) for deriving knowledge from documents or websites. Basically, you can view the relationship of Copilot Studio to Retrieval-augmented Generation apps as what Power BI is to self-service BI. Copilot Studio licensing starts at $200 per month for up to 25,000 messages (interactions between user and agent) although at Ignite Microsoft hinted that pay-as-you-go licensing will be coming. The Good A...
-
Atlanta Microsoft BI Group Meeting on December 2nd (Semantic Modeling as Code)
November 26, 2024 / No Comments »
Atlanta BI fans, please join us online for our next meeting on Monday, December 2nd at 5PM ET (please note the change to our usual meeting time to accommodate the presenter). Rui Romano (Product Manager at Microsoft) will discuss how the new TMDL language for Power BI models can unlock new scenarios that previously weren't possible. For more details and sign up, visit our group page. Presentation: "Semantic Modeling as Code" with TMDL using Power BI Desktop Developer Mode (PBIP) and VS Code Delivery: Online Level: Intermediate to Advanced Overview: The landscape for developing enterprise-scale models has never been more exciting than it is now! Developer mode in Power BI Desktop and the new TMDL language unlock new scenarios that previously weren't possible, such as great source control and co-development experiences with Git integration. Additionally, the TMDL Visual Studio Code extension offers a new, powerful and efficient, code-first semantic modeling experience....
-
Temporal Tables
November 10, 2024 / No Comments »
I like SQL Server temporal tables for implementing ODS-style tables and change data tracking (CDS) for three main reasons: SQL Server maintains the system versioning. By contrast, I have witnessed erroneous Start/End dates for pretty much all home grown implementation. Further, SQL Server grains the changes at millisecond level. There is a clean separation between the current state of data and historical data. SQL Server separates the historical changes to a history table. You can establish a flexible data retention policy. A retention policy can be established at database or table level. SQL Server take care of purging the expired data. At the same time, temporal tables are somewhat more difficult to work with. For example, you must disable system versioning before you alter the table. Here is the recommended approach for altering the schema by the documentation: BEGIN TRANSACTION ALTER TABLE [dbo].[CompanyLocation] SET (SYSTEM_VERSIONING = OFF); ALTER TABLE [CompanyLocation]...
-
Atlanta Microsoft BI Group Meeting on November 4th (Accelerating your Fabric Data Estate with AI & Copilot)
October 29, 2024 / No Comments »
Atlanta BI fans, please join us in person for our next meeting on Monday, November 4th at 6:30 PM ET. Stacey Jones (Principal Data & AI Cross-Solution Architect at Microsoft) and Elayne Jones (Solutions Architect at Coca-Cola Bottlers Sales and Services) will explore the AI and Copilot capabilities within Microsoft Fabrics. And I'll help you catch up on Microsoft BI latest. I will sponsor the event which marks the 14th anniversary of the Atlanta Microsoft BI Group! For more details and sign up, visit our group page. Details Presentation: Accelerating your Fabric Data Estate with AI & Copilot Delivery: In-person Date: November 4th, 2024 Time: 18:30 – 20:30 ET Level: Beginner to Intermediate Food: Pizza and drinks will be provided Agenda: 18:15-18:30 Registration and networking 18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing) 19:00-20:15 Main presentation 20:15-20:30 Q&A Venue Improving Office 11675 Rainwater Dr Suite #100 Alpharetta, GA 30009 Overview: In this...
-
Implementing Role-playing Dimensions in Power BI
October 11, 2024 / No Comments »
Role-playing dimensions are a popular business requirement but yet challenging to implement in Power BI (and Tabular) due to a long-standing limitation that two tables can’t be joined multiple times with active relationships. Declarative relationships are both a blessing and a curse and, in this case, we are confronted with their limitations. Had Power BI allowed multiple relationships, the user must be prompted which path to take. Interestingly, a long time ago Microsoft considered a user interface for the prompting but dropped the idea for unknown reasons. Given the existing technology limitations, you have two implementation choices for implementing subsequent role-playing dimensions: duplicating the dimension table (either in DW or semantic model) or denormalizing the dimension fields into the fact table. The following table presents pros and cons of each option: Option Pros Cons Duplicate dimension table in semantic model or DW No or minimum impact on ETL Minimum...
-
Generative AI: Excessive Reinforcement Learning
August 28, 2024 / No Comments »
Do you know that without human intervention the Generative AI responses would be too toxic after training the model from the garbage on Internet? Do you know that OpenAI hired Kenyan workers to make ChatGPT less toxic? To throw in some jargon, this is called RLHF (reinforcement learning with human feedback). Not sure who Microsoft hired to cleanse copilots, but apparently they went overboard. Here is a Windows Copilot jerk-knee response that I got to a prompt asking when to open with 2 No Trump in contract bridge. I see, Generative AI refuse to generate since "trump" is a taboo topic nowadays. AI dummy!
-
Fabric Direct Lake: Memory Utilization with Interactive Operations
August 15, 2024 / No Comments »
As I mentioned in my Power BI and Fabric Capacities: Thinking Outside the Box, memory limits of Fabric capacities could be rather restrictive for large semantic models with imported data. One relatively new option to combat out-of-memory scenarios that deserves to be evaluated and added to the list if Fabric is in your future is semantic models configured for Direct Lake storage. The blog covers results of limited testing that I did comparing side by side the memory utilization of two identical semantic models with the first one configured to import data and the second to use Direct Lake storage. If you need a Direct Lake primer, Chris Webb has done a great job covering its essentials here and here. As a disclaimer, the emphasis is on limited as these results reflect my personal observations based on some isolated tests I’ve done lately. Your results may and probably will vary...
-
Fabric Capacity Limits
August 14, 2024 / No Comments »
Here is table that is getting more and more difficult to find as searching for Fabric capacity limits returns results about CU compute units (for the most part meaningless in my opinion). I embed in a searchable format below before it vanishes on Internet. The most important column for semantic modeling is the max memory which denotes the upper limit of memory Fabric will grant a semantic model. SKU Max memory (GB)1, 2 Max concurrent DirectQuery connections (per semantic model)1 Max DirectQuery parallelism3 Live connection (per second)1 Max memory per query (GB)1 Model refresh parallelism Direct Lake rows per table (in millions)1, 4 Max Direct Lake model size on OneLake (GB)1, 4 F2 3 5 1 2 1 1 300 10 F4 3 5 1 2 1 2 300 10 F8 3 10 1 3.75 1 5 300 10 F16 5 10 1 7.5 2 10 300 20 F32 10...
-
Atlanta Microsoft BI Group Meeting on September 3rd (Create Code Copilots with Large Language Models)
August 12, 2024 / No Comments »
Atlanta BI fans, please join us in person for the next meeting on Monday, September 3th at 6:30 PM ET. Your humble correspondent will show you how to use Large Language Models, such as ChatGPT, to create your own copilots for Text2SQL and Text2DAX. I'll also help you catch up on Microsoft BI latest. I will sponsor the event which marks the 14th anniversary of the Atlanta Microsoft BI Group! For more details and sign up, visit our group page. Details Presentation: Create Code Copilots with Large Language Models Delivery: In-person Date: September 3rd, 2024 Time: 18:30 – 20:30 ET Level: Beginner to Intermediate Food: Pizza and drinks will be provided Agenda: 18:15-18:30 Registration and networking 18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing) 19:00-20:15 Main presentation 20:15-20:30 Q&A Venue Improving Office 11675 Rainwater Dr Suite #100 Alpharetta, GA 30009 Overview: Resistance is futile! Instead of fearing that AI...