From Prompt to Insight: My Daily Dance with AI

August 10, 2025/0 Comments/in Blog/by Prologika - Teo Lachev

“I go checking out the reports, digging up the dirt
You get to meet all sorts in this line of work
And when I find the reason, I still can’t get used to it
And what have you got at the end of the day?
What have you got to take away?”

Private investigations, Dire Straits

Here we go

Me: “Write code to do this and that.”

LLM: “I’m glad to help. Here is the code.”

Me: “It doesn’t work because of this error <nasty error message follows>”

LLM: “You get this error because…Here is the correct code.”

Me: “Doesn’t work again because of this new error <nastier error message follows>.”

LLM: “You get this error because…Here is the correction of the corrected code.”

After N iterations and mutual blame, we either get eventually to working code or give up and start cursing each other. LLM usually quits first, claiming I have exhausted my quota, so I start harassing the next vendor.

Me: “Why didn’t get it right the first time?”

LLM: “That’s rude…I’m learning and I can make mistakes…don’t hurt my feelings.”

Teo’s top 5 LLM professional wishes

These typical exchanges inspired by top 5 LLM wishes:

If you are still learning, why can’t you be more humble and less assertive? This reminds me of some members of my family whose level of assertiveness is a reverse correlation with their knowledge of the subject. But it could be that LLMs are designed to act as humans in this regard too.
When it comes to code generation, can we use the latest versions, class signatures, etc.? We all know how quickly programing interfaces evolve.
Even better, can you compile the code to ensure that at least I don’t get compile errors?
Best, can you actually run the code instead of claiming that the code will produce the desired outcome?
When you substantiate your claims with references, can you ensure that they do what I asked you to do? Can you display a warning that you’re reasoning over some code example that is N years old?

Admiration lives on

Other than that, I keep on being impressed with LLMs. Specifically, I’m impressed by their reasoning and code generation capabilities, especially when it comes to pioneering languages that have decided to plant their flag in lands unknown, such as Power BI DAX, Power Query M, and Azure Data Factory (whatever bizarre expression language it adopted).

As of now, I believe that experts and architects who have solid foundation skills are in position to gain the most as I won’t trust AI to make architectural or design decisions.

Speaking of being impressed, the latest gem I’ve discovered was Microsoft Copilot Screen Sharing. I used it recently to analyze charts from the Fabric Capacity Metrics app whose primary design goal appears to be leaving the user utterly confused or convinced that it’s time to upgrade their Fabric capacity (see these red spikes? time for upgrade!). In my humble opinion, its output could have been much more useful if it had a chart showing the average resource utilization instead of actual, but I digress. However, the Screen Sharing feature saved taking screenshots and intelligently pointed out what the issue was.

On the downside, ChatGPT did a better job with screenshots. For example, it correctly identified ‘AS’ as Analysis Services workload and came up with better conclusions. Luckily, having multiple assistants it’s not an issue and they don’t complain unless you start abusing them…

Prologika Newsletter Summer 2025

June 16, 2025/0 Comments/in Newsletter/by Prologika - Teo Lachev

The May release of Power BI Desktop added a feature called Translytical Task Flows which aims to augment Power BI reports with rudimentary writeback capabilities, such as to make corrections to data behind a report. Previously, one way to accomplish this was to integrate the report with Power Apps as I demonstrated a while back here. My claim to fame was that Microsoft liked this demo so much that it was running for years on big monitors in the local Microsoft office!

Are translytical flows a better way to implement report writeback? I followed the steps to test this feature and here are my thoughts.

The Good

I like that translytical flows don’t require external integration and additional licensing. By contrast, the Power Apps integration required implementing an app and incurring additional licensing cost.

I like that Microsoft is getting serious about report writeback and has extended Power BI Desktop with specific features to support it, such as action buttons, new slicers, and button-triggered report refresh.

I like that you can configure the action button to refresh the report after writeback so you can see immediately the changes (assuming DirectQuery or DirectLake semantic models). I tested the feature with a report connected to a published dataset and it works. Of course, if the model imports data, refreshing the report won’t show the latest.

The Bad

Currently, you must use the new button, text, or list slicers, which can only provide a single value to the writeback function. No validation, except if you use a list slicer. From end user experience, every modifiable field would need a separate textbox. This is horrible UI! Ideally, I’d like to see Microsoft extending the Table visual to allow editing in place.

Translytical flows require a Python function to propagate the changes. Although this opens new possibilities (you can do more things with custom code), what happened to low-code, no-code mantra?

The Ugly

Currently, the data can be written to only four destinations:

Fabric SQL DB (Azure SQL DB provisioned in Fabric)
Fabric Lakehouse
Fabric Warehouse
Fabric Mirrored DB

Notice the “Fabric” prefix in all four options? Customers will probably interpret this as another attempt to force them into Fabric. Not to mention that this imitation excludes 99% of real-life scenarios where the data is in other data sources. I surely hope Microsoft will open this feature to external data sources in future.

So, is the Power Apps integration for writeback obsolete? Not really because it is more flexible and provides better user experience at the expense of additional licensing cost.

In summary, as they stand today, transalytical task flows attempt to address basic writeback needs within Fabric, such as changing a limited number of report fields or performing massive updates on a single field. They are heavily dependent on Fabric and support writeback to only Fabric data sources. Keep an eye on this feature with the hope that it will evolve over time to something more useful.

Teo Lachev
Prologika, LLC | Making Sense of Data

Prologika Newsletter Spring 2025

March 15, 2025/0 Comments/in Newsletter/by Prologika - Teo Lachev

Looking for easy ways to create intelligent bots or Retrieval-Augmented Generation (RAG) apps? Microsoft Copilot Studio should help. After a year of letting it (and other Microsoft LLM offerings) simmer and inspired by the latest hoopla from the Ignite conference, I took another look at Microsoft Copilot Studio and share my findings in this newsletter. For the uninitiated, Copilot Studio lets you implement AI-powered smart bots (“agents”) for deriving knowledge from documents or websites. Basically, you can view the relationship of Copilot Studio to Retrieval-augmented Generation apps as what Power BI is to self-service BI.

Copilot Studio licensing starts at $200 per month for up to 25,000 messages (interactions between user and agent) although at Ignite Microsoft hinted that pay-as-you-go licensing will be coming.

The Good

A few months ago, when I discussed RAG apps, it was obvious that a lot of custom code had to be written to glue the services together and implement the user interface. Microsoft Copilot Studio has the potential to change and simplify this. It offers a Power Automate-like environment for no-code, low-code implementation of AI agents and therefore opens new possibilities for faster implementation of various and specialized AI agents across the enterprise. I was impressed by how easy the process was and how capable the tool was to create more complex topics, such as conditional branches based on user input.

Like Power BI, the tool gets additional appeal from its integration with the Microsoft ecosystem. For example, it can index SharePoint and OneDrive documents. It can integrate with Power Automate, Azure AI Search and Azure Open AI.

I was impressed by how easy is to use the tool to connect to and intelligently search an existing website. For now, I see this as being its main strength. Organizations can quickly implement agents to help their employees or external users to derive knowledge from intranet or Internet websites.

To demonstrate this, I implemented an agent to index my blog and embedded it below for you to try it out before my free trial expires. Please feel free to ask more sophisticated questions, such as “What’s the author’s sentiment toward Fabric?”, “What are the pros and cons of Fabric?”. Or “I need help with Power BI budget” (I got innovative here and implemented a conditional topic with branches depending on the budget you specify). I instructed the tool to stay only within the content of my website, so the answers are not diluted from other public sources. Given that no custom code was written, Copilot Studio is pretty impressive.

The Bad

Everyone wants to be autonomous and AI agents are no exception. In fact, “autonomous agent” is the buzzword of AI world today. Not to be outdone, Copilot Studio claims that it can “build agents that operate independently to dynamically plan, learn, and escalate on your behalf”. However, as the tool stands today, I don’t think there is much to this claim. Or it could be that my definition of “autonomous” is different than Microsoft’s.

To me, an autonomous agent must be capable of making decisions and taking actions on its own. Like you tell your assistant that you plan a trip, give her some constraints, such as how much to spend on hotel and air, and let her make travel reservations. As it stands, Copilot Studio offers none of this. It follows a workflow you specify. Again, its output is more or less a smarter bot than the ones you see on many websites.

However, at Ignite Microsoft claimed that autonomy is coming so it will be interesting to see how the tool will evolve. Don’t get me wrong. Even as it stands, I believe the tool has enormous potential for more intelligent search and retrieval of information.

The Ugly

My basic complaint as of now is performance. It took the tool 10 minutes to index a PDF document. Then in a momentary lapse of reason, I connected it to an Azure SQL Database with Adventure Works with 15 tables (the max number of tables currently supported) and it’s still not done indexing after a day. Given that many AI implementations would require searching the data in relational databases, this is not acceptable. Not to mention there isn’t much insight on how far it’s done indexing or limit the number of fields it should index.

Therefore, I believe most real-world architectures for implementing AI agents will take the path Copilot Studio->Azure AI Search ->Azure Open AI, where Copilot Studio is used for implementing the UI and workflows (topics and actions), while the data indexing is done by Azure AI Search with semantic ranking in conjunction with Azure Open AI for embedded vectors.

Teo Lachev
Prologika, LLC | Making Sense of Data

LLM Adventures: xAI Grok

February 18, 2025/0 Comments/in Blog/by Prologika - Teo Lachev

I’ve been recently impressed with xAI Grok compared to Microsoft Copilot. Grok performs better that Copilot which seems to be crippled with political correctness to a point of absurdity although it seems to be getting more permissive of late. But the most important Grok advantage is that it’s always recent… and it doesn’t crash every now and then.

A couple of examples:

Question: Where is this image from?

1. Microsoft Copilot – what exactly is inappropriate is not clear.
2. Grok (impressive!) – This image depicts the rock formations and cave dwellings of Cappadocia, a historical region in central Turkey. The unique rock structures and the cave homes carved into them are iconic features of this area, particularly around the town of Göreme.

Question: What are the most important political events to watch for tomorrow (as of 2/18/2025)?
1. Microsoft Copilot (chaos of past and current events):
  1. Presidential Primaries: Former President Donald Trump and President Joe Biden have both secured enough delegates to become their parties’ presumptive nominees for president. The primaries in Georgia, Mississippi, and Washington state have been particularly significant.
  2. Senate Hearings: There are ongoing confirmation hearings for various cabinet positions, including the Department of Homeland Security and the Department of Housing and Urban Development.
  3. International Relations: Discussions and negotiations on international issues, such as the ongoing conflict in Ukraine and relations with China, are likely to be in the spotlight.
  4. Domestic Policy Debates: Debates on domestic policies, including immigration reform, climate change, and economic policies, will continue to be important topics
2. Grok – Based on the available information, here are some of the most important political events to watch for on February 19, 2025:

- 1. Pat King’s Sentencing in Canada: This event is noted in posts on X, indicating its significance as people are discussing its impact. Pat King was involved in the Freedom Convoy protests, and his sentencing could have broader implications on discussions around civil liberties and protest laws in Canada.
  2. Global Political Dynamics: Although not specific to February 19, the context from web sources suggests a focus on ongoing geopolitical tensions and diplomatic engagements. For instance, the broader political calendar highlights events like the NATO defense ministers’ meeting in Brussels, which might influence or be influenced by daily developments in international relations.

I’m looking forward to Grok 3!

LLM Adventures: Microsoft Copilot Studio (The Good, The Bad, and The Ugly)

November 28, 2024/0 Comments/in Blog/by Prologika - Teo Lachev

“A momentary lapse of reason
That binds a life to a life
You won’t regret, you will never forget
There’ll be no sleep in here tonight”
“The Slip”, Pink Floyd

Happy Thanksgiving! What a better way for me to spend a Thanksgiving week than doing more AI? After a year of letting it (and other Microsoft LLM offerings) simmer and awaken by the latest AI hoopla from the Ignite conference, I took another look at Microsoft Copilot Studio. For the uninitiated, Copilot Studio lets you implement AI-powered smart bots (“agents”) for deriving knowledge from documents or websites. Basically, you can view the relationship of Copilot Studio to Retrieval-augmented Generation apps as what Power BI is to self-service BI.

Copilot Studio licensing starts at $200 per month for up to 25,000 messages (interactions between user and agent) although at Ignite Microsoft hinted that pay-as-you-go licensing will be coming.

The Good

A few months ago, when I discussed RAG apps, it was obvious that a lot of custom code had to be written to glue the services together and implement the user interface. Microsoft Copilot Studio has the potential to change and simplify this. It offers a Power Automate-like environment for no-code, low-code implementation of AI agents and therefore opens new possibilities for faster implementation of various and specialized AI agents across the enterprise. I was impressed by how easy the process was and how capable the tool was to create more complex topics, such as conditional branches based on user input.

Like Power BI, the tool gets additional appeal from its integration with the Microsoft ecosystem. For example, it can index SharePoint and OneDrive documents. It can integrate with Power Automate, Azure AI Search and Azure Open AI.

I was impressed by how easy is to use the tool to connect to and intelligently search an existing website. For now, I see this as being its main strength. Organizations can quickly implement agents to help their employees or external users to derive knowledge from intranet or Internet websites.

To demonstrate this, I implemented an agent to index my blog and embedded it below for you to try it out before my free trial expires. Please feel free to ask more sophisticated questions, such as “What’s the author’s sentiment toward Fabric?”, “What are the pros and cons of Fabric?”. Or “I need help with Power BI budget” (I got innovative here and implemented a conditional topic with branches depending on the budget you specify). I instructed the tool to stay only within the content of my website, so the answers are not diluted from other public sources. Given that no custom code was written, Copilot Studio is pretty impressive.

The Bad

Everyone wants to be autonomous and AI agents are no exception. In fact, “autonomous agent” is the buzzword of AI world today. Not to be outdone, Copilot Studio claims that it can “build agents that operate independently to dynamically plan, learn, and escalate on your behalf”. However, as the tool stands today, I don’t think there is much to this claim. Or it could be that my definition of “autonomous” is different than Microsoft’s.

To me, an autonomous agent must be capable of making decisions and taking actions on its own. Like you tell your assistant that you plan a trip, give her some constraints, such as how much to spend on hotel and air, and let her make travel reservations. As it stands, Copilot Studio offers none of this. It follows a workflow you specify. Again, its output is more or less a smarter bot than the ones you see on many websites.

However, at Ignite Microsoft claimed that autonomy is coming so it will be interesting to see how the tool will evolve. Don’t get me wrong. Even as it stands, I believe the tool has enormous potential for more intelligent search and retrieval of information.

The Ugly

My basic complaint as of now is performance. It took the tool 10 minutes to index a PDF document. Then in a momentary lapse of reason, I connected it to an Azure SQL Database with Adventure Works with 15 tables (the max number of tables currently supported) and it’s still not done indexing after a day. Given that many implementations would require searching the data in relational databases, I believe this is not acceptable. Not to mention there isn’t much insight on how far it’s done indexing or limit the number of fields it should index.

Therefore, I believe most real-world architectures for implementing AI agents will take the path Copilot Studio->Azure AI Search ->Azure Open AI, where Copilot Studio is used for implementing the UI and workflows (topics and actions), while the data indexing is done by Azure AI Search with semantic ranking in conjunction with Azure Open AI for embedded vectors.

Atlanta Microsoft BI Group Meeting on September 3rd (Create Code Copilots with Large Language Models)

August 12, 2024/0 Comments/in Blog, Events/by Prologika - Teo Lachev

Atlanta BI fans, please join us in person for the next meeting on Monday, September 3th at 6:30 PM ET. Your humble correspondent will show you how to use Large Language Models, such as ChatGPT, to create your own copilots for Text2SQL and Text2DAX. I’ll also help you catch up on Microsoft BI latest. I will sponsor the event which marks the 14th anniversary of the Atlanta Microsoft BI Group! For more details and sign up, visit our group page.

Details

Presentation: Create Code Copilots with Large Language Models
Delivery: In-person
Date: September 3rd, 2024
Time: 18:30 – 20:30 ET
Level: Beginner to Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Venue
Improving Office
11675 Rainwater Dr
Suite #100
Alpharetta, GA 30009

Overview: Resistance is futile! Instead of fearing that AI will take over our jobs, embrace it and apply it to outsource mundane work and create a new class of applications that were not possible before. In this session, I’ll introduce you through the fascinating world of Large Language Models (LLMs) and one of their practical applications in creating Text2SQL and Text2DAX copilots. I’ll demonstrate how LLMs open new opportunities for intelligent exploration. As an optional challenge, bring your laptop, download the code from my website (use the download link below), and follow along using your favorite AI chat, such as ChatGPT, which is what I’ll use for the demos, Microsoft Copilot, Meta AI, Google Gemini, or Perplexity.ai. You’ll also discover how you can automate LLM-powered copilots using Python and Azure OpenAI.

Code download link: Create Code Copilots with Large Language Models

Speaker: Teo Lachev is a BI consultant, author, and mentor. Through his Atlanta-based company Prologika (https://prologika.com) he designs and implements innovative solutions that bring tremendous value to his clients and help them make sense of data. Teo has authored and co-authored many books, and he has been leading the Atlanta Microsoft Business Intelligence group since he founded it in 2010. Microsoft has recognized Teo’s contributions to the community by awarding him the prestigious Microsoft Most Valuable Professional (MVP) Data Platform status for 15 years. Microsoft selected Teo as one of only 30 FastTrack Solution Architects for Power Platform worldwide.

Sponsor: Prologika (https://prologika.com)

LLM Adventures: RAG Apps

July 19, 2024/0 Comments/in Blog/by Prologika - Teo Lachev

This post summarizes my research around the increasingly popular RAG apps, and it’s meant more as an internal memo to myself to summarize existing findings should one day a suitable project comes along. However, someone starting with RAG development might find this useful (we are all LLM rookies). RAG is a fascinating topic and presents another great case for generative AI in data analytics.

RAG (retrieval-augmented generation) apps apply AI to let end users intelligently search data, such as PDF or Word documents, using natural questions. The most common scenario is for searching internal data because public LLM models don’t have access to your corporate data repositories and therefore know nothing about your data.

Suppose your HR department has accumulated a large knowledge base of files detailing internal policies, such as health plans. Using a home-grown RAG app, the user can type natural questions, such as “Which plan supports vision?” and the RAG application should retrieve and rank related documents. Or a law firm might have documents related to matters and be interested in letting internal users ask natural questions to find the details of a specific case. As you can imagine, many companies can benefit from RAG <insert your company smart search needs here>.

Understanding RAG

Here is a typical high-level RAG architecture:

The user uses some sort of UI, such as a web portal, to submit the natural question. The app backend gets the question and submits it to a search service (Retrieval), such as Azure AI Search. Previously, you have configured Azure AI Search to create an index (knowledge base) by indexing the data found in files or popular databases. The following Azure AI Search features should be strongly considered (ideally, both should be implemented):

Embedding vectors – As a part of indexing the data, each text chunk should have an embedding vector, such as a vector produced by Azure OpenAI LLM embedding model. This allow Azure AI Search to find related documents when related terms are used in the question, such as “vision exam” and “eye exam”, but the underlying documents don’t contain these exact terms.
Semantic ranking – Once the related documents are identified, Azure AI Search can apply ranking to sort the related documents in descending order of relevance using an optional feature called Semantic Ranker.

Processing more complicated data can be done with Azure Document Intelligence Service (previously Form Recognizer). It’s capable of extracting data from more complicated structured files, such as invoices, medical forms, etc.

Once Azure AI Search returns the related text sections, the natural question and text are then sent to Azure OpenAI (Augmentation) which does the magic of parsing the semantics of the user input and describing the output in natural language typical to LLM-powered chat apps (Generative AI), such as “The plan A has coverage for vision”.

Further reading and code samples

A good starting point to learn about LLM in general and RAG is the Generative AI for Beginners free training (kudos to Kevin Jourdain for the hint). You can find more implementation-specific and in-depth RAG knowledge especially for Azure-based implementations in the excellent Pamela Fox’s YouTube Channel, such as the “Vector search, RAG, and Azure AI search” presentation.

Microsoft has also provided a code intensive end-to-end Python app. This app uses custom code for extracting the text using the Azure Document Intelligence Service, chunking the text, generating embeddings, integrating with Azure AI Search and Azure Open AI, and implementing a web app for the user interface.

An easier code to start with is the AG Academy’s Python sample. It uses custom code for chunking the text (either using the Python PDFReader library or Microsoft Document Intelligence service), and then integrating with Azure AI Search and Azure OpenAI.

Observations

Naturally, given that my developer skills are somewhat rusty (long live low-code BI solutions!), I’m inclined to find a low(er)-code approach that saves development effort. It looks like Microsoft is reasoning along the same lines as they’ve came up with the Azure AI Search Import and Vectorize Wizard (currently in preview). Based on my research, this wizard can avoid tons of custom code to:

Extract and chunk text.
Process documents of different types stored in the same folder (for some reason the files must be in ADLS Gen 1 or Fabric One Lake, since ADLS Gen 2 hierarchical folders are not supported).
Generate embedding vectors by integrating with Azure OpenAI embedding model, such as text-embedding-ada-002.
Refresh the index on schedule.

You still must write code for the user interface and integrating with Azure AI Search and Azure OpenAI. I’d love to see Microsoft enhancing the wizard to integrate further with Azure OpenAI text-based models in order to avoid the current two-step process that requires custom code for getting the results from Azure Search and them sending them to Azure OpenAI.

LLM Adventures: Text2DAX

July 11, 2024/0 Comments/in Blog/by Prologika - Teo Lachev

In my previous post, I covered how large language models, such as ChatGPT, can be used to convert natural queries to SQL. Let’s now see how Text2DAX fairs. But wait, we have a Microsoft Fabric copilot already for this, right? Yes, but what happens when you click the magic button in PBI Desktop? You are greeted that you need to purchase F64 or larger capacity. It’s a shame that Microsoft has decided that AI should be a super premium feature. Given this horrible predicament, what would a mortal developer strapped for cash do? Create their own copilot of course!

Building upon the previous sample, this is remarkably simple. As I mentioned in the first blog, one great LLM feature is the loose input. To make the schema generation simple, I obtained the table and column names from the Adventure Works semantic model (a *.pbix file opened in PBI Desktop) by running the following query in DAX Studio:

select Table_Name as TableName, Column_Name as ColumnName 
from $SYSTEM.DBSCHEMA_COLUMNS 
where column_olap_type = 'attribute' and table_name <>'$MySelectedMeasure' and table_name <> '$Parameter' and table_name <> '$DynamicMeasure' 
order by table_name

This gave me the model schema in a two-column format (I had to remove the $ in front of the table name which is a relic from OLAP days where dimension names were prefixed with $). Also, notice that I haven’t included model relationships yet.

TableName,ColumnName
"Customer","FirstName"
"Customer","HouseOwnerFlag"
…

Then, I used the following prompts:

As a DAX expert, examine a Power BI data model with the following table definitions where TableName lists the tables and ColumnName lists the columns, and provide DAX query that can answer user questions based on the data model, you will think step by step throughout and return DAX statement directly without any additional explanation.

<Table definition>
TableName,ColumnName
"Customer","FirstName"
"Customer","HouseOwnerFlag"
...

As with Text2SQL, I used a user role prompt to feed a natural question “What are the three best selling cities for the “AWC Logo Cap” product?” followed by an assistant role prompt containing the expected DAX query (noticed that I’ve made a mistake by not using a SUM function to summarize the SalesAmount column):

EVALUATE
TOPN (
3,
SUMMARIZECOLUMNS (
'DimGeography'[City],
"SalesAmount", CALCULATE('FactInternetSales'[SalesAmount], 'DimProduct'[EnglishProductName] = "AWC Logo Cap")
)
)
ORDER BY [SalesAmount] DESC

Then I started asking questions and my amazement grew as the complexity increased. Here are some samples:

What are the three best selling cities for the “AWC Logo Cap” product?

EVALUATE
TOPN (
3,
SUMMARIZECOLUMNS (
'Geography'[City],
"SalesAmount",
CALCULATE (
SUM('InternetSales'[SalesAmount]), -- notice that LLM fixed my syntax error
'Product'[ProductName] = "AWC Logo Cap"
)
),
[SalesAmount], DESC)

Which products sold the most?

EVALUATE
TOPN (
10,
SUMMARIZECOLUMNS (
'Product'[ProductName],
"TotalSalesAmount",
SUM('InternetSales'[SalesAmount])
),
[TotalSalesAmount], DESC)

What was YTD sales variance for 2012 compared to last year?
— Here LLM switch to a measure definition

VAR CurrentYearSales =
CALCULATE (
SUM('InternetSales'[SalesAmount]),
'Date'[CalendarYear] = 2012,
DATESYTD('Date'[Date])
)
VAR PreviousYearSales =
CALCULATE (
SUM('InternetSales'[SalesAmount]),
'Date'[CalendarYear] = 2011,
DATESYTD('Date'[Date])
)
VAR SalesVariance = CurrentYearSales - PreviousYearSales
RETURN SalesVariance

What was the customer churn in 2012?

Now, this is impressive! To start with, LLM had to discover what a “churn” is and to figure out that it needs a date context. At its first attempt, it created an incorrect measure because the Date dimension doesn’t relate to the Customer dimension.

VAR CustomersAtStartOfYear =
CALCULATE (
DISTINCTCOUNT('Customer'[CustomerKey]),
'Date'[CalendarYear] = 2011
)
VAR CustomersAtEndOfYear =
CALCULATE (
DISTINCTCOUNT('Customer'[CustomerKey]),
'Date'[CalendarYear] = 2012
)
VAR NewCustomers =
CALCULATE (
DISTINCTCOUNT('Customer'[CustomerKey]),
'Date'[CalendarYear] = 2012,
NOT EXISTS (
CALCULATETABLE (
'Customer',
'Date'[CalendarYear] = 2011
),
'Customer'[CustomerKey]
)
)
VAR ChurnedCustomers = CustomersAtStartOfYear - CustomersAtEndOfYear + NewCustomers
RETURN ChurnedCustomers

Relationships are important concept in Power BI semantic modeling, and they are both a blessing and a curse. Thanks to these explicit paths, the end user doesn’t need how to join tables. But they also introduce rigidity and are subject of various limitations, such as that Power BI doesn’t support diamond or closed loop relationships (interestingly, at some point Microsoft was planning to implement UI to prompt the user which path to take but this never materialized.

Let’s now tell the model about the relationships. I added the following after the Table Definitions section to enumerate a few of the model relationships.

<Relationships>
InternetSales[OrderDate]='Date'[Date]
InternetSales[ShipDate]='Date'[Date] (inactive)
InternetSales[CustomerKey]=Customer[CustomerKey]
InternetSales[ProductKey]=Product[ProductKey]

Now running the previous query, we get a much better outcome because the measure now evaluates over the InternetSales table. It would have blown my mind if it had used CustomerID (business key) instead of CustomerKey (surrogate key) 😊

VAR CustomersAtStartOfYear =
CALCULATE (
DISTINCTCOUNT('InternetSales'[CustomerKey]),
'Date'[CalendarYear] = 2011
)
VAR CustomersAtEndOfYear =
CALCULATE (
DISTINCTCOUNT('InternetSales'[CustomerKey]),
'Date'[CalendarYear] = 2012
)
VAR RetainedCustomers =
CALCULATE (
DISTINCTCOUNT('InternetSales'[CustomerKey]),
'Date'[CalendarYear] = 2012,
FILTER (
'Customer',
CALCULATE (
DISTINCTCOUNT('InternetSales'[CustomerKey]),
'Date'[CalendarYear] = 2011
) > 0
)
)
VAR ChurnedCustomers = CustomersAtStartOfYear - RetainedCustomers
RETURN ChurnedCustomers

Which customers churned in 2012?

–Correctly switching to a DAX query

EVALUATE
EXCEPT (
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date'[CalendarYear] = 2011
),
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date'[CalendarYear] = 2012
)
)

Which customers churned in 2012? Show me the customer name.

EVALUATE
SELECTCOLUMNS (
EXCEPT (
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date'[CalendarYear] = 2011
),
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date'[CalendarYear] = 2012
)
),
"CustomerName", LOOKUPVALUE('Customer'[FullName], 'Customer'[CustomerKey], [CustomerKey])
)

Which customers churned in 2012? Use ship date instead of order date
–Notice how LLM adds USERELATIONSHIP.

EVALUATE
SELECTCOLUMNS (
EXCEPT (
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date',
'Date'[CalendarYear] = 2011,
USERELATIONSHIP('InternetSales'[ShipDate], 'Date'[Date])
),
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date',
'Date'[CalendarYear] = 2012,
USERELATIONSHIP('InternetSales'[ShipDate], 'Date'[Date])
)
),
"CustomerName", LOOKUPVALUE('Customer'[FullName], 'Customer'[CustomerKey], [CustomerKey])
)

In summary, it appears that LLM can effectively assist us in writing code. The emphasis is on assist because I view the LLM role as a second set of eyes. Hey, what do you think about this problem I’m trying to solve here? LLM doesn’t absolve us from doing our homework and learning the fundamentals, nor it can compensate for improper design. While LLM might not always generate the optimum code and might sometimes fabricate, it can definitely assist you in creating business calculations, generating test queries, and learning along the way.

BTW, you can use any of the publicly available LLM apps, such as Copilot, ChatGPT, Google Gemini or Perplexity (you don’t need the sample app I’ve demonstrated) for Text2SQL and Text2DAX and probably you would obtain similar results if you give it the right prompts. I used this app because I was interested in automating the process.

LLM Adventures: Text2SQL

July 7, 2024/0 Comments/in Blog/by Prologika - Teo Lachev

As inspired by Kevin Jordain, who clued me about the increased demand for using natural questions to analyze client’s data, John Kerski, who did a great presentation to our Atlanta BI Group on integrating Power Query with Azure OpenID, and Kyle Hale, who believes (wrongly 😊) that the Databricks Genie will make Power BI and semantic modeling obsolete, I set on a quest to find how effective Large Language Models (LLM) are in generating SQL from natural questions, also known as Text2SQL. What a better way to spend some free time around the 4^th of July holiday, right?

I have to say that I was impressed with LLM. I used the excellent Ric Zhou’s Text2SQL sample as a starting point inside Visual Studio Code. The sample uses the Python streamlit framework to create a web app that submits natural questions to Azure OpenAI.

My humble contributions were:

I switched to the Azure OpenAI chatgpt-4o model.
Because the Python openai module has deprecated ChatCompletion, I transitioned the code to use the new way of interacting, as I explained in the related GitHub issue.
I used the AdventureWorksDW2019 database which I deployed to Azure SQL Database.

I was amazed how simple the LLM input was. Given that it’s trained with many popular languages, including SQL, all you have to do is provide some context, database schema (generated in a simple format by a provided tool), and a few prompts:

[{'role': 'system', 'content': 'You are smart SQL expert who can do text to SQL, following is the Azure SQL database data model <database schema>},
{'role': 'user', 'content': 'What are the three best selling cities for the "AWC Logo Cap" product?'},
{'role': 'assistant', 'content': 'SELECT TOP 3 A.City, sum(SOD.LineTotal) AS TotalSales \\nFROM [SalesLT].[SalesOrderDe...BY A.City \\nORDER BY TotalSales DESC; \\n'},
{'role': 'user', 'content': natural question here'}]

Let’s decipher this:

The first prompt provides context to the model for our intention to act as a SQL expert.
The sample includes a tool that generates the database schema consisting of table and columns in the following format below. Noticed that referential integrity constraints are not included (the model doesn’t need to know how the tables are related!) There is nothing special in the syntax. It is fluid (impressive!), which is a paradigm shift from structured XML or JSON where missing a bracket crashes the system. So, I could have expressed the schema using the following notation and it would still work (ideally, we should also provide the column data types):
Table: FactSalesQuota
– SalesQuotaKey
– other columns

or
Table: FactSalesQuota
Columns: column1, column2

or
CREATE TABLE (DML here)

<Table definition>
[dbo].[FactSalesQuota]
-SalesQuotaKey
-EmployeeKey
-DateKey
-CalendarYear
-CalendarQuarter
-SalesAmountQuota
-Date

[dbo].[DimAccount]
-AccountKey 
-…

The next prompt assumes the role of an end user who will ask a natural question, such as ‘What are the three best selling cities for the “AWC Logo Cap” product?’
The next prompt assumes the role of an assistant who hints the model what the correct query should be.
The final prompt acts as a user who now provides what natural question you enter.

The sample also supports running the query against the database and visualizing the results as a table, bar chart, area chart, or line chart. Here are some results

What are the three best selling cities for the “AWC Logo Cap” product?

SELECT TOP 3 G.City, SUM(FIS.SalesAmount) AS TotalSales
FROM [dbo].[FactInternetSales] FIS
JOIN [dbo].[DimProduct] P ON FIS.ProductKey = P.ProductKey
JOIN [dbo].[DimCustomer] C ON FIS.CustomerKey = C.CustomerKey
JOIN [dbo].[DimGeography] G ON C.GeographyKey = G.GeographyKey
WHERE P.EnglishProductName = 'AWC Logo Cap'
GROUP BY G.City
ORDER BY TotalSales DESC;

Which products sold the most?

SELECT P.EnglishProductName, SUM(FIS.SalesAmount) AS TotalSales
FROM [dbo].[FactInternetSales] FIS
JOIN [dbo].[DimProduct] P ON FIS.ProductKey = P.ProductKey
GROUP BY P.EnglishProductName
ORDER BY TotalSales DESC;

Which customers bought something in 2012 but didn’t buy anything in 2011?

SELECT DISTINCT C.CustomerKey, C.FirstName, C.LastName
FROM [dbo].[FactInternetSales] FIS2012
JOIN [dbo].[DimCustomer] C ON FIS2012.CustomerKey = C.CustomerKey
WHERE FIS2012.OrderDateKey BETWEEN 20120101 AND 20121231
AND C.CustomerKey NOT IN (
SELECT DISTINCT FIS2011.CustomerKey
FROM [dbo].[FactInternetSales] FIS2011
WHERE FIS2011.OrderDateKey BETWEEN 20110101 AND 20111231);

All SQL statements were syntactically and logically correct. Pretty amazing, right? Especially considering that the model had to figure out how the tables related, such as that DimCustomer is related to FactInternetSales but not to FactResellerSales.

In summary, it appears that LLM can be used effectively for generating SQL from natural text. This is opening new possibilities for ad hoc data analytics and for assisting users with learning or generating code. I believe that simplified schema is very important, such as star schema. This underscores the importance of schema design for data analytics, something that AI can’t do (yet).

What about Text2DAX? Can LLM do it? You bet! DAX happens to be one of the programming languages used for training, so stay tuned for the next blog post where I’ll present you with a poor man’s DAX copilot.

Posts