LLM Adventures: Text2DAX

In my previous post, I covered how large language models, such as ChatGPT, can be used to convert natural queries to SQL. Let’s now see how Text2DAX fairs. But wait, we have a Microsoft Fabric copilot already for this, right? Yes, but what happens when you click the magic button in PBI Desktop? You are greeted that you need to purchase F64 or larger capacity. It’s a shame that Microsoft has decided that AI should be a super premium feature. Given this horrible predicament, what would a mortal developer strapped for cash do? Create their own copilot of course!

A screenshot of a chat Description automatically generated

Building upon the previous sample, this is remarkably simple. As I mentioned in the first blog, one great LLM feature is the loose input. To make the schema generation simple, I obtained the table and column names from the Adventure Works semantic model (a *.pbix file opened in PBI Desktop) by running the following query in DAX Studio:

select Table_Name as TableName, Column_Name as ColumnName 
from $SYSTEM.DBSCHEMA_COLUMNS 
where column_olap_type = 'attribute' and table_name <>'$MySelectedMeasure' and table_name <> '$Parameter' and table_name <> '$DynamicMeasure' 
order by table_name

This gave me the model schema in a two-column format (I had to remove the $ in front of the table name which is a relic from OLAP days where dimension names were prefixed with $). Also, notice that I haven’t included model relationships yet.

TableName,ColumnName
"Customer","FirstName"
"Customer","HouseOwnerFlag"
…

Then, I used the following prompts:

As a DAX expert, examine a Power BI data model with the following table definitions where TableName lists the tables and ColumnName lists the columns, and provide DAX query that can answer user questions based on the data model, you will think step by step throughout and return DAX statement directly without any additional explanation.

<Table definition>
TableName,ColumnName
"Customer","FirstName"
"Customer","HouseOwnerFlag"
...

As with Text2SQL, I used a user role prompt to feed a natural question “What are the three best selling cities for the “AWC Logo Cap” product?” followed by an assistant role prompt containing the expected DAX query (noticed that I’ve made a mistake by not using a SUM function to summarize the SalesAmount column):

EVALUATE
TOPN (
3,
SUMMARIZECOLUMNS (
'DimGeography'[City],
"SalesAmount", CALCULATE('FactInternetSales'[SalesAmount], 'DimProduct'[EnglishProductName] = "AWC Logo Cap")
)
)
ORDER BY [SalesAmount] DESC

Then I started asking questions and my amazement grew as the complexity increased. Here are some samples:

What are the three best selling cities for the “AWC Logo Cap” product?

EVALUATE
TOPN (
3,
SUMMARIZECOLUMNS (
'Geography'[City],
"SalesAmount",
CALCULATE (
SUM('InternetSales'[SalesAmount]), -- notice that LLM fixed my syntax error
'Product'[ProductName] = "AWC Logo Cap"
)
),
[SalesAmount], DESC)

Which products sold the most?

EVALUATE
TOPN (
10,
SUMMARIZECOLUMNS (
'Product'[ProductName],
"TotalSalesAmount",
SUM('InternetSales'[SalesAmount])
),
[TotalSalesAmount], DESC)

What was YTD sales variance for 2012 compared to last year?
— Here LLM switch to a measure definition

VAR CurrentYearSales =
CALCULATE (
SUM('InternetSales'[SalesAmount]),
'Date'[CalendarYear] = 2012,
DATESYTD('Date'[Date])
)
VAR PreviousYearSales =
CALCULATE (
SUM('InternetSales'[SalesAmount]),
'Date'[CalendarYear] = 2011,
DATESYTD('Date'[Date])
)
VAR SalesVariance = CurrentYearSales - PreviousYearSales
RETURN SalesVariance

What was the customer churn in 2012?

Now, this is impressive! To start with, LLM had to discover what a “churn” is and to figure out that it needs a date context. At its first attempt, it created an incorrect measure because the Date dimension doesn’t relate to the Customer dimension.

VAR CustomersAtStartOfYear =
CALCULATE (
DISTINCTCOUNT('Customer'[CustomerKey]),
'Date'[CalendarYear] = 2011
)
VAR CustomersAtEndOfYear =
CALCULATE (
DISTINCTCOUNT('Customer'[CustomerKey]),
'Date'[CalendarYear] = 2012
)
VAR NewCustomers =
CALCULATE (
DISTINCTCOUNT('Customer'[CustomerKey]),
'Date'[CalendarYear] = 2012,
NOT EXISTS (
CALCULATETABLE (
'Customer',
'Date'[CalendarYear] = 2011
),
'Customer'[CustomerKey]
)
)
VAR ChurnedCustomers = CustomersAtStartOfYear - CustomersAtEndOfYear + NewCustomers
RETURN ChurnedCustomers

Relationships are important concept in Power BI semantic modeling, and they are both a blessing and a curse. Thanks to these explicit paths, the end user doesn’t need how to join tables. But they also introduce rigidity and are subject of various limitations, such as that Power BI doesn’t support diamond or closed loop relationships (interestingly, at some point Microsoft was planning to implement UI to prompt the user which path to take but this never materialized.

Let’s now tell the model about the relationships. I added the following after the Table Definitions section to enumerate a few of the model relationships.

<Relationships>
InternetSales[OrderDate]='Date'[Date]
InternetSales[ShipDate]='Date'[Date] (inactive)
InternetSales[CustomerKey]=Customer[CustomerKey]
InternetSales[ProductKey]=Product[ProductKey]

Now running the previous query, we get a much better outcome because the measure now evaluates over the InternetSales table. It would have blown my mind if it had used CustomerID (business key) instead of CustomerKey (surrogate key) 😊

VAR CustomersAtStartOfYear =
CALCULATE (
DISTINCTCOUNT('InternetSales'[CustomerKey]),
'Date'[CalendarYear] = 2011
)
VAR CustomersAtEndOfYear =
CALCULATE (
DISTINCTCOUNT('InternetSales'[CustomerKey]),
'Date'[CalendarYear] = 2012
)
VAR RetainedCustomers =
CALCULATE (
DISTINCTCOUNT('InternetSales'[CustomerKey]),
'Date'[CalendarYear] = 2012,
FILTER (
'Customer',
CALCULATE (
DISTINCTCOUNT('InternetSales'[CustomerKey]),
'Date'[CalendarYear] = 2011
) > 0
)
)
VAR ChurnedCustomers = CustomersAtStartOfYear - RetainedCustomers
RETURN ChurnedCustomers

Which customers churned in 2012?

–Correctly switching to a DAX query

EVALUATE
EXCEPT (
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date'[CalendarYear] = 2011
),
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date'[CalendarYear] = 2012
)
)

Which customers churned in 2012? Show me the customer name.

EVALUATE
SELECTCOLUMNS (
EXCEPT (
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date'[CalendarYear] = 2011
),
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date'[CalendarYear] = 2012
)
),
"CustomerName", LOOKUPVALUE('Customer'[FullName], 'Customer'[CustomerKey], [CustomerKey])
)

Which customers churned in 2012? Use ship date instead of order date
–Notice how LLM adds USERELATIONSHIP.

EVALUATE
SELECTCOLUMNS (
EXCEPT (
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date',
'Date'[CalendarYear] = 2011,
USERELATIONSHIP('InternetSales'[ShipDate], 'Date'[Date])
),
CALCULATETABLE (
VALUES('Customer'[CustomerKey]),
'InternetSales',
'Date',
'Date'[CalendarYear] = 2012,
USERELATIONSHIP('InternetSales'[ShipDate], 'Date'[Date])
)
),
"CustomerName", LOOKUPVALUE('Customer'[FullName], 'Customer'[CustomerKey], [CustomerKey])
)

In summary, it appears that LLM can effectively assist us in writing code. The emphasis is on assist because I view the LLM role as a second set of eyes. Hey, what do you think about this problem I’m trying to solve here? LLM doesn’t absolve us from doing our homework and learning the fundamentals, nor it can compensate for improper design. While LLM might not always generate the optimum code and might sometimes fabricate, it can definitely assist you in creating business calculations, generating test queries, and learning along the way.

BTW, you can use any of the publicly available LLM apps, such as Copilot, ChatGPT, Google Gemini or Perplexity (you don’t need the sample app I’ve demonstrated) for Text2SQL and Text2DAX and probably you would obtain similar results if you give it the right prompts. I used this app because I was interested in automating the process.