App Integration Tools and Patterns

“…That’s when the teardrops start, fella…
…Dirty deeds and they’re done dirt cheap”

“Dirty deeds done dirt cheap”, AC/DC

A few months ago, I did an assessment for a large organization seeking to modernize their data analytics processes. As a part of standardizing the toolset, the client was considering Salesforce MuleSoft for both application and data integration. After all, the vendor told them the tool can do “everything”. Indeed, taken from the vendor page, we learn that “MuleSoft is a platform that gives IT the tools to automate everything. This includes integrating data and systems, automating workflows and processes, and creating incredible digital experiences — all on a single, easy-to-use platform.”

Let’s parse this claim.

Defining integration types

Quick as a bunny and excited that I finally have one tool that does it all, we hop to the Gartner’s Magic Quadrant for Data Integration Tools. Surprise, Salesforce MuleSoft is not even there. Where is MuleSoft? It’s on the Gartner’s Magic Quadrant for Integration Platform as a Service and the Magic Quadrant for API Management. Confused? Let’s start with definitions.

Data integration (note the focus on data)

“Gartner defines data integration as the discipline that includes architectural patterns, tools, and methodologies for accessing, harmonizing, transforming, processing, and moving data across various endpoints and infrastructures. It ensures consistent access and delivery of data across different subject areas and structures to meet business and application needs.”

Integration Platform (note the focus on applications)

“Gartner defines integration platform as a service (iPaaS) as a vendor-managed cloud service that enables end users to implement integrations between a variety of applications, services and data sources, both internal and external to their organization.”

API Management (note the focus on APIs)

“Gartner defines the application programming interface (API) management market as the market for software that supports API life cycle stages such as planning and design, implementation and testing, deployment and operation, and versioning and retirement.”

So, should we use MuleSoft for ETL and data integration patterns typically involving large data volumes, such as to load a data warehouse? I say we don’t. You probably could, but it will be like trying to fit a square peg into a round hole. Did that client try it despite my advice against it? They did and came to the same realization. Should we consider MuleSoft for enterprise-level app integration, such as something happening in one system triggers something happening in another system by calling APIs? Probably, that’s what’s designed to do anyway. Let’s choose the right tool for the job.

Application integration done dirt cheap

What about a true application integration scenario? In a current assessment, a client uses a third-party app which stores data in a SQL Server database. Each subsidiary has its own database that tracks invoicing locally through the app. However, a central NetSuite ERP system is used to consolidate invoices from all subsidiaries. So, the integration task at hand is to implement a bi-directional synchronization where each new local invoice (and its associated line items) is exported to NetSuite and the local invoice is then updated with the corresponding NetSuite transaction id (it’s more complicated than this but you get the idea).

Now this is where we can surely use “specialized” tools like MuleSoft, Boomi, Celigo, etc., right? And we surely pay a premium, to the likes of thousands of dollars per month. I don’t claim to know these tools but for the most part they appear to me as nice UI wrappers on top of the vendor APIs. They certainly avoid dealing with the actual API complexities, JSONs, etc, but should you lock yourself and break the bank for small scale app integration? Nowadays, everyone wants to do more with less.

Therefore, knowing and using the Microsoft Azure stack, my personal choice is to first consider Azure Data Factory. True, it’s a generic data integration tool and I must learn how to call the vendor APIs and deal with JSONs but it’s not a big deal (AI to help here). ADF can even do webhooks. And it will save you tons of money.

A group of people standing around a large device AI-generated content may be incorrect.

 

Atlanta Microsoft BI Group Meeting on May 5th (Optimize Power BI Reports and Semantic Models Using External Tools)

Atlanta BI fans, please join us in person for our next meeting on Monday, May 5th at 18:30 ET. Lakshmana R Boya will show us how to optimize Power BI reports by identifying the performance and size issues using “Measure Killer” community app. Your humble correspondent will walk you through some of the Power BI and Fabric enhancements of late. For more details and sign up, visit our group page.

Delivery: In-person

Level: Intermediate

Food: Pizza and drinks will be provided

 

Agenda:

18:15-18:30 Registration and networking

18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)

19:00-20:15 Main presentation

20:15-20:30 Q&A

Overview: We will deep dive into how we can optimize Power BI reports and semantic models by identifying the performance and size issues using the “Measure Killer” app.

Sponsor: Prologika (https://prologika.com) helps organizations of all sizes to make sense of data by delivering tailored BI solutions that drive actionable insights and maximize ROI. Your BI project will be your best investment, we guarantee it!

PowerBILogo

Two Takeaways from FabCon 2025

The 2025 Fabric Community conference has just concluded. There were many announcements but I found two of them intriguing to make a case for Fabric more compelling should a suitable project comes around:

  • OneSecurity – Analysis Services Tabular and Power BI semantic models have long supported a comprehensive and flexible row-level security (RLS), such as to support scenarios requiring dynamically denying rows. Unfortunately, column-level security (aka object-level security) has been less flexible as I explain in more detail here. I hope OneSecurity will prove more flexible over time. Currently, it doesn’t support dynamic security, but I hope it eventually will, such as by allowing a SELECT statement to evaluate runtime conditions. So, keep an eye out for OneSecurity as it evolves over time if you face more complex column-level security requirements.
  • Copilots for all F SKUs – Previously, I complained that Microsoft has established that AI must require premium pricing starting at $5,000/mo (F64 and above). But now they’ve promised that by the end of April copilots would go mainstream and available in all F SKUs should you need AI assistance.

Alas, no major announcements about Fabric Warehouse. As with Synapse, Microsoft went back to the drawing board with Fabric Warehouse. Consequently, the its T-SQL surface area doesn’t support various features I use. For example, IDENTITY and MERGE were promised long ago but unfortunately didn’t materialize. I personally don’t care for the most part about relational-like lakehouses or feature bundles. I don’t dare even dream about “advanced” features that SQL Server had for a long time, such as multiple databases, temporal tables, etc. Without robust warehousing, I still find it difficult to substantiate a good case for Fabric for most of my projects.

A group of people in a conference room AI-generated content may be incorrect.

Atlanta Microsoft BI Group Meeting on April 7th (Source Control for Power BI)

Atlanta BI fans, please join us in person for our next meeting on Monday, April 7th at 18:30 ET. Neel Pathak will explain how to do source control for Power BI with Git integration. Your humble correspondent will walk you through some of the Power BI and Fabric enhancements of late. For more details and sign up, visit our group page.

Presentation: Source Control for Power BI
Delivery: In-person
Level: Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Overview: The days of sharing pbix files and copying in changes are gone. Git integration now enables development teams to collaborate and provides version control/rollbacks.

Speaker: Neel Pathak (Senior Consultant with 3Cloud) a BI and analytics professional with an expertise in Power BI and guiding clients through their analytics transformation journeys.

Sponsor: 3Cloud is the #1 Microsoft Azure partner. No other partner combines 3Cloud’s expertise in Azure with a deep understanding of infrastructure, data and analytics and app innovation. We have proven our impact and are trusted with client’s most complex business problems. We look for people who share our passion for technology and want to unleash their potential.

PowerBILogo

Hacking Adventures: Automating Button Click

Sometimes you may encounter annoying training content that forces you to watch a section before you can explicitly advance to the next one. Never understood why I have to do this or sit for long trainings altogether if I already know the answers and can pass the naive quizzes. Usually, a web button to continue gets enabled after you watch a video or after some time elapses.

With some investigative work and low-code JavaScript, you can easily “automate” the process to some extent, such as until it’s time for the quiz, as follows:

  1. The trick is to find the id of the button and what makes it enabled/disabled. In the Edge browser, right click the “button” and then click Inspect. This will open the Dev Tools and you’ll will see the button HTML definition. You can right-click it on the HTML code and then click Copy OuterHTML to paste it in Notepad to inspect it. In my case, the element id is “button-next”. When the button is disabled, the app adds an attribute disabled as follows:
    <button … disabled=”” type=”button” id=”button-next”>…</button>
  2. In the browser Dev Tools Console (F12), copy and paste the following JavaScript code, and then press Enter (note that you must execute allow pasting before you can copy and paste code for the first time as the Console will inform you):
    function checkAndClick() {
    const targetElement = document.getElementById('button-next'); // Get the element by ID
    
    if (targetElement) { // Check if the element exists
    if (!targetElement.hasAttribute('disabled')) { // Check if it does NOT have the 'disabled' attribute
       targetElement.click(); // Simulate a click
       console.log('Element clicked!');
    } else {
      console.log('Element is disabled.');
    }
    } else {
        console.log('Element not found.');
    }
    }
    // Set an interval to run the function every 2 seconds
    const intervalId = setInterval(checkAndClick, 2000);
    // clearInterval(intervalId); Disable the timer when no longer needed
    

    This code watches every two seconds if the disabled attribute is removed from the button, in which case it “clicks” the button to advance the content. Hope this will save you precious time for more productive work. Happy training!

 

 

LLM Adventures: xAI Grok

I’ve been recently impressed with xAI Grok compared to Microsoft Copilot. Grok performs better that Copilot which seems to be crippled with political correctness to a point of absurdity although it seems to be getting more permissive of late. But the most important Grok advantage is that it’s always recent… and it doesn’t crash every now and then.

A couple of examples:

  1. Question: Where is this image from?

A group of people walking in a cave AI-generated content may be incorrect.

    1. Microsoft Copilot – what exactly is inappropriate is not clear.
      A white box with black text AI-generated content may be incorrect.
    2. Grok (impressive!) – This image depicts the rock formations and cave dwellings of Cappadocia, a historical region in central Turkey. The unique rock structures and the cave homes carved into them are iconic features of this area, particularly around the town of Göreme.
  1. Question: What are the most important political events to watch for tomorrow (as of 2/18/2025)?
    1. Microsoft Copilot (chaos of past and current events):
      1. Presidential Primaries: Former President Donald Trump and President Joe Biden have both secured enough delegates to become their parties’ presumptive nominees for president. The primaries in Georgia, Mississippi, and Washington state have been particularly significant.
      2. Senate Hearings: There are ongoing confirmation hearings for various cabinet positions, including the Department of Homeland Security and the Department of Housing and Urban Development.
      3. International Relations: Discussions and negotiations on international issues, such as the ongoing conflict in Ukraine and relations with China, are likely to be in the spotlight.
      4. Domestic Policy Debates: Debates on domestic policies, including immigration reform, climate change, and economic policies, will continue to be important topics
    2. Grok – Based on the available information, here are some of the most important political events to watch for on February 19, 2025:
      1. Pat King’s Sentencing in Canada: This event is noted in posts on X, indicating its significance as people are discussing its impact. Pat King was involved in the Freedom Convoy protests, and his sentencing could have broader implications on discussions around civil liberties and protest laws in Canada.
      2. Global Political Dynamics: Although not specific to February 19, the context from web sources suggests a focus on ongoing geopolitical tensions and diplomatic engagements. For instance, the broader political calendar highlights events like the NATO defense ministers’ meeting in Brussels, which might influence or be influenced by daily developments in international relations.

I’m looking forward to Grok 3!

Atlanta Microsoft BI Group Meeting on February 3rd (Fabric Warehouse vs. Fabric Lakehouse)

Atlanta BI fans, please join us in person for our next meeting on Monday, February 3rd at 18:30 ET. Jeff Levy (Data Architect @ Protiviti) will present the core concepts, architectures, and use cases of the Fabric Warehouse and Fabric Lakehouse. Your humble correspondent will walk you through the most significant Power BI and Fabric enhancements of late. For more details and sign up, visit our group page.

Presentation: Fabric Warehouse vs. Fabric Lakehouse: Choosing the Right Architecture for Your Environment
Delivery: In-person
Level: Intermediate
Food: Pizza and drinks will be provided

Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (news, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A

Overview: The modern data landscape demands scalable, flexible, and efficient architectures to support diverse business needs. With Microsoft Fabric, two leading paradigms have emerged to address these challenges: the Fabric Warehouse and the Fabric Lakehouse. While both tools aim to provide robust solutions for data storage, processing, and analytics, their approaches, strengths, and trade-offs differ significantly.

This presentation explores the core concepts, architectures, and use cases of the Fabric Warehouse and Fabric Lakehouse. I will compare their performance in areas such as data integration, scalability, and cost-efficiency. Attendees will gain insights into how these approaches align with specific business objectives and workloads, enabling informed decisions about which model best suits their organization’s data strategy.

Sponsor: Protiviti (www.protiviti.com) is a global consulting firm that delivers deep expertise, objective insights, a tailored approach and unparalleled collaboration to help leaders confidently face the future. Protiviti and its independent and locally owned member firms provide clients with consulting and managed solutions in finance, technology, operations, data, digital, legal, HR, risk and internal audit through a network of more than 90 offices in over 25 countries.

PowerBILogo

TMDL View in Power BI Desktop

I had the privilege to participate in the early preview program of the new TMDL View in Power BI Desktop which is currently in public preview in the latest January release of Power BI Desktop. Without reiterating what was said in the announcement, I’d like to mention three main benefits of this feature:

  1. Ability to access the entire model metadata – This includes features don’t have User interface in Power BI Desktop. Traditionally, BI developers have been relying on Tabular Editor to do so. Now you have another option although it requires knowing the TMLDL language. Alas, TMLD doesn’t come with user interface although it does support Autocomplete.
  2. Ability to copy specific model features from one Power BI Desktop file to another – For example, in the screenshot below, I have scripted a calculation group. Now, I can open another Power BI Desktop file, copy the script and apply it. Of course, the target model must include the referenced entities, otherwise I’ll get an error.
  3. Automating tasks – Hopefully, in near future support creating add-ins to automate certain aspects like creating macros in Excel by programming the Excel VBA object model. For example, a developer should be to use the Tabular Object Model (TOM) API to create TMDL scripts and apply them to a semantic model.

Handling Data Quality and Data Enrichment

Some of the most difficult and tedious issues in almost every BI project are handling data quality issues (aka exceptions) and data enrichment. Examples of data quality issues typically originate from wrong data entries or violated business rules, such as misspelled and misclassified products. Data enrichment tasks may go beyond the design of the original data source, such as introducing new taxonomies for products or customers.

In a recent project, a client needed guidance on where to handle data quality and enrichment processes. Like cakes and ogres, a classic BI solution has layers. Starting upstream in the data pipeline and moving downstream are data sources, ODS (data staging and change data tracking), EDW (star schema) and semantic models.

In general, unless absolutely necessary, I’m against master data management (MDM) systems because of their complexity and cost. Instead, I believe that most data quality and enrichment tasks can be addressed efficiently without formal MDM. I’m also a big believer in tackling these issues as further upstream as possible, ideally in the data source.

Consider the following guidelines:

  1. There are different types of exceptions and enrichment requirements, and they range in complexity and resolution domain. Each case must be triaged by the team to determine how to best handle it by examining the data pipeline upstream to downstream direction and traversing the layers.
  2. As a best practice, exceptions and enrichments tasks should be dealt with as further upstream as possible because the further downstream the correction is made, the narrower its availability to consumers will be and more implementation effort might be required.
    1. Data Source – examples include wrong entries, such as wrong employe’s job start date or adding custom fields to implement categories requires for standard hierarchies. The team should always start with the source system in attempt to address data quality issues at the source.
    2. ODS – examples include corrections not possible in the upstream layers but necessary for other consumers besides EDW.
    3. EDW – examples include corrections applicable only to analytics, such as IsAttendance flag to flag which attendance events should be evaluated.
    4. Semantic layer – calculated columns for data enrichment best done in DAX and additional fields the end user needs in a composite model.
  3. Shortcuts – When resolving the change upstream is time prohibitive, it should be permissible to handle the change further downstream while waiting for changes to an upstream system. For example, if fixing an issue in the data source is rather involved, a temporary fix can be done in ODS to prototype the change and expedite it to consumers. However, the team will remove that fix once the source system is enhanced.
  4. When consumers prefer access to uncorrected data, a pair of fields could be introduced as upstream as possible, such as:
    1. EmployeeId – raw field
    2. EmployeeIdCorrected – corrected field. Joins to the related dimension table will be done on this field.

data quality issues

LLM Adventures: Microsoft Copilot Studio (The Good, The Bad, and The Ugly)

“A momentary lapse of reason
That binds a life to a life
You won’t regret, you will never forget
There’ll be no sleep in here tonight”
“The Slip”, Pink Floyd

Happy Thanksgiving! What a better way for me to spend a Thanksgiving week than doing more AI? After a year of letting it (and other Microsoft LLM offerings) simmer and awaken by the latest AI hoopla from the Ignite conference, I took another look at Microsoft Copilot Studio. For the uninitiated, Copilot Studio lets you implement AI-powered smart bots (“agents”) for deriving knowledge from documents or websites. Basically, you can view the relationship of Copilot Studio to Retrieval-augmented Generation apps as what Power BI is to self-service BI.

Copilot Studio licensing starts at $200 per month for up to 25,000 messages (interactions between user and agent) although at Ignite Microsoft hinted that pay-as-you-go licensing will be coming.

The Good

A few months ago, when I discussed RAG apps, it was obvious that a lot of custom code had to be written to glue the services together and implement the user interface. Microsoft Copilot Studio has the potential to change and simplify this. It offers a Power Automate-like environment for no-code, low-code implementation of AI agents and therefore opens new possibilities for faster implementation of various and specialized AI agents across the enterprise. I was impressed by how easy the process was and how capable the tool was to create more complex topics, such as conditional branches based on user input.

Like Power BI, the tool gets additional appeal from its integration with the Microsoft ecosystem. For example, it can index SharePoint and OneDrive documents. It can integrate with Power Automate, Azure AI Search and Azure Open AI.

I was impressed by how easy is to use the tool to connect to and intelligently search an existing website. For now, I see this as being its main strength. Organizations can quickly implement agents to help their employees or external users to derive knowledge from intranet or Internet websites.

To demonstrate this, I implemented an agent to index my blog and embedded it below for you to try it out before my free trial expires. Please feel free to ask more sophisticated questions, such as “What’s the author’s sentiment toward Fabric?”, “What are the pros and cons of Fabric?”. Or “I need help with Power BI budget” (I got innovative here and implemented a conditional topic with branches depending on the budget you specify). I instructed the tool to stay only within the content of  my website, so the answers are not diluted from other public sources. Given that no custom code was written, Copilot Studio is pretty impressive.

The Bad

Everyone wants to be autonomous and AI agents are no exception. In fact, “autonomous agent” is the buzzword of AI world today. Not to be outdone, Copilot Studio claims that it can “build agents that operate independently to dynamically plan, learn, and escalate on your behalf”. However, as the tool stands today, I don’t think there is much to this claim. Or it could be that my definition of “autonomous” is different than Microsoft’s.

To me, an autonomous agent must be capable of making decisions and taking actions on its own. Like you tell your assistant that you plan a trip, give her some constraints, such as how much to spend on hotel and air, and let her make travel reservations. As it stands, Copilot Studio offers none of this. It follows a workflow you specify. Again, its output is more or less a smarter bot than the ones you see on many websites.

However, at Ignite Microsoft claimed that autonomy is coming so it will be interesting to see how the tool will evolve. Don’t get me wrong. Even as it stands, I believe the tool has enormous potential for more intelligent search and retrieval of information.

The Ugly

My basic complaint as of now is performance. It took the tool 10 minutes to index a PDF document. Then in a momentary lapse of reason, I connected it to an Azure SQL Database with Adventure Works with 15 tables (the max number of tables currently supported) and it’s still not done indexing after a day. Given that many implementations would require searching the data in relational databases, I believe this is not acceptable. Not to mention there isn’t much insight on how far it’s done indexing or limit the number of fields it should index.

Therefore, I believe most real-world architectures for implementing AI agents will take the path Copilot Studio->Azure AI Search ->Azure Open AI, where Copilot Studio is used for implementing the UI and workflows (topics and actions), while the data indexing is done by Azure AI Search with semantic ranking in conjunction with Azure Open AI for embedded vectors.