Sentiment Analysis with Power BI

A recent ask from an airline company was to perform sentiment analysis on comments in surveys collected from their customers. Sentiment analysis is a machine learning task that requires natural language processing.

In Power BI, we have at least two ways to approach this requirement: Cognitive Services and custom code, such as by using the Python Natural Language Toolkit (NLTK).

This post compares the pros and cons of each option based on my impressions so far.

Cognitive ServicesPython
LicensingIncluded in premium or embedded capacity

or, provisioned separately with Azure subscription with Power BI Pro

Freely available
ProvisioningAlready provisioned with premium or embedded (need to enable AI workloads)Install Python

Install pandas, matplotlib, and nltk packages

Language detectionYesNo
Data refreshNo gateway requiredPersonal gateway required
Enhanced dataset metadataSupportedNot supported

Cognitive Services

Cognitive Services is an Azure PaaS cloud service that supports text analytics and image recognition. It’s automatically included in Power BI Premium or Embedded capacities (make sure that AI workloads are enabled in the capacity settings). If you organization doesn’t have Power BI Premium or Embedded, you can provision Cognitive Services in Azure (requires an Azure subscription) and then write a custom Power Query function to invoke its APIs, as demonstrated by this tutorial. If you provision Cognitive Services outside Power BI Premium,  you’ll be charged per transaction. In the case of Power BI, the number of transactions equates to the number of rows in your table. So, if you refresh five times a table with 1,000 rows and calculate the sentiment polarity score for each row, you’ll be charged for 5,000 transactions.

You can integrate Power BI with Cognitive Services in a Power BI dataflow or within Power Query in Power BI Desktop. The latter option requires specifying a premium or embedded capacity if you want to go code-free and use the Text Analytics feature (Home ribbon in Power Query). Otherwise, you must write M code as the above tutorial shows.

One CS feature that proved very useful is the automatic language detection. In my case, I had comments in different languages. When each row is processed, Power BI will send a “transaction” to Cognitive Services. If you leave the second parameter (language) of the API call to null, Cognitive Services will try to detect it on its own!

Refreshing data and rescoring do not require a Power BI gateway because Cognitive Services is a cloud service.

Python

When budget is tight or you can’t get help from IT to provision Cognitive Services, Python might come to the rescue. The main advantage of this option is that is free. But, you need at least a few lines of Python code (or much more if English is not the only language you need to support), as this article demonstrates. You must install Python (TIP: install it from python.org as Anaconda doesn’t work with Python scripts since there isn’t way to start the Anaconda environment before the script runs), configure Power BI for Python scripting, and install pandas, matplotlib, and nltk packages . A great feature of Power Query is that you can add a Python transformation that can call the Python script inside your Power Query transformation steps.

As far as I could tell, handling multiple languages is not an easy task with Python NLTK. You can easily detect the language, but there are no built-in dictionaries for any other language than English when performing sentiment analysis. In addition, when you publish your Power BI Desktop file with Python transformations, you need to set up a gateway. The enterprise gateway doesn’t support Python scripts so you must install a personal gateway on the machine that was used to develop the Power BI Desktop file.

As a last caveat, note that Power BI Desktop “Enhanced Dataset Metadata” feature (currently in preview) doesn’t support R and Python scripts yet. So, if Power Query Preview Pane works but you get an error when importing your data in Power BI Desktop, you’ve probably turned this feature on. To resolve, turn it off and then create a new Power BI Desktop file.