-
Organizational Semantic Model
September 6, 2020 / No Comments »
I am delivering a data governance assessment for an enterprise client. As a part of the effort to migrate reporting from MicroStrategy to Power BI, the client wants to improve data analytics. The gap analysis interviews with the business leaders revealed common pitfalls: no single version of truth, data is hard to come by, business users don't know what data sources exist, business users spend more time in data wrangling than analytics, data quality is bad, IT is overwhelmed with report requests, report proliferation and duplication, and so on... Sounds familiar? As I mentioned many times in my blog, an enterprise data warehouse (EDW) plays a critical role in overcoming the above challenges, but it's not enough. A semantic model is needed and I extolled its virtues in my "Why Semantic Layer?" newsletter. In the Microsoft BI world, Analysis Services Tabular is commonly used to implement such models that are...
-
Section Hiked A.T. in Georgia
September 5, 2020 / No Comments »
My wife and I started section hiking the Appalachian Trail during weekends to escape the summer heat and the virus. A.T. runs for 2,200 miles from Georgia to Maine, with 78.6 miles in Georgia. Today we finished the Georgia part and entered North Carolina. We actually covered twice the distance (averaging 10-12 miles per section) because we had to come back each time to where we parked. We started hiking with the great Atlanta Outdoor Club back in February, 2020. But when the virus hit, group hikes were put on hold, so we were left to our own devices. Hiking somehow grew on me. Perhaps, because it as a metaphor for life. There are ups and downs. Some sections are hard and require a great deal of effort and perspiration, while others are easy. There are exhilarating views but there are also areas with overgrown vegetation. Perceived risks, such as...
-
Azure Synapse: The Good, The Bad, and The Ugly
August 23, 2020 / No Comments »
Cloud deployments are the norm nowadays for new software projects, including BI. And Azure Synapse shows a great potential for modern cloud-based data analytics. Here are some high-level pros and cons to keep in mind for implementing Azure Synapse-centered solutions that I harvested from my real-life projects and workshops. The Good There is plenty to like in Azure Synapse which is the evaluation of Azure SQL DW. If you're tasked to implement a cloud-based data warehouse, you have a choice among three Azure SQL Server-based PaaS offerings, including Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse. In a nutshell, Azure SQL Database and Azure SQL MI are optimized for OLTP workloads. For example, they have full logging enabled and replicate each transaction across replicas. Full logging is usually a no-no for decent size DW workloads because of the massive ETL changes involved. In addition, to achieve good performance,...
-
Uploading Files to ADLS Gen2 with Python and Service Principal Authentication
August 13, 2020 / No Comments »
I had an integration challenge recently. I set up Azure Data Lake Storage for a client and one of their customers want to use Python to automate the file upload from MacOS (yep, it must be Mac). They found the command line azcopy not to be automatable enough. So, I whipped the following Python code out. I configured service principal authentication to restrict access to a specific blob container instead of using Shared Access Policies which require PowerShell configuration with Gen 2. The comments below should be sufficient to understand the code. ###install dependencies # install Azure CLI https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest #pip install azure-identity #pip install azure-storage-blob # upgrade or install pywin32 to build 282 to avoid error "DLL load failed: %1 is not a valid Win32 application" while importing azure.identity # pip install pywin32 –upgrade # IMPORTANT! set the four environment (bash) variables as per https://docs.microsoft.com/en-us/azure/developer/python/configure-local-development-environment?tabs=cmd # Note that AZURE_SUBSCRIPTION_ID is enclosed with double quotes while...
-
Presenting Analytics in a Day Workshop on August 20th
August 11, 2020 / No Comments »
In partnership with Microsoft, I'm delivering a complimentary, one-day, Analytics in a Day virtual workshop on August 20th, 9 AM – 5 PM Eastern Time. Targeting BI developers, architects and technology decision makers interested in achieving a single version of truth with organizational BI, this workshop is designed to guide and accelerate your journey towards a modern data warehouse to power your business with Azure Synapse, Azure Data Factory, Azure Data Lake, and Power BI. The first half of the day from 9 am – 1 pm will help you better understand how to: Create an analytics solution that goes from data ingestion to insights using Azure Synapse Analytics and Power BI Empower self-service analytics Enable a truly data-driven culture in your business Part of the workshop will be dedicated to hands-on training to help you get started on your cloud analytics journey. The second half of the day from...
-
Discipline at the Core, Flexibility at the Edge
August 6, 2020 / No Comments »
I'm preparing to teach the brand new Analytics in a Day course by Microsoft. This course emphasizes the business value and technical fundamentals for implementing a modern cloud DW using Azure Synapse, ADF, Data Lake, and Power BI. The second half of the class is focused on Power BI and its role for creating organizational semantic models and self-service models from Synapse. I liked the best practices that Microsoft shares based on how they've adopted BI over years and challenges they faced with self-service BI, including: Inconsistent data definitions, hierarchies, metrics, KPIs Analysts spending 75% of their time collection and compiling data 78% of reports being creating in "offline environments" Over 350 centralized finance tools and systems Approximately $30M annual spend on "shadow applications" Indeed, many vendors tout only self-service BI which can quickly lead to chaos. By contrast, I have found that most successful data-driven organizations have both organizational...
-
Atlanta MS BI and Power BI Group Meeting on August 3rd
July 30, 2020 / No Comments »
Our group celebrates its 10 anniversary! Please join us online for the next Atlanta MS BI and Power BI Group meeting on Monday, August 3rd, at 6:30 PM. Chris Hamill from the Power BI CAT team will share techniques on creating performant reports without sacrificing design. For more details, visit our group page and don't forget to RSVP (fill in the RSVP survey if you're planning to attend). Presentation: Power BI Report Design Techniques for Performance Date: August 3rd, 2020 Time 6:30 – 8:30 PM ET Place: Join Microsoft Teams Meeting Learn more about Teams | Meeting options Overview: As you have likely observed, a performant model with optimized DAX can suffer greatly if the front-end design is too heavy. Front end report developers are often challenged with balancing performance and the richness of user requirements. Chris Hamill from Power BI CAT team will share techniques on creating performant reports without sacrificing...
-
Why You Need a Trusted Advisor
July 24, 2020 / No Comments »
I've providing advisory services to a Fortune 500 organization for a few months now. As all large organizations, they adopted Power BI Premium. However, they have provisioned only one Power BI Premium P1 node which has been showing signs for overutilization. In the process, I discovered they have purchased 40 Power BI Premium cores with 32 cores left unutilized! In other words, they used 1/5 of what they've been paying Microsoft as Power BI Premium fees. How did they arrive at this unfortunate situation? A year or so ago, they used the Power BI Premium Calculator to estimate the licensing cost on their own. They plugged in 10,000 users and got a recommendation for 5 P1 nodes (or 40 cores). And that's what they bought, assuming that they will get a cluster of five P1 nodes that would load balance the reports across nodes. When they set up Power BI...
-
Conquering Time Zones
July 16, 2020 / No Comments »
A client has Power BI models connected to Dynamics Online. Dynamics stores all dates in UTC instead of keeping the time offset, such as 7/14/2020 1:21:29 AM +00:00. Naturally, the users want to see dates localized to the US Eastern Time zone. Easy, right? Use the Power Query ToLocal time transformation (in the Transform ribbon, expand Time, and then click To Local) to offset with the desired number of hours. But there are a few issues with this approach: It will work fine in Power BI Desktop (assuming you are on the Eastern Time zone because Power BI will pick your computer settings), but it won't work when you publish the dataset to powerbi.com. The reason for this is that Microsoft sets the time zone of Azure servers to UTC irrespective of the geo location of the Power BI data center. So, when Power BI Service refreshes the dataset, it...
-
Designing Responsive Power BI Reports (Part 2)
July 6, 2020 / No Comments »
In my previous post about responsive reports, I said that changes to the report design can dramatically reduce the report load time. But you will undoubtedly reach a point where you cannot optimize the report any further. After all, having a page with a single visual might be fast, but I doubt it would deliver much business value. What else can be done to speed up the reports and ideally achieve a page load time of a few seconds? Our next stop for that project is to revisit the solution architecture. The client initially opted for a hybrid architecture where the reports use live connection to on-prem SSAS Tabular models. However, a significant time was spent in Power BI showing "loading data" before rendering the report. What took so long? In this case, Power BI was hosted in the Azure West region, the Tabular server in the company data center...