Prologika Newsletter Winter 2015

Power BI and You


book1I’m excited to announce the availability of my latest (7th) book – Applied Microsoft Power BI! Currently, this is the only book on Power BI. The book page has more information about the book, including the front matter (with foreword by Jen Underwood), source code, and a sample chapter (Chapter 1 “Introduction to Power BI”). You can order the paper copy on Amazon, and soon on other popular retail channels. I predict that 2016 will be the year of Power BI and I hope that this book will help you to take the most out of it and bring your data to life! And if you’re looking for instructor-led training on Power BI and Microsoft BI, please check our training classes.

 


 

Let’s face it. Without supporting data, you are just another person with an opinion. But data is useless if you can’t derive knowledge from it. And this is where Power BI can help you. While writing the book and helping customers use Power BI, I’m astonished by its breath of features and the development momentum Microsoft has put behind it. The Power BI cloud service gets major features every week, while Power BI Desktop is updated every month! Although this makes it hard for people like me who are writing books, it’s a great value proposition for you.

Not to mention that Power BI has the best business model: most of it it’s free! Power BI Desktop and Power BI Mobile are free. Power BI Service is free and has a Power BI Pro subscription option that you could pay for, following a freemium model. Cost was the biggest hindrance of Power BI, and it’s now been turned around completely. You can’t beat free! In this newsletter, I’ll revisit how Power BI can benefit different users in your organization.

IMG_8221

Power BI for Business Users

To clarify the term, a business user is someone in your organization who is mostly interested in consuming BI artifacts, such as reports and dashboards. Business users can utilize Power BI to connect to popular cloud services, such as Salesforce, Marketo, Google Analytics, Dynamics CRM, and many more. With a few clicks, a business user can use content packs to connect to cloud data and gain insights from predefined reports and dashboards, and create custom reports. Other cloud-hosted providers build profitable businesses around SaaS cloud BI but Power BI does it for free!

With a few clicks, business users can analyze data from files and cubes without having to create data models. And they can also view Power BI dashboards and reports on mobile devices so they are always informed while they are on the go. Again, all of this for free!

Power BI for Data Analysts

A data analyst or BI analyst is a power user who has the skills and desire to create self-service data models. Leveraging the Microsoft’s prior investment in Power Pivot, Power View, and Power Query, Power BI lets business analysts import data form virtually everywhere and create sophisticated self-service models whose features are on a par with professional models and cubes. And now that we have a native support for many-to-many relationships, there shouldn’t be a requirement you can’t meet with Power BI.

As a data analyst you have a choice about your toolset because you can create models in both Excel or in Power BI Desktop. While other vendors charge hefty licensing fees for desktop modeling tools, Power BI Desktop is free and it gets updates every month! Think of Power BI Desktop as the unification of Power Pivot, Power Query, and Power View. Previously available as Excel add-ins, these tools now blend into a single flow. No more guessing which add-in to use and where to find it! Because many data analysts use R to data analysis and statistics, Power BI recently added support for R scripts and visualizing data using the R plotting capabilities.

Power BI for Pros

BI pros and IT pros have much to gain from Power BI. An IT pro can establish a trustworthy environment that promotes sharing of BI artifacts. To do so, IT can set up Power BI workspaces that allow authorized users to see the same BI content. If IT needs to distribute BI artifacts to a wider audience, such as the entire organization, she can create an organizational content pack and publish it to the Power BI Content Gallery. Then her coworkers can search, discover, and use the content pack. And IT can set up an organizational gateway to centralize and grant access to on-premises data.

The scenario that BI pros will probably be most excited about is hybrid BI solutions, where the report definitions (not data) is hosted in Power BI but corporate data remains in relational databases and cubes. This is a scenario that Prologika is planning for a cloud-averse Fortune 10 company in order to empower their users with mobile reports and dashboards. But that’s not all! BI pros can also implement predictive and real-time solutions that integrate with Power BI, and book has the details.

Power BI for Developers

Power BI has plenty to offer to developers as well because it’s built on an open and extensible architecture that embraces popular protocols and standards, such as REST, JSON, and oAuth. For years, Microsoft didn’t have a good solution for embedding interactive reports in custom apps. Power BI enables this scenario by allowing developers to embed dashboard tiles and interactive reports. Soon it will also support custom authentication.

Microsoft has also published the required “custom visuals”  interfaces to allow developers to implement and publish custom visuals using any of the JavaScript-based visualization frameworks, such as D3.js, WebGL, Canvas, or SVG. Do you need visualizations that Power BI doesn’t support to display data more effectively? With some coding wizardry, you can implement your own, such as the Sparkline visual that I published to the Power BI visuals gallery!

In summary, no matter what data visualization or data analytics requirements you have, Power BI should be at the forefront and you ought to evaluate its breath of features. Remember that Power BI is a part of a holistic vision that Microsoft has for delivering cloud and on-premises data analytics. When planning your on-premises BI solutions, consider the Microsoft public reporting roadmap. Keep in mind that you can use both Power BI (cloud-based data analytics) and the SQL Server box product on-premises to implement synergetic solutions that bring your data to life!

As you’d probably agree, the BI landscape is fast-moving and it might be overwhelming. If you need any help with planning and implementing your next-generation BI solution, don’t hesitate to contact me. As a Microsoft Gold Partner and premier BI firm, you can trust us to help you plan and implement your data analytics projects, and rest assured that you’ll get the best service.

Regards,

Teo Lachev

Teo Lachev
President and Owner
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

EVENTS & RESOURCES

Prologika: “Applied Microsoft Power BI Service” book by Teo Lachev
SQL Saturday BI: “What’s New for BI in SQL Server 2016” presentation by Teo Lachev and “Introduction to R” presentation by Neal Waterstreet on 1/9/2016
Atlanta BI Group: Power BI presentation by Patrick LeBlanc on 1/25/2016

 

Prologika Newsletter Fall 2015

Is ETL (E)ating (T)hou (L)ive?

etlBefore we get to the subject of this newsletter, I’m happy to announce the availability of my latest class – Applied Power BI Service. As you’ve probably heard by now, Power BI Service (or Power BI 2.0) is the Microsoft latest cloud-based analytics service with a simple promise: 5 seconds to sign up, 5 minutes to wow! If you’re ready to disrupt how your organization is analyzing data, please contact me to schedule this class at your convenience and get immediate value.


 

What Not To Do

Back to the newsletter subject, let’s start with a story that was inspired by true events as they say in the movies. Not a long time ago, a man wanted a nice and modern house for his family. He hired a couple of well-known builders but they didn’t deliver what the man really wanted. Then, the man hired a third builder who built him a great house (or a close approximation of the grand vision). Everyone was happy and they lived happily ever after…or at least until the man sold the house to another man.

The second owner had more pressing needs and another vision about the house. Not only the house had to accommodate his family but now the house had to entertain hordes of guests so it had to be expanded. But to cut down cost, the second man decided to maintain the house on his own or outsource whatever he can’t do to a cheap builder. The new owner put hastily new rooms and did other renovations as necessary. Expansion and new construction were his highest priorities and there was never time for proper maintenance or to reinforce the house infrastructure so that it can accommodate the new demands. Needless to say, not much time had passed until the infrastructure gave up. For example, it took days for clogged pipes to drain and guests were not happy. Did I mention the man sold his guests the sun and the moon?

What does this have to do with Extraction, Transformation, and Loading (ETL)? Data is rapidly growing nowadays while ETL processing windows are shrinking. You must do more with less. And, ETL usually becomes a performance bottleneck that stands in the way of your current and future BI initiatives

What To Do

How did the story end? The story didn’t end and it will never end. If you have a house, you can just focus on renovations and additions. You must also maintain it and you must budget for it. One day a member of the man’s family did something out of ordinary and the entire infrastructure collapsed. There wasn’t a way to find out why and the family was scurrying around trying to apply quick fixes. Finally, the second man hired hastily the original builder to assess the situation. Among other things that the builder did to resolve the crisis, he recommended changes and proactive maintenance along the following ten tenets:

  1. Parallelism – The chances are that you have an ETL framework that orchestrates package execution, log errors, etc. And, the chances are that the framework executes packages sequentially. With all the bandwidth modern servers have, there is no excuse if your framework doesn’t support parallel execution. That’s because many ETL tasks, such as ODS loads, loading dimensions and independent fact tables, can benefit greatly from parallel execution. For example, at Prologika we use ETL framework that supports a configurable number of parallelism. Once you configure which packages can run in parallel, the framework distributes the packages across parallel flows.
  2. Incremental extraction – If you have small data volumes, you might get away with fully loading the source data but most systems would require incremental extraction. Again, this is something the ETL framework is best suited to handle.
  3. Volume stats – ETL must log in important data volume metrics, such as number of rows extracted, inserted, updated, and deleted. It should also load how many days were processed since the last incremental extraction and additional context that might be useful for troubleshooting purposes, such as what parameters were passed to stored procedures.
  4. Targeted execution – I recommend you add a target execution duration for each package. Then, ETL will log in the actual duration so that you can detect performance deviations from the norm.
  5. Daily monitoring – I suggest you implement and publish a dashboard, such as using Excel Power Pivot, and monitor this dashboard daily. For example, the dashboard should include a Package Execution KPI that flags excessive executions in red based on the performance metrics you established in step 4.
  6. Regression analysis – Once things “normalize”, create an one-time Extended Events session (assuming SQL Server) to capture the query plans for all significant queries. If during daily monitoring you discover a performance deviation, run the session again focusing on that slow package and compare the query plan with the baseline. Analyze both query plans to find if and why they have changed. To make this easier, when SQL Server 2016 ships, consider upgrading to take advantage of the new Query Store feature.
  7. Cold data archiving – If you lots of source data, e.g. billions of rows, consider archiving historical data that no one cares about, such as by uploading to Azure Table storage.
  8. Project deployment – Consider upgrading to SSIS 2012 or above to benefit from its project deployment so that you can get task-level performance analysis in addition to easier development.
  9. Avoid locking – Use “SET TRANSACTION ISOLATION LEVEL READ UNCOMMITED” at the beginning of your stored procedures of freeform SQL to avoid read locks. I prefer using this statement instead of the NOLOCK hint for its brevity and so that I don’t miss a table.
  10. ELT pattern – I saved the best for last. I’m a big fan of the ELT pattern. I usually try to get out as fast as I can from the SSIS designer. Instead of transformations in the ETL data flow, consider ETL pattern for its performance and maintenance benefits. For more information about the ELT pattern, read my blog “3 Techniques to Save BI Implementation Effort.

As you’d probably agree the BI landscape is fast-moving and it might be overwhelming. As a Microsoft Gold Partner and premier BI firm, you can trust us to help you plan and implement your data analytics projects.

Regards,

Teo Lachev

Teo Lachev
President and Owner
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

EVENTS & RESOURCES

Prologika: Applied Power BI Service training by Prologika (online or instructor-led):
Atlanta BI Group: Enhancing Data Analysis and Predictive Analytics with NoSQL by Cornell A. Emile on September 28th
Atlanta BI Group: ETL Architecture Reusable Design Patterns and Best Practices by Stephen Davis on October 26th

Prologika Newsletter Summer 2015

What’s New for BI in Office 2016?

office

While you might be still trying to convince management to upgrade to Office 2013, the next version of MicrosoftOffice (Office 2016) is on the horizon and scheduled to be released sometime in the Fall. Meanwhile, you can download and test the Public Preview. Naturally, you might wonder what’s new for BI given that Excel is the Microsoft premium BI desktop tool. In this newsletter, I’ll share my favorite Excel 2016 BI features (most significant listed first).


 

Power View Connectivity to Multidimensional Cubes

I don’t know why it took more than two years for Microsoft to add this feature (initially released for Power View in SharePoint), but you will be finally able to use Power View in Excel 2016 to connect to OLAP cubes, just like we can use Power View in SharePoint as a front end to cubes. What this means to you is that you can preserve your OLAP investment by allowing business users to use Power View to build dashboardsand perform ad-hoc data analytics.

pv

Native Power Query Integration

If you follow the Power Query story, you know that it has an increasingly popular place in the Microsoft BI strategy. In fact, all data roads in Power BI 2.0 go through Power Query because of its integration with Power BI Designer. Besides not having to go through extra steps to download and install Power Query, the native integration allows you to manipulate programmatically Power Query objects using VBA, C#, or PowerShell code and record macros, as explained in more details in this document.

Continuing down the integration path, you can now easily enable all “Power” features from File > Options > Advanced > Turn on data analysis features. In the past you had to enable add-in separately using File- >Options >Add-Ins > Manage “Com Add-ins” > Go.

pq

Power Pivot Improvements

As you probably know, your business users can use Power Pivot to build sophisticated self-service BI models that are on a par with organizational models. Moving forward to Office 2016, the Power Pivot models will support many-to-many relationships. Unfortunately, this feature didn’t make it to the Office 2016 public preview but when released it should work in the same way as in the Power BI Designer. Continuing further down the list, you can now rename tables, columns and calculated fields/measures in the Power Pivot add-in and they will automatically propagate to Excel and existing reports. In another words, metadata changes are not breaking changes to existing reports anymore.

And it gets easier to define relationships. When you are building a Data Model PivotTable working with two or more tables with no relationships defined, you get a notification to run Automatic relationship detection. This will detect and create all relationships between the tables that are used for the Data Model PivotTable so that you don’t have to do any of the work yourself. DAX added many new functions which are documented here.

Usability Improvements

Another long anticipated feature made the cut. The Excel Field List added a Search feature to allow the end user to filter names of tables, columns, and measures. This works across all data structures: Power Pivot, Tabular, and Multidimensional.
Similar to native pivot tables, pivot reports connected to Power Pivot models now support Time grouping. For example, this feature allows the end user to group a report at a day level to any time interval, e.g. year and then quarter.

search

Similar to native pivot tables, pivot reports connected to Power Pivot models now support Time grouping. For example, this feature allows the end user to group a report at a day level to any time interval, e.g. year and then quarter.

grouping

Excel regular slicers now add a multi-select option so that the user doesn’t need to remember to hold down the Ctrl key to select multiple items.

slicer

Forecasting Functions

Recognizing the need for time series forecasting, Excel adds forecasting functions, which are documented here. For example, the Forecast.ETS function predicts a future value based on existing (historical) values by using the AAA version of the Exponential Smoothing (ETS) algorithm. In the example below, Forecast.ETS calculates the forecasted value for the date in cell O16, using the historical values in the range P5:P15 across the dates in the time series from the range O5:O15).

=FORECAST.ETS(O16, P5:P15, O5:O15)
The function also supports arguments also for confidence interval and seasonlity.

As you’d agree, the BI landscape is fast-moving and it might be overwhelming. As a Microsoft Gold Partner and premier BI firm, you can trust us to help you plan and implement your data analytics projects.

Regards,

Teo Lachev

Teo Lachev
President and Owner
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

EVENTS & RESOURCES

Atlanta BI Group: Microsoft and Revolution Analytics: What’s the Add-Value? by Dr. Mark Tabladillo on June 29th
Atlanta BI Group: Sentiment Analysis with Big Data & Machine Learning by Paco Gonzalez on August 31th

Prologika Newsletter Spring 2015


dw1A while back I met with a client that was considering overhauling their BI. They asked me if the traditional data warehousing still makes sense or should they consider a logical data warehouse, Big Data, or some other “modern variant”. This newsletter discusses where data warehousing is going and explains how different data architectures complement instead of compete with each other.

 


QUO VADIS DATA WAREHOUSE?

The following diagram illustrates the “classic” BI architecture consisting of source data, data staging, data warehouse, data model, and presentation layers (click to enlarge the image).

ar

Yes, not only is the data warehouse not obsolete, but it plays a central role in this architectural stack. Almost always data needs to be imported, cleansed and transformed before it can be analyzed. The ETL effort typically takes 60-80% percent of BI implementation effort. And, currently there isn’t a better way to store the transformed data than to have a central data repository, typically referred to as a “data warehouse”.

That’s because a data warehouse database is specifically designed for analytics.
Ideally, most or even all of dimensional data should be centrally managed by business users (data stewards) outside the data warehouse. However, the unfortunate truth is that not many companies invest in master data management. Instead, they kick the data quality can down the road and pay much higher data quality later but this is a topic for another newsletter. A well-thought architecture should also include a data model (semantic) layer, whose purpose and advantages I’ve outlined in my “Why Semantic Layer?” newsletter.

BIG DATA

Irrespective of the huge vendor propaganda surrounding Big Data, based on my experience most of you still tackle much smaller datasets (usually in the range of millions or billions at worst). This is not Big Data since a single (SMP) database server can accommodate such datasets. This is a good news for you because Big Data typically equates big headaches. For those of you who truly have Big Data, its implementation should complement, instead of replace, your data warehouse. Even though the advancements in the popular Big Data technologies are removing or mitigating some of the Big Data concerns, such as slow queries, these technologies are still “write-once, read many”, meaning that they are not designed for ETL and data changes.

Moreover, a core tenant of data warehousing is providing a user-friendly schema that supports ad-hoc reporting and self-service BI requirements. By contrast, BI Data is typically implemented as a “data lake” where data is simply parked without any transformation. For more information about Big Data and how it can fit into your BI architecture, read my Big Data newsletter.

LOGICAL DATA WAREHOUSE

Recently, logical data warehouses (LDW) have gained some traction and backing from vendors, including Composite (acquired by Cisco), Denado, and others. Logical data warehousing is also known as data federation and data virtualization. The idea is simple – consolidate and share the data in a controlled manner to all users and applications across the enterprise. Data is made available as virtual views on top of existing data sources, with additional features, such as discovery and caching. Again, the goal here is not to replace the traditional data warehouse, but make its data, plus the data from other systems, readily available for self-service BI and/or custom applications.

Logical data warehousing is at a very early stage of adoption. In my opinion, the companies that will benefit most of it are large organizations with many data repositories, where data availability is a major barrier for enabling self-service BI. If you believe that your organization might benefit from a Logical Data Warehouse, you might not need to make a major investment. If your organization has an Office 365 Power BI subscription, your first step could be leveraging the Power Query capabilities for dataset sharing and discovery. This process can work like this:

  1. Designated users promote virtual views in the form of Power Query queries to Office 365.
  2. A data steward verifies and approves these datasets.
  3. Once signed in to Office 365, other users can search, preview these queries, and import the associated data in self-service BI models.

The following snapshot shows how a business user can search and preview a published query that returns Product List data.

cl

One caveat is that these shared queries are currently Power Query-specific and they can only consumed by Microsoft self-service BI tools, which currently include Excel and Power BI Designer. I recommended to Microsoft to expose shared queries as ODATA feeds to enable additional integration scenarios.

SELF-SERVICE BI

Self-service BI allows power users to create their own data models by using specialized tools, such as Power Pivot, Tableau, QlikView, and so on. Despite what some “pure” self-service BI vendors proclaim, self-service BI is not a replacement for data warehousing. In fact, a data warehouse is often a prerequisite and a major enabler for self-service BI. That’s because a data warehouse provides consumers with clean and trusted data that can be further enriched with external data

As you’d agree, the BI landscape is fast-moving and it might be overwhelming. As a Microsoft Gold Partner and premier BI firm, you can trust us to help you plan and implement your data analytics projects.

Regards,

Teo Lachev

Teo Lachev
President and Owner
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

EVENTS & RESOURCES

PASS Data Analytics Conference, April 20-22, Santa Clara, California
Atlanta BI Group: Overview of R by Neal Waterstreet on March 30th
Friendlier Data Profiling with the SSIS Data Profiler Task by Julie Smith on April 27th
SQL Saturday Atlanta on May 16th

Prologika Newsletter Winter 2014


powerbiAfter an appetizer of embedded Power View reports , Microsoft proceeded to the main course that is a true Christmas gift – a pubic preview of what’s coming in its Power BI offering. For the lack of a better term, I’ll call it Power BI.NEXT. In this newsletter, you’ll see why Power BI.NEXT is much more than just an incremental release of the Office 365 Power BI, as you might have thought after reading the Microsoft announcement. As a participant in the private preview, I’m really excited about the new direction and the capabilities it opens.

 


WHY POWER BI.NEXT?

What do the United States of America and Microsoft BI have in common? They both decided to become independent for the welfare of their community. USA by choice. Microsoft BI by market conditions.

After more than a decade working with BI technologies and a variety of customers, I do believe that Microsoft BI is the best BI platform on the market. However, you and I both  know that it’s not perfect. One ongoing challenge is that various product groups have had a stake in Microsoft BI since Microsoft got serious about BI (circa 2004). Microsoft management promoted SharePoint as a platform for sharing BI artifacts. Major effort underwent to extend SharePoint with SSRS in SharePoint integration mode, PerformancePoint, Power Pivot, etc. But the hefty price tag of SharePoint and its complexity barred the adoption of BI on premises. Power BI for Office 365 alleviated some of these issues but many customers find its kitchen sink approach too overwhelming and cost-prohibitive if all they want is the ability to deploy BI artifacts to the cloud.

Similarly, seeking more market share and motivation for customers to upgrade, Excel added Power Pivot and Power View and was promoted as the Microsoft premium BI tool on the desktop. But adding new BI features to Microsoft Office turned out to be a difficult proposition. A case in point – it’s been two years since DAXMD got released and Power View in Excel 2013 still doesn’t support Multidimensional. In addition, because self-service BI was added to Excel later on, business users find it difficult to navigate the cornucopia of features. For example, there are at least three ways to import data in Excel (Excel native data import capabilities, Power Pivot, and Power Query).

Meanwhile, new self-service BI players entered the marketplace which was previously dominated by mega players. Although lacking technologically in their offerings, they gain in agility, simplicity, vendor neutrality, and aggressive pre-sales support. It appears that these market conditions caused a major shift in the Microsoft BI cloud vision that culminated in POWER BI.NEXT.

WHAT’S DIFFERENT?

In my opinion, the main difference between Power BI.NEXT and Power BI for Office 365 is that Power BI.NEXT removes the Excel, SharePoint, and Office 365 adoption barriers. On the desktop, it introduces the Power BI Designer – a new BI design tool that unifies Power Query, Power Pivot, Power View and Power Map in a single environment that runs completely outside Excel (Excel BI features are still available for Excel users). Power BI Designer targets business users willing to create self-service data models and dashboards. Powered by Power Query, the designer simplifies data acquisition from a plethora of data sources.

pd

Similar to Power Pivot, Power BI Designer imports data (it seems that the preview version always imports data) into the xVelocity in-memory store backed up by a file in a zip format for durable storage (*.pbix file extension). For reporting, it uses integrated Power View reports that now render by default in HTML5 (if you inspect the page source of a deployed dashboard you’ll see that it uses jquery, angular, and the rest of the mobile dev stack) and natural Q&A (first introduced in Power BI for Office 365).

Another welcome addition is new visualizations that have been demonstrated in many public events, including funnel charts, tree maps, gauges, and others. Although the designer still doesn’t support this feature, the deployed dashboards demonstrate navigation to other report views for more detailed analysis. Unfortunately, action-based drillthrough and conditional formatting are not supported at this point).

Yet another great feature is the ability to connect dashboards deployed to the cloud to on-premise organizational Tabular models (support for Multidimensional is expected at a later time). This is great news for Office 365 customers who aren’t willing to move their data to the cloud yet but are OK with having reports in Power BI. This is accomplished by installing a new bridge connector called Analysis Services Connector Preview that runs independently from the Power BI Data Management Gateway which you might have used to refresh Power Pivot models deployed to Power BI for Office 365.

Another indication of the management shift that reflects market realities is the initial focus on native mobile applications for iPad (already available on the Apple Store) and Android.

Last but not least, always catering to developers, Microsoft has big plans. The Power BI Developer page describes the initial REST APIs that are currently in preview. I’m particularly interested in these APIs that will allow us to “integrate rich analytics into your application by embedding Power BI dashboards, tiles and reports” so we don’t have to rely on third-party controls.

WHAT’S NOT TO LIKE?

Although I’m very excited about Power BI.NEXT, there are some important features lacking in the preview. Currently, it’s a cloud-only offering and you can’t deploy it on premise. I hope we don’t have to wait for another management shift for this to happen given that the majority of my customers are not ready for the cloud yet.

Continuing down the list, the preview customization capabilities of Power View are still limited. To meet dashboard requirements more effectively, we need at least basic customization, such as conditional formatting and action-based drillthrough (as supported in SSRS).

As I mentioned previously, the ability to embed reports on web pages, such as to provide customer-facing reporting, is a must too. I hope the new Power BI will be more agile and add these features soon. Meanwhile, give your feedback and vote for the features you need!
As you’d agree, the BI landscape is fast-moving and it might be overwhelming. As a Microsoft Gold Partner and premier BI firm, you can trust us to help you plan and implement your data analytics projects.

Regards,

Teo Lachev

Teo Lachev
President and Owner
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

EVENTS & RESOURCES

“Microsoft BI 2014 Review” presentation by Teo Lachev
Applied Excel and Analysis Services e-learning course by Teo Lachev
Atlanta BI Group: “Success in Business Intelligence requires Emotional Intelligence” by Javier Guillen on Jan 26th

Prologika Newsletter Fall 2014


aml

According to Gartner, one main data analytics trend this year and beyond will be predictive analytics. “Increasing competition, cost and regulatory pressures will motivate business leaders to adopt more prescriptive analytics, making business decisions smarter and more repeatable and reducing personnel costs”. Prologika has been helping our customers implement predictive analytics for revenue forecasting, outlier detection, and train data scientists in predictive analytics. This newsletter introduces you to Azure Machine Learning.


WHAT IS AZURE MACHINE LEARNING?

Microsoft unveiled the public preview of Azure Machine Learning (previously known as project Passau) in July 2014. Your humble correspondent has been participating in the private Preview Program to learn it firsthand and provide feedback. Microsoft Azure Machine Learning is a web-based service for cloud-based predictive analytics. You upload your data to the cloud, and then define datasets and workflows to create “experiments”. As you would recall, you can create organizational data mining models using the SQL Server Analysis Services and Excel data mining capabilities. Microsoft Azure Machine Learning to organizational DM models is what Power Pivot to Analysis Services is in a sense that it democratizes predictive analytics. It targets business users willing to create predictive models. The following figure shows a simple experiment to predict the customer’s likelihood to purchase a product.

bikebuyers

HOW DOES IT WORK?

The process of creating an experiment is simple:

  1. The user subscribes to the Azure Machine Learning service. For the pricing details of the public preview service, see this page (be aware that with all cloud-based offerings, everything is subject to change). Currently, the user is charged 38 cents per hour to use the service.
  2. The user defines the input data by uploading a file. Supported file formats include CSV, tab-delimited, plain text, Attribute Relationship File Format, Zip, and R Object or Workspace. In addition, Azure Machine Learning can connect to Azure cloud data sources, such as HDInsight or Azure SQL Server.
  3. The user creates an experiment which a fancy term for a workflow that defines the data transformations. For example, the user might want to filter the input dataset or use just a sample of the input data. In my case, I used the Split data transformation to divide the input dataset into two parts: one that is used for training the model, and the second one that is used for predictions.
  4. The user selects a Machine Learning algorithm(s) that will be used for predictions. In this case, I used the Two-Class Boosted Decision Tree algorithm. This is where users will need some guidance and training about which mining algorithm to use for the task at hand.
  5. The user trains the model at which point the model learns about the data and discovers patterns.
  6. The user uses a new dataset or enters data manually to predict (score) the model. For example, if the model was used to predict the customer probability to purchase a product from past sales history, a list of new customers can be used an input to find which customers are the most likely buyers. The Score Model task allows you to visualize the output and to publish it as a Web service that is automatically provisioned, load-balanced, and auto-scaled, so that you can implement business solutions that integrate with the Web service.

WHY SHOULD YOU CARE?

Implementing predictive solutions has been traditionally a difficult undertaking. Azure Machine Learning offers the following benefits to you BI initiatives:

  1. It simplifies predictive analytics by targeting business users.
  2. As a cloud-based offering, it doesn’t require any upfront investment in software and hardware.
  3. It promotes collaboration because an experiment can be shared among multiple people.
  4. Unlike SQL Server data mining, it supports workflows, similar to some high-end predictive products, such as SAS.
  5. It supports R. Hundreds of existing R modules can be directly used.
  6. It supports additional mining models that are not available in SQL Server Analysis Services and that came from Microsoft Research.
  7. It allows implementing automated business solutions that integrate with the Azure Machine Learning service.

 

As you’d agree, the BI landscape is fast-moving and it might be overwhelming. As a Microsoft Gold Partner and premier BI firm, you can trust us to help you plan and implement your data analytics projects.

Regards,

Teo Lachev

Teo Lachev
President and Owner
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Business Intelligence

EVENTS & RESOURCES

Atlanta BI Group: Atlanta BI Group: DAX 101 on September 29th
Atlanta BI Group: Atlanta BI Group: Predictive Analytics From Theory to Practice on October 27th

Prologika Newsletter Summer 2014


powerbiDuring the Day 1 keynote at the 2014 BA Conference, Microsoft committed to BI on your terms. They said that all new features will be eventually available both on premises and in the cloud. We’ve also learned that hybrid scenarios will be possible in near future, such as hosting Power View and SSRS reports in the cloud connected to on-prem data sources. Based on my coversations with customers, many people are confused which path (on-prem or cloud) to take. I hope this newsletter clarifies.

 

 


BI ON YOUR TERMS

The following table should help you understand how the Microsoft on-prem and cloud BI offerings compare at a high level (click to enlarge the image). Choosing between on-prem and cloud BI is essentially a decision about where SharePoint would be installed.

options

Self-service and Team BI

Microsoft markets Power BI as “self-service BI with the familiarity of Office and the power of the cloud”. It’s important to clarify however that the self-service BI tools (Power Pivot, Power View, Power Query, and Power Map) that are shown in green in the diagram below are implemented as add-ins to Excel and thus are available to no additional cost for Excel users.

bi

In other words, you don’t need an Office 365 or Power BI license to use these tools. You can purchase Office from a retailer, such as Amazon, and have all of the Excel “power” components:
Power Pivot: Integrated with Excel 2013 (needs to be downloaded for Excel 2010)
Power Query: Needs to be downloaded for both Excel 2010 and 2013
Power Map: Needs to be downloaded for Excel 2013 (doesn’t support Excel 2010).
Power View: Integrated with Excel 2013 (not available for Excel 2010
but available in SharePoint Server integrated with SQL Server 2012)

The orange box on the right is the SharePoint cloud infrastructure that enables team BI for exploring and sharing BI artifacts. You can think of it as an upgrade to Office 365 that is available on a subscription basis. For example, while Office 365 (Midsize Business, E3, E4) allows Excel workbooks with pivot and Power View reports to be rendered online, it limits the workbook size currently to 30 MB (a more detailed Office 365 feature breakdown is available online).By contrast, Power BI supports models up to 250 MB in size and on-prem SharePoint supports file sizes up to 2 GB in size. Note that the cloud data size limits are likely to change upwards. So, the model size is one factor to consider when planning your team BI environment.

Office Click-to-Run

Most corporate Office installations are performed by downloading and running the MSI installer. The MSI setup is a perpetual one (you pay for a version once and you’re entitled to fixes for it). If you have an Office 365 subscription, you can install the Click-To-Run version of Office. This is a subscription-based Office 365 setup. You continuously pay for using the software and you’re entitled to fixes and the latest features within the O365 plan you’re subscribed to. I wrote more about the Office C2R setup in this blog.

Organizational BI

At this point, both Office 365 and Power BI are all about self-service and team BI delivered with Excel workbooks that embed Power Pivot modes. The typical scenario is that a power user would create a BI model and reports on the desktop and share these artifacts by deploying to the cloud. The rest of the BI tools that are typically used for organizational BI (SSIS, SSRS, SSAS, PerformancePoint Services) are not yet available in the cloud.

As I mentioned in the beginning, at the BA conference Microsoft said that they plan to host SSRS and Power View in the cloud in near future to enable hybrid scenarios. Meanwhile, the only way to build a MS cloud-based organizational solutions would be to deploy them to Azure virtual machines, with the exception of SQL Server Database Engine and Reporting Services which as available as Platform as a Service (PaaS) Azure offerings.

Additional Features

As an upgrade to Office 365, Power BI has additional features that are currently only available in Power BI, such as Q&A, data stewardship, and so on. I discussed these features in more details in my previous newsletter. Microsoft plans to release new BI features first to Power BI and then to on-prem SQL Server and SharePoint. An example of such a feature is Power View Time Series Forecasting which just got released to Power BI.

 

As you’d agree, the BI landscape is fast-moving and it might be overwhelming. As a Microsoft Gold Partner and premier BI firm, you can trust us to help you plan and implement your BI projects.

Regards,

Teo Lachev

Teo Lachev
President and Owner
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Business Intelligence

EVENTS & RESOURCES

Atlanta BI Group: Microsoft BI for Financial Reporting by Justin Stephens on June 30th
Atlanta BI Group: Integrating Data Mining by Mark Tabladillo on July 28th
Atlanta BI Group: Making Your SSIS Fly Beyond the Sky by Carlos Rodriguez on August 25th

Prologika Newsletter Spring 2014


powerbi
I hope you are enjoying the beginning of the spring. The focus of this newsletter is self-service BI with Microsoft Power BI. But before I start, I strongly encourage those of you in the Southeast USA to register and attend the SQL Saturday Atlanta event which will be held this year on May 3rd. We had a record attendance last year with some 570 people attending SQL Saturday Atlanta! You can also sign up for the preconference sessions on Friday, May 2nd, one of which will be delivered by your humble correspondent.


POWER BI

Nowadays, every organization has some sort of a self-service BI tooling, ranging from Excel spreadsheets and Access databases to sophisticated self-service data models implemented by power users. Microsoft self-service BI is centered around Excel with PowerPivot on the desktop and SharePoint on the server. SharePoint does offer a wealth of features that you might find appealing, as I’ve recently discussed in my “Is SharePoint Overkill for BI” post. However, Microsoft self-service BI has been traditionally criticized for its deployment complexity and steep price tag.

To mitigate these concerns, Microsoft introduced a few years ago Office 365 (E3 or E4 plan required) and SharePoint Online (Plan 2 required) to allow you to deploy self-service BI artifacts created with Power Pivot and Power View to the cloud. Recently, Microsoft unveiled Power BI as a next step in its BI cloud strategy. Power BI is essentially a cloud combo of all the self-service “power” tools (Power Pivot, Power View, Power Query, and Power Map) available on a subscription basis and running on Microsoft Azure.

What’s to like?

Here are some reasons why you should consider Power BI:

Avoid SharePoint on-premises installation and licensing
Microsoft maintains and scales SharePoint, not your IT department. Instead of paying on-premise SharePoint Server and SQL Server licenses, you pay a subscription price per user using one of these three pricing options.

Get new features
Some self-service BI features are currently only available in Power BI. The most important ones are natural queries (aka Q&A), Power View mobile support, sharing PowerPivot queries, and great Power Map geospatial visualizations.

Always up to date
As I’ve explained recently in my “The Office Click-To-Run Setup” post, Office 365 and Power BI subscribers can install the Office Click-To-Run setup and receive new features continuously.

qa

Cautions

Besides the usual concerns about security and data ownership associated with a cloud deployment, here are some additional considerations to keep in mind:

Self-service BI only
Unlike Azure virtual machines, Power BI can’t be configured as an extension to your Active Directory domain. Therefore, you can’t deploy Power View reports connected to organizational Multidimensional or Tabular models. However, Microsoft has implemented a data management gateway that you can install on premises to schedule your cloud Power Pivot data models for automatic refresh from your on-premise data sources.

Compatibility issues
If you decide to install Office Click-To-Run, your Office installation will be updated periodically while your on-premise SharePoint won’t. Consequently, you might not be able to deploy Power Pivot models that have the latest features to your on-premise SharePoint instance. In other words, if you use the Office Click-to-Run setup you should commit to deploying to Power BI to share data models and reports with other users.

Size limitations
Currently, Power BI limits your models to 250 MB in size. However, because of the Power Pivot efficient data compression, you should be able to pack millions of rows in your model.

I believe that cloud BI has come out of age and the Power BI ease of deployment and rich assortment of features should warrant your attention. To start a trial or learn more the Power BI capabilities and visualizations, visit Power BI.

 

As you’d agree, the BI landscape is fast-moving and it might be overwhelming. As a Microsoft Gold Partner and premier BI firm, you can trust us to help you plan and implement your BI projects.

Regards,

Teo Lachev

Teo Lachev
President and Owner
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Business Intelligence

EVENTS & RESOURCES

Atlanta BI Group: Custom ETL Frameworks with SSIS 2012 by Aneel Ismaily on 3/31
Atlanta BI Group: Power BI by Brian Jackson on 4/28
SQL Saturday Atlanta: A full day of SQL and BI on 5/3

Prologika Newsletter Winter 2013


sql2014As I’m enjoying my vacation in tropical Cancun, Mexico, I hope you’ve been enjoying the holidays and planning your BI initiatives in 2014. Speaking of planning, you may have heard that Microsoft is working hard on next release of SQL Server, version 2014, which is expected in the first half of next year. Naturally, you may wonder what’s new in the forthcoming release in the BI arena. If so, keep on reading.

 


Just when you thought that you can finally catch your breath after the SQL Server 2012 and Office 2013 waves, SQL Server 2014 is looming on the horizon. Relax, version 2014 will be a database-focused release with no major changes to any of BI services: SSAS, SSRS, SSIS, PowerPivot, MDS and DQS. However, there are a couple of important enhancements that you might find more or less relevant to your BI projects. If they sound interesting, I encourage you to download and install the second Community Technology Preview (CTP2) of SQL Server 2014 so you can try and plan for the new features. CTP2 is a feature-complete build and new features are expected in the release bits.

Memory-optimized Tables

Previously code-named Hekaton, this new in-memory storage option allows you to host SQL Server tables in memory in order to improve I/O. Interestingly, Microsoft is the only vendor that ships the in-memory technology in the box without charging more for it. Moreover, this new enhancement doesn’t require new skills as it integrates seamlessly with SQL Server. It really is an enhancement. While predominantly targeting OLTP applications, I believe memory-optimized could be useful for narrow-case BI scenarios as well, such as for improving the performance of the ETL processes by using in-memory staging tables.

For example, ETL might require complex transformations with heavy I/O as a part of the nightly data warehouse load. With a few minor schema changes, you could configure the staging table in memory. Just by doing so you will gain significant performance improvements. How much improvementt? Results will vary depending on the ETL specifics of course. According to Microsoft, many applications see very significant performance gains, on average 10X, with results up to 30X. Much like optimizing hardware configurations, the results you get vary significantly depending on how much effort you put into the project. You can do a very simplistic, and low cost project, defining hot tables as memory optimized, and doing no other changes, or you can rewrite stored procedures to be natively compiled, investing more effort, and resulting in much better results. Here are additional resources to get you started with this great technology:

Server 2014 In-Memory Technology Blog Series Introduction
Getting Started with SQL Server 2014 In-Memory OLTP
In-Memory OLTP: Q & A Myths and Realities
Architectural Overview of SQL Server 2014’s In-Memory OLTP Technology

Clustered Columnstore Indexes

Introduced in SQL Server 2012, columnstore indexes were intended to speed up typical BI-type queries that aggregate many rows but request only a few columns. As we reported in this Microsoft case study “Records Management Firm Saves $1 Million, Gains Faster Data Access with Microsoft BI”, with columnstore indexes resource-intensive ETL queries achieved 10X performance gains. Specifically, their execution time dropped from about 20 minutes to less than 2 minutes.

On the downside, columnstore indexes were not updatable in SQL Server 2012, requiring ETL processes to drop and recreate the index after the load was completed. That wasn’t a huge issue for this specific project because creating the index on a fact table with 1.5 billion rows took about 20 minutes. Even better, in SQL Server 2014, Microsoft introduces clustered columnstore indexes and they will be updatable so you don’t have to drop the index. To gain the most from columnstore indexes, make sure that the queries that use them execute in a batch mode and not in row by row mode.

For more information about columnstore index internals and improvements in 2014, read the excellent “Enhancements to SQL Server Column Stores” whitepaper by Microsoft.

 

As you know, the BI landscape is fast-moving and it might be overwhelming. If you have Software Assurance benefits, don’t forget that as a Microsoft Gold Partner and premier BI firm, Prologika can use your SA vouchers and work with you to help you plan and implement your BI initiatives, including:

  • Analyze your current environment and determine a vision for a BI solution
  • Define a plan to launch a BI reporting and analysis solution
  • Upgrading or migrating to SQL Server 2012

Regards,

Teo Lachev

Teo Lachev
President and Owner
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Business Intelligence

EVENTS & RESOURCES

Atlanta BI Group: Bus Matrix – the foundation of your Dimensional Data Model by Bill Anton
Atlanta BI Group: Data Warehousing by Lance England
SQL Saturday Atlanta: May 3rd, 2014

Prologika Newsletter Fall 2013


bismI hope you had a great summer. The chances are that your organization has a centralized data repository, such as ODS or a data warehouse, but you might not use it to the fullest. Do you want a single version of truth? Of course, you do. Do you want to empower your business users to create their own reports and offload reporting effort from IT? I bet this is one of your top requirements. A semantic layer could help you achieve these objectives and much more… In this newsletter, I’ll discuss the importance of having a semantic layer that bridges users and data.

 

 


WHY SEMANTIC LAYER?

A semantic layer sits between the data and presentation layers. In the world of Microsoft BI, its role is fulfilled by the Business Intelligence Semantic Model (BISM) which encompasses the Multidimensional and Tabular implementation paths. The first chapter (you can download it from the book page) of my latest book “Applied Microsoft SQL Server 2012 Analysis Services (Tabular Modeling)” explains this in more details.

bismd

Here are 10 reasons why you should have a semantic layer:

  1. Abstraction
    In general, semantics relates to discovering the meaning of the message behind the words. In the context of data and BI, semantics represents the user’s perspective of data: how the end user views the data to derive knowledge from it. Before we expose data to end users, we need to translate the machine-friendly database structures and terminology into a user-friendly semantic layer that describes the business problems to be solved. To address this need, you create a semantic layer.
  2. Reducing reporting effort
    Based on my experience, the typical IT department is caught in a never-ending struggle to create operational reports. A semantic layer is beneficial because it empowers business users to create their own ad hoc reports. For example, one of the nice features of Analysis Services is that the entity relationships become embedded in the model. So, end users don’t have to know how to relate the Product to Sales entities. They just select which fields they want on the report (Excel, Power View) and the model knows how to relate and aggregate data.
  3. Eliminating “spreadmarts”
    It seems that all the end users want is Excel data dumps so they can create pivot reports to slice and dice data. In many cases, self-service BI is actually a polished and extended version of the “spreadmart” trend. If this bothers you (and it should), the only way to minimize and eliminate this issue is to have a semantic layer that allows business users to analyze the data from any subject area without having to export the data. Moreover, a semantic layer is a true self-service BI enabler because it allows the users to focus on data analysis instead of data preparation and model creation – skills that many users lack and might struggle with.
  4. Performance
    The semantic layer should provide excellent performance when aggregating massive amounts of data. For example, in a real-life project we are able to achieve delivering operational reports within milliseconds that require aggregating a billion rows. Try to do that with relational reporting, especially when you need more involved calculations, such as YTD, QTD, parallel period, etc.
  5. Reducing cost
    Do you have biggish data and struggle with report performance? Related to the previous point, having a semantic layer might save you millions of dollars to overcome performance limitations (to a point) by purchasing MPP systems, such as Teradata or Netezza.
  6. Centralizing business logic
    The unfortunate reality that we’re facing quite often is that many important business metrics end up being defined and redefined either in complex SQL code or reports. This presents maintenance, implementation, and testing challenges. Instead, you can encapsulate metrics and KPIs where they belong – in your semantic model so they can be reused across reports and dashboards.
  7. Rich client support
    Many reporting tools are designed to integrate and support popular semantic models. For example, Microsoft provides Excel on the desktop and the SharePoint-based Power View tool that allows business users to create their own reports. Do you like a specific third-party reporting tool? The chances are that it supports Analysis Services.
  8. Security
    How much time do you spend implementing custom security frameworks for authorizing users to access data they are allowed to see on reports? Moving to Analysis Services, you’ll find that the model can apply security on connect. I wrote more about this in my article “Protect UDM with Dimension Data Security”.
  9. Isolation
    Because a semantic layer sits on top of the relational database, it could provide a natural separation between reports and data. For example, assuming distributed deployment, a long-running ETL job in the database won’t impact the performance of the reports serviced by Analysis Services.
  10. Additional BI capabilities
    A semantic layer can futher enrich your BI solution with appealing features, such as actions, KPIs, and predictive analytics.

HOW DO YOU CHOOSE SEMANTIC LAYER?

Different vendors have different ideology and implementations of semantic layers. Some, such as Oracle and MicroStrategy, implement the semantic layer as a thin, pass-through layer whose main goal is to centralize access to data sources, metadata and business calculations in a single place. For example, a Fortune 100 company that had a major investment in Oracle engaged us to recommend a solution for complementing their OBIEE semantic layer with user-friendly reporting tool that would allow end users to create ad hoc transactional reports from their Oracle Exadata-based enterprise data warehouse. In the OBIEE world, the role of the semantic layer is fulfilled by the Common Enterprise Information Model that encompasses data sources, metrics, calculations, definitions, and hierarchies. When users submit report queries, the Oracle BI Server compiles incoming requests and auto-generates native queries against the data sources plugged in the Common Enterprise Information Model. In theory, this “thin” implementation should work pretty well. In reality, several issues surface:

Performance
Any tool, either a reporting tool or semantic layer, that auto-generates queries should be highly suspect of performance issues. First, no matter how well the tool is designed, it’s unlikely that it would generate efficient queries in all cases. Second, because the tool sends the query to the underlying data source, the overall report performance is determined by how fast the data source crunches data. In this case, the organization was facing performance issues with data aggregation. To circumvent them, their BI developers have implemented summarized fact tables. Once the user reaches the lowest level of the summary fact table, OBIEE would allow the user to drill down to a chained table, such as the actual fact table. Needless to say, business users were complaining that analysis was limited to the dimensionality of the summarized fact table. Further, to avoid severe performance issues with historical reports, the BI developers have taken the “no big aggregations, no problem” approach by forcing a time filter on each request that touches the large fact tables. This effectively precluded historical analysis by year, quarter, etc.

Scalability
When a query involves multiple data source, OBIEE would send a query to each data source, load the data in memory, and then join the datasets together to find matching data. This also presents performance and scalability challenges. To solve performance and scalability issues with “thin” semantic layers and biggish data, organizations ultimately resort to expensive back-end MPP systems, such as Teradata or Netezza, which can parallelize data crunching and compensate for inefficient auto-generated queries. But this is a costly solution!

Common denominator
You’re limited you to a subset of supported features of the underlying data source. Such an architecture ultimately locks you in the data sources supported by the vendor. What happens when there is a new version of the data source that has some cool new features? Well, you need to wait until the vendor officially supports.

The big difference between Microsoft and other vendors in terms of semantic layer implementation is that by default both Multidimensional and Tabular store data on the server. This proprietary storage is designed for fast data crunching even with billions of rows. You pay a price for loading and refreshing the data but you enjoy great query performance with historical and trend reports. And, you get a great ROI because you may not need an expensive MPP appliance or excessive data maintenance. When sent to BISM, queries are served by BISM, and not by the RDBMS. No need for summarized tables and time filters. the new version and its extensions

As you know, the BI landscape is fast-moving and it might be overwhelming. If you have Software Assurance benefits, don’t forget that as a Microsoft Gold Partner and premier BI firm, Prologika can use your SA vouchers and work with you to help you plan and implement your BI initiatives, including:

  • Analyze your current environment and determine a vision for a BI solution
  • Define a plan to launch a BI reporting and analysis solution
  • Upgrading or migrating to SQL Server 2012

Regards,

Teo Lachev

Teo Lachev
President and Owner
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Business Intelligence

EVENTS & RESOURCES

Atlanta BI Group: Dimensional Modeling 101 by Julie Smith on September 30th presentation
Atlanta BI Group: Head-to-Head on Maps by Dave Tangren on October 28th
SQL Saturday Charlotte: Best Practices for Implementing Enterprise BI Solution presentation by Teo Lachev on October 19th