Beware Fast Columnar Databases

A large company uses the SAP HANA ERP system. Users requires real-time access to transactional data. To avoid performance degradation, SLT replication (trigger-based change data capture) replicates data to another SAP HANA system that is used solely for reporting. The problem is that the more detailed the report gets and the more columns it has, the slower it gets and SAP HANA throws out of memory exceptions.

SAP HANA is an in-memory columnar database like Tabular. So, it stores data in columns, not rows. Columnar databases are primarily designed for analytical reports which typically have a few columns (sales by customer, product, date), but can potentially aggregate large datasets. As the reporting grain lowers and more columns are added (order number, order line item, customer name, phone number, etc.), a columnar database has to cross-join more and more columns. This is not efficient and performance quickly degrades irrespective that storage is fast. SSAS Tabular and Power BI are no different. SAP HANA complicates the issue further by preventing direct access to tables and requiring “analytical” views that join tables and potentially nest other views.

So, what’s the solution? Use the right tool for the job. A relational database is designed to transactional reports and it’s very efficient for joining tables together on indexed columns. In this case, performance and user satisfaction would probably be much better if SAP HANA replicates to an SQL Server database instead to a columnar database. Use columnar databases for analytical reports. Every technology has limits and “super-fast” in-memory columnar databases are no exception. Resist the vendor propaganda.

Prologika Newsletter Summer 2018

Microsoft Common Data Services

BI and data integration projects often benefit from an operational data source (ODS), whose benefits and design I discussed in my “Designing an Operational Data Store (ODS)” newsletter. A corporate ODS typically fall into the organizational BI area, which means that it’s implemented and sanctioned by IT. Wouldn’t be nice to let Business stage the data needed for business applications and analytics? Of course, it would! Think of the Microsoft Common Data Services as a cloud staging database or ODS by Business and for Business. But before I discuss the details, makes sure to review our Terms of Use, which has been updated as part of our commitment to transparency and to address the requirements set forth by the new European privacy law (General Data Protection Regulation). By continuing to use the Prologika website and its online services, you consent that you have read, understand and accept the terms of the Prologika Privacy Policy. If you have any questions regarding our updated Privacy Policy, please contact us by writing to info@prologika.com.

What’s Common Data Services?

Microsoft introduced Common Data Services were introduced as a part of the reimagined Business Application Platform as a “one connected platform that empowers everyone to innovate” and to put all the data you need into a standardized data model. Common Data Services consists of two offerings: Common Data Service for Apps (CDS for Apps) and Common Data Service for Analytics (CDS for Analytics).

Why two flavors? Think of the Microsoft Common Data Service for Apps (CDS for Apps) as a cloud OLTP-like repository by Business and for Business. Officially introduced in 2016 and running on Azure SQL Database, CDS for Apps is now the entity and data model behind Dynamics 365. This is where Dynamics 365 stores its data. Because it’s transaction-oriented, it’s layered on top of SQL Server. By contrast, Common Data Service for Analytics (CDS for Analytics) is oriented towards supporting analytical requirements.

How Do They Compare?

The following table compares the two CDS types.

 CDS for AppsCDS for Analytics
Primary usageOLTPOLAP
Primary tool for loading dataPowerApps/Power QueryPower Query
Primary tool for reading dataPowerApps/Power BIPower BI
Data storageAzure SQL DatabaseAzure Blob Storage (a CSV text file per entity and a JSON file for the schema)
Power BI connectivity mechanismODataAzure Blob Storage
PricingIncluded in PowerApps plansIncluded in Power BI Pro/Premium
Storage Quota10 GB per databaseRestricted by associated app workspace quota
Add-onsLogic and validationPower BI Insights apps

Both CDS types support standardized entities, whose definitions are documented in the GitHub repository of the Common Data Model. Currently, the schema of these entities is designed and controlled by Microsoft and it’s limited to Dynamics entities, such as Account, Opportunity, and so on. However, Microsoft hopes that other vendors will provide solutions and extend the CDS schema. Of course, because CDS is your database, you can extend it with your own custom entities. Note that both CDS types target business users willing to store and analyze data in a business-friendly staging database. Over time Microsoft hopes that partners will deliver more value to CDS by implementing apps (CDS apps are like the prepacked apps that already exist in Power BI, such as for Salesforce and Dynamics). Let’s now highlight some of the differences of the two CDS flavors.

Common Data Service for Apps

The main usage scenario for CDS for Apps is to jumpstart the development of PowerApps applications with a standardized data model that you can extend to your own needs.

The Good

There is a lot to like about CDS for Apps. Let’s start with pricing. Other vendors, such as Oracle and Teradata, have similar visions and products but their offerings are very expensive. The CDS for Apps pricing is included in the PowerApps licensing model because PowerApps is the primary client for creating CDS for Apps-centered solutions. Using CDS outside selected Dynamics 365 plans (that include it already) will cost you at least $7 per user and per month. CDS for Apps is more than just a data repository. It’s a business application platform with a collection of data, business rules, processes, plugins and more. In this regard, it resembles SQL Server Master Data Services (MDS). The modeler can:

  • Define and change entities, fields, relationships, and constraints. For example, the screenshot shows a custom Device Order entity that I’ve created.
  • Business rules, such as to prepopulate Ship Date based on Order Date.
  • Secure data to ensure that users can see it only if you grant them access. Role-based security allows you to control access to entities for different users within your organization

Besides the original PowerApps canvas apps (like InfoPath forms), CDS for Apps also opens the possibility to create model-driven PowerApps applications (require PowerApps P2 plan). Model-driven apps are somewhat like creating Access data forms but more versatile. Because PowerApps knows CDS for Apps, you can create the app bottom-up, i.e. start with CDS for Apps and then generate the app based on the actual schema and data. For example, you can use PowerApps to build a model-driven app for implementing the workflow for approving a certain process. Model-driven apps are a new style of a PowerApps application that makes it easy to build entity forms, entity views, and workflows. How do you get data into CDS for Apps custom entities? Your PowerApps app can write to it. Or, you can create and schedule a project that uses Power Query (yep, the same one as in Power BI) to load data from somewhere into CDS for Apps.

The Bad

How do you get data out from CDS for App, such as to import data from some entities into a Power BI model? Microsoft has released a preview build of the Common Data Service for Apps connector for Power BI. However, this connector is even slower that the Dynamics connector. It uses the OData v4 Web API. Based on my limited tests, it took the connector about a minute to download 40,000 rows from Dynamics, clocking 10% slower than the Dynamics connector. To make things worse, the connector doesn’t support query folding, so Power BI must download the entire dataset before Power Query applies filters. Because the connector doesn’t support also REST filter and select predicates, so you can’t filter data or select a subset of columns at the source. Microsoft is actively working on improving the connector performance and it might get better in time.

Continuing down the list of limitations, CDS for Apps doesn’t support change tracking (to capture changes to a given row) and incremental loads, such as to load or refresh only the data that has changed yesterday or previous month. These are all essential features that could make ODS even more valuable.

The Ugly

For years people were complaining that after migrating from the on-premises Dynamics to the cloud, they lost the ability to connect to its database directly and they had to rely on the REST APIs (slow) or Data Export Service to export the data to an SQL Server Database (fast but requires additional effort and budget). Unfortunately, although CDS for Apps stores data in Azure SQL Database, Microsoft doesn’t expose its database directly to get data out fast and bypass the REST endpoint. When I raised this issue to Microsoft I got feedback that CDS for Apps is a business platform and there are layers on top of data to handle security, rules, calculations, and so on. However, the argument that CDS for Apps is more than just a database is nonsensical to me. Try to explain to a customer that cakes have layers and CDS for Apps has layers, and therefore getting something out of it is slow. As I mentioned, the “layered nature” of the CDS is conceptually like MDS. In fact, I see a lot of overlap. MDS also supports rules, security, etc. but it doesn’t force me to go through the web service interface if all I need is the raw data. Hence, my wish to support direct connectivity to the Azure SQL Database endpoint of CDS for Apps.

Common Data Service for Analytics

CDS for Analytics is a standard feature of Power BI so every Power BI Pro user can access it. CDS for Analytics is exposed to the end user in Power BI as datapools. A datapool is a collection of entities associated with a Power BI app workspace. An entity maps to a text file in Azure Storage. Business users will rely on Power Query to populate (manually or via a scheduled refresh) entities in CDS for Analytics. You can access the workspace datapool in the workspace content page

The Good

I can think of three primary scenarios where CDS for Apps can deliver value as it stands today:

  • Offline data staging – Let’s say IT doesn’t allow direct connectivity to LOB applications but you need to create some reports on top of this data. You can stage the data as text files into CDS for Analytics. I don’t think CDS for Analtyics would bring much value if you could connect directly to it in Power BI Desktop if direct connectivity is an option. The more you move the data, the more problems you may run into. At least for now, having apps on top of text files doesn’t look like a good reason to me but I guess we have to see what apps will become available in time.
  • Prepackaged third-party solutions – Sometime ago, a software vendor asked me how they can deploy a solution to Power BI for their customers but still retain ownership. Back then I didn’t have a good answer but CDS for Apps might be a good option now. In fact, besides the Power Query as a primary tool for loading entities, any service that can write to Azure Storage can bring data to CDS for Analytics. The ISV can write the entities as CSV files and tell CDS Analytics to “mount” the storage container. CDS Analytics can now see these mounted entities and treat them as part of the whole. Worried about protecting intellectual property? Currently only the Insight App installer would have access to the installed workspace and artifacts (other users in the organization would just see the published reports which are shared with them).
  • Prepackaged insights – Like CDS for Apps, CDS for Analytics understands the Common Data Model. Over time, Microsoft and partners can contribute prepackaged “insights” that are built on top of popular LOB apps, such as Dynamics or Salesforce.

Pricing is also right. CDS for Analytics is included in Power BI although it storage counts towards the workspace quota. Another thing I like about CDS for Analytics is that the Power BI connector is very fast unlike the CDS for Apps connector.

The Bad

As of now datapools support only a small subset of the Power Query connectors. This is probably just a temporary limitation for the preview cycle. I’d imagine that all Power BI connectors for cloud and on-premises data sources will be eventually available. Continuing on the list of limitations, like CDS for Apps, CDS for Analytics doesn’t support incremental refreshes so be careful downloading millions for rows every night.

The Ugly

CDS for Analytics promises to break silos but a datapool is associated with a Power BI workspace. This architecture fragments CDS for Analytics into Power BI workspaces. However, most users would probably require access to common entities, such as Customer, Product. Not only this is not possible but the datapool storage is also limited by the workspace quota. So, if you are a Power BI Pro user who has access to an app workspace, you’re currently limited to 10 GB storage quota which includes not only Power BI datasets but also CDS entities. I wish that CDS has no association to workspaces and it was designed a global staging area, just like Azure Storage. Microsoft has promised at some point in future to allow you to reference entities between datapools in different workspaces and create calculated entities on top of them.

The success of Common Data Services for Apps will depend largely on adoption and contributions by Microsoft partners. Although it lacks typical ODS features and fast connectivity, CDS for Apps gains in “business platform” features. CDS for Analytics and Power BI Insights are new additions to Power BI. CDS for Analytics delivers Operational Data Store (ODS) to business users that is populated and maintained by business users. Microsoft and partners can augment CDS for Analytics with Power BI Insights apps.

Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

logo

Atlanta MS BI and Power BI Group Meeting on June 25

MS BI fans, join us for the next Atlanta MS BI and Power BI Group meeting on Monday, June 25th at 6:30 PM. Your humble correspondent will share 10 ways  Power BI can help augment your existing or envisioned Power BI strategy. DevScope will sponsor the meeting and demo their PowerBI Robots offering.  For more details, visit our group page and don’t forget to RSVP (use the RSVP survey on the main page) if you’re planning to attend.

Presentation:10 Ways to Empower Your BI Strategy with Power BI
Level: Intermediate
Date:June 25, 2018
Time6:30 – 8:30 PM ET
Place:South Terraces Building (Auditorium Room)

115 Perimeter Center Place

Atlanta, GA 30346

Overview:Not sure what value Power BI can bring to your BI infrastructure? Join me to discuss 10 ways Power BI can help augment your existing or envisioned Power BI strategy. If you’re interested in the Power BI but you’re not sure how it fits within your organizational data strategy, this event is for you. Discussion points include:

•              Organizational BI

•              Self-service BI

•              Cloud vs. on-premises deployments

•              Predictive analytics

•              External reporting

•              Integrated solutions

Get your Power BI questions answered and see demos along the way.

Speaker:Teo Lachev is a consultant, author, and mentor, with a focus on Microsoft Business Intelligence. Through his Atlanta-based company “Prologika” (a Microsoft Gold Partner in Data Analytics) he designs and implements innovative BI solutions that bring tremendous value to his clients. Teo has authored and co-authored several SQL Server BI books, and he has been leading the Atlanta Microsoft BI and Power BI group since he founded it in 2010. Microsoft has recognized Teo’s expertise and contributions to the technical community by awarding him the prestigious Microsoft Most Valuable Professional (MVP) status since 2004.
Sponsor:DevScope is a young, dynamic and experienced company, specialized in mentoring and development services in Web environments, and a pioneer in the region, integrating Microsoft technology, products and solutions. DevScope implements business and technology solutions with established and emerging technologies every day. DevScope projects are usually based on the latest technologies available from Microsoft, and many times, those same technologies are not yet available in the market.
Prototypes with PizzaDevScope Power BI Robots by Rui Romano, DevScope

092417_1708_AtlantaMSBI1.png

Common Data Service for Analytics: The Good, the Bad, the Ugly

UPDATE 11/15/2018: Common Data Service for Analytics is superseded by Power BI dataflows. Find the updated review here.

In a previous blog I discussed the Common Data Service for Apps (CDS for Apps). I explained that CDS for Apps is more suitable for OLTP-type applications, which is why its main client is PowerApps apps saving data in normalized tables. Since good things shouldn’t come alone, Microsoft is readying another CDS flavor, Common Data Service for Analytics (CDS for Analytics), which is oriented towards supporting analytical requirements. Microsoft provided a good introduction to CDS for Analytics in the “Common Data Service for Analytics (CDS-A) and Power BI – an Introduction” video and “Introduction to Common Data Service For Analytics” video. Without rehashing what has been already announced and said, I’d like to share a few notes from what I’ve learned so far.

The following table compares the two CDS types.

CDS for AppsCDS for Analytics
Primary usageOLTPOLAP
Primary tool for loading dataPowerApps/Power QueryPower Query
Primary tool for reading dataPowerApps/Power BIPower BI
Data storageAzure SQL DatabaseAzure Blob Storage (a CSV text file per entity and a JSON file for the schema)
Power BI connectivity mechanismODataAzure Blob Storage
PricingIncluded in PowerApps plansIncluded in Power BI Pro/Premium
Storage Quota10 GB per databaseRestricted by associated app workspace quota
Add-onsLogic and validationPower BI Insights apps

Note that both CDS types target business users willing to store and analyze data in a business-friendly staging database. Over time Microsoft hopes that partners will deliver more value to CDS by implementing apps (CDS apps are like the prepacked apps that already exist in Power BI, such as for Salesforce and Dynamics).

In fact, apps are so important for the success of CDS that Microsoft listed the CDS for Analytics-powered apps, dubbed Power BI Insights, as another product in the Power BI portfolio.

CDS for Analytics is a standard feature of Power BI so every Power BI Pro user can access it. CDS for Analytics is exposed to the end user in Power BI as datapools. A datapool is a collection of entities associated with a Power BI app workspace. An entity maps to a text file in Azure Storage. Business users will rely on Power Query to populate (manually or via a scheduled refresh) entities in CDS for Analytics. You can access the workspace datapool in the workspace content page.

The Good

I can think of three primary scenarios where CDS for Apps can deliver value as it stands today:

  1. Offline data staging –  Let’s say IT doesn’t allow direct connectivity to LOB applications but you need to create some reports on top of this data. You can stage the data as text files into CDS for Analytics. I don’t think CDS for Analtyics would bring much value if you could connect directly to it in Power BI Desktop if direct connectivity is an option. The more you move the data, the more problems you may run into. At least for now, having apps on top of text files doesn’t look like a good reason to me but I guess we have to see what apps will become available in time.
  2. Prepackaged third-party solutions – Sometime ago, a software vendor asked me how they can deploy a solution to Power BI for their customers but still retain ownership. Back then I didn’t have a good answer but CDS for Apps might be a good option now. In fact, besides the Power Query as a primary tool for loading entities, any service that can write to Azure Storage can bring data to CDS for Analytics. The ISV can write the entities as CSV files and tell CDS Analytics to “mount” the storage container. CDS Analytics can now see these mounted entities and treat them as part of the whole. Worried about protecting intellectual property? Currently only the Insight App installer would have access to the installed workspace and artifacts (other users in the organization would just see the published reports which are shared with them).
  3. Prepackaged insights – Like CDS for Apps, CDS for Analytics understands the Common Data Model. Over time, Microsoft and partners can contribute prepackaged “insights” that are built on top of popular LOB apps, such as Dynamics or Salesforce.

Pricing is also right. CDS for Analytics is included in Power BI although it storage counts towards the workspace quota. Another thing I like about CDS for Analytics is that the Power BI connector is very fast unlike the CDS for Apps connector.

The Bad

As of now datapools support only a small subset of the Power Query connectors. This is probably just a temporary limitation for the preview cycle. I’d imagine that all Power BI connectors for cloud and on-premises data sources will be eventually available. Continuing on the list of limitations, like CDS for Apps, CDS for Analytics doesn’t support incremental refreshes so be careful downloading millions for rows every night.

The Ugly

CDS for Analytics promises to break silos but a datapool is associated with a Power BI workspace. This architecture fragments CDS for Analytics into Power BI workspaces. However, most users would probably require access to common entities, such as Customer, Product. Not only this is not possible but the datapool storage is also limited by the workspace quota. So, if you are a Power BI Pro user who has access to an app workspace, you’re currently limited to 10 GB storage quota which includes not only Power BI datasets but also CDS entities. I wish that CDS has no association to workspaces and it was designed a global staging area, just like Azure Storage. Microsoft has promised at some point in future to allow you to reference entities between datapools in different workspaces and create calculated entities on top of them.

CDS for Analytics and Power BI Insights are new additions to Power BI. CDS for Analytics delivers Operational Data Store (ODS) to business users that is populated and maintained by business users. Microsoft and partners can augment CDS for Analytics with Power BI Insights apps.

Power BI Feature Discrepancies for Data Acquisition

Power BI churns out new features fast but not all features are available everywhere. As one of the most confusing aspects of Power BI, feature availability depends on the data acquisition method (data import vs live connections) and across Power BI product offerings (Power BI Desktop, Power BI Service, Power BI Report Server, and Power BI Mobile). Microsoft has left gaps in the product documentation and UI to clearly indicate feature availability. For example, while you can add a Q&A button in Power BI Desktop irrespective of whether you import or connect live, the button won’t work with live connections because Q&A for Creators is available only when data is imported.

This blog is my first attempt to clarify the feature availability based on the data acquisition method. All features work when you import data. Therefore, this option is not listed in the table. Direct connections however and not that fortunate. DirectQuery is when you connect directly to all data sources except Analysis Services. Because Power BI knows more about Analysis Services, direct connectivity to Analysis Services is listed separately. A while back I asked Microsoft why certain features are not supported with live connections to Analysis Services, especially to Tabular. To me, there shouldn’t be any difference between data import and using an external Tabular connection. At the end, it’s all Tabular behind the scenes. The answer back then was related to performance concerns for chatty features with external Tabular models. I continue pushing to eliminate these discrepancies if possible over time and provide more feature parity. The table omits features that are supported by the three connectivity options.

DirectQueryLive Connection to Analysis Services1
BinningYesNo
Calculated columnYes2No
Calculated measureYes2Yes2
Change field data typeYesNo
Change field formattingYesNo
ClusteringNoNo
Custom groupsYesNo
Data categories for fieldsYesNo
Explain Increase/decrease (Power BI Desktop)NoNo
Fields propertiesYesNo
HierarchiesYesNo
Power QueryYesNo
Q&A in dashboards (Q&A for consumers)NoTabular only
Q&A in reports (Q&A for creators)NoNo
Quick Insights (Power BI Service)NoNo
RelationshipsYesNo
Row-level Security (Power BI Desktop)YesNo3
SynonymsYesNo
What-ifNoNo
  1. In general, besides calculated measures for Tabular, no modeling features are available with live connections to Analysis Services. This makes sense to avoid a semantic model (Power BI Desktop) over a semantic model in Analysis Services. In fact, Data, Relationships, and Query Editor (Power Query) are not available when connecting directly to Analysis Services.
  2. See this page for DAX limitations in DirectQuery mode.
  3. Analysis Services has its own security mechanism.