Optimizing Dimension Data Security

Dimension data security is a great feature of Analysis Services and one of the top reasons to strongly consider a semantic layer in your BI implementations, as I mentioned in my “Why Semantic Layer” newsletter. That’s because the server automatically secures the cube at connect time so you don’t have to implement your own security framework and secure every report query. If you are new to dimension data security, I covered it in various places including this article that I wrote for SQL Magazine a while back. One thing to be aware of is that the server applies dimension data security when the user sends the first query after the connection is established even if this query doesn’t specifically request the secured dimension, e.g. SELECT NULL ON 0 FROM [CubeName]. Many things happen at this point (see the steps after “Dimension security is applied to all attributes” in this Mosha blog). This initialization sequence is the price you need to pay to have your entire cube secured. The initialization duration depends on many factors, including the size of the dimension, the size of the allowed set, and the cube script. However, you have some control over the initial performance hit or at least to offset it.

Let’s consider a real-life scenario. A customer has a medium-size cube (about 50 GB) and dimension data security needs to be applied to an Account dimension whose dimension key attribute had about four million members. Specifically, the allowed set had to be applied on an attribute that had about 300,000 members. In this case, the security policies are stored in a factless fact table (for more information how dimension data security works with a factless fact table, see the second part of the above article). The initial initialization time for an allowed set of only two accounts was 9 seconds which is not so bad. However, if the allowed set return 20,000 allowed accounts, the user had to wait for 30 seconds to connect which can surely lead to user complaints.

Reducing the Autoexists Performance Hit

When the server applied dimension data security, it applies it to all attributes within the secured dimension. This behavior is known as Autoexists. In this case, the Account dimension had about 30 attributes. Although dimension security is applied on a single attribute (Level 3), the server automatically secures the rest of the attributes to allow only the members that exist with the allowed members of the secured attribute (if the allowed set returns ‘USA’, Autoexists allows only cities within USA). One way to determine how much time the server spends applying Autoexists is to enable and monitor in the Query Dimension event in the Profiler. Another way would be to remove all the cube script (except the first CALCULATE statement) and track the query execution time before and after.

As I said, the larger the allowed set and the dimension size, the more time will be spent in propagating security to the attributes in the secured dimension. To reduce this time, consider introducing a new dimension, e.g. Account Filter, that has as fewer attributes as possible. Ideally, it should include only the secured attribute although you might need to include the dimension key attribute so you don’t have to make changes to the fact tables in order to join them to the dimension at a higher level. In our case, we tested three scenarios:

Scenario 1: Security applied on the original dimension

An allowed set of two members took 1.7 seconds in Autoexists. An allowed set 20,000 members took 13 seconds in Autoexists.

Scenario 2: Security applied on a new Account Filter dimension that had three attributes (dimension key, secured attribute, and another attribute required to meet the business requirements for security).

An allowed set of two members took 1.5 seconds in Autoexists. An allowed set 20,000 members took 3.5 seconds in Autoexists.

Scenario 3: Security applied on a new Account Filter dimension that had two attributes (secured attribute and another attribute required to meet the business requirements for security). This scenario required changes to the fact tables to bring in the secured attribute in order to join them to the Account Filter dimension.

An allowed set of two members took 1.2 seconds in Autoexists. An allowed set 20,000 members took 1.7 seconds in Autoexists.

Optimizing the Cube Script

This cube had about 3,000 lines, most of them basic calculated members (a best practice is to have calculations defined in the semantic layer instead of reports or elsewhere). The items to pay attention to in the script are static sets and default members (default members can be defined in the dimension itself or in the script). A static set is evaluated once per the user session. With regards to dimension data security, we’ve found that the server evaluates the static set in the context of the allowed members. If you turn the Execute MDX Script Current event to see how much time the server spends evaluating each line in the script. In our case, I’ve found that the following static set took some 10 seconds to evaluate with an allowed set of 20,000 members:

CREATE SET [MonthsWithInventory] AS Exists([Date].[Fiscal by Month].[Month].Members, , “Inventory”); 

When I converted the set to dynamic (CREATE DYNAMIC SET), the overhead disappeared from the initialization time. However, be aware that during the static set evaluation the server warms the cache with the results of the static set. So switching to a dynamic set really saves you the cost of evaluating the Exists query which in this case was expensive. Consequently, the first query will execute faster. But it obviously means that the cache that would have been built up as part of evaluating that set is now not available for other operations. Therefore, subsequent queries that requests data by month might be slower.

What I’ve found also is that overwriting the default members contributes to the initialization times. If you have an option, remove explicit attribute defaults. Interestingly, trying to conditionally bypass the defaults didn’t eliminate the overhead, such as using the following approach:

ALTER CUBE CURRENTCUBE UPDATE DIMENSION [Account].[Active Flag], Default_Member = iif(IsEmpty(CustomData()), [Account].[Active Flag].&[Active], [Account].[Active Flag].DefaultMember);

 

Special thanks to Akshai Mirchandani on the SSAS team for shedding light in dark places.

The SharePoint 2013 Data Model

Somehow, this PowerPivot for SharePoint gem escaped my attention but it’s a very important deployment enhancement that was more than likely initiated by Power BI and the cloud. Starting with SQL Server 2012 Service Pack 1 and SharePoint 2013, you can set up a dedicated SSAS server in SharePoint mode outside the SharePoint farm and configure the farm to point to it. This was possible as a result of all the integration work done between Excel Calculation Services, SharePoint and AS for the 2013 release. Previously, the SSAS in SharePoint configuration mode instance had to be installed on the SharePoint application server.

This brings more flexibility to scaling out your PowerPivot environment outside SharePoint and simplifies the PowerPivot setup. For example, to scale out Analysis Services, you can simply add a new SSAS server configured in SharePoint mode. Excel Calculation Services will balance the load between all available SSAS servers automatically based on health statistics obtained from these server instances. Same thing if you want to make the backend fault-tolerant. If one server goes down, the workbooks that used to be on the failed server will automatically get redeployed on the others by Excel Calculation Services. Specifically, the next time the workbook is accessed by a user, ECS would realize that the workbook is no longer on any active node and the workbook would get loaded once again on a different node.

To register a new server in SharePoint 2013:

  1. Open the SharePoint Central Administration, in the Application Management section, click Manage service applications.
  2. On the Manage Service Applications page, click the Excel Services service application that you want to manage.
  3. On the Manage Excel Services page, click Data Model.

102713_0124_TheSharePoi1

Then, in the Excel Services Application Data Model Settings, click Add Server to register the new server.

102713_0124_TheSharePoi2

The PowerPivot setup is simplified as well because you don’t need to install the PowerPivot for SharePoint add-in, as explained here. However, the add-in is still required for the PowerPivot for SharePoint full capabilities, including PowerPivot Gallery, scheduled data refresh, and the PowerPivot management dashboard. Finally, having a dedicated SSAS server might save you money if you need to scale out. That’s because you no longer need to license SSAS on each SharePoint application server assuming you don’t need the other SQL Server components, such as SSRS. Further savings can be realized by deploying SSAS in SharePoint as another instance to an existing SSAS server (Multidimensional or Tabular) you have. To my understanding, the same version instances are covered by a single license.

Presenting at Visual Analytics Science and Technology Conference

I believe that the Pareto 80/20 principle applies well to a sound BI strategy where 80% of the BI needs should be addressed by Organizational BI (data warehouse, semantic layer, dashboards, operational reports, descriptive analytics, predictive analytics, big data, etc.) and 20% should be left for self-service BI. One of the areas where self-service BI excels is to promote lateral thinking. A business users can quickly test ideas and hypotheses by importing, mashing up, and visualizing data. This is why I’m excited to present at the Visual Analytics Science and Technology Conference in Atlanta on October 16th. The session title is “Self Help BI: How Self-service BI Promotes Lateral Thinking” and it’s included in the “Using Visual Analytics to Foster Lateral Thinking about Business Problems” workshop. I’ll show how a business users can use the Excel reporting and data mining capabilities to test ideas and then share the results by deploying to the cloud.

WORKSHOP: Using Visual Analytics to Foster Lateral Thinking about Business Problems

DATE & TIME: October 16, 08:30 am – 12:10 pm ET

LOCATION: A 706+707

Lateral thinking (also called divergent thinking) is a critical part of the sense making process. Moreover, researchers in the field of Visual Analytics have recognized that iterative and sequential rounds of ‘Convergent Thinking’ and ‘Lateral Thinking’ are necessary for arriving at the most insightful observations. The same pattern is at the heart of ‘Design Thinking’ practiced by creative professionals. The approach leads them to holistic problem solutions that exceed what could be achieved through pure a ‘Convergent Thinking’ approach. However, most of the BI and analytics systems used by business organizations include tools and interactive features (like filtering, sorting, selecting or ‘data brushing’) that are primarily ‘convergent’ in nature. There is little or no support for lateral ideation. Yet lateral thinking has a fairly well developed body of knowledge and includes easy to use techniques for developing out-of-the-box, creative design ideas. Business practitioners regularly use ideation techniques like ‘6 Hats’, ‘HIT Matrix’ and ‘BrainWriting’ to bring structure to brainstorming sessions that seek out-of-the-box business ideas. Most new design or business ideas start as a ‘sketch’ or early visualization of the idea. But there are no formal tools or techniques in commercial BI systems that allow business users to develop ‘sketches’ of alternative business scenarios though forced and controlled data experiments – just the way a ‘HIT matrix’ or ‘6 Hats’ technique does in Lateral Thinking workshops. This session will introduce a number of ideation techniques designed for lateral thinking and foster discussion around how these approaches can be leveraged in the visual analytics context. The session will bring together researchers and practitioners to think about ways this can be done, and discuss challenges and potential solutions.

I hope you can make it!

Oracle Developer Tools for Visual Studio

In my previous blog, I was somewhat hard on Oracle for not testing their connectivity software property. On a positive note, I like their effort to integrate Oracle Developer Tools with Visual Studio. This video provides a nice intro to this feature. Basically, Oracle Developer Tools for Visual Studio fulfills the same role as the Visual Studio built-in support for SQL Server, including designing objects, querying databases, and even tuning queries in Server Explorer.

It’s great to see vendors working together and Oracle Developer Tools for Visual Studio is a true affirmation that times are changing for better.

100413_0045_OracleDevel1

Seeking Oracle

Getting two mega vendors to work together – always fun. On a positive note, the experience further reinforced my love for SQL Server.

Scenario: Installing Oracle 11g 64-bit and 32-bit clients on Windows Server R2 64-bit machine for both development and testing. As you would recall, BIDS (aka SSDS in SQL Server 2012) is a 32-bit application. So, is PowerPivot if it runs in 32 bit Excel. On the other hand, the server products, such as SSRS and SSAS, are x64. So, the changes are that you’ll need both the 32-bit and 64-bit OLE DB Oracle providers since Microsoft discontinued its Oracle OLE DB provider support.

Issue 1: It looks like this deployment scenario wasn’t on the Oracle support radar and probably was never tested. First of all, based on previous experience, you need to install both the 64 bit and 32 bit full Oracle clients with the Administrator option. Other options didn’t work for me. However, the 64-bit client install completes fine but the 32-bit client fails. Once the 32-bit setup goes to the Finish step, it just vanishes into a thin air, probably as a result of an unhandled exception. No errors in the Oracle inventory logs or anywhere else.

Resolution: How do we get the 32-bit client to actually install SxS with the 64-bit client? A registry hack of course, because it tries to use the 64-bit inventory folder (C:\Program Files\Oracle\Inventory) and this is not allowed. So, open regedit and navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\ORACLE. Change the inst_loc setting from C:\Program Files\Oracle\Inventory to C:\Program Files (x86)\Oracle\Inventory, as follows:

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\ORACLE]

inst_loc=C:\Program Files (x86)\Oracle\Inventory

Issue 2: Now that we got the 32-bit client installed, what do we get in BIDS when we attempt to set up a data source to Oracle? The Oracle OLE DB provider is missing.

Resolution: Run the Oracle 32-bit client a second time but choose the Custom option. Then, select the Oracle OLE DB provider to get it registered properly.

 

1122.ora1.png-550x0

Now we get the Oracle OLE DB Provider to show in BIDS and we should be able to connect in both x32 and x64 clients and everything should just work nicely, right?

4331.ora2.png-550x0

Issue 3: After all of this setup extravaganza, what speed do you get when reading rows from Oracle in SSAS or PowerPivot? About 20,000 rows/sec. Now, SSAS is capable of processing 100K+ rows/sec on a single thread so something is wrong here. I got the same pathetic speed with different customers so the issue is probably with the OLE DB provider. I tried a few things but I couldn’t get better performance. I’m hearing that Oracle drivers from other vendors (DataDirect and Attunity were specifically mentioned) have better performance but I haven’t tried them. Let me know if you have a better luck.