OLAP PivotTable Extensions for Excel 2010

Kudos to Greg Galloway (SQL Server MVP) for upgrading his fantastic OLAP PivotTable Extensions for Excel 2010 add-in (now available in both x32 and x64). This add-in is one of the first third party tools I install on a new machine. Since the Excel team seems to be ignoring customer feedback for improving the Excel BI features, OLAP PivotTable provides what’s lacking, including:

  • Excellent search capabilities with Field List, Dimension Data and attribute options. By contrast, the Excel 2010 search is limited to searching within a hierarchy level only. This is pretty much useless with parent-child hierarchies. If I know what level the member is located, I don’t have to search, right?
  • Getting the MDX query – Excel refuses to expose the actual MDX.
  • Calculated members – Another feature that business users keep asking about but Excel doesn’t budge. The Calculations tab of the add-in lets you create calculated members. It will be great if this is more user-friendly. Ideally, the user should get the same Calculation Member dialog that’s included in Reporting Services.
  • Filtering of attribute members

071010_2228_OLAPPivotTa1

Yet Another Report Timeout Solved

Every now and then, I run into mysterious report timeouts. This time the client had a farm of two report servers integrated with MOSS with long running reports that would time out after two minutes or so. Normally, you shouldn’t have such report monsters but life is not perfect, right? The client did a due diligence and had checked/changed every possible timeout setting, including:

  1. Report Processing Timeout in the Reporting Services Server Defaults settings in SharePoint Central Administration.
  2. The httpRuntime settings on all SharePoint farm servers’ web.config files:

    <httpRuntime maxRequestLength=”51200″ executionTimeout=”9000″ />

  3. The “databaseQueryTimeout” value in the RSReportServer.config file from 120 to 900 (15 minutes).

    <Add Key=”DatabaseQueryTimeout” Value=”900″/>

  4. The connection timeout in IIS for the web site on each MOSS server from 120 to 240 seconds.

Alas, no help. Reports would still time out. After much investigation, we discovered a server timeout setting on the load balancer which was set for 2 minutes. After increased it to 15 minutes, the problem went away. So many timeout settings, so little time…

Want more timeout tips? Check out this blog which I wrote four years ago.

Atlanta.MBI Website

The website for the Atlanta Microsoft Business Intelligence SIG is up and running although it’s still work in progress.

http://atlantabi.sqlpass.org

1. Please register.

2. Please fill in the two polls on the first page for the first meeting attendance and topic of interest.

3. You can post suggestions for our first meeting on the Discussions page (Meetings forum) as a reply to my first post there.

4. Please use the General forum in the Discussions for any general questions.

5. Please spread the news and redirect Atlanta BI fans to http://atlantabi.sqlpass.org.

Venue for Atlanta.MBI Found

The great search for a meeting place for the Atlanta BI SIG is over! I am happy to report that I found the perfect place. Matrix Resources graciously offered to host and sponsor our meetings in a training room at their premises at 115 Perimeter Center Place #250, Atlanta, GA, 30346. I visited their location and another location today and I think their place is great. The room is capable of accommodating 50 people and has a projector. The location is nice too since I was looking for place around this area to accommodate the traffic concerns of as many people as possible. I personally live in Norcross and it would have been nice to take advantage of the Data Profit’s meetup offer but it would have been too selfish J

I also booked the meeting days for the rest of the year. I suggest we meet up every last Monday of the month from 6:30 – 8:30 PM as follows:

8.23, 9.27, 10.25, 11.22 and 12.27

Our first meeting will on be August 23th at 6:30 PM.

My focus now is to put together the Atlanta BI website that will be hosted on sqlpass.org. I’ll let you know when it’s ready so you could register and stay on top of the latest announcements. Stay tuned!

Venue for Atlanta.MBI Wanted

To follow up on my recent blog about forming an Atlanta-based Microsoft Business Intelligence Special Interest Group, I am planning to have our first meeting in August. The effort now is to find a venue where we can meet. If you have a confirmed location, please post its address to the discussion thread. I’ll compile a list and I update this blog and the discussion thread.

Interactive Sorting and Matrix

A customer requested the ability to interactively sort a matrix on rows and columns. When they click the sort indicator in a row, the matrix should sort columns based on that row. For example, if the first rows is clicked and we are in an descending sort mode, the following screenshot shows the desired behavior.

062010_1331_Interactive1

When they click on a column sort, all row groups would sort within the column clicked. Sounds simple? I’ll be quick to point out that interactive sort is currently not supported directly inside a matrix with dynamic row groups and column groups. The only supported scenario for interactive sort with matrix is a sort indicator in the matrix corner which sorts based on overall column aggregates when a dynamic row group is present. That case is supported because the interactive sort doesn’t reach across from the column to the row axis of the matrix; it just sorts column groups relative to each other based on the overall aggregate of an entire column group instance. Robert Bruckner explains this in the discussion forum and provided a sample report.

So, column sort is out of the question unless you get really innovative and post back to the report by following the approach I presented in my Remembering Report Parameters blog and change the actual query once you know the row and column coordinates. As far as sorting on rows, I got this working somewhat by making the following changes.

062010_1331_Interactive2

I set the Choose What To Sort option to the name of the column group, Sort By to the data field (measure), and Apply This Sorting to the name of the dataset. I said this works “somewhat” because a subsequent sort on another row doesn’t work. The user must refresh the report to go back to unsorted state before they click on another row.

It will be nice if tablix tells you upfront that this isn’t supported so folks don’t trash time thinking that they didn’t get the options right.

Atlanta Microsoft Business Intelligence Special Interest Group (SIG) Forming

To Atlanta Business Intelligence fans:

Some of you have approach Geoff Hiten and me about forming an Atlanta-based Microsoft Business Intelligence group. After an intense internal discussion and brainstorming involving SQL Pass, Atlanta.MDF, and the local Microsoft office, we agreed that we should form such a group as a special interest group (SIG) within Atlanta.MDF. I volunteered to lead the group. I started a discussion on the Atlanta.MDF discussion list to solicit feedback and gauge interest about this endeavor. What should we do so you can get the most from this group? Please share your opinion by posting to the thread!

Attend One BI Training Class – Get One Consulting Hour Free!

I am back from vacation in Florida and I am all rested despite the intensive sun exposure and the appearance of some tar from the oil spill. I have scheduled the next two runs of my online training classes:

Applied Reporting Services 2008 Online Training Class
Date: June 28 – June 30, 2010
Time: Noon – 4:30 pm EDT; 9 am – 1:30 pm PDT

Applied Analysis Services 2008 Online Training Class
Date: July 7 – July 9, 2010
Time: Noon – 4:30 pm EDT; 9 am – 1:30 pm PDT

Yes, I am also tossing in an hour of consulting with me to spend it any way you want absolutely FREE! This is your chance to pick up my brain about this nasty requirement your boss wants you to implement. So, sign up while the offer lasts. Don’t forget that you can request custom dates if you enroll several people from your company.

In Search of Aggregations

The other day I decided to spend some time and educate myself better on the subject of aggregations. Much to my surprise, no matter how hard I tried hitting different aggregations in the Internet Sales measure group of the Adventure Works cube, when I got the Get Data From Aggregation event in the SQL Profiler, it always indicated an Aggregation 1 hit.

Aggregation 1 100,111111111111111111111

I found this strange given that all partitions in the Internet Sales measure group share the same aggregation design which has 57 aggregations in the SQL Server R2 version of Adventure Works. And I couldn’t relate the Aggregation 1 aggregation any of the Internet Sales aggregations. BTW, the aggregation number reported by the profiler is a hex number, so Aggregation 16 is actually Aggregation 22 in the Advanced View of the Aggregations tab in Cube Designer. With some help from Chris Webb and Darren Gosbell, we realized that Aggregation 1 is an aggregation from the Exchange Rates measure group. Too bad the profiler doesn’t tell you which measure group the aggregation belongs to. But since the Aggregation 1 vector has only two comma-separated sections, you can deduce that the containing measure group references two dimensions only. This is exactly the case of the Exchange Rates measure group because it references the Date and Destination Currency dimensions only.

But this still doesn’t answer the question why the queries can’t hit any of the other aggregations. As it turned out, the Internet Sales and Reseller Sales measure group includes measures with measure expressions to convert currency amounts to USD Dollars, such as Internet Sales, Reseller Sales, etc. Because the server resolves measure expressions at the leaf members of the dimensions at run time, the server cannot benefit from any aggregations. Basically, the server says “this measure value is a dynamic value and pre-aggregated summaries won’t help here”. More interestingly, even if the query requests another measure that doesn’t have a measure expression from these measure groups, it won’t take advantage of any aggregations that belong to these measure groups. That’s because when you query for one measure value in a measure group, the server actually returns all the measure values in that measure group. The Storage Engine always fetches all measures, including a measure based on a measure expression – even when it may not have been requested by the current query. So, it doesn’t really matter in this case that you requested a different measure – it will always aggregate from leaves.

To make the story short, if a measure group has measures with measure expressions, don’t bother adding aggregations to that measure group. So, all these 57 aggregations in Internet Sales and Measure Sales that the Adventure Works demonstrates are pretty much useless. To benefit from an aggregation design in these measure groups, you either have to move measures with expressions to a separate measure group or don’t use measure expressions at all. As a material boy, who tries to materialize calculations as much as possible for better performance, I’d happily go for the latter approach.

BTW, while we are on the subject of aggregations, I highly recommend you watch Chris Webb’s excellent Designing Effective Aggregation with Analysis Services presentation to learn what the aggregations are and how to design them. Finally, unlike the partition size recommendations that I discussed in this blog, aggregations are likely to help with much smaller datasets. Chris mentions that aggregations may help even with 1-2 million rows in the fact table. The issue with aggregations is often about building the correct aggregations, and in some cases to be careful not to build too many aggregations.

To Partition or Not – This is the Question

As Analysis Services users undoubtedly know, partitioning and aggregations are the two core tenants of a good Analysis Services data design. Or, at least they have been since its first release. The thing though is that a lot of things have changed since then. Disks and memory got faster and cheaper, and Analysis Services have been getting faster and more intelligent with each new release. You should catch up with these trends, of course, and re-think some old myths, the most prevalent being that you must partition and add aggregations to every cube so you can get faster queries.

In theory, partitioning reduces the data slice the storage engine needs to cover and the smaller the data slice, the less time the SE will spent reading data. But there is a great deal of confusion about partitioning through the BI land. Folks don’t know when to partition and how to partition. They just know that they have to do it since everybody does it. The feedback from the Analysis Services team contributes somewhat to this state of affairs. For example, the Microsoft SQL Server 2005 Analysis Services Performance Guide advocates a partitioning strategy for 15-20 million rows per partitions or 2GB as a maximum partition size. However, the Microsoft SQL Server 2008 Analysis Services Performance Guide is somewhat silent on the subject of performance recommendations for partitioning. The only thing you will find is:

For nondistinct count measure groups, tests with partition sizes in the range of 200 megabytes (MB) to up to 3 gigabytes (GB) indicate that partition size alone does not have a substantial impact on query speeds.

Notice the huge data size range – 200 MB to 3 GB. Does this make you doubtful about partitioning for getting faster query times? And why did they remove the partition size recommendation? As it turns out, folks with really large cubes (billions of rows) would follow this recommendation blindly and end up with thousands of partitions! This is even a worse problem than having no partitions at all because the server trashes time for managing all these partitions. Unfortunately (or fortunately), I’ve never had to chance to deal with such massive cubes. Nevertheless, I would like to suggest the following partitioning strategy for the small to mid-size cubes:

  1. Start with no partitions.
  2. Use XPerf (see my Using XPerf to Test Analysis Services blog) to test a few queries that aggregate data only (no MDX functions or script involved, and no aggregations).
  3. Analyze the I/O time, that is, the time spent in reading data from disk. This is the time that partitioning may help you reduce.
  4. Compare this time with the overall query time. As I explained in the blog, the storage engine is responsible for retrieving and aggregating data. You may very well find that the most of the query time is spent in aggregating data at the grain requested by the query and only a small portion is spent reading data from disk.
  5. If the I/O time is substantial, partition your measure groups and compare times again. Otherwise, forget about partitions and see if aggregations can help you reduce the time spent in aggregating data.

If the above sounds too complicated, here are some general recommendations that originate from the Analysis Services team to give you rough size guidelines about partitioning:

  • For tiny/small cubes (e.g. less that 100 million), don’t partition at all. I don’t have statistics but I won’t be surprised if 80% of the cubes in existence are smaller.
  • For mid-size cubes (say 100 million records to 10 billion records), partition based on good slices and data load strategies. This is where you could follow the 20 million/2 GB guidelines that lead to 10 to 500 partitions.
  • For large cubes, consider larger but fewer partitions. Perhaps, go with larger old partitions and smaller new partitions if possible. Choose an intelligent partitioning strategy where you understand the slices that your typical queries are going to apply and try to optimize based on that.

One area where partitioning does help is manageability if this is important for you. For example, partitions can be processed independently, incrementally (e.g. add only new data), have different aggregation designs, combine fact tables (e.g. actual, budget, and scenario tables in one measure group). Partitioning can help you also reduce the overall cube processing time with multiple CPUs and cores. Processing is a highly parallel operation and partitions are processed in parallel. So, you may still want to add a few partitions, e.g. partition by year, to get the cube to process faster. Just be somewhat skeptical about partitioning for improving the query performance.