Gartner’s 2013 Business Intelligence Magic Quadrant
Gartner released the 2013 BI Magic quadrant report. And the winner is…
Gartner released the 2013 BI Magic quadrant report. And the winner is…
MVP fellow, Andrew Burst, discusses Gartner’s newly released DW (not BI, mind you) magic quadrant. It looks like Gartner fell in love with Teradata but Microsoft is moving up as well thanks to the advances in its Parallel Data Warehouse (PDW) and in-memory offerings.
I ran into a severe performance issue with the Aggregate function and unrelated dimensions which I reported on connect. In this context, an unrelated dimension is a dimension that doesn’t join any measure group in the cube. Why have unrelated dimensions? My preferred way to implement time calculations is use a regular dimension that is added to the cube but not related to any measure groups. Then, I use scope assignments to overwrite the time calculations, e.g. for YTD, MTD, etc. For example, the scope assignment for YTD might look like:
Scope (
[Relative Date].[YTD]
);
this =
Aggregate
(
{[Relative Date].[Current Period]} * PeriodsToDate([Date].[Fiscal].[Fiscal Year], [Date].[Fiscal].CurrentMember)
);
End Scope;
Notice the use of the Aggregate function which when executed maps to the default aggregation function of the underlying measure. For some reason with SQL Server 2012, a query that uses the Relative Date dimension experiences a significant performance hit. Replacing Aggregate with Sum fixes the issue, assuming you can sum up the affected measure to produce the time calculations.
What options does a PowerPivot user have to refresh data in a PowerPivot model on the desktop, aka PowerPivot for Excel?
Prior to Excel 2013, the answer was just one – manual refresh by either clicking the Refresh button in the PowerPivot window or the Refresh button in the Existing Connections dialog box. Note that the Refresh button in the Excel ribbon doesn’t work since Excel doesn’t know anything about PowerPivot. Not does the checking the “Refresh data when opening the file” checkbox in the PowerPivot connection. In Excel 2010, these options won’t reopen the PowerPivot connections to the data sources. Instead, the net effect is that they will simply refresh the pivot reports from the pivot cache which is not what you’re after.
Starting with Excel 2013, however, Excel and PowerPivot play better together as I discussed in my What’s New in Office 2013 BI blog. And, now we have additional options to refresh data:
3. Finally, you can also automate PowerPivot tasks, including refreshing data. For example, if you want to refresh the ResellerSales table on open, you can add the following line to the Workbook_Open() event:
ActiveWorkbook.Model.ModelTables(“ResellerSales”).Refresh
Speaking of automation, everything you can do in the Excel UI is exposed in the object model and automatable in Excel 2013. Here is another example of adding a table from the Excel workbook to the model and then creating a relationship between that table and another table which is already in the model:
Workbooks(“Book1”).Connections.Add “WorksheetConnection_Book1!Table2”, “”, “WORKSHEET;Book1”, “Book1!Table2”, 7, True, False
ActiveWorkbook.Model.ModelRelationships.Add ActiveWorkbook.Model.ModelTables(“Table1”).ModelTableColumns(“Name”), ActiveWorkbook.Model.ModelTables(“Table2”).ModelTableColumns(“Name”)
And, don’t forget then when you deploy your PowerPivot model to SharePoint you can automate the data refresh on a schedule.
Besides the usual hassle configuring PerformancePoint, including insufficient permissions to databases and service accounts, version 2013 requires the SQL Server 2008 R2 drivers. This is surprising considering that SharePoint 2013 shipped after SQL Server 2012.
In a previous blog, I explained how to configure the SharePoint 2013 BI Center. When you go to any of the PerformancePoint-related links, such as Dashboards or PerformancePoint Content, you’ll see the following ribbon.
The Dashboard Designer button is the new way to launch the PerformancePoint Designer. However, when you attempt to configure a data source pointing to Analysis Services, you will likely get an error. The first stop for troubleshooting SharePoint and PerformancePoint issues is of course the Windows Event Log. If you examine the Windows Event Log, you’ll see that PerformancePoint fails to load the 10.0 version of Microsoft.AnalysisServices.AdomdClient. This is the version that’s included in SQL Server 2008 R2. This sends you to the SQL Server 2008 R2 Feature Pack page, from where you can download and install the Microsoft SQL Server 2008 R2 ADOMD.NET library. Now, you can connect to Analysis Services.
The next trip to the SQL Server 2008 R2 Feature Pack page will happen when you try to import (not create) KPIs defined in an Analysis Services cube. This time the error in the Event Log indicates that PerformancePoint requires the 10.0 version of the Microsoft.AnalysisServices dll, which represents the Analysis Services Management Objects (AMO). Back to the SQL Server 2008 R2 Feature Pack, you need to download and install Microsoft SQL Server 2008 R2 Analysis Management Objects. While you there, you might as well download and install Microsoft Analysis Services OLE DB Provider for Microsoft SQL Server 2008 R2 although I don’t think PerformancePoint uses it.
Where should I put my BI reports? Should I upload them to department-level SharePoint sites or put them in one place?
These are common questions that we get from customers. Delivering on the promise of pervasive BI, my preference is to centralize BI artifacts in a single place. Ideally, this BI depository should be the SharePoint Business Intelligence Center. If organizational security is required, you can control security at SharePoint site or library level. For example, you can create department-specific PowerPivot galleries.
The BI Center is one of the SharePoint site templates that is specifically designed to host BI reports. In SharePoint 2013, Microsoft has extended the BI Center to accommodate various types of BI documents.
When I first installed the BI Center in SharePoint 2013 (click Settings (the wheel in the top-right corner), Site Contents, New Subsite, Enterprise Tab, Business Intelligence Center), I was confused. The images are not clickable and the default home page doesn’t offer much more. It turned out that by default, the BI Center doesn’t add a navigation menu. To fix this:
This is what the resulting navigation pane might look like. One thing that might not be obvious is that the added benefit of creating your links using the SharePoint structural navigation is that links reflect security. For example, if the user doesn’t have permission to a library, the user won’t see the link. Note that there might be additional steps required, such as to enable the library content types, as with SharePoint 2010.
In a previous blog “Installing HDInsight Server for Windows”, I introduced you to the Microsoft HDInsight Server for Windows. Recall that HDInsight Server for Windows is a Windows-based Hadoop distribution that offers two main benefits for Big Data customers:
And, in previous blogs, I’ve introduced you to Hadoop. Recall that there are two main reasons for using Hadoop for storing and processing Big Data:
This blog continues the HDInsight Server for Windows journey. As many of you probably don’t have experience in Unix or Java, I’ll show you how HDInsight makes it easy to write MapReduce jobs on a Windows machine.
Note Writing MapReduce jobs can be complex. If all you need is performing some crude data analysis, you should consider an abstraction layer, such as Hive, which is capable for deriving the schema and generating the MapReduce jobs for you. This doesn’t mean that experience in MapReduce is not useful. When processing the files go beyond just imposing a schema on the data and querying the results , you might need programming logic, such as in The New York Times Archive case.
As a prerequisite, I installed HDInsight on my Windows 8 laptop. Because of its prerelease status, the CTP of HDInsight Server for Windows currently supports a single node only which is fine for development and testing. My task is to analyze the same dataset that I used in the MS BI Guy Does Hadoop (Part 2 – Taking Hadoop for a Spin) blog. The dataset (temp.txt) contains temperature readings from weather stations around the world and it represents the weather datasets kept by National Climatic Data Center (NCDC). You will find the sample dataset in the source code attached to this blog. It has the following content (the most important parts are highlighted in red: the year found in offset 15 and temperature found in offset 88).
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
The Microsoft .NET SDK for Hadoop facilitates the programming effort required to code MapReduce jobs in .NET. To install it:
This command will download the required Hadoop binaries and add them as references in your project.
The Map job is responsible for parsing the input (the weather dataset), deriving the schema from it, and generating a key-value pair for the data that we’re interested in. In our case, the key will be the year and the value will be the temperature measure for that year. The Map class derives from the MapperBase class defined in Microsoft.Hadoop.MapReduce.dll.
At runtime, HDInsight will parse the file content and invoke the Map method once for each line in the file. In our case, the Map job is simple. We parse the input and extract the temperature and year. If the parsing operation is successful, we return the key-value pair. The end result will look like this:
(1950, 0)
(1950, 22)
(1950, −11)
(1949, 111)
(1949, 78)
Suppose that we want to get the maximum temperature for each year. Because each weather station might have multiple readings (lines in the input file) for the same year, we need to combine the results and find the maximum year. This is analogous to GROUP BY in SQL. The following Reduce job gets the work done:
The Reduce job is even simpler. The Hadoop framework pre-processed the output of the Map jobs before it’s sent to the Reduce function. This processing sorts and groups the key-value pairs by key, so the input to the Reduce job will look like this:
(1949, [111, 78])
(1950, [0, 22, −11])
In our case, the only thing left for the Reduce job is to loop through the values for a given key (year) and return the maximum value, so the final output will be:
(1949, 111)
(1950, 22)
Instead of deploying to Hadoop each time you make a change during the development and testing lifecycle, you can add another project, such as a Console Application, and use it as a test harness to test the MapReduce code. For your convenience, Microsoft provides a StreamingUnit class in Microsoft.Hadoop.MapReduce.dll. Here is what our test harness code looks like:
The code uses a test input file. It reads the content of the file one line at the time and adds each line as a new element to an instance of ArrayList. Then, the code calls the StreamInsight.Execute method to initiate the MapReduce job.
Once the code is tested, it’s time to deploy the dataset and MapReduce jobs to Hadoop.
Note When you execute the hadoop command shell in the previous step, the file will be uploaded to your folder. However, if you use the JavaScript interactive console found in the HDInsight Dashboard, the file will be uploaded to the Hadoop folder in HDFS because the console runs under the hadoop user. Consequently, the MapReduce job won’t be able to find the file. So, you use the hadoop command prompt.
2. Browse the file system using the web interface (http://localhost:50070) to see that the file is in your folder.
3. Finally, we need to execute the job with HadoopJobExecutor, which be called in various ways. The easiest way is to use MRRunner
D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug>.\mrlib\mrrunner -dll FirstJob.dll
D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug>.\mrlib\mrrunner -dll FirstJob.dll
File dependencies to include with job:[Auto-detected] D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\FirstJob.dll
[Auto-detected] D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\Microsoft.Hadoop.MapReduce.dll
[Auto-detected] D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\Newtonsoft.Json.dll
>>CMD: c:\hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd jar c:\hadoop\hadoop-1.1.0-SNAPSHOT\lib\hadoop-streaming.jar -D “mapred.map.max.attempts=1” -D “mapred.reduce.max.attempts=1” -input inpu
emp -mapper ..\..\jars\Microsoft.Hadoop.MapDriver.exe -reducer ..\..\jars\Microsoft.Hadoop.ReduceDriver.exe -file D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\MRLib\Microsoft.Hadoop.MapDriver.e
p\MapReduce\FirstJob\bin\Debug\MRLib\Microsoft.Hadoop.ReduceDriver.exe -file D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\MRLib\Microsoft.Hadoop.CombineDriver.exe -file “D:\MyApp\Hadoop\MapRedu
irstJob.dll” -file “D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\Microsoft.Hadoop.MapReduce.dll” -file “D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\Newtonsoft.Json.dll” -cmdenv “MSFT_HADOOP_MA
-cmdenv “MSFT_HADOOP_MAPPER_TYPE=FirstJob.TemperatureMapper” -cmdenv “MSFT_HADOOP_REDUCER_DLL=FirstJob.dll” -cmdenv “MSFT_HADOOP_REDUCER_TYPE=FirstJob.TemperatureReducer”
packageJobJar: [D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\MRLib\Microsoft.Hadoop.MapDriver.exe, D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\MRLib\Microsoft.Hadoop.ReduceDriver.exe, D:\MyApp
Job\bin\Debug\MRLib\Microsoft.Hadoop.CombineDriver.exe, D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\FirstJob.dll, D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\Microsoft.Hadoop.MapReduce.dll, D
e\FirstJob\bin\Debug\Newtonsoft.Json.dll] [/C:/Hadoop/hadoop-1.1.0-SNAPSHOT/lib/hadoop-streaming.jar] C:\Users\Teo\AppData\Local\Temp\streamjob7017247708817804198.jar tmpDir=null
12/12/28 12:35:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
log4j:ERROR Failed to rename [C:\Hadoop\hadoop-1.1.0-SNAPSHOT\logs/hadoop.log] to [C:\Hadoop\hadoop-1.1.0-SNAPSHOT\logs/hadoop.log.2012-12-27].
12/12/28 12:35:20 WARN snappy.LoadSnappy: Snappy native library not loaded
12/12/28 12:35:20 INFO mapred.FileInputFormat: Total input paths to process : 1
12/12/28 12:35:20 INFO streaming.StreamJob: getLocalDirs(): [c:\hadoop\hdfs\mapred\local]
12/12/28 12:35:20 INFO streaming.StreamJob: Running job: job_201212271510_0010
12/12/28 12:35:20 INFO streaming.StreamJob: To kill this job, run:
12/12/28 12:35:20 INFO streaming.StreamJob: C:\Hadoop\hadoop-1.1.0-SNAPSHOT/bin/hadoop job -Dmapred.job.tracker=localhost:50300 -kill job_201212271510_0010
12/12/28 12:35:20 INFO streaming.StreamJob: Tracking URL: http://127.0.0.1:50030/jobdetails.jsp?jobid=job_201212271510_0010
12/12/28 12:35:21 INFO streaming.StreamJob: map 0% reduce 0%
12/12/28 12:35:38 INFO streaming.StreamJob: map 100% reduce 0%
12/12/28 12:35:50 INFO streaming.StreamJob: map 100% reduce 100%
12/12/28 12:35:56 INFO streaming.StreamJob: Job complete: job_201212271510_0010
12/12/28 12:35:56 INFO streaming.StreamJob: Output: output/Temp
4. Using the web interface or the JavaScript console, go to the output folder and view the part-00000 file to see the output (should match your testing results).
As another year is winding down, it’s time to review and plan ahead. 2012 was a great year for both Prologika and BI. On the business side of things, we achieved Microsoft Gold BI and Silver Data Platform competencies. We added new customers and consultants. We completed several important projects with Microsoft acknowledging two of them.
2012 was an eventful year for Microsoft BI. SQL Server 2012 was released in March. It added important BI enhancements, including Power View, PowerPivot v2, Reporting Services End-User Alerting, Analysis Services in Tabular mode, Data Quality Services, Integration Services enhancements, MDS Add-In for Excel, Reporting in the Cloud, and self-service BI for Big Data with the Excel Hive add-in. The next BI wave came with Office 2013 and added important organizational and self-service BI features, including PowerPivot Integration in Excel 2013, Power View Integration in Excel 2013, Excel updatable web reports in SharePoint, productivity enhancements (Flash Fill, Quick Explore, Quick Analysis, and so on), PerformancePoint theming support and enhanced filtering, better mobile BI support, and self-service BI in Office 365.
Microsoft added support for Big Data and Hadoop both on cloud with HDInsight Service and on-premises with the CTP release of HDInsight Server. Finally, we got the public prerelease bits of DAXMD to connect Power View to multidimensional cubes.
As we witnessed, BI is a very important part of the Microsoft data strategy. Although overwhelming in times, I hope the trend will continue in 2013 and beyond. In the spirit of the season, here is my top 5 wish list:
Most importantly, I hope to see Microsoft B having a renewed focus on customers in 2013. We should be listening more to our customers as sometimes as geeks we tend to be too much caught up in technology and we learn our lesson the hard way.
Stay happy and healthy in 2013!
Microsoft announced yesterday the availability of the Community Technology Preview (CTP) of Microsoft SQL Server 2012 With Power View for Multidimensional Models (aka DAXMD). As a participant of the CTP program and I’m very excited about this enhancement. Now customers can leverage their investment in OLAP and empower business users to author Power View ad-hoc reports and dashboards from Analysis Services cubes. Previously, Power View supported only PowerPivot workbooks or Analysis Services Tabular models as data sources. I’m not going to repeat what T.K. Anand said in the announcement. Instead, I want to emphasize a few key points:
Kudos to the SSAS and SSRS teams for listening to customers and working together on this feature!
I’ll be presenting What’s New in Excel 2013 and SharePoint 2013 BI at our Atlanta BI Group on Monday, December 3rd.
Microsoft has recently released the 2013 version of Excel and SharePoint. Both technologies include major enhancements for self-service and organizational BI. Join us to review these new features. Learn how business users can quickly analyze and understand data in Power Pivot which is now natively supported by Excel. See how Power View enables rich data visualization and having fun with data both on the desktop and server. Understand the new Excel and SharePoint features for organizational BI that opens new opportunities for analyzing OLAP and Tabular models.