Installing HDInsight Server for Windows

As you’ve probably heard the news, Microsoft rebranded their Big Data offerings as HDInsight that currently encompasses two key services:

  • Windows Azure HDInsight Service (formerly known as Hadoop-based Services on Windows Azure) – This is a cloud-based Hadoop distribution hosted on Windows Azure.
  • Microsoft HDInsight Server for Windows – A Windows-based Hadoop distribution that offers two main benefits for Big Data customers:
    • An officially supported Hadoop distribution on Windows server – Previously, you can set up Hadoop on Windows as an unsupported installation (via Cygwin) for development purposes. What this means for you is that you can now set up a Hadoop cluster on servers running Windows Server OS.
    • Extends the reach of the Hadoop ecosystem to .NET developers and allows them to write MapReduce jobs in .NET code, such as C#.

Both services are available as preview offerings and changes are expected as they evolve. The Installing the Developer Preview of Apache Hadoop-based services on Windows article covers the setup steps pretty well. I decided to set up HDInsight Server for Windows by installing the Microsoft Web Platform Installer on my Windows 8 laptop.

Note Initially, I planned to install HDInsight Server for Windows on a VM running Windows Server 2012 Standard Edition. Although the installer completed successfully, it failed to create the sites and shortcuts to the dashboards (Hadoop Name Node, Dashboard, and MapRaduce). This was probably caused by the fact that server was configured as a domain controller. There is an ongoing discussion about this issue on the Microsoft HDInsight forum.

The Windows 8 setup failed to create the shortcut to the dashboard. However, the following steps fixed the issue:

1. Open up an Administrator PowerShell prompt and elevate the execution policy of the PowerShell to accept scripts.

PS:> Set-ExecutionPolicy RemoteSigned

2. Navigate to the C:\HadoopFeaturePackSetup\HadoopFeaturePackSetupTools folder:

cd C:\HadoopFeaturePackSetup\HadoopFeaturePackSetupTools

  • Install HadoopWebApi

.\winpkg.ps1 ..\Packages\HadoopWebApi-winpkg.zip install -CredentialFilePath c:\Hadoop\Singlenodecreds.xml

  • Install the dashboard

.\winpkg.ps1 ..\Packages\HadoopDashboard-winpkg.zip install -CredentialFilePath c:\Hadoop\Singlenodecreds.xml

This should create the shortcuts on the desktop and you should be able to navigate to http://localhost:8085 to access the dashboard.

110112_0205_InstallingH1

From here, you can open the Interactive Console and your experience should be the same as Windows Azure HDInsight Service. David Zhang has a great coverage of how you can use the Interactive Console in his video presentation “Introduction to the Hadoop on Azure Interactive JavaScript Console”.

BTW, HDInsight Server installs a set of Windows services corresponding to the UNIX daemons when Hadoop is installed on UNIX.

110112_0205_InstallingH2

Hadoop and Big Data Tonight with Atlanta BI Group

Atlanta BI Group is meeting tonight. The Topic is Hadoop and Big Data by Ketan Dave and our sponsor is Enterprise Software Solutions.

With wide acceptance of open source technologies , Hadoop/Map Reduce has become a viable option when it comes implementing the 100 of Terabytes to Petabytes of Data solutions. Scalability, Reliability , Versatility and Cost benefits of Hadoop based system is replacing traditional approach of data solutions. Microsoft has partnered with Hadoop vendors, have recently made announcements to make data on Hadoop accessible by Excel, easily linked to SQL Server and its business intelligence, analytical and reporting tools for business intelligence and managed through Active Directory.

I hope you can make it!

SharePoint 2013 and SQL Server 2012

As I mentioned before, Microsoft released SharePoint 2013 and Office 2013 and the bits are now available on MSDN Subscriber Downloads. I am sure you are eager to try the new BI features. One thing that you need to be aware of though is that you need SQL Server 2012 Service Pack 1 in order to integrate the BI features (PowerPivot for SharePoint and SSRS) with SharePoint 2013. If you run the RTM version of SQL Server 2012 setup, you won’t get too far because it will fail the installation rule that SharePoint 2010 is required. That’s because the setup doesn’t know anything about SharePoint 2013 and the latest release includes major architectural changes.

Then the logical question is where is SQL Server 2012 SP1 now that is a prerequisite for SharePoint 2013 BI? As far as I know there isn’t a confirmed ship date yet but it should arrive soon. I’d suspect Microsoft to announce it at PASS.

Business Intelligence on Surface

Microsoft has started shipping the cool Surface RT tablets this week with Surface Pro to follow in January. Naturally, a BI person would attempt BI in Excel only to find that BI and Power View (no Silverlight support) are not there as Kasper explains. It’s important to know that the RT version of Surface is limited to native Windows 8 applications only, it comes preinstalled with Office 2013, and you can’t install non-Windows 8 applications. If you are interested in BI or running non-Windows 8 apps, you need the Pro version. This means that you and I need to wait until January next year so no Surface for Christmas.

Windows 8 Hyper-V Support

I installed Windows 8 shortly it was released and I’ve been running it for almost two months now. I’m still in a learning mode to teach old dog new tricks. I have a few gripes here and there, such as the need for two screens (start screen and desktop), the absence of the start button and its search, the quest for dumbing down the OS, and the fact that my Quickbooks PDF converter doesn’t work anymore.

However, I do like the performance boost (Windows 8 really flies on a solid state disk). I also discover gems here and there and one of them is the Hyper-V support. For those of you who need to run virtual machines, such as to test prerelease bits or run other guest operating systems, you know that in the pre-Windows 8 era you had to rely on either VirtualBox or VMWare because Virtual PC wasn’t capable of running 64-bit guest Windows (required for SharePoint for example). Now, Hyper-V is built in Windows 8 and installing it is a snap. Steven Sinofsky wrote a nice coverage of Windows 8 Hyper-V in his blog “Bringing Hyper-V to Windows 8” which includes also a short video of how to install it. Trust me, you’ll really appreciate how simple the install process is.

Anticipating the forthcoming release of SharePoint 2013, today I stood up a Windows Server 2012 VM using Hyper-V on a Windows 8 host and I loved the experience. There is no need to install extensions to enable keyboard and mouse integration, or to go through bizarre steps to resize the virtual disk in order to free up more space. For example, expanding the disk to free up more space is a simple two-step procedure:

  1. Use the Hyper-V manager to edit and expand the virtual disk.
  2. Start the VM. Assuming a Windows guest OS, use the Disk Manager to extend the volume to include the newly allocated space.

Tip If you plan to stood up a VM for BI testing, plan for at least 40 GB virtual disk to install Windows Server 2012, SQL Server 2012, and SharePoint. This is where a second disk (I highly recommend a solid state disk) comes handy.

Office 2013 and SharePoint 2013 Go Live

I have to admit this came as a surprise to me as I expected them to go live next year. Microsoft just announced that Office 2013 and SharePoint 2013 went live. Time for celebrating, learning, and adopting these great technologies!

Check my coverage about what’s new in Office and SharePoint 2013 BI.

Prologika Newsletter Fall 2012

I started a newsletter and I plan to update it on a quarterly basis. The topic of the first issue is Big Data. If you like the newsletter, you’re welcome to subscribe and forward.

Your Data Warehouse in the Cloud?

I spoke with a customer today that has implemented Salesforce.com. For those who are not familiar with Salesforce, it’s a popular cloud-based, customer relationship management (CRM) product. As it usually happens, just when you’re done with the Salesforce implementation, you’re immediately faced with the challenge of consolidated reporting. It won’t be long before the Vice President of Sales asks you to integrate sales data residing in Salesforce.com and your on-premise data sources. In this case, the customer went to Dreamforce (the Salesforce premier conference) in search for integration options and was advised to solve the report consolidation issue by … migrating their multi-terabyte data warehouse to Salesforce.com!

I’m sure that this approach makes perfect sense to Salesforce.com, but it’s hardly in the customer’s best interest. First, although Salesforce is extensible and you can add custom objects (tables), Salesforce.com is not designed to host relational databases. As far as I know, it doesn’t have ETL tools, an analytical layer, and comprehensive reporting capabilities. Second, even with the enormous recent strides in cloud computing and ever decreasing storage prices, it’s hard to imagine anyone moving a data warehouse to the cloud. It’s just cost-prohibitive to do so. Third, there are data logistics challenges to populate a cloud-based data warehouse, such as to upload gigabytes of data from on-premises databases to the cloud over Internet.

At Prologika, we advise our customers to keep data where it belongs: operational data in the on-premise data warehouse and sales data in the Salesforce.com cloud. By design, the data warehouse is an enterprise repository for storing and consolidating data from operational data sources. We design Integration Services packages to retrieve data by integrating with the Salesforce.com web service and importing the data into the data warehouse. This opens all kinds of interesting data analytical possibilities, such as implementing forecasting reports that combine actual and opportunity revenue.

When the customer requires reports that source data from the data warehouse, we implement a web service end point residing on the customer’s premises that allows Salesforce.com to pull data from the data warehouse and cubes. Or, if it’s acceptable for the sales people to be logged to the customer’s network, we extend Salesforce to pass parameters to operational reports hosted on premise, such in SharePoint. This bi-directional integration allows our customers to keep data where it belongs but allows each system to obtain data from the other system.

Sometimes, it’s better to keep your head and data out of the cloud no matter how modern and exciting this might sound.

101112_0155_YourDataWar1

MVP for Eight Years!

Just got the news that my MVP (SQL Server) award got extended for another year! This is my eight renewal.

Fix for “Wide” MDX Query Performance Degradation

We ran into a situation where a wide MDX query requesting many regular and semi-additive measures side by side can degrade the Analysis Services server performance. As I explained in this connect bug report, the customer had a Reporting Services report that includes many measures (close to a hundred). Users insisted on having this report to be able to export all results in Excel and pivot on them. Of course, one of the advantages of having a cube is to make this exact requirement unnecessary but old habits die hard. We’ve noticed that when the report includes 75 measures from a single measure group, the query executes within 20 seconds. However, when adding more measures from the same measure group, the query performance degrades to minutes. The measures are either regular (persisted) measures or simple calculated member wrappers and don’t use any intensive formulas or scope assignments. The important point is that the query requests both additive and semi-additive measures (in our case the semi-additive measures use the LastChild function).

Microsoft determined that the performance degradation occurs when the number of overlapping calculations for a part of the cube space exceeds a threshold causing the server to switch from block mode to cell by cell evaluation for the affected query space. Microsoft was quick to provide us with a private fix that addresses the issue by distinguishing when the overlapping calculations are due to semi-additive measures vs. scope assignments in the cube script or query scope calculations (the reason the original limit was present). It will also allow the administrator to configure the threshold by adjusting the MaxCoverItemOverlapCount setting in msmdsrv.ini file although raising this number too high could result in negative performance on some cases where the cell by cell evaluation of highly complex calculations is faster than the block evaluation method. The official fix will be included in CU4 for SQL Server 2012 expected in mid-October 2012. I’m not sure about if the fix will make it to the older versions of SQL Server.