<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Big Data &#8211; Prologika</title>
	<atom:link href="https://prologika.com/tag/big-data/feed/" rel="self" type="application/rss+xml" />
	<link>https://prologika.com</link>
	<description>Business Intelligence Consulting and Training in Atlanta</description>
	<lastBuildDate>Tue, 16 Feb 2021 09:59:58 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>Microsoft Acquires Metanautix</title>
		<link>https://prologika.com/microsoft-acquires-metanautix/</link>
					<comments>https://prologika.com/microsoft-acquires-metanautix/#respond</comments>
		
		<dc:creator><![CDATA[Prologika - Teo Lachev]]></dc:creator>
		<pubDate>Fri, 01 Jan 2016 22:52:00 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Big Data]]></category>
		<guid isPermaLink="false">/CS/blogs/blog/archive/2016/01/01/microsoft-acquires-metanautix.aspx</guid>

					<description><![CDATA[If you&#8217;ve missed the announcement from a couple of weeks ago, Microsoft acquired Metanautix &#8211; a startup founded by ex-Google engineers who worked on BigQuery (aka Dremel). Technical details are [&#8230;]]]></description>
										<content:encoded><![CDATA[<p>If you&#8217;ve missed the <a href="http://blogs.microsoft.com/blog/2015/12/18/microsoft-acquires-metanautix-to-help-customers-connect-data-for-business-insights/">announcement</a> from a couple of weeks ago, Microsoft acquired Metanautix &ndash; a startup founded by ex-Google engineers who worked on BigQuery (aka Dremel). Technical details are scarce at point. In fact, the Metanautix website doesn&#8217;t exist anymore but there are YouTube videos and slides, such as this <a href="https://www.youtube.com/watch?v=QZy1NuIZGto">one</a>. A while back, I wrote about <a href="https://prologika.com/prologika-newsletter-spring-2015/">logical data warehouses</a>, which come in different shapes and names, such as software-defined data marts, distributed data, and, what I call, brute-force queries, such as <a href="https://www.youtube.com/watch?v=Tj0gW4XI6vU">Amazon QuickSight</a>. It looks like that with this acquisition, Microsoft is hoping to make a step in this direction, especially when it comes to Big Data analysis. </p>
<p>From I was able to gather online to connect the pieces, Metanautix Quest uses a SQL-like language to define tables that point to wherever the data resides, such as in HDFS, flat files, or RDBMS. The syntax to define a table might like this: </p>
<p><span style="font-family:Courier New; font-size:10pt">DEFINE TABLE t AS /path/to/data/* </span></p>
<p><span style="font-family:Courier New; font-size:10pt">SELECT TOP(signal1, 100), COUNT(*) FROM t </span></p>
<p>I believe that the original Google implementation would leave the data on the Google File System (GFS). However, it looks like Metanautix always brings the data into an in-memory columnar store, similar to how Tabular stores the data. When the user sends a query (the query could relate data from multiple stores), a multi-level serving tree algorithm is used to parallelize the query and fetch the data with distributed joins, as described in more details in the &#8220;Dremel: Interactive Analysis of WebScale Datasets&#8221; whitepaper by Google. According to the whitepaper, this query execution pattern outperforms by far MapReduce queries. </p>
<p>While I was reading about Metanautix, I couldn&#8217;t help but ask myself &#8220;how is it different than Tabular if it brings the data in?&#8221; Yet, from the announcement: </p>
<p><em>&#8220;With Metanautix technology, IT teams can connect a diversity of their company&#8217;s information across private and public clouds, <strong>without having</strong> to go through the costly and complex process of moving data into a centralized system.&#8221; </em></p>
<p>It might that Metanautix is more scalable when it comes to Big Data although I don&#8217;t see how this could happen if the data is not in situ. We shall see as details start coming in. &#8220;In the coming months, we will have more to share about how we will bring Metanautix technology into the Microsoft data platform, including SQL Server and the Cortana Analytics Suite&#8221; One thing is for sure: as with logical data warehouses, Metanautix won&#8217;t solve your data integration challenges and it&#8217;s not a replacement for DW. From what I can tell, it could help with ad hoc analysis across distributed datasets without having to build analytical models, with all the pros and cons surrounding it. </p>
<p><img decoding="async" src="/wp-content/uploads/2016/02/010116_2251_MicrosoftAc1.png" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://prologika.com/microsoft-acquires-metanautix/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Programming MapReduce Jobs with HDInsight Server for Windows</title>
		<link>https://prologika.com/programming-mapreduce-jobs-with-hdinsight-server-for-windows/</link>
					<comments>https://prologika.com/programming-mapreduce-jobs-with-hdinsight-server-for-windows/#comments</comments>
		
		<dc:creator><![CDATA[Prologika - Teo Lachev]]></dc:creator>
		<pubDate>Fri, 28 Dec 2012 20:18:00 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<guid isPermaLink="false">/CS/blogs/blog/archive/2012/12/28/programming-mapreduce-jobs-with-hdinsight-server-for-windows.aspx</guid>

					<description><![CDATA[In a previous blog &#8220;Installing HDInsight Server for Windows&#8221;, I introduced you to the Microsoft HDInsight Server for Windows. Recall that HDInsight Server for Windows is a Windows-based Hadoop distribution [&#8230;]]]></description>
										<content:encoded><![CDATA[<p>In a previous <a href="/CS/blogs/blog/archive/2012/10/31/installing-hdinsight-server-for-windows.aspx">blog</a> &#8220;Installing HDInsight Server for Windows&#8221;, I introduced you to the Microsoft HDInsight Server for Windows. Recall that HDInsight Server for Windows is a Windows-based Hadoop distribution that offers two main benefits for Big Data customers:</p>
<ul>
<li>An officially supported Hadoop distribution on Windows server – Previously, you can set up Hadoop on Windows as an unsupported installation (via Cygwin) for development purposes. What this means for you is that you can now set up a Hadoop cluster on servers running Windows Server OS.</li>
<li>Extends the reach of the Hadoop ecosystem to .NET developers by allowing them to write MapReduce jobs in .NET code, such as C#.</li>
</ul>
<p>And, in previous <a href="#">blogs</a>, I&#8217;ve introduced you to Hadoop. Recall that there are two main reasons for using Hadoop for storing and processing Big Data:</p>
<ul>
<li>Storage – You can store massive files in a distributed and fault-tolerant file system (HDFS) without worrying that hardware failure will result in a loss of data.</li>
<li>Distributed processing – When you outgrows the limitations of a single server, you can distribute job processing across the nodes in a Hadoop cluster. This allows you to perform crude data analysis directly on files stored in HDFS or execute any other type of jobs that can benefit from a parallel execution.</li>
</ul>
<p>This blog continues the HDInsight Server for Windows journey. As many of you probably don&#8217;t have experience in Unix or Java, I&#8217;ll show you how HDInsight makes it easy to write MapReduce jobs on a Windows machine.</p>
<p style="background: #f2f2f2;"><strong>Note</strong> Writing MapReduce jobs can be complex. If all you need is performing some crude data analysis, you should consider an abstraction layer, such as <a href="/CS/blogs/blog/archive/2012/06/24/ms-guy-does-hadoop-part-3-hive.aspx">Hive</a>, which is capable for deriving the schema and generating the MapReduce jobs for you. This doesn&#8217;t mean that experience in MapReduce is not useful. When processing the files go beyond just imposing a schema on the data and querying the results , you might need programming logic, such as in <a href="http://open.blogs.nytimes.com/2008/05/21/the-new-york-times-archives-amazon-web-services-timesmachine/">The New York Times Archive</a> case.</p>
<p>As a prerequisite, I installed HDInsight on my Windows 8 laptop. Because of its prerelease status, the CTP of HDInsight Server for Windows currently supports a single node only which is fine for development and testing. My task is to analyze the same dataset that I used in the MS BI Guy Does Hadoop (Part 2 – Taking Hadoop for a Spin) <a href="/CS/blogs/blog/archive/2012/06/09/ms-bi-guy-does-hadoop-part-2-taking-hadoop-for-a-spin.aspx">blog</a>. The dataset (temp.txt) contains temperature readings from weather stations around the world and it represents the weather datasets kept by <a href="http://www.ncdc.noaa.gov/oa/ncdc.html">National Climatic Data Center (NCDC)</a>. You will find the sample dataset in the <a href="#" target="_blank">source code</a> attached to this blog. It has the following content (the most important parts are highlighted in red: the year found in offset 15 and temperature found in offset 88).</p>
<p><span style="font-size: 9pt;">006701199099999<span style="color: red;"><strong>1950</strong></span>051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+0<span style="color: red;">0<strong>00</strong></span>1+99999999999 </span></p>
<p><span style="font-size: 9pt;">004301199099999<span style="color: red;"><strong>1950</strong></span>051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+0<span style="color: red;">0<strong>22</strong></span>1+99999999999 </span></p>
<p><span style="font-size: 9pt;">004301199099999<span style="color: red;"><strong>1950</strong></span>051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-0<span style="color: red;">0<strong>11</strong></span>1+99999999999 </span></p>
<p><span style="font-size: 9pt;">004301265099999<span style="color: red;"><strong>1949</strong></span>032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+0<span style="color: red;">1<strong>11</strong></span>1+99999999999 </span></p>
<p><span style="font-size: 9pt;">004301265099999<span style="color: red;"><strong>1949</strong></span>032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+0<span style="color: red;">0<strong>78</strong></span>1+99999999999 </span></p>
<h1><span style="font-size: 11pt;">Note that the data is stored in its raw format and no schema was imposed on the data. The schema will be derived at runtime by parsing the file content. </span></h1>
<h1>Installing Microsoft .NET SDK for Hadoop</h1>
<p>The <a href="http://hadoopsdk.codeplex.com/">Microsoft .NET SDK for Hadoop</a> facilitates the programming effort required to code MapReduce jobs in .NET. To install it:</p>
<ol>
<li>Install <a href="http://docs.nuget.org/docs/start-here/installing-nuget">NuGet</a> first. NuGet is a Visual Studio extension that makes it easy to add, remove, and update libraries and tools in Visual Studio projects that use the .NET Framework.</li>
<li>Open Visual Studio (2010 or 2012) and create a new C# Class Library project.</li>
<li>Go to Tools <span style="font-family: Wingdings;">ð</span> Library Package Manager <span style="font-family: Wingdings;">ð</span> Package Manager Console.</li>
<li>
<div>In the Package Manager Console window that opens in the bottom of the screen, enter:<br />
<span style="color: #253340; font-family: Consolas; font-size: 9pt;">install-package Microsoft.Hadoop.MapReduce –pre</span></div>
<p>This command will download the required Hadoop binaries and add them as references in your project.</li>
</ol>
<h1>Coding the Map Job</h1>
<p>The Map job is responsible for parsing the input (the weather dataset), deriving the schema from it, and generating a key-value pair for the data that we&#8217;re interested in. In our case, the key will be the year and the value will be the temperature measure for that year. The Map class derives from the MapperBase class defined in Microsoft.Hadoop.MapReduce.dll.</p>
<p><img fetchpriority="high" decoding="async" class="alignnone wp-image-2135 size-full" src="/wp-content/uploads/2012/12/122812_2018_Programming1.png" alt="122812_2018_Programming1" width="505" height="393" srcset="https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming1.png 505w, https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming1-450x350.png 450w, https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming1-300x233.png 300w" sizes="(max-width: 505px) 100vw, 505px" /></p>
<p>At runtime, HDInsight will parse the file content and invoke the Map method once for each line in the file. In our case, the Map job is simple. We parse the input and extract the temperature and year. If the parsing operation is successful, we return the key-value pair. The end result will look like this:</p>
<p><span style="color: #253340; font-family: Consolas; font-size: 9pt;">(1950, 0) </span></p>
<p><span style="color: #253340; font-family: Consolas; font-size: 9pt;">(1950, 22) </span></p>
<p><span style="color: #253340; font-size: 9pt;"><span style="font-family: Consolas;">(1950, </span><span style="font-family: Times New Roman;">−</span><span style="font-family: Consolas;">11) </span></span></p>
<p><span style="color: #253340; font-family: Consolas; font-size: 9pt;">(1949, 111) </span></p>
<p><span style="color: #253340; font-family: Consolas; font-size: 9pt;">(1949, 78) </span></p>
<h1>Coding the Reduce Job</h1>
<p>Suppose that we want to get the maximum temperature for each year. Because each weather station might have multiple readings (lines in the input file) for the same year, we need to combine the results and find the maximum year. This is analogous to GROUP BY in SQL. The following Reduce job gets the work done:</p>
<p><img decoding="async" class="alignnone wp-image-2136 size-full" src="/wp-content/uploads/2012/12/122812_2018_Programming2.png" alt="122812_2018_Programming2" width="555" height="250" srcset="https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming2.png 555w, https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming2-450x203.png 450w, https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming2-300x135.png 300w" sizes="(max-width: 555px) 100vw, 555px" /></p>
<p>The Reduce job is even simpler. The Hadoop framework pre-processed the output of the Map jobs before it&#8217;s sent to the Reduce function. This processing sorts and groups the key-value pairs by key, so the input to the Reduce job will look like this:</p>
<p><span style="color: #253340; font-family: Consolas; font-size: 9pt;">(1949, [111, 78]) </span></p>
<p><span style="color: #253340; font-family: Consolas; font-size: 9pt;">(1950, [0, 22, −11]) </span></p>
<p>In our case, the only thing left for the Reduce job is to loop through the values for a given key (year) and return the maximum value, so the final output will be:</p>
<p><span style="color: #253340; font-family: Consolas; font-size: 9pt;">(1949, 111) </span></p>
<p><span style="color: #253340; font-family: Consolas; font-size: 9pt;">(1950, 22) </span></p>
<h1>Testing MapReduce</h1>
<p>Instead of deploying to Hadoop each time you make a change during the development and testing lifecycle, you can add another project, such as a Console Application, and use it as a test harness to test the MapReduce code. For your convenience, Microsoft provides a StreamingUnit class in Microsoft.Hadoop.MapReduce.dll. Here is what our test harness code looks like:</p>
<p><img decoding="async" class="alignnone wp-image-2138 size-full" src="/wp-content/uploads/2012/12/122812_2018_Programming3.png" alt="122812_2018_Programming3" width="499" height="377" srcset="https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming3.png 499w, https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming3-450x340.png 450w, https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming3-300x227.png 300w" sizes="(max-width: 499px) 100vw, 499px" /></p>
<p>The code uses a test input file. It reads the content of the file one line at the time and adds each line as a new element to an instance of ArrayList. Then, the code calls the StreamInsight.Execute method to initiate the MapReduce job.</p>
<h1>Deploying to Hadoop</h1>
<p>Once the code is tested, it&#8217;s time to deploy the dataset and MapReduce jobs to Hadoop.</p>
<ol>
<li>Deploy the file to the Hadoop HDFS file system.<br />
<span style="font-family: Courier New;">C:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin&gt;hadoop fs -copyFromLocal D:\MyApp\Hadoop\MapReduce\temp.txt input/Temp/input.txt</span></li>
</ol>
<p style="background: #f2f2f2;"><strong>Note</strong> When you execute the hadoop command shell in the previous step, the file will be uploaded to your folder. However, if you use the JavaScript interactive console found in the HDInsight Dashboard, the file will be uploaded to the Hadoop folder in HDFS because the console runs under the hadoop user. Consequently, the MapReduce job won&#8217;t be able to find the file. So, you use the hadoop command prompt.</p>
<p>      2.   Browse the file system using the web interface (<a href="http://localhost:50070">http://localhost:50070</a>) to see that the file is in your folder.</p>
<p><img loading="lazy" decoding="async" class="alignnone wp-image-2140 size-full" src="/wp-content/uploads/2012/12/122812_2018_Programming4.png" alt="122812_2018_Programming4" width="505" height="157" srcset="https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming4.png 505w, https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming4-450x140.png 450w, https://prologika.com/wp-content/uploads/2012/12/122812_2018_Programming4-300x93.png 300w" sizes="auto, (max-width: 505px) 100vw, 505px" /></p>
<p>3.     Finally, we need to execute the job with HadoopJobExecutor, which be called in various ways. The easiest way is to use MRRunner<br />
<span style="font-family: Courier New;">D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug&gt;.\mrlib\mrrunner -dll FirstJob.dll</span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug&gt;.\mrlib\mrrunner -dll FirstJob.dll </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">File dependencies to include with job:[Auto-detected] D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\FirstJob.dll </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">[Auto-detected] D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\Microsoft.Hadoop.MapReduce.dll </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">[Auto-detected] D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\Newtonsoft.Json.dll </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">&gt;&gt;CMD: c:\hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd jar c:\hadoop\hadoop-1.1.0-SNAPSHOT\lib\hadoop-streaming.jar -D &#8220;mapred.map.max.attempts=1&#8221; -D &#8220;mapred.reduce.max.attempts=1&#8221; -input inpu </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">emp -mapper ..\..\jars\Microsoft.Hadoop.MapDriver.exe -reducer ..\..\jars\Microsoft.Hadoop.ReduceDriver.exe -file D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\MRLib\Microsoft.Hadoop.MapDriver.e </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">p\MapReduce\FirstJob\bin\Debug\MRLib\Microsoft.Hadoop.ReduceDriver.exe -file D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\MRLib\Microsoft.Hadoop.CombineDriver.exe -file &#8220;D:\MyApp\Hadoop\MapRedu </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">irstJob.dll&#8221; -file &#8220;D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\Microsoft.Hadoop.MapReduce.dll&#8221; -file &#8220;D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\Newtonsoft.Json.dll&#8221; -cmdenv &#8220;MSFT_HADOOP_MA </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">-cmdenv &#8220;MSFT_HADOOP_MAPPER_TYPE=FirstJob.TemperatureMapper&#8221; -cmdenv &#8220;MSFT_HADOOP_REDUCER_DLL=FirstJob.dll&#8221; -cmdenv &#8220;MSFT_HADOOP_REDUCER_TYPE=FirstJob.TemperatureReducer&#8221; </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">packageJobJar: [D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\MRLib\Microsoft.Hadoop.MapDriver.exe, D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\MRLib\Microsoft.Hadoop.ReduceDriver.exe, D:\MyApp </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">Job\bin\Debug\MRLib\Microsoft.Hadoop.CombineDriver.exe, D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\FirstJob.dll, D:\MyApp\Hadoop\MapReduce\FirstJob\bin\Debug\Microsoft.Hadoop.MapReduce.dll, D </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">e\FirstJob\bin\Debug\Newtonsoft.Json.dll] [/C:/Hadoop/hadoop-1.1.0-SNAPSHOT/lib/hadoop-streaming.jar] C:\Users\Teo\AppData\Local\Temp\streamjob7017247708817804198.jar tmpDir=null </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform&#8230; using builtin-java classes where applicable </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">log4j:ERROR Failed to rename [C:\Hadoop\hadoop-1.1.0-SNAPSHOT\logs/hadoop.log] to [C:\Hadoop\hadoop-1.1.0-SNAPSHOT\logs/hadoop.log.2012-12-27]. </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:20 WARN snappy.LoadSnappy: Snappy native library not loaded </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:20 INFO mapred.FileInputFormat: Total input paths to process : 1 </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:20 INFO streaming.StreamJob: getLocalDirs(): [c:\hadoop\hdfs\mapred\local] </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:20 INFO streaming.StreamJob: Running job: job_201212271510_0010 </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:20 INFO streaming.StreamJob: To kill this job, run: </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:20 INFO streaming.StreamJob: C:\Hadoop\hadoop-1.1.0-SNAPSHOT/bin/hadoop job -Dmapred.job.tracker=localhost:50300 -kill job_201212271510_0010 </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:20 INFO streaming.StreamJob: Tracking URL: http://127.0.0.1:50030/jobdetails.jsp?jobid=job_201212271510_0010 </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:21 INFO streaming.StreamJob: map 0% reduce 0% </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:38 INFO streaming.StreamJob: map 100% reduce 0% </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:50 INFO streaming.StreamJob: map 100% reduce 100% </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:56 INFO streaming.StreamJob: Job complete: job_201212271510_0010 </span></p>
<p><span style="font-family: Courier New; font-size: 8pt;">12/12/28 12:35:56 INFO streaming.StreamJob: Output: output/Temp </span></p>
<p>4.   Using the web interface or the JavaScript console, go to the output folder and view the part-00000 file to see the output (should match your testing results).</p>
<p><img loading="lazy" decoding="async" class="alignnone wp-image-2141 size-full" src="/wp-content/uploads/2012/12/122812_2018_Programming5.png" alt="122812_2018_Programming5" width="268" height="173" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://prologika.com/programming-mapreduce-jobs-with-hdinsight-server-for-windows/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title>Installing HDInsight Server for Windows</title>
		<link>https://prologika.com/installing-hdinsight-server-for-windows/</link>
					<comments>https://prologika.com/installing-hdinsight-server-for-windows/#respond</comments>
		
		<dc:creator><![CDATA[Prologika - Teo Lachev]]></dc:creator>
		<pubDate>Thu, 01 Nov 2012 02:10:00 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<guid isPermaLink="false">/CS/blogs/blog/archive/2012/10/31/installing-hdinsight-server-for-windows.aspx</guid>

					<description><![CDATA[As you&#8217;ve probably heard the news, Microsoft rebranded their Big Data offerings as HDInsight that currently encompasses two key services: Windows Azure HDInsight Service (formerly known as Hadoop-based Services on [&#8230;]]]></description>
										<content:encoded><![CDATA[<p>As you&#8217;ve probably heard the news, Microsoft rebranded their Big Data offerings as HDInsight that currently encompasses two key services:</p>
<ul>
<li>Windows Azure HDInsight Service (formerly known as Hadoop-based Services on Windows Azure) – This is a cloud-based Hadoop distribution hosted on Windows Azure.</li>
<li>
<div><a href="http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT-PREVIEW">Microsoft HDInsight Server</a> for Windows – A Windows-based Hadoop distribution that offers two main benefits for Big Data customers:</div>
<ul>
<li>An officially supported Hadoop distribution on Windows server – Previously, you can set up Hadoop on Windows as an unsupported installation (via Cygwin) for development purposes. What this means for you is that you can now set up a Hadoop cluster on servers running Windows Server OS.</li>
<li>Extends the reach of the Hadoop ecosystem to .NET developers and allows them to write MapReduce jobs in .NET code, such as C#.</li>
</ul>
</li>
</ul>
<p>Both services are available as preview offerings and changes are expected as they evolve. The Installing the Developer Preview of Apache Hadoop-based services on Windows <a href="http://social.technet.microsoft.com/wiki/contents/articles/14141.installing-the-developer-preview-of-apachetm-hadooptm-based-services-on-windows.aspx">article</a> covers the setup steps pretty well. I decided to set up HDInsight Server for Windows by installing the <a href="http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT-PREVIEW">Microsoft Web Platform Installer</a> on my Windows 8 laptop.</p>
<p style="background: #d0cece;"><strong>Note</strong> Initially, I planned to install HDInsight Server for Windows on a VM running Windows Server 2012 Standard Edition. Although the installer completed successfully, it failed to create the sites and shortcuts to the dashboards (Hadoop Name Node, Dashboard, and MapRaduce). This was probably caused by the fact that server was configured as a domain controller. There is an ongoing discussion about this issue on the Microsoft HDInsight <a href="http://social.msdn.microsoft.com/Forums/en-US/hdinsight/thread/a0a25c89-2d28-4f52-83e2-5161211f7d28">forum</a>.</p>
<p>The Windows 8 setup failed to create the shortcut to the dashboard. However, the following steps fixed the issue:</p>
<p>1. Open up an Administrator PowerShell prompt and elevate the execution policy of the PowerShell to accept scripts.</p>
<p><span style="font-family: Courier New; font-size: 10pt;">PS:&gt; Set-ExecutionPolicy RemoteSigned </span></p>
<p>2. Navigate to the C:\HadoopFeaturePackSetup\HadoopFeaturePackSetupTools folder:</p>
<p><span style="color: black; font-family: Courier New; font-size: 10pt;">cd C<span style="color: #666666;">:<span style="color: black;">\HadoopFeaturePackSetup\HadoopFeaturePackSetupTools </span></span></span></p>
<ul>
<li>Install HadoopWebApi</li>
</ul>
<p><span style="color: black; font-family: Courier New; font-size: 10pt;">.\winpkg.ps1 ..\Packages\HadoopWebApi-winpkg.zip install -CredentialFilePath c:\Hadoop\Singlenodecreds.xml </span></p>
<ul>
<li>Install the dashboard</li>
</ul>
<p><span style="color: black; font-family: Courier New;">.\winpkg.ps1 ..\Packages\HadoopDashboard-winpkg.zip install -CredentialFilePath c:\Hadoop\Singlenodecreds.xml </span></p>
<p>This should create the shortcuts on the desktop and you should be able to navigate to http://localhost:8085 to access the dashboard.</p>
<p><img loading="lazy" decoding="async" class="alignnone wp-image-2160 size-full" src="/wp-content/uploads/2012/11/110112_0205_InstallingH1.png" alt="110112_0205_InstallingH1" width="397" height="341" srcset="https://prologika.com/wp-content/uploads/2012/11/110112_0205_InstallingH1.png 397w, https://prologika.com/wp-content/uploads/2012/11/110112_0205_InstallingH1-300x258.png 300w" sizes="auto, (max-width: 397px) 100vw, 397px" /></p>
<p>From here, you can open the Interactive Console and your experience should be the same as Windows Azure HDInsight Service. David Zhang has a <a href="http://www.youtube.com/watch?v=alPMYcomUEs">great coverage</a> of how you can use the Interactive Console in his video presentation &#8220;Introduction to the Hadoop on Azure Interactive JavaScript Console&#8221;.</p>
<p>BTW, HDInsight Server installs a set of Windows services corresponding to the UNIX daemons when Hadoop is installed on UNIX.</p>
<p><img loading="lazy" decoding="async" class="alignnone wp-image-2161 size-full" src="/wp-content/uploads/2012/11/110112_0205_InstallingH2.png" alt="110112_0205_InstallingH2" width="250" height="193" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://prologika.com/installing-hdinsight-server-for-windows/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Prologika Newsletter Fall 2012</title>
		<link>https://prologika.com/prologika-newsletter-fall-2012/</link>
					<comments>https://prologika.com/prologika-newsletter-fall-2012/#respond</comments>
		
		<dc:creator><![CDATA[Prologika - Teo Lachev]]></dc:creator>
		<pubDate>Fri, 12 Oct 2012 12:35:21 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Newsletter]]></category>
		<guid isPermaLink="false">/CS/blogs/blog/archive/2012/10/12/prologika-newsletter-fall-2012.aspx</guid>

					<description><![CDATA[I started a newsletter and I plan to update it on a quarterly basis. The topic of the first issue is Big Data. If you like the newsletter, you&#8217;re welcome [&#8230;]]]></description>
										<content:encoded><![CDATA[<p>I started a <a href="/community/newsletter-2/">newsletter</a> and I plan to update it on a quarterly basis. The topic of the first issue is Big Data. If you like the newsletter, you&#8217;re welcome to subscribe and forward.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://prologika.com/prologika-newsletter-fall-2012/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
