Big Data Twitter Demo
So, you have (or heard of) classic BI, self-service BI, Big Data BI, descriptive BI, predictive BI or even prescriptive BI, but do you have real-time BI? I’ve been doing quite bit of research and work in that area lately. As you could imagine, real-time BI requires a different architecture that is capable of processing streams of data (sometimes thousands of events) in real time. The Microsoft premium technology for Complex Event Processing (CEP) is StreamInsight (requires a SQL Server license). Microsoft has also a lightweight, open-source .NET library called RX which does event streaming but it doesn’t have many of the StreamInsight features, such as windowing. To demonstrate how classic BI, Big Data, and real-time BI can play together, Microsoft put together a great sample – Big Data Twitter Demo.
The demo allows you to subscribe to one or more Twitter topics of interest. It uses StreamInsight to listen to the Twitter activity and extract tweets that match the topics you “subscribe” to. In the screenshot below, I’m intercepting tweets about Microsoft and SQL Server. Then the demo saves the results to a SQL server table for offline analysis with PowerPivot and Power View (a sample Excel workbook with reports is included). In addition, the demo stores the results in a SQL Azure Hadoop cluster (HDInsight). I guess the idea is to truncate the operational SQL Server store on a regular basis while archiving all data on Hadoop for future analysis. The demo also includes a dashboard that displays the matching tweets and hit rate in real time. Behind the scenes, the application uses Web Sockets (IMO, SignalR would have been a better choice here since Web Sockets have limited browser and platform support) to communicate with the JQuery code on the client which updates the dashboard content. For more information about how all of this work, Mike Wilmot covers the demo in more details.
This is a very impressive demo and I can imagine how much effort went into building it. I personally believe that we’ll see more demand for real-time applications, especially coupled with predictive analytics, such as detecting outliers or forecasting volumes.On the downside, a few days after Microsoft released the demo, Twitter discontinued Basic Authentication, which the demo uses to authenticate with Twitter (you need a Twitter account to run it). Twitter now uses OAuth so I had to tweak the code. Specifically, I added the OAuthTokens.cs and WebRequestBuilder.cs from the Patrick Smith’s Twitterizer library to the StreamInsight.Demos.Twitter.Common class library in the demo. In the same library, I changed the Read method in the TwitterStreaming class as follows:
public TextReader Read() {
var url = GetURL();
// Basic Authencation – Obsolete
//var request = HttpWebRequest.Create(url);
//request.Timeout = _config.Timeout;
//request.Credentials = new NetworkCredential(_config.Username, _config.Password);
//var response = request.GetResponse();
// Twitter uses OAuth now which is much more complex to implement so you need wrapper classes, such as Twitterizer
OAuthTokens tokens = new OAuthTokens();
tokens.ConsumerKey = “<your Twitter consumer key>”;
tokens.ConsumerSecret = “<your Twitter consumer secret>”;
tokens.AccessToken = “<your Twitter access token>”;
tokens.AccessTokenSecret = “<your Twitter access token secret>”;
WebRequestBuilder requestBuilder = new WebRequestBuilder(new Uri(url), HTTPVerb.GET, tokens);
var response = requestBuilder.ExecuteRequest();
return new StreamReader(response.GetResponseStream());
}
And, to get the dashboard to work, I had to use Safari since for some obscure reason Web Sockets won’t work for me with IE 10 on Windows 8. If I have time, I plan to cover StreamInsight and real-time BI in more details in future posts.