February 4, 2014 · Cloud Computing Hadoop HDInsights

Why Windows HDInsights is going nowhere


Everyone wants to do big data, Microsoft is no exception. Jumping on the bigdata bandwagon and cashing in is something no one wants to miss. We have so many players in the market: AWS, Cloudera, MapR, HortonWorks, IBM, Intel and of course open source Hadoop Ecosystem. Stakes are high and microsoft knows it. That is why they jumped in with Windows Azure and Windows HDInsights: IaaS and PaaS.

I had a chance to work with MS IaaS and PaaS services. I have been using Windows HDInsights for some time now and based on my experience with AWS and Cloudera based services i can tell that HDInsights is going nowhere. It is a half baked solutions thrown out in haste to make a presence. And the strategy seems to be simple, grab some lab rats and improve the offering at clients’ expense. HDInsights comes up with Hadoop, Hive, Oozie, Pig and Sqoop preinstalled, Even desktop shortcuts to jobtracker and namenode UI, now thats some neat work. The cluster stores all its data on windows blob store (similar to S3) by default.

Microsoft has come up with, what it is best at (understatement?), a good UI. Few click abstractions for complex tasks such as spinning up a hadoop cluster. Once one is over the feel good factor of UI, clear problems come up.

Here is a list of things i thought was the most problematic areas:

would take 30-60 seconds.
- A running cluster with small or moderate data set would take 3 mins at least to finish a map reduce job while the same job with same configuration would finish in 45 seconds on AWS!

And the list could go on. One could argue that they are still adapting to the ecosystem but they are doing it at clients’ expense. The only use case i can see fit for HDInsights is running hadoop wordcount!