Tag: HDInsight

Learning Hadoop: The benefits of Hadoop commercial distributions

What are the benefits of using a commercial distribution of Hadoop? And what are the popular commercial distributions of Hadoop?

Hadoop, the preeminent open-source platform for retrieving, processing, storing and analyzing very large amounts of data, has grown tremendously from its core components pioneered by Google into a powerful ecosystem of supporting tools. There are various tools for integrating, streaming, storing, searching, and retrieving data, and tools for security and resource management, among others. And new tools keep emerging at a rapid pace.

Keeping these tools in sync with the versions that are compatible with each other, and keeping patches up-to-date, and plugging in new tools as they become available, and making sure it all works well together, along with the normal management of the Hadoop cluster, can become overwhelming for a small team. Using a commercial distribution of Hadoop alleviates this problem.

Commercial Distributions of Hadoop bundle the various tools of the ecosystem using compatible versions, ensure that they all work together, apply patches, package things in a way that makes the distribution of the software easy to download and install, and provide tools for managing the platform. For production projects created to help meet important business goals, it’s best to use a commercial distribution instead of trying to handle it all on your own. This will allow your team more time to focus on building business solutions instead of solving pesky technology issues.

Some of the most popular commercial distributions of Hadoop (not in any specific order) are:

  • Cloudera Hadoop Distribution (CDH)
    • Some major technology vendors, such as Oracle and Dell, provide their flavors of CDH
  • Hortonworks Data Platform (HDP)
  • Amazon Elastic MapReduce
  • MapR Hadoop Distribution
  • IBM Open Platform
  • Microsoft Azure’s HDInsight
  • Pivotal Big Data Suite
  • Datameer Professional
  • Datastax Enterprise Analytics

I will provide details of the various distributions in future posts.