Hadoop Version History and Feature

By | 04月12日

Versions and Features

Hadoop has seen significant interest over the past few years. This has led to a proportional uptick in features and bug fixes. Some of these features were so significant or had such a sweeping impact that they were developed on branches. As you might expect, this in turn led to a somewhat dizzying array of releases and parallel lines of development.

Here is a whirlwind tour of the various lines of development and their status. This information is also depicted visually in Figure 4-1.


The 0.20 branch of Hadoop is extremely stable and has seen quite a bit of production burn-in. This branch has been one of the longest-lived branches in Hadoop’s history since being at Apache, with the first release appearing in April 2009. CDH2 and CDH3 are both based off of this branch, albeit with many features and bug fixes from 0.21, 0.22, and 1.0 back-ported.


One of the features missing from 0.20 was support for file appends in HDFS. Apache HBase relies on the ability to sync its write ahead log, (such as force file contents to disk) which under the hood, uses the same basic functionality as file append. Append was considered a potentially destabilizing feature and many disagreed on the implementation, so it was relegated to a branch. This branch was called 0.20-append. No official release was ever made from the 0.20-append branch.


Yahoo!, one of the major contributors to Apache Hadoop, invested in adding full Kerberos support to core Hadoop. It later contributed this work back to Hadoop in the form of the 0.20-security branch, a version of Hadoop 0.20 with Kerberos authentication support. This branch would later be released as the 0.20.20X releases.


There was a strong desire within the community to produce an official release of Hadoop that included the 0.20-security work. The 0.20.20X releases contained not only security features from 0.20-security, but also bug fixes and improvements on the 0.20 line of development. Generally, it no longer makes sense to deploy these releases as they’re superseded by 1.0.0.


The 0.21 branch was cut from Hadoop trunk and released in August 2010. This was considered a developer preview or alpha quality release to highlight some of the features that were currently in development at the time. Despite the warning from the Hadoop developers, a small number of users deployed the 0.21 release anyway. This release does not include security, but does have append.


Hold on, because this is where the story gets weird. In December 2011, the Hadoop community released version 0.22, which was based on trunk, like 0.21 was. This release includes security, but only for HDFS. Also a bit strange, 0.22 was released after 0.23 with less functionality. This was due to when the 0.22 branch was cut from trunk.


In November 2011, version 0.23 of Hadoop was released. Also cut from trunk, 0.23 includes security, append, YARN, and HDFS federation. This release has been dubbed a developer preview or alpha-quality release. This line of development is superseded by 2.0.0.


In a continuing theme of confusion, version 1.0.0 of Hadoop was released from the 0.20.205 line of development. This means that 1.0.0 does not contain all of the features and fixes found in the 0.21, 0.22, and 0.23 releases. That said, it does include security.


In May 2012, version 2.0.0 was released from the 0.23.0 branch and like 0.23.0, is considered alpha-quality. Mainly, this is because it includes YARN and removes the traditional MRv1 jobtracker and tasktracker daemons. While YARN is API-compatible with MRv1, the underlying implementation is different enough for it to require more significant testing before being considered production-ready.

Figure 4-1. Hadoop branches and releases



Similar Posts:

  • [REF] Java EE Version History

    simply copied from : http://en.wikipedia.org/wiki/Java_EE_version_history Java EE version history From Wikipedia, the free encyclopedia Jump to: navigation, search The Java Platform, Enterprise Edition or Java EE (formerly known as Java 2 Platform, E

  • Hadoop Version Graph

    可以到这里看全文: http://cloudblog.8kmiles.com/2012/01/19/apache-hadoop-version-timeline/ 有感于hadoop版本的混乱,看到了这篇文章讲解其发展路线,甚好.但是,注意原文是在2012年一月份写的,所以最新的肯定没在上面,谁要是能再总结一下就好了.

  • Java之美[从菜鸟到高手演练]之Hadoop常用命令

    作者:二青 邮箱:[email protected] 微博:http://weibo.com/xtfggef 这篇文章主要是讲一下位于bin下的hadoop命令,我们可以直接输入hadoop无任何参数看一下: 用法就是:hadoop [---config confdir] COMMAND此处COMMAND就是下面列出来的那些,fs, version,jar 等等. 用户命令 fs 目前版本的hadoop已经摒弃了fs命令,取而代之的是hdfs dfs. Usage: hdfs dfs [GENE

  • 基于Hadoop 2.2.0的高可用性集群搭建步骤(64位)

    内容概要: CentSO_64bit集群搭建, hadoop2.2(64位)编译,安装,配置以及测试步骤 新版亮点: 基于yarn计算框架和高可用性DFS的第一个稳定版本. 注1:官网只提供32位release版本, 若机器为64位,需要手动编译. 注2:目前网上传的2.2版本的安装步骤几乎都有问题,没有一个版本是完全正确的.若不懂新框架内部机制,不要照抄网传的版本. 0. 编译前的准备 虚拟机vmware准备,64bit CentOS准备 节点ip cluster1 172.16.102. 2

  • hadoop配置文件的参数含义说明

    #hadoop version 查看版本号 1 .获取默认配置 hadoop2系列配置文件一共包括6个,分别是hadoop-env.sh.core-site.xml.hdfs-site.xml.mapred-site.xml.yarn-site.xml和slaves.除了hdfs-site.xml文件在不同集群配置不同外,其余文件在四个节点的配置是完全一样的,可以复制. 另外,core-site.xml是全局配置,hdfs-site.xml和mapred-site.xml分别是hdfs和mapr

  • hadoop mapred-default.xml配置文件

    name value description hadoop.job.history.location If job tracker is static the history files are stored in this single well known place. If No value is set here, by default, it is in the local file system at ${hadoop.log.dir}/history. hadoop.job.his

  • hadoop Incompatible namespaceIDs

    错误: ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs DataNode的VERSION内容: [[email protected] current]$ more /hdfs/tmp/dfs/data/current/VERSION #Mon Feb 25 07:42:26 HKT 2002 namespaceID=698320168 storag

  • hadoop管理员常用命令

    启动Hadoop 进入HADOOP_HOME目录. 执行sh bin/start-all.sh 关闭Hadoop 进入HADOOP_HOME目录. 执行sh bin/stop-all.sh 1.查看指定目录下内容 hadoop dfs –ls [文件目录] eg: hadoop dfs –ls /user/wangkai.pt 2.打开某个已存在文件 hadoop dfs –cat [file_path] eg:hadoop dfs -cat /user/wangkai.pt/data.txt

  • Hadoop原理及部署初探

    Hadoop Hadoop为何物 Hadoop是一个分布式系统基础架构,由Apache基金会所开发. 用户可以在不了解分布式底层细节的情况下,开发分布式程序.充分利用集群的威力高速运算和存储. Hadoop实现了一个分布式文件系统(HadoopDistributed File System),简称HDFS.HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上:而且它提供高传输率(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large da

  • hadoop单击模式环境搭建

    一 安装jdk 下载相应版本的jdk安装到相应目录,我的安装目录是/usr/lib/jdk1.8.0_40 下载完成后,在/etc/profile中设置一下环境变量,在文件最后追加如下内容 export JAVA_HOME=/usr/lib/jdk1.8.0_40 export JRE_HOME=/usr/lib/jdk1.8.0_40/jre export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH export PATH=