Know Hadoop 3.0 Features and Enhancements
Hadoop was first made publicly available as an open source in 2011, since then it has undergone major changes in three different versions. The major release of Hadoop 3.x is out on 13 December, 2017.
What is in Hadoop 3.0?
Minimum required Java version increased from Java 7 to Java 8.
With Hadoop 2.0 shell scripts were difficult to understand as hadoop developers had to read almost all the shell scripts to understand what is the correct environment variable to set an option and how to set it whether it is java.library.path or java classpath or GC options.
Support for more than 2 NameNodes. This is enabled by this new feature, which allows users to run multiple standby NameNodes. For instance, by configuring three NameNodes and five JournalNodes, the cluster is able to tolerate the failure of two nodes rather than just one.
Default ports of multiple services have been changed. Conflicting ports have been moved out of the ephemeral range, affecting the NameNode, Secondary NameNode, DataNode, and KMS.
Hadoop now supports integration with Microsoft Azure Data Lake and Aliyun Object Storage System as alternative Hadoop-compatible filesystems.
Intra-datanode balancer. A single DataNode manages multiple disks. During normal write operation, disks will be filled up evenly. However, adding or replacing disks can lead to significant skew within a DataNode. This situation is not handled by the existing HDFS balancer, which concerns itself with inter-, not intra-, DN skew.