封尘网

让学习成为一种习惯!

hadoop-2.7.3源码编译支持snappy压缩

hadoop2.7.3是一个小版本发布,基于hadoop2.7.2稳定版本;本次编译主要是支持snappy压缩功能;


下面是hadoop2.7.3主要功能和改进概述:

Common

  • 当使用HTTP 代理服务器的时候,认证改进。当通过代理服务器访问WebHDFS是非常有用的。
  • 新的hadoop metrics sink允许直接写Graphite
  • 规范Hadoop Compatible Filesystem (HCFS) 相关的工作

HDFS

  • 支持POSIX风格的文件系统扩展属性。
  • 使用OfflineImageViewer,客户端可以通过WebHDFS API浏览fsimage
  • NFS gateway 接受了一些可支持的改进和bug的修复。
  • SecondaryNameNode, JournalNode, and DataNode web 界面已经使用HTML5和Javascript.

YARN

  • YARN’s REST APIs 现在支持写/修改操作,用户可以通过 REST APIs提交和杀掉应用程序。
  • Yarn中存储TImeline,用于存储常规和指定信息为应用程序,支持Kerberos认证。
  • Fair Scheduler 支持用户动态分层。

本次编译环境:虚拟机
Centos6.5_64位
JDK:1.7.0
Maven:3.3.9
Findbugs:3.0.1
protobuf:2.5.0
ant:1.9.7

内存:2G

由于编译过程会下载大量有maven包,所以网络要求比较高。编译速度也主要取决于下载依赖包的速度;

源码包下载地址:

http://apache.fayea.com/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz

tar xf hadoop-2.7.3-src.tar.gz
cd hadoop-2.7.3-src

[root@moban hadoop-2.7.3-src]# head -15 BUILDING.txt #查看编译依赖环境最低要求;

Build instructions for Hadoop
----------------------------------------------------------------------------------
Requirements:
* Unix System
* JDK 1.7+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
* Zlib devel (if compiling native code)
* openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )
* Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)

下面安装各种库:

yum -y install svn ncurses-devel gcc*
yum -y install lzo-devel zlib-devel autoconf automake libtool cmake openssl-devel

编译:protobuf

下载地址:https://github.com/google/protobuf
tar zxvf protobuf-2.5.0.tar.gz
mv protobuf-2.5.0 /usr/local/
cd /usr/local/protobuf-2.5.0
./configure
make && make install

验证是否安装成功:

[root@moban protobuf-2.5.0]# protoc
Missing input file.

查看版本

[root@moban protobuf-2.5.0]# protoc --version
libprotoc 2.5.0

如上显示libprotoc 2.5.0则安装成功

ant下载地址:

http://apache.fayea.com/ant/binaries/apache-ant-1.9.7-bin.tar.gz

tar xf apache-ant-1.9.7-bin.tar.gz -C /usr/local

findbugs-3.0.1 下载地址:

http://prdownloads.sourceforge.net/findbugs/findbugs-3.0.1.tar.gz
tar xf findbugs-3.0.1.tar.gz
mv findbugs-3.0.1 /usr/local/

查看版本信息:

[root@moban local]# findbugs -version
3.0.1

maven下载地址:

http://mirrors.cnnic.cn/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz

tar xf apache-maven-3.3.9-bin.tar.gz -C /usr/local/

安装snappy-1.1.3

下载地址:https://github.com/google/snappy/releases/download/1.1.3/snappy-1.1.3.tar.gz
tar xf snappy-1.1.3.tar.gz -C /usr/local/
cd /usr/local/snappy-1.1.3
./configure
make && make install

安装完成后检查效果:

[root@moban snappy-1.1.3]# ls -lh /usr/local/lib |grep snappy
-rw-r--r-- 1 root root 462K Dec 20 12:06 libsnappy.a
-rwxr-xr-x 1 root root  955 Dec 20 12:06 libsnappy.la
lrwxrwxrwx 1 root root   18 Dec 20 12:06 libsnappy.so -> libsnappy.so.1.3.0
lrwxrwxrwx 1 root root   18 Dec 20 12:06 libsnappy.so.1 -> libsnappy.so.1.3.0
-rwxr-xr-x 1 root root 223K Dec 20 12:06 libsnappy.so.1.3.0

JDK下载地址:

http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

配置环境变量:

vim ~/.bash_profile

export PATH
export JAVA_HOME=/usr/local/jdk1.7.0_79
export JRE_HOME=/usr/local/java/jre
export CLASSPATH=$CLASSPATH:./:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export MAVEN_HOME=/usr/local/apache-maven-3.3.9
export SCALA_HOME=/usr/local/scala-2.11.8       #这个是编译Spark时用到
export FINDBUGS_HOME=/usr/local/findbugs-3.0.1
export PROTOBUF_HOME=/usr/local/protobuf-2.5.0
export ANT_HOME=/usr/local/apache-ant-1.9.7
export PATH=$PATH:$JAVA_HOME/bin:$MAVEN_HOME/bin:$SCALA_HOME/bin:$FINDBUGS_HOME/bin:$ANT_HOME/bin
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" source ~/.bash_profile

编译参数:

mvn clean package -Pdist,native -DskipTests -Dtar -Dbundle.snappy -Dsnappy.lib=/usr/local/lib

参数说明:

  • Pdist,native :把重新编译生成的hadoop动态库;
  • DskipTests :跳过测试
  • Dtar :最后把文件以tar打包
  • Dbundle.snappy :添加snappy压缩支持【默认官网下载的是不支持的】
  • Dsnappy.lib=/usr/local/lib :指snappy在编译机器上安装后的库路径

编译结果:

[INFO] ------------------------------------------------------------------------ 
[INFO] Reactor Summary: 
[INFO]  
[INFO] Apache Hadoop Main ................................. SUCCESS [  2.469 s] 
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [  1.331 s] 
[INFO] Apache Hadoop Project POM .......................... SUCCESS [  1.582 s] 
[INFO] Apache Hadoop Annotations .......................... SUCCESS [  3.130 s] 
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.227 s] 
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  1.957 s] 
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  4.687 s] 
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [  7.643 s] 
[INFO] Apache Hadoop Auth ................................. SUCCESS [  7.308 s] 
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  2.713 s] 
[INFO] Apache Hadoop Common ............................... SUCCESS [01:34 min] 
[INFO] Apache Hadoop NFS .................................. SUCCESS [  5.755 s] 
[INFO] Apache Hadoop KMS .................................. SUCCESS [02:47 min] 
[INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.054 s] 
[INFO] Apache Hadoop HDFS ................................. SUCCESS [03:55 min] 
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [02:31 min] 
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [03:53 min] 
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [  3.825 s] 
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [  0.043 s] 
[INFO] hadoop-yarn ........................................ SUCCESS [  0.029 s] 
[INFO] hadoop-yarn-api .................................... SUCCESS [01:23 min] 
[INFO] hadoop-yarn-common ................................. SUCCESS [01:42 min] 
[INFO] hadoop-yarn-server ................................. SUCCESS [  0.057 s] 
[INFO] hadoop-yarn-server-common .......................... SUCCESS [ 15.157 s] 
[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 19.710 s] 
[INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [  3.938 s] 
[INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [ 10.158 s] 
[INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 19.264 s] 
[INFO] hadoop-yarn-server-tests ........................... SUCCESS [  5.756 s] 
[INFO] hadoop-yarn-client ................................. SUCCESS [  8.633 s] 
[INFO] hadoop-yarn-server-sharedcachemanager .............. SUCCESS [  3.851 s] 
[INFO] hadoop-yarn-applications ........................... SUCCESS [  0.029 s] 
[INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [  2.681 s] 
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [  1.785 s] 
[INFO] hadoop-yarn-site ................................... SUCCESS [  0.046 s] 
[INFO] hadoop-yarn-registry ............................... SUCCESS [  6.323 s] 
[INFO] hadoop-yarn-project ................................ SUCCESS [  3.170 s] 
[INFO] hadoop-mapreduce-client ............................ SUCCESS [  0.149 s] 
[INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 21.400 s] 
[INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 17.242 s] 
[INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [  4.433 s] 
[INFO] hadoop-mapreduce-client-app ........................ SUCCESS [ 10.327 s] 
[INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [  6.912 s] 
[INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [01:59 min] 
[INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [  1.805 s] 
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [  7.366 s] 
[INFO] hadoop-mapreduce ................................... SUCCESS [  2.525 s] 
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 33.129 s] 
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [  8.177 s] 
[INFO] Apache Hadoop Archives ............................. SUCCESS [  1.759 s] 
[INFO] Apache Hadoop Rumen ................................ SUCCESS [  6.449 s] 
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [  4.048 s] 
[INFO] Apache Hadoop Data Join ............................ SUCCESS [  2.997 s] 
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [  1.976 s] 
[INFO] Apache Hadoop Extras ............................... SUCCESS [  3.139 s] 
[INFO] Apache Hadoop Pipes ................................ SUCCESS [  6.732 s] 
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [  5.706 s] 
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [33:55 min] 
[INFO] Apache Hadoop Azure support ........................ SUCCESS [ 53.715 s] 
[INFO] Apache Hadoop Client ............................... SUCCESS [  6.341 s] 
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  0.645 s] 
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [  6.669 s] 
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [  7.077 s] 
[INFO] Apache Hadoop Tools ................................ SUCCESS [  0.021 s] 
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 43.940 s] 
[INFO] ------------------------------------------------------------------------ 
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------ 
[INFO] Total time: 01:00 h
[INFO] Finished at: 2016-12-20T11:57:33+08:00
[INFO] Final Memory: 124M/766M

其实编译时间不长,总计1个小时;但是下载依赖包的时候看网速了,所以有人说编译要多久,其实都是因为网络不稳定下载依赖包慢导致的;

编译过程遇到的问题:

报错:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-[plugin:1.7:run (dist) on project hadoop-kms: An Ant BuildException has
[occured: exec returned: 2 ERROR] around Ant part ...<exec[dir="/root/soft/hadoop-2.7.3-src/hadoop-common-project/hadoop-kms/target"

[executable="sh" failonerror="true">... @ 10:118 in[/root/soft/hadoop-2.7.3-src/hadoop-common-project/hadoop-kms/target/antrun[/build-main.xml ERROR] -> Help 1]

出现原因:

  • 有可能是ant没有安装好或环境配置有问题

  • 如果1已经确认安装好,就是apache-tomcat-6.0.44.tar.gz 这个包没下载完成。

  • 添加一个普通用户如:hadoop ,授权整个hadoop-2.7.3-src 目录

useradd hadoop
chown -R hadoop.hadoop hadoop-2.7.3-src

另外一个情况:在编译过程中发现Maven会自动下载依赖包。编译时会自动下载两次apache-tomcat-6.0.44.tar.gz

[mkdir] Created dir: /root/soft/hadoop-2.7.3-src/hadoop-hdfs-project/hadoop-hdfs-httpfs/downloads
[mkdir] Created dir: /root/soft/hadoop-2.7.3-src/hadoop-common-project/hadoop-kms/downloads

两个文件相同,却存着两份,不同的目录;

[root@moban hadoop-2.7.3-src]# ls /root/soft/hadoop-2.7.3-src/hadoop-hdfs-project/hadoop-hdfs-httpfs/downloads
apache-tomcat-6.0.44.tar.gz

[root@moban hadoop-2.7.3-src]# ls /root/soft/hadoop-2.7.3-src/hadoop-common-project/hadoop-kms/downloads
apache-tomcat-6.0.44.tar.gz

如果这只下载一次那明显编译所用的时间也就更少了;因这个文件不是存到Maven的仓库里的,根据项目下载,上面的download就是编译项目时创建的目录;这个只是一个记录不影响编译;

最后怎么确定编译后是否支持snappy压缩呢?

[root@moban native]# pwd
/root/soft/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/lib/native
[root@moban native]# ll libsnappy.*
-rw-r--r-- 1 root root 472950 Dec 20 15:08 libsnappy.a
-rwxr-xr-x 1 root root    955 Dec 20 15:08 libsnappy.la
lrwxrwxrwx 1 root root     18 Dec 20 15:08 libsnappy.so -> libsnappy.so.1.3.0
lrwxrwxrwx 1 root root     18 Dec 20 15:08 libsnappy.so.1 -> libsnappy.so.1.3.0 
-rwxr-xr-x 1 root root 228177 Dec 20 15:08 libsnappy.so.1.3.0

如果此目录下有这样的文件即说明此次的编译成功添加了snappy压缩支持;同样可以在安装Hadoop后检查,

命令:hadoop checknative

提醒:本文最后更新于 2469 天前,文中所描述的信息可能已发生改变,请谨慎使用。