hadoop源码编译及安装配置

hadoop源码编译及安装配置

参考博客原址:https://blog.csdn.net/liweihope/article/details/89605340

文章目录

前言编译所需要的环境条件安装依赖库上传软件到系统上安装JDK安装maven安装protobuf编译hadoop

伪分布式部署配置ssh修改配置文件格式化namenode启动hadoop验证hadoop检测是否支持压缩

前言

直接从官网上下载的hadoop-2.6.0-cdh5.7.0.tar.gz部署后,不支持压缩,生产上是不可接受的,可以用hadoop checknative命令查看hadoop所支持的本地库:

#这个是直接下载的hadoop包部署的不支持压缩

[hadoop@hadoop001 ~]$ hadoop checknative

19/04/27 10:22:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Native library checking:

hadoop: false

zlib: false

snappy: false

lz4: false

bzip2: false

openssl: false

19/04/27 10:22:34 INFO util.ExitUtil: Exiting with status 1

[hadoop@hadoop001 ~]$

Unix SystemJDK 1.7+Maven 3.0 or laterFindbugs 1.3.9 (if running findbugs)ProtocolBuffer 2.5.0CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on MacZlib devel (if compiling native code)openssl devel ( if compiling native hadoop-pipes )Internet connection for first build (to fetch all Maven and Hadoop dependencies)

组件名称版本centoscentos6.6HadoopHadoop-2.6.0-cdh5.7.0-src.tar.gzjdkjdk-7u80-linux-x64.tar.gzmavenapache-maven-3.3.9-bin.tar.gzprotobufprotobuf-2.5.0.tar.gz

需要注意的是:编译的JDK版本必须是1.7版本,1.8的JDK会导致编译失败。 jdk版本可以安装好多个,需要哪个版本的时候在环境变量里面把注释去掉,把其它版本的注释掉就可以了。 大致过程:把以上软件下载好之后,需要部署安装jdk、maven、protobuf、以及相关的依赖,然后用maven编译Hadoop源码,编译之后,部署Hadoop,验证是否可行。

安装依赖库

root用户:

[root@hadoop001 ~]# yum install -y svn ncurses-devel

[root@hadoop001 ~]# yum install -y gcc gcc-c++ make cmake

[root@hadoop001 ~]# yum install -y openssl openssl-devel svn ncurses-devel zlib-devel libtool

[root@hadoop001 ~]# yum install -y snappy snappy-devel bzip2 bzip2-devel lzo lzo-devel lzop autoconf automake cmake

1234

上传软件到系统上

以前已经存在hadoop用户,没有的话创建一下 hadoop用户

#创建三个文件夹:软件、maven仓库、源码

[hadoop@hadoop001 ~]$ mkdir soft maven_repo source

[hadoop@hadoop001 ~]$ cd soft/

[hadoop@hadoop001 soft]$ rz

[hadoop@hadoop001 soft]$ ll

total 202192

-rw-r--r-- 1 hadoop hadoop 8491533 Apr 27 09:19 apache-maven-3.3.9-bin.tar.gz

-rw-r--r-- 1 hadoop hadoop 42610549 Apr 27 09:19 hadoop-2.6.0-cdh5.7.0-src.tar.gz

-rw-r--r-- 1 hadoop hadoop 153530841 Apr 27 09:18 jdk-7u80-linux-x64.tar.gz

-rw-r--r-- 1 hadoop hadoop 2401901 Apr 27 09:18 protobuf-2.5.0.tar.gz

12345678910

安装JDK

root用户: 解压安装包,安装目录必须是/usr/java,安装后记得修改拥有者为root

[root@hadoop001 ~]# tar -zxvf /home/hadoop/soft/jdk-7u80-linux-x64.tar.gz -C /usr/java/

[root@hadoop001 ~]# cd /usr/java

[root@hadoop001 java]# ll

total 169220

drwxr-xr-x 8 uucp 143 4096 Apr 11 2015 jdk1.7.0_80

drwxr-xr-x 8 root root 4096 Apr 11 2015 jdk1.8.0_45

-rw-r--r-- 1 root root 173271626 Jan 31 00:34 jdk-8u45-linux-x64.gz

[root@hadoop001 java]# chown -R root:root jdk1.7.0_80

[root@hadoop001 java]# ll

total 169220

drwxr-xr-x 8 root root 4096 Apr 11 2015 jdk1.7.0_80

drwxr-xr-x 8 root root 4096 Apr 11 2015 jdk1.8.0_45

-rw-r--r-- 1 root root 173271626 Jan 31 00:34 jdk-8u45-linux-x64.gz

[root@hadoop001 java]#

1234567891011121314

添加环境变量 因为我的环境之前安装了jdk1.8版本,所以在这里要把这1.8的给注释掉就好了。

[root@hadoop001 java]# vi /etc/profile

#添加如下环境变量

#export JAVA_HOME=/usr/java/jdk1.8.0_45

export JAVA_HOME=/usr/java/jdk1.7.0_80

export JRE_HOME=$JAVA_HOME/jre

export CLASSPATH=.:$JAVA_HOME/lib:$JER_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JER_HOME/bin:$PATH

#测试java是否安装成功

[root@hadoop001 java]# source /etc/profile

[root@hadoop001 java]# java -version

java version "1.7.0_80"

Java(TM) SE Runtime Environment (build 1.7.0_80-b15)

Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

[root@hadoop001 java]#

1234567891011121314

安装maven

hadoop用户:

[root@hadoop001 java]# su - hadoop

[hadoop@hadoop001 ~]$ tar -zxvf ~/soft/apache-maven-3.3.9-bin.tar.gz -C ~/app/

12

添加环境变量

#修改haoop用户的环境变量

[hadoop@hadoop001 ~]$ vi ~/.bash_profile

#添加或修改如下内容,注意MAVEN_OPTS设置了maven运行的内存,防止内存太小导致编译失败

export MAVEN_HOME=/home/hadoop/app/apache-maven-3.3.9

export MAVEN_OPTS="-Xms1024m -Xmx1024m"

export PATH=$MAVEN_HOME/bin:$PATH

#测试maven是否安装成功

[hadoop@hadoop001 ~]$ source ~/.bash_profile

[hadoop@hadoop001 ~]$ which mvn

~/app/apache-maven-3.3.9/bin/mvn

[hadoop@hadoop001 ~]$

1234567891011

配置maven

[hadoop@hadoop001 protobuf-2.5.0]$ vim ~/app/apache-maven-3.3.9/conf/settings.xml

#配置maven的本地仓库位置

/home/hadoop/maven_repo/repo

#添加阿里云中央仓库地址,注意一定要写在之间,否则后面编译会报错

nexus-aliyun

central

Nexus aliyun

http://maven.aliyun.com/nexus/content/groups/public

12345678910

小知识点:网络慢可能导致mvn第一次编译时下载需要超长的时间甚至编译失败。如果有别人或者自己已经建立好的本地仓库,可以把maven本地仓库添加到/home/hadoop/maven_repo/中去。

安装protobuf

hadoop用户: 解压

[hadoop@hadoop001 ~]$ tar -zxvf ~/soft/protobuf-2.5.0.tar.gz -C ~/app/

[hadoop@hadoop001 protobuf-2.5.0]$ ls

aclocal.m4 config.sub depcomp install-sh Makefile.am python

autogen.sh configure editors INSTALL.txt Makefile.in README.txt

CHANGES.txt configure.ac examples java missing src

config.guess CONTRIBUTORS.txt generate_descriptor_proto.sh ltmain.sh protobuf-lite.pc.in vsprojects

config.h.in COPYING.txt gtest m4 protobuf.pc.in

1234567

编译protobuf

#./configure --prefix是指定编译好的包要存放的路径,软件要安装的路径

#make 进行编译,make install进行安装

[hadoop@hadoop001 protobuf-2.5.0]$ ./configure --prefix=/home/hadoop/app/protobuf-2.5.0

[hadoop@hadoop001 protobuf-2.5.0]$ make

[hadoop@hadoop001 protobuf-2.5.0]$ make install

12345

添加环境变量

[hadoop@hadoop001 protobuf-2.5.0]$ vi ~/.bash_profile

#追加如下两行内容

export PROTOBUF_HOME=/home/hadoop/app/protobuf-2.5.0

export PATH=$PROTOBUF_HOME/bin:$PATH

#测试是否生效,若出现libprotoc 2.5.0则为生效

[hadoop@hadoop001 protobuf-2.5.0]$ source ~/.bash_profile

[hadoop@hadoop001 protobuf-2.5.0]$ protoc --version

libprotoc 2.5.0

[hadoop@hadoop001 protobuf-2.5.0]$

123456789

编译hadoop

hadoop用户: 解压

[hadoop@hadoop001 ~]$ tar -zxvf ~/soft/hadoop-2.6.0-cdh5.7.0-src.tar.gz -C ~/source/

#进入hadoop的源码目录

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ ls

BUILDING.txt hadoop-client hadoop-mapreduce1-project hadoop-project LICENSE.txt

cloudera hadoop-common-project hadoop-mapreduce-project hadoop-project-dist NOTICE.txt

dev-support hadoop-dist hadoop-maven-plugins hadoop-tools pom.xml

hadoop-assemblies hadoop-hdfs-project hadoop-minicluster hadoop-yarn-project README.txt

1234567

编译hadoop使其支持压缩:mvn clean package -Pdist,native -DskipTests -Dtar

#进入hadoop的源码目录

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ cd ~/source/hadoop-2.6.0-cdh5.7.0/

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ mvn clean package -Pdist,native -DskipTests -Dtar

#进行编译,第一次编译会下载很多依赖的jar包,快慢由网速决定,需耐心等待

#我是把别人的本地仓库拷贝到我的本地仓库,所以很快

12345

若报异常,主要信息如下(无异常跳过)(这个我没有碰到,是其他人碰到,学习一下):

[FATAL] Non-resolvable parent POM for org.apache.hadoop:hadoop-main:2.6.0-cdh5.7.0: Could not transfer artifact com.cloudera.cdh:cdh-root:pom:5.7.0 from/to cdh.repo (https://repository.cloudera.com/artifactory/cloudera-repos): Remote host closed connectio

#分析:是https://repository.cloudera.com/artifactory/cloudera-repos/com/cloudera/cdh/cdh-root/5.7.0/cdh-root-5.7.0.pom文件下载不了,但是虚拟机确实是ping通远程的仓库,很是费解为什么。

#解决方案:前往本地仓库到目标文件目录,然后 通过wget 文件,来成功获取该文件,重新执行编译命令,或者执行4.5的可选步骤,将需要的jar直接放到本地仓库

123

编译成功后会有提示:

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

123

查看编译后的包:hadoop-2.6.0-cdh5.7.0.tar.gz

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ ll /home/hadoop/source/hadoop-2.6.0-cdh5.7.0/hadoop-dist/target/

total 563956

drwxrwxr-x 2 hadoop hadoop 4096 Apr 27 12:42 antrun

drwxrwxr-x 3 hadoop hadoop 4096 Apr 27 12:42 classes

-rw-rw-r-- 1 hadoop hadoop 1998 Apr 27 12:42 dist-layout-stitching.sh

-rw-rw-r-- 1 hadoop hadoop 690 Apr 27 12:42 dist-tar-stitching.sh

drwxrwxr-x 9 hadoop hadoop 4096 Apr 27 12:42 hadoop-2.6.0-cdh5.7.0

-rw-rw-r-- 1 hadoop hadoop 191842632 Apr 27 12:42 hadoop-2.6.0-cdh5.7.0.tar.gz

-rw-rw-r-- 1 hadoop hadoop 7314 Apr 27 12:42 hadoop-dist-2.6.0-cdh5.7.0.jar

-rw-rw-r-- 1 hadoop hadoop 385574240 Apr 27 12:43 hadoop-dist-2.6.0-cdh5.7.0-javadoc.jar

-rw-rw-r-- 1 hadoop hadoop 4855 Apr 27 12:42 hadoop-dist-2.6.0-cdh5.7.0-sources.jar

-rw-rw-r-- 1 hadoop hadoop 4855 Apr 27 12:42 hadoop-dist-2.6.0-cdh5.7.0-test-sources.jar

drwxrwxr-x 2 hadoop hadoop 4096 Apr 27 12:42 javadoc-bundle-options

drwxrwxr-x 2 hadoop hadoop 4096 Apr 27 12:42 maven-archiver

drwxrwxr-x 3 hadoop hadoop 4096 Apr 27 12:42 maven-shared-archive-resources

drwxrwxr-x 3 hadoop hadoop 4096 Apr 27 12:42 test-classes

drwxrwxr-x 2 hadoop hadoop 4096 Apr 27 12:42 test-dir

1234567891011121314151617

伪分布式部署

由于我之前已经部署过hadoop-2.6.0-cdh5.7.0,所以这里先把它名字修改一下

hadoop@hadoop001 app]$ mv hadoop-2.6.0-cdh5.7.0 hadoop-2.6.0-cdh5.7.0-20190427bak

1

解压安装包

[hadoop@hadoop001 target]$ pwd

/home/hadoop/source/hadoop-2.6.0-cdh5.7.0/hadoop-dist/target

[hadoop@hadoop001 target]$ tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C ~/app

123

配置环境变量 (因为我之前已经配置过环境变量,所以这里不需要配置,没有配的话,可以这样配一下)

[hadoop@hadoop001 ~]$ vi ~/.bash_profile

export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0

export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

[hadoop@hadoop001 ~]$ source ~/.bash_profile

[hadoop@hadoop001 ~]$ which hadoop

~/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop

123456

配置ssh

为什么要配置ssh,因为hadoop在启动的时候要用到ssh。

[hadoop@hadoop001 ~]$ rm -rf ~/.ssh

[hadoop@hadoop001 ~]$ ssh-keygen

[hadoop@hadoop001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[hadoop@hadoop001 ~]$ chmod 600 ~/.ssh/authorized_keys

#测试ssh是否成功,用户第一ssh会提示输入是否连接,yes。成功显示时间

[hadoop@hadoop001 ~]$ ssh hadoop001 date

123456

修改配置文件

编辑hadoop-env.sh文件相关配置

[hadoop@hadoop001 hadoop]$ pwd

/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop

[hadoop@hadoop001 hadoop]$ vi hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_80

12345

编辑core-site.xml文件相关配置

#这里我创建了hadoop_tmp目录

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ mkdir hadoop_tmp

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ chmod 777 hadoop_tmp

[hadoop@hadoop001 hadoop]$ vi core-site.xml fs.defaultFS hdfs://hadoop001:9000 #这里我配置了它的临时目录设定,默认是在/tmp/hadoop目录下的,linux会定期清理/tmp目录 hadoop.tmp.dir /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/hadoop_tmp

12345678910111213141516

编辑hdfs-site.xml文件相关配置

[hadoop@hadoop001 hadoop]$ vi hdfs-site.xml

dfs.replication

1

dfs.namenode.secondary.http-address

hadoop001:50090

dfs.namenode.secondary.https-address

hadoop001:50091

123456789101112131415

编辑mapred-site.xml文件相关配置

[hadoop@hadoop001 hadoop]$ cp mapred-site.xml.template mapred-site.xml

[hadoop@hadoop001 hadoop]$ vi mapred-site.xml

mapreduce.framework.name

yarn

12345678

编辑yarn-site.xml文件相关配置

[hadoop@hadoop001 hadoop]$ vi yarn-site.xml

yarn.nodemanager.aux-services mapreduce_shuffle

123456789

添加slaves节点

[hadoop@hadoop001 ~]$ vim ~/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/slaves

#把默认的localhost修改成hadoop001

hadoop001

123

格式化namenode

[hadoop@hadoop001 ~]$ hdfs namenode -format

......

#若出现 has been successfully formatted 则表示格式化成功

Re-format filesystem in Storage Directory /tmp/hadoop-hadoop/dfs/name ? (Y or N) Y

19/04/27 15:00:42 INFO namenode.FSImage: Allocated new BlockPoolId: BP-835614160-127.0.0.1-1556348442807

19/04/27 15:00:43 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.

19/04/27 15:00:43 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

19/04/27 15:00:43 INFO util.ExitUtil: Exiting with status 0

19/04/27 15:00:43 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at hadoop001/127.0.0.1

************************************************************/

[hadoop@hadoop001 hadoop]$

12345678910111213

启动hadoop

[hadoop@hadoop001 sbin]$ ./start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

Starting namenodes on [hadoop001]

hadoop001: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-hadoop001.out

hadoop001: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hadoop001.out

Starting secondary namenodes [hadoop001]

hadoop001: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-secondarynamenode-hadoop001.out

starting yarn daemons

starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-hadoop001.out

hadoop001: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hadoop001.out

#查询出五个守护进程

[hadoop@hadoop001 sbin]$ jps

868 ResourceManager

456 NameNode

1353 DataNode

1419 Jps

720 SecondaryNameNode

961 NodeManager

[hadoop@hadoop001 sbin]$

12345678910111213141516171819

验证hadoop

hdfs验证:

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ hdfs dfs -ls /

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ hdfs dfs -put README.txt hdfs://hadoop001:9000/

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ hdfs dfs -ls /

Found 1 items

-rw-r--r-- 1 hadoop supergroup 1366 2019-04-27 15:39 /README.txt

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$

123456

yarn验证:

#用这个案例验证一下,看一下结果

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 4 10

12

检测是否支持压缩

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ hadoop checknative

19/04/27 16:04:13 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native

19/04/27 16:04:13 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library

Native library checking:

hadoop: true /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/lib/native/libhadoop.so.1.0.0

zlib: true /lib64/libz.so.1

snappy: true /usr/lib64/libsnappy.so.1

lz4: true revision:99

bzip2: true /lib64/libbz2.so.1

openssl: true /usr/lib64/libcrypto.so

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$

1234567891011

本文参考: ①主要参考博客:https://blog.csdn.net/qq_32641659/article/details/89074365#commentBox ②关于Hadoop的本地库参考: http://blog.sina.com.cn/s/blog_3d9e90ad0102wqrp.html ③Protocol Buffer介绍 https://blog.csdn.net/carson_ho/article/details/70568606 ④./configure、make、make install 命令详解: https://www.cnblogs.com/tinywan/p/7230039.html https://www.cnblogs.com/qiaopei/p/5550168.html

相关文章