1. 虚拟机环境准备

主机名 IP 内存 系统
hadoop101 192.168.10.11 2G Ubuntu 20.04.3 LTS
hadoop102 192.168.10.12 2G Ubuntu 20.04.3 LTS
hadoop103 192.168.10.13 2G Ubuntu 20.04.3 LTS

以下步骤三台服务器均需要操作

1.1 添加主机名映射

1
2
3
4
ren@hadoop101:~$ cat /etc/hosts
192.168.10.11 hadoop101
192.168.10.12 hadoop102
192.168.10.13 hadoop103

1.2 ssh三机互信

1
2
3
4
ren@hadoop101:~$ ssh-keygen
ren@hadoop101:~$ ssh-copy-id ren@192.168.10.11
ren@hadoop101:~$ ssh-copy-id ren@192.168.10.12
ren@hadoop101:~$ ssh-copy-id ren@192.168.10.13

1.3 创建项目目录

1
2
ren@hadoop101:~$ sudo mkdir /opt/{module,software}
ren@hadoop101:~$ sudo chown -R ren.ren /opt/*

2. Hadoop部署

2.1 集群部署规划

服务 hadoop101 hadoop102 hadoop103
HDFS NameNode
DatanNode
DataNode SecondaryNameNode
DataNode
YARN NodeManager ResourceManager
NodeManager
NodeManager

2.2 常用端口说明

端口名称 Hadoop2.x Hadoop3.x
NameNode内部通信端口 8020/9000 8020/9000/9820
NameNode HTTP UI 50070 9870
MapReduce查看执行任务端口 8088 8088
历史服务器通信端口 19888 19888

2.3 资源下载

这里通过百度云下载
链接: https://pan.baidu.com/s/1VrFIfFpxq4S0nrrMGcKw3w 提取码: 986o

2.4 上传资源到software目录

1
2
3
4
5
6
7
8
9
10
11
ren@hadoop101:~$ ll /opt/software/
total 767908
drwxr-xr-x 2 ren ren 4096 Mar 2 09:39 ./
drwxr-xr-x 4 root root 4096 Mar 1 18:56 ../
-rw-r--r-- 1 ren ren 607792249 Mar 1 18:57 hadoop-3.3.1-aarch64.tar.gz
-rw-r--r-- 1 ren ren 115 Mar 1 18:58 hadoop.sh
-rw-r--r-- 1 ren ren 178509312 Mar 1 18:58 jdk-8u321-fcs-bin-b07-linux-aarch64-15_dec_2021.tar
-rw-r--r-- 1 ren ren 75 Mar 1 18:58 jdk.sh
-rwxrwxr-x 1 ren ren 157 Mar 2 09:39 jpsall*
-rwxrwxr-x 1 ren ren 1086 Mar 2 09:30 myhadoop*
-rwxrwxr-x 1 ren ren 691 Mar 1 19:00 xsync*

2.5 安装jdk

解压jdk到module目录

1
ren@hadoop101:/opt/software$ tar xf jdk-8u321-fcs-bin-b07-linux-aarch64-15_dec_2021.tar -C ../module/

添加jdk系统环境变量

1
2
ren@hadoop101:/opt/software$ sudo cp jdk.sh /etc/profile.d/
ren@hadoop101:/opt/software$ source /etc/profile.d/jdk.sh

确保java有以下输出

1
2
3
4
ren@hadoop101:/opt/software$ java -version
java version "1.8.0_321"
Java(TM) SE Runtime Environment (build 1.8.0_321-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.321-b07, mixed mode)

2.6 安装hadoop

解压hadoop到module目录

1
ren@hadoop101:/opt/software$ tar xf hadoop-3.3.1-aarch64.tar.gz -C ../module/

添加hadoop系统环境变量

1
2
ren@hadoop101:/opt/software$ sudo cp hadoop.sh /etc/profile.d/
ren@hadoop101:/opt/software$ source /etc/profile.d/hadoop.sh

测试是否安装成功

1
2
3
4
5
6
7
ren@hadoop101:/opt/module$ hadoop version
Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T10:51Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /opt/module/hadoop-3.3.1/share/hadoop/common/hadoop-common-3.3.1.jar

2.7 配置hadoop

2.7.1 默认配置文件

1
2
3
4
./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
./share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
./share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

2.7.2 自定义配置文件

etc/hadoop/core-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop101:8020</value>
<description>指定NameNode的地址</description>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.3.1/data</value>
<description>指定hadoop数据的存储目录</description>
</property>

<property>
<name>hadoop.http.staticuser.user</name>
<value>ren</value>
<description>配置hdfs网页登陆使用的静态用户为ren</description>
</property>
</configuration>

etc/hadoop/hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop101:9870</value>
<description>nn web端访问地址</description>
</property>

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop103:9868</value>
<description>2nn web端访问地址</description>
</property>

</configuration>

etc/hadoop/yarn-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<configuration>
<property>
<description>指定MR走shuffle</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<description>指定ResourceManager的地址</description>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop102</value>
</property>

<property>
<description>环境变量的继承</description>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

<property>
<description>开始日志聚集功能</description>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<description>设置日志聚集服务器地址</description>
<name>yarn.log.server.url</name>
<value>http://hadoop101:19888/jobhistory/logs</value>
</property>

<property>
<description>设置日志保留时间为7天</description>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>

</configuration>

etc/hadoop/mapred-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<configuration>
<property>
<description>指定MapReduce程序运行再Yarn上</description>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop101:10020</value>
<description>历史服务器地址</description>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop101:19888</value>
<description>历史服务器web端地址</description>
</property>
</configuration>

2.7.3 配置workers

etc/hadoop/workers

1
2
3
hadoop101
hadoop102
hadoop103

3. 项目同步

同步hadoop101的hadoop项目目录到hadoop102、haddop103
使用自定义脚本xsync

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/bin/bash

#1. 判断参数个数
if [ $# -lt 1 ]; then
echo Not Enough Arguement!
exit;
fi

#2. 遍历集群所有机器
for host in hadoop101 hadoop102 hadoop103; do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@; do
#4. 判断文件是否存在
if [ -e $file ]; then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done

3.1 同步目录

1
ren@hadoop101:/opt/module$ xsync /opt/module/

4. 起集群

4.1 如果集群第一次启动,需要再hadoop101节点格式化NameNode

1
hdfs namenode -format

4.2 启动集群

这里使用自定义脚本myhadoop
Usage: myhadoop <start|stop>

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/bin/bash
if [ $# -lt 1 ]; then
echo "No Args Input..."
exit ;
fi

case $1 in
"start")
echo " =================== 启动 hadoop 集群 ==================="
echo " --------------- 启动 hdfs ---------------"
ssh hadoop101 "/opt/module/hadoop-3.3.1/sbin/start-dfs.sh"
echo " --------------- 启动 yarn ---------------"
ssh hadoop102 "/opt/module/hadoop-3.3.1/sbin/start-yarn.sh"
echo " --------------- 启动 historyserver ---------------"
ssh hadoop101 "/opt/module/hadoop-3.3.1/bin/mapred --daemon start historyserver"
;;
"stop")
echo " =================== 关闭 hadoop 集群 ==================="
echo " --------------- 关闭 historyserver ---------------"
ssh hadoop101 "/opt/module/hadoop-3.3.1/bin/mapred --daemon stop historyserver"
echo " --------------- 关闭 yarn ---------------"
ssh hadoop102 "/opt/module/hadoop-3.3.1/sbin/stop-yarn.sh"
echo " --------------- 关闭 hdfs ---------------"
ssh hadoop101 "/opt/module/hadoop-3.3.1/sbin/stop-dfs.sh"
;;
*)
echo "Input Args Error..."
;;
esac
1
ren@hadoop101:/opt/module/hadoop-3.3.1$ myhadoop start

4.3 查看集群状态

这里使用自定义脚本jpsall
Usage: jpsall

1
2
3
4
5
#!/bin/bash
for host in hadoop101 hadoop102 hadoop103; do
echo =============== $host ===============
ssh $host /opt/module/jdk1.8.0_321/bin/jps
done

查看集群进程
确保进程都在,这里配置的是
NameNode和JobHistoryServer在hadoop101
ResourceManager在hadoop102
SecoundaryNameNode在hadoop103

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ren@hadoop101:/opt/module/hadoop-3.3.1$ jpsall
=============== hadoop101 ===============
17907 JobHistoryServer
17172 NameNode
17383 DataNode
21559 Jps
17689 NodeManager
=============== hadoop102 ===============
19267 DataNode
19495 ResourceManager
19851 NodeManager
22975 Jps
=============== hadoop103 ===============
16487 NodeManager
16167 DataNode
20984 Jps
16333 SecondaryNameNode

4.4 集群web页面

hdfs web页面
集群文件的web管理页面
http://hadoop101:9870/

yarn web页面
yarn的资源调度页面
http://hadoop102:8088

历史记录web页面
http://hadoop101:19888/jobhistory

4.5 各服务组件逐一启动/停止

如果单台机器上服务出现异常,单独启动或停止方法

1
hdfs --daemon start/stop namenode/datanode/secondarynamenode

启动/停止yarn

1
yarn --daemon start/stop resourcemanager/nodemanager

5. 集群饿基本测试

5.1 上传文件到集群

hdfs中创建目录

1
ren@hadoop101:/opt/module/hadoop-3.3.1$ hadoop fs -mkdir /input

HDFS中上传一个小文件

1
hadoop fs -put /opt/module/hadoop-3.3.1/wcinput/word.txt /input

数据的实际存储位置

1
2
3
4
5
6
7
ren@hadoop101:/opt/module/hadoop-3.3.1$ cat data/dfs/data/current/BP-2027323945-192.168.10.11-1646133053594/current/finalized/subdir0/subdir0/blk_1073741825
test
hehe
haha
test
za
zhe

5.2 执行调度任务保存到hdfs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
ren@hadoop101:/opt/module/hadoop-3.3.1$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar wordcount /input /wcountput
2022-03-02 15:44:37,508 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop102/192.168.10.12:8032
2022-03-02 15:44:37,882 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/ren/.staging/job_1646190571735_0001
2022-03-02 15:44:38,221 INFO input.FileInputFormat: Total input files to process : 1
2022-03-02 15:44:38,302 INFO mapreduce.JobSubmitter: number of splits:1
2022-03-02 15:44:38,450 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1646190571735_0001
2022-03-02 15:44:38,450 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-03-02 15:44:38,561 INFO conf.Configuration: resource-types.xml not found
2022-03-02 15:44:38,562 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-03-02 15:44:38,716 INFO impl.YarnClientImpl: Submitted application application_1646190571735_0001
2022-03-02 15:44:38,741 INFO mapreduce.Job: The url to track the job: http://hadoop102:8088/proxy/application_1646190571735_0001/
2022-03-02 15:44:38,741 INFO mapreduce.Job: Running job: job_1646190571735_0001
2022-03-02 15:44:43,845 INFO mapreduce.Job: Job job_1646190571735_0001 running in uber mode : false
2022-03-02 15:44:43,851 INFO mapreduce.Job: map 0% reduce 0%
2022-03-02 15:44:46,897 INFO mapreduce.Job: map 100% reduce 0%
2022-03-02 15:44:51,958 INFO mapreduce.Job: map 100% reduce 100%
2022-03-02 15:44:52,996 INFO mapreduce.Job: Job job_1646190571735_0001 completed successfully
2022-03-02 15:44:53,065 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=58
FILE: Number of bytes written=545001
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=128
HDFS: Number of bytes written=32
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1323
Total time spent by all reduces in occupied slots (ms)=2244
Total time spent by all map tasks (ms)=1323
Total time spent by all reduce tasks (ms)=2244
Total vcore-milliseconds taken by all map tasks=1323
Total vcore-milliseconds taken by all reduce tasks=2244
Total megabyte-milliseconds taken by all map tasks=1354752
Total megabyte-milliseconds taken by all reduce tasks=2297856
Map-Reduce Framework
Map input records=6
Map output records=6
Map output bytes=51
Map output materialized bytes=58
Input split bytes=101
Combine input records=6
Combine output records=5
Reduce input groups=5
Reduce shuffle bytes=58
Reduce input records=5
Reduce output records=5
Spilled Records=10
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=75
CPU time spent (ms)=560
Physical memory (bytes) snapshot=498315264
Virtual memory (bytes) snapshot=4984467456
Total committed heap usage (bytes)=403177472
Peak Map Physical memory (bytes)=287354880
Peak Map Virtual memory (bytes)=2484170752
Peak Reduce Physical memory (bytes)=210960384
Peak Reduce Virtual memory (bytes)=2500296704
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=27
File Output Format Counters
Bytes Written=32

5.3 WEB页面查看执行结果

每次执行计算任务保存到hdfs中,这里使用wordcount计算文本中文字出现的次数,可以看到多了一个wcountput

查看计算结果文件

查看yarn任务调度页面,查看此次任务执行各参数

通过任务页面跳转到历史记录页面,查看相关执行细节

因为是集群查看日志不方便,web端做了日志合并任务关联,通过yarn任务调度界面进入查看

6. 常见错误及解决方案

6.1 防火墙没关闭、或没启动YARN

    INFO client.RMProxy: Connecting to ResourceManager at hadoop108/192.168.10.108:8032

6.2 主机名配置错误

主机名不要和集群名称一样,或者删除/etc/hosts下127.0.0.1 hadoop101

6.3 IP地址配置错误

6.4 ssh没配置好

涉及同步脚本报错

6.5 root 用户和 ren 两个用户启动集群不统一

集群使用普通用户启动,如果需要使用root用户启动需要特殊配置不要混用用户启动,所以最好使用脚本统一启动

在/hadoop/sbin路径下:
将start-dfs.sh,stop-dfs.sh两个文件顶部添加以下参数

1
2
3
4
5
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

还有,start-yarn.sh,stop-yarn.sh顶部也需添加以下:

1
2
3
4
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

6.6 配置文件修改不细心

服务启动会出现异常,具体查看服务log

6.7 不识别主机名称

1
2
3
4
5
6
7
8
9
10
11
java.net.UnknownHostException: hadoop102: hadoop102
at
java.net.InetAddress.getLocalHost(InetAddress.java:1475)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(Job
Submitter.java:146)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native
Method)
at javax.security.auth.Subject.doAs(Subject.java:415)

解决办法:
(1)在/etc/hosts 文件中添加 192.168.10.102 hadoop102
(2)主机名称不要起 hadoop hadoop000 等特殊名称

6.8 DataNode 和 NameNode 进程同时只能工作一个

6.9 jps 发现进程已经没有,但是重新启动集群,提示进程已经开启

原因是在 Linux 的根目录下/tmp 目录中存在启动的进程临时文件,将集群相关进程删除掉,再重新启动集群。

6.10 jps不生效

原因:全局变量 hadoop java 没有生效。解决办法:需要 source /etc/profile 文件。