CentOS 搭建 Hadoop3 高可用集群

2023-11-02 07:44

本文主要是介绍CentOS 搭建 Hadoop3 高可用集群,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Hadoop FullyDistributed Mode 完全分布式

spark101spark102spark103
192.168.171.101192.168.171.102192.168.171.103
namenodenamenode
journalnodejournalnodejournalnode
datanodedatanodedatanode
nodemanagernodemanagernodemanager
recource managerrecource manager
job history
job logjob logjob log

1. 准备

1.1 升级操作系统和软件

yum -y update

升级后建议重启

1.2 安装常用软件

yum -y install gcc gcc-c++ autoconf automake cmake make rsync vim man zip unzip net-tools zlib zlib-devel openssl openssl-devel pcre-devel tcpdump lrzsz tar wget openssh-server

1.3 修改主机名

hostnamectl set-hostname spark01
hostnamectl set-hostname spark02
hostnamectl set-hostname spark03

1.4 修改IP地址

vim /etc/sysconfig/network-scripts/ifcfg-ens160

网卡 配置文件示例

TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="none"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens32"
DEVICE="ens32"
ONBOOT="yes"
IPADDR="192.168.171.101"
PREFIX="24"
GATEWAY="192.168.171.2"
DNS1="192.168.171.2"
IPV6_PRIVACY="no"

1.5 关闭防火墙

sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/configsetenforce 0
systemctl stop firewalld
systemctl disable firewalld

1.6 修改hosts配置文件

vim /etc/hosts

修改内容如下:

192.168.171.101	spark01
192.168.171.102	spark02
192.168.171.103	spark03

1.7 上传软件配置环境变量

在所有主机节点创建软件目录

mkdir -p /opt/soft 

以下操作在 hadoop101 主机上完成

进入软件目录

cd /opt/soft

下载 JDK

wget https://download.oracle.com/otn/java/jdk/8u391-b13/b291ca3e0c8548b5a51d5a5f50063037/jdk-8u391-linux-x64.tar.gz?AuthParam=1698206552_11c0bb831efdf87adfd187b0e4ccf970

下载 zookeeper

wget https://dlcdn.apache.org/zookeeper/zookeeper-3.8.3/apache-zookeeper-3.8.3-bin.tar.gz

下载 hadoop

wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz

解压 JDK 修改名称

解压 zookeeper 修改名称

解压 hadoop 修改名称

tar -zxvf jdk-8u391-linux-x64.tar.gz -C /opt/soft/
mv jdk1.8.0_391/ jdk-8
tar -zxvf apache-zookeeper-3.8.3-bin.tar.gz
mv apache-zookeeper-3.8.3-bin zookeeper-3
tar -zxvf hadoop-3.3.5.tar.gz -C /opt/soft/
mv hadoop-3.3.5/ hadoop-3

配置环境变量

vim /etc/profile.d/my_env.sh

编写以下内容:

export JAVA_HOME=/opt/soft/jdk-8
export set JAVA_OPTS="--add-opens java.base/java.lang=ALL-UNNAMED"export ZOOKEEPER_HOME=/opt/soft/zookeeper-3export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_ZKFC_USER=root
export HDFS_JOURNALNODE_USER=rootexport YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=rootexport HADOOP_HOME=/opt/soft/hadoop-3
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

生成新的环境变量

注意:分发软件和配置文件后 在所有主机执行该步骤

source /etc/profile

2. zookeeper

2.1 编辑配置文件

cd $ZOOKEEPER_HOME/conf
vim zoo.cfg
# 心跳单位,2s
tickTime=2000
# zookeeper-3初始化的同步超时时间,10个心跳单位,也即20s
initLimit=10
# 普通同步:发送一个请求并得到响应的超时时间,5个心跳单位也即10s
syncLimit=5
# 内存快照数据的存储位置
dataDir=/home/zookeeper-3/data
# 事务日志的存储位置
dataLogDir=/home/zookeeper-3/datalog
# 当前zookeeper-3节点的端口 
clientPort=2181
# 单个客户端到集群中单个节点的并发连接数,通过ip判断是否同一个客户端,默认60
maxClientCnxns=1000
# 保留7个内存快照文件在dataDir中,默认保留3个
autopurge.snapRetainCount=7
# 清除快照的定时任务,默认1小时,如果设置为0,标识关闭清除任务
autopurge.purgeInterval=1
#允许客户端连接设置的最小超时时间,默认2个心跳单位
minSessionTimeout=4000
#允许客户端连接设置的最大超时时间,默认是20个心跳单位,也即40s,
maxSessionTimeout=300000
#zookeeper-3 3.5.5启动默认会把AdminService服务启动,这个服务默认是8080端口
admin.serverPort=9001
#集群地址配置
server.1=spark01:2888:3888
server.2=spark02:2888:3888
server.3=spark03:2888:3888
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/zookeeper-3/data
dataLogDir=/home/zookeeper-3/datalog 
clientPort=2181
maxClientCnxns=1000
autopurge.snapRetainCount=7
autopurge.purgeInterval=1
minSessionTimeout=4000
maxSessionTimeout=300000
admin.serverPort=9001
server.1=spark01:2888:3888
server.2=spark02:2888:3888
server.3=spark03:2888:3888

2.2 保存后根据配置文件创建目录

在每台服务器上执行

mkdir -p /home/zookeeper-3/data
mkdir -p /home/zookeeper-3/datalog

2.3 myid

spark01

echo 1 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid

spark02

echo 2 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid

spark03

echo 3 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid

2.4 编写zookeeper-3开机启动脚本

在/etc/systemd/system/文件夹下创建一个启动脚本zookeeper-3.service

注意:在每台服务器上编写

cd /etc/systemd/system
vim zookeeper.service

内容如下:

[Unit]
Description=zookeeper
After=syslog.target network.target[Service]
Type=forking
# 指定zookeeper-3 日志文件路径,也可以在zkServer.sh 中定义
Environment=ZOO_LOG_DIR=/home/zookeeper-3/datalog
# 指定JDK路径,也可以在zkServer.sh 中定义
Environment=JAVA_HOME=/opt/soft/jdk-8
ExecStart=/opt/soft/zookeeper-3/bin/zkServer.sh start
ExecStop=/opt/soft/zookeeper-3/bin/zkServer.sh stop
Restart=always
User=root
Group=root[Install]
WantedBy=multi-user.target
[Unit]
Description=zookeeper
After=syslog.target network.target[Service]
Type=forking
Environment=ZOO_LOG_DIR=/home/zookeeper-3/datalog
Environment=JAVA_HOME=/opt/soft/jdk-8
ExecStart=/opt/soft/zookeeper-3/bin/zkServer.sh start
ExecStop=/opt/soft/zookeeper-3/bin/zkServer.sh stop
Restart=always
User=root
Group=root[Install]
WantedBy=multi-user.target
systemctl daemon-reload
# 等所有主机配置好后再执行以下命令
systemctl start zookeeper
systemctl enable zookeeper
systemctl status zookeeper

3. hadoop

修改配置文件

cd  $HADOOP_HOME/etc/hadoop
  • hadoop-env.sh
  • core-site.xml
  • hdfs-site.xml
  • workers
  • mapred-site.xml
  • yarn-site.xml

hadoop-env.sh 文件末尾追加

export JAVA_HOME=/opt/soft/jdk-8
export HADOOP_OPTS="--add-opens java.base/java.lang=ALL-UNNAMED"export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_ZKFC_USER=root
export HDFS_JOURNALNODE_USER=rootexport YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
--><!-- Put site-specific property overrides in this file. --><configuration><property><name>fs.defaultFS</name><value>hdfs://lihaozhe</value></property><property><name>hadoop.tmp.dir</name><value>/home/hadoop/data</value></property><property><name>ha.zookeeper.quorum</name><value>spark01:2181,spark02:2181,spark03:2181</value></property><property><name>hadoop.http.staticuser.user</name><value>root</value></property><property><name>dfs.permissions.enabled</name><value>false</value></property><property><name>hadoop.proxyuser.root.hosts</name><value>*</value></property><property><name>hadoop.proxyuser.root.groups</name><value>*</value></property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
--><!-- Put site-specific property overrides in this file. --><configuration><property><name>dfs.nameservices</name><value>lihaozhe</value></property><property><name>dfs.ha.namenodes.lihaozhe</name><value>nn1,nn2</value></property><property><name>dfs.namenode.rpc-address.lihaozhe.nn1</name><value>spark01:8020</value></property><property><name>dfs.namenode.rpc-address.lihaozhe.nn2</name><value>spark02:8020</value></property><property><name>dfs.namenode.http-address.lihaozhe.nn1</name><value>spark01:9870</value></property><property><name>dfs.namenode.http-address.lihaozhe.nn2</name><value>spark02:9870</value></property><property><name>dfs.namenode.shared.edits.dir</name><value>qjournal://spark01:8485;spark02:8485;spark03:8485/lihaozhe</value></property><property><name>dfs.client.failover.proxy.provider.lihaozhe</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property><property><name>dfs.ha.fencing.methods</name><value>sshfence</value></property><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/root/.ssh/id_rsa</value></property><property><name>dfs.journalnode.edits.dir</name><value>/home/hadoop/journalnode/data</value></property><property><name>dfs.ha.automatic-failover.enabled</name><value>true</value></property><property><name>dfs.safemode.threshold.pct</name><value>1</value></property>
</configuration>

workers

spark01
spark02
spark03

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
--><!-- Put site-specific property overrides in this file. --><configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.application.classpath</name><value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value></property><!-- yarn历史服务端口 --><property><name>mapreduce.jobhistory.address</name><value>spark01:10020</value></property><!-- yarn历史服务web访问端口 --><property><name>mapreduce.jobhistory.webapp.address</name><value>hadoop102:19888</value></property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
-->
<configuration><!-- Site specific YARN configuration properties --><property><name>yarn.resourcemanager.ha.enabled</name><value>true</value></property><property><name>yarn.resourcemanager.cluster-id</name><value>cluster1</value></property><property><name>yarn.resourcemanager.ha.rm-ids</name><value>rm1,rm2</value></property><property><name>yarn.resourcemanager.hostname.rm1</name><value>spark01</value></property><property><name>yarn.resourcemanager.hostname.rm2</name><value>spark02</value></property><property><name>yarn.resourcemanager.webapp.address.rm1</name><value>spark01:8088</value></property><property><name>yarn.resourcemanager.webapp.address.rm2</name><value>spark02:8088</value></property><property><name>yarn.resourcemanager.zk-address</name><value>spark01:2181,spark02:2181,spark03:2181</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.nodemanager.env-whitelist</name><value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value></property><!-- 是否将对容器实施物理内存限制 --><property><name>yarn.nodemanager.pmem-check-enabled</name><value>false</value></property><!-- 是否将对容器实施虚拟内存限制。 --><property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value></property><!-- 开启日志聚集 --><property><name>yarn.log-aggregation-enable</name><value>true</value></property><!-- 设置yarn历史服务器地址 --><property><name>yarn.log.server.url</name><value>http://spark01:19888/jobhistory/logs</value></property><!-- 保存的时间7天 --><property><name>yarn.log-aggregation.retain-seconds</name><value>604800</value></property>
</configuration>

4. 配置ssh免密钥登录

创建本地秘钥并将公共秘钥写入认证文件

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
ssh-copy-id root@spark01
ssh-copy-id root@spark02
ssh-copy-id root@spark03
ssh root@spark01
exit
ssh root@spark02
exit
ssh root@spark03
exit

5. 分发软件和配置文件

scp -r /etc/profile.d root@spark02:/etc
scp -r /etc/profile.d root@spark03:/etc
scp -r /opt/soft/zookeeper-3 root@spark02:/opt/soft
scp -r /opt/soft/zookeeper-3 root@spark03:/opt/soft
scp -r /opt/soft/hadoop-3/etc/hadoop/* root@spark02:/opt/soft/hadoop-3/etc/hadoop/
scp -r /opt/soft/hadoop-3/etc/hadoop/* root@spark03:/opt/soft/hadoop-3/etc/hadoop/

6. 在各服务器上使环境变量生效

source /etc/profile

7. 启动zookeeper

7.1 myid

spark01

echo 1 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid

spark02

echo 2 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid

spark03

echo 3 > /home/zookeeper-3/data/myid
more /home/zookeeper-3/data/myid

7.2 启动服务

在各节点执行以下命令

systemctl daemon-reload
systemctl start zookeeper
systemctl enable zookeeper
systemctl status zookeeper

7.3 验证

jps
zkServer.sh status

8. Hadoop初始化

1.	启动三个zookeeper:zkServer.sh start
2.	启动三个JournalNode:hadoop-daemon.sh start journalnode 或者 hdfs --daemon start journalnode
7.	在其中一个namenode上格式化:hdfs namenode -format
8.	把刚刚格式化之后的元数据拷贝到另外一个namenode上a)	启动刚刚格式化的namenode :hadoop-daemon.sh start namenodeb)	在没有格式化的namenode上执行:hdfs namenode -bootstrapStandbyc)	启动第二个namenode: hadoop-daemon.sh start namenode
9.	在其中一个namenode上初始化hdfs zkfc -formatZK
10.	停止上面节点:stop-dfs.sh
11.	全面启动:start-all.sh
12. 启动resourcemanager节点 yarn-daemon.sh start resourcemanager
start-yarn.shhttp://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.5.0.tar高版本不需要执行第 1213. 启动历史服务
mapred --daemon start historyserver
14 15 16 不需要执行
14、安全模式hdfs dfsadmin -safemode enter  
hdfs dfsadmin -safemode leave15、查看哪些节点是namenodes并获取其状态
hdfs getconf -namenodes
hdfs haadmin -getServiceState spark0116、强制切换状态
hdfs haadmin -transitionToActive --forcemanual spark01

重点提示:

# 关机之前 依关闭服务
stop-yarn.sh
stop-dfs.sh
# 开机后 依次开启服务
start-dfs.sh
start-yarn.sh

或者

# 关机之前关闭服务
stop-all.sh
# 开机后开启服务
start-all.sh
#jps 检查进程正常后开启胡哦关闭在再做其它操作

9. 修改windows下hosts文件

C:\Windows\System32\drivers\etc\hosts

追加以下内容:

192.168.171.101	hadoop101
192.168.171.102	hadoop102
192.168.171.103	hadoop103

Windows11 注意 修改权限

  1. 开始搜索 cmd

    找到命令头提示符 以管理身份运行

    命令提示符cmd

    命令提示符cmd

  2. 进入 C:\Windows\System32\drivers\etc 目录

    cd drivers/etc
    

    命令提示符cmd

  3. 去掉 hosts文件只读属性

    attrib -r hosts
    

    dos命令去掉文件只读属性

  4. 打开 hosts 配置文件

    start hosts
    

    dos命令打开文件

  5. 追加以下内容后保存

    192.168.171.101	spark01
    192.168.171.102	spark02
    192.168.171.103	spark03
    

10. 测试

12.1 浏览器访问hadoop集群

浏览器访问: http://spark01:9870

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

浏览器访问:http://spark01:8088

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

浏览器访问:http://spark01:19888/

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

12.2 测试 hdfs

本地文件系统创建 测试文件 wcdata.txt

vim wcdata.txt
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive
FlinkHBase Flink
Hive StormHive Flink HadoopHBase
HiveHadoop Spark HBase StormHBase
Hadoop Hive FlinkHBase Flink Hive StormHive
Flink HadoopHBase Hive
Spark HBaseHive Flink
Storm Hadoop HBase SparkFlinkHBase
StormHBase Hadoop Hive

在 HDFS 上创建目录 /wordcount/input

hdfs dfs -mkdir -p /wordcount/input

查看 HDFS 目录结构

hdfs dfs -ls /
hdfs dfs -ls /wordcount
hdfs dfs -ls /wordcount/input

上传本地测试文件 wcdata.txt 到 HDFS 上 /wordcount/input

hdfs dfs -put wcdata.txt /wordcount/input

检查文件是否上传成功

hdfs dfs -ls /wordcount/input
hdfs dfs -cat /wordcount/input/wcdata.txt

12.2 测试 mapreduce

计算 PI 的值

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar pi 10 10

单词统计

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar wordcount /wordcount/input/wcdata.txt /wordcount/result
hdfs dfs -ls /wordcount/result
hdfs dfs -cat /wordcount/result/part-r-00000

11. 元数据

hadoop101

cd /home/hadoop_data/dfs/name/current
ls

看到如下内容:

edits_0000000000000000001-0000000000000000009  edits_inprogress_0000000000000000299  fsimage_0000000000000000298      VERSION
edits_0000000000000000010-0000000000000000011  fsimage_0000000000000000011           fsimage_0000000000000000298.md5
edits_0000000000000000012-0000000000000000298  fsimage_0000000000000000011.md5       seen_txid

查看fsimage

hdfs oiv -p XML -i fsimage_0000000000000000011

将元数据内容按照指定格式读取后写入到新文件中

hdfs oiv -p XML -i fsimage_0000000000000000011 -o /opt/soft/fsimage.xml

查看edits

将元数据内容按照指定格式读取后写入到新文件中

hdfs oev -p XML -i edits_inprogress_0000000000000000299  -o /opt/soft/edit.xml

这篇关于CentOS 搭建 Hadoop3 高可用集群的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/329151

相关文章

使用Python实现快速搭建本地HTTP服务器

《使用Python实现快速搭建本地HTTP服务器》:本文主要介绍如何使用Python快速搭建本地HTTP服务器,轻松实现一键HTTP文件共享,同时结合二维码技术,让访问更简单,感兴趣的小伙伴可以了... 目录1. 概述2. 快速搭建 HTTP 文件共享服务2.1 核心思路2.2 代码实现2.3 代码解读3.

MySQL双主搭建+keepalived高可用的实现

《MySQL双主搭建+keepalived高可用的实现》本文主要介绍了MySQL双主搭建+keepalived高可用的实现,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,... 目录一、测试环境准备二、主从搭建1.创建复制用户2.创建复制关系3.开启复制,确认复制是否成功4.同

CentOS 7部署主域名服务器 DNS的方法

《CentOS7部署主域名服务器DNS的方法》文章详细介绍了在CentOS7上部署主域名服务器DNS的步骤,包括安装BIND服务、配置DNS服务、添加域名区域、创建区域文件、配置反向解析、检查配置... 目录1. 安装 BIND 服务和工具2.  配置 BIND 服务3 . 添加你的域名区域配置4.创建区域

Centos环境下Tomcat虚拟主机配置详细教程

《Centos环境下Tomcat虚拟主机配置详细教程》这篇文章主要讲的是在CentOS系统上,如何一步步配置Tomcat的虚拟主机,内容很简单,从目录准备到配置文件修改,再到重启和测试,手把手带你搞定... 目录1. 准备虚拟主机的目录和内容创建目录添加测试文件2. 修改 Tomcat 的 server.X

使用DeepSeek搭建个人知识库(在笔记本电脑上)

《使用DeepSeek搭建个人知识库(在笔记本电脑上)》本文介绍了如何在笔记本电脑上使用DeepSeek和开源工具搭建个人知识库,通过安装DeepSeek和RAGFlow,并使用CherryStudi... 目录部署环境软件清单安装DeepSeek安装Cherry Studio安装RAGFlow设置知识库总

Linux搭建Mysql主从同步的教程

《Linux搭建Mysql主从同步的教程》:本文主要介绍Linux搭建Mysql主从同步的教程,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录linux搭建mysql主从同步1.启动mysql服务2.修改Mysql主库配置文件/etc/my.cnf3.重启主库my

国内环境搭建私有知识问答库踩坑记录(ollama+deepseek+ragflow)

《国内环境搭建私有知识问答库踩坑记录(ollama+deepseek+ragflow)》本文给大家利用deepseek模型搭建私有知识问答库的详细步骤和遇到的问题及解决办法,感兴趣的朋友一起看看吧... 目录1. 第1步大家在安装完ollama后,需要到系统环境变量中添加两个变量2. 第3步 “在cmd中

CentOS系统Maven安装教程分享

《CentOS系统Maven安装教程分享》本文介绍了如何在CentOS系统中安装Maven,并提供了一个简单的实际应用案例,安装Maven需要先安装Java和设置环境变量,Maven可以自动管理项目的... 目录准备工作下载并安装Maven常见问题及解决方法实际应用案例总结Maven是一个流行的项目管理工具

本地搭建DeepSeek-R1、WebUI的完整过程及访问

《本地搭建DeepSeek-R1、WebUI的完整过程及访问》:本文主要介绍本地搭建DeepSeek-R1、WebUI的完整过程及访问的相关资料,DeepSeek-R1是一个开源的人工智能平台,主... 目录背景       搭建准备基础概念搭建过程访问对话测试总结背景       最近几年,人工智能技术

5分钟获取deepseek api并搭建简易问答应用

《5分钟获取deepseekapi并搭建简易问答应用》本文主要介绍了5分钟获取deepseekapi并搭建简易问答应用,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需... 目录1、获取api2、获取base_url和chat_model3、配置模型参数方法一:终端中临时将加