hadoop/hdfs QJM 笔记

2024-03-05 04:20

文章标签 笔记 hdfs hadoop qjm

本文主要是介绍hadoop/hdfs QJM 笔记，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

在hadoop hdfs 中有一个角色叫做 JournalNode, 作用是储存对hdfs修改的日志，如果NN挂掉，通过重新播放日志来恢复。

在原来的模式下，这些日志文件都是放在active的NameNode(NN) 中，又starndy 的NN 定期来合并这些日志文件(压缩等待),然后将合并后的文件合并到active 的NN,关键问题是，如果这个NN挂掉了那么整个集群就挂掉了，为了解决这个问题.

1. 将日志文件由NN写到几台机器上journalnode,几台的原因是担心有机器挂掉.

2. 设置两个NN，一个是active的，一个是backup的，如果active的挂掉了，通过zookeeper，让backup的NN变成active的NN，通过重播journalnode上的日志，让backup的NN的数据同步到active NN 挂掉的状态, 这样集群的容错能力更强。

下面是原文.

Background

Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.

This impacted the total availability of the HDFS cluster in two major ways:

In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode.
Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in windows of cluster downtime.

The HDFS High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance.