问题 HBase RegionServer频繁挂掉

2024-02-24 22:20

本文主要是介绍问题 HBase RegionServer频繁挂掉,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

错误日志
2019-09-21 20:42:17,264 INFO org.apache.hadoop.hbase.ScheduledChore: Chore: CompactionChecker missed its start time
2019-09-21 20:42:17,273 WARN org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 156013ms
GC pool 'ParNew' had collection(s): count=1 time=156080ms
2019-09-21 20:42:17,264 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 158843ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2019-09-21 20:42:17,281 WARN org.apache.hadoop.hbase.ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1569069581136,"responsesize":2051,"method":"Scan","processingtimems":156145,"client":"10.97.202.19:58322","queuetimems":0,"class":"HRegionServer"}
2019-09-21 20:42:17,300 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server hdh19,60020,1568940808648: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hdh19,60020,1568940808648 as dead serverat org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:426)at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:331)at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:345)at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8617)at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hdh19,60020,1568940808648 as dead serverat org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:426)at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:331)at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:345)at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8617)at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:423)at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:327)at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1158)at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:966)at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.YouAreDeadException): org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hdh19,60020,1568940808648 as dead server
......
2019-09-21 20:42:17,621 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x86cf6a57553f9a7 has expired, closing socket connection
2019-09-21 20:42:17,621 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server hdh19,60020,1568940808648: regionserver:60020-0x86cf6a57553f9a7, quorum=hdh12:2181,hdh53:2181,hdh1-07.p.xyidc:2181,hdh52:2181,hdh1-10.p.xyidc:2181, baseZNode=/hbase regionserver:60020-0x86cf6a57553f9a7 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expiredat org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:700)at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:611)at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2019-09-21 20:42:42,269 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getChildren failed after 4 attempts
2019-09-21 20:42:42,269 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020-0x86cf6a57553f9a7, quorum=hdh12:2181,hdh53:2181,hdh1-07.p.xyidc:2181,hdh52:2181,hdh1-10.p.xyidc:2181, baseZNode=/hbase Unable to list children of znode /hbase/replication/rs/hdh19,60020,1568940808648
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/rs/hdh19,60020,1568940808648at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:456)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:484)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenBFSAndWatchThem(ZKUtil.java:1476)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursivelyMultiOrSequential(ZKUtil.java:1398)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursively(ZKUtil.java:1280)at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeAllQueues(ReplicationQueuesZKImpl.java:187)at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:310)at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:180)at org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:172)at org.apache.hadoop.hbase.regionserver.HRegionServer.stopServiceThreads(HRegionServer.java:2162)at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1088)at java.lang.Thread.run(Thread.java:748)
2019-09-21 20:42:42,270 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020-0x86cf6a57553f9a7, quorum=hdh12:2181,hdh53:2181,hdh1-07.p.xyidc:2181,hdh52:2181,hdh1-10.p.xyidc:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/rs/hdh19,60020,1568940808648at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:456)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:484)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenBFSAndWatchThem(ZKUtil.java:1476)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursivelyMultiOrSequential(ZKUtil.java:1398)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursively(ZKUtil.java:1280)at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeAllQueues(ReplicationQueuesZKImpl.java:187)at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:310)at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:180)

通过线上日志可以看到 hbase由于GC时间较长,zk服务自动剔除该hbase节点,关闭当前连接,这种情况下,hbase框架选择停止了不能连接到zookeeper的 hbase regionserver,因为请求到这个超时节点的请求可能已经转到其他的节点。

解决方法
提高hbase zk的超时时间

hbase设置超时时间5分钟
只设置hbase的超时时间是不够的的,还需要设置zk的最大超时时间
zk最大超时时间5分钟

这篇关于问题 HBase RegionServer频繁挂掉的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/743547

相关文章

Nginx启动失败:端口80被占用问题的解决方案

《Nginx启动失败:端口80被占用问题的解决方案》在Linux服务器上部署Nginx时,可能会遇到Nginx启动失败的情况,尤其是错误提示bind()to0.0.0.0:80failed,这种问题通... 目录引言问题描述问题分析解决方案1. 检查占用端口 80 的进程使用 netstat 命令使用 ss

mybatis和mybatis-plus设置值为null不起作用问题及解决

《mybatis和mybatis-plus设置值为null不起作用问题及解决》Mybatis-Plus的FieldStrategy主要用于控制新增、更新和查询时对空值的处理策略,通过配置不同的策略类型... 目录MyBATis-plusFieldStrategy作用FieldStrategy类型每种策略的作

linux下多个硬盘划分到同一挂载点问题

《linux下多个硬盘划分到同一挂载点问题》在Linux系统中,将多个硬盘划分到同一挂载点需要通过逻辑卷管理(LVM)来实现,首先,需要将物理存储设备(如硬盘分区)创建为物理卷,然后,将这些物理卷组成... 目录linux下多个硬盘划分到同一挂载点需要明确的几个概念硬盘插上默认的是非lvm总结Linux下多

Python Jupyter Notebook导包报错问题及解决

《PythonJupyterNotebook导包报错问题及解决》在conda环境中安装包后,JupyterNotebook导入时出现ImportError,可能是由于包版本不对应或版本太高,解决方... 目录问题解决方法重新安装Jupyter NoteBook 更改Kernel总结问题在conda上安装了

pip install jupyterlab失败的原因问题及探索

《pipinstalljupyterlab失败的原因问题及探索》在学习Yolo模型时,尝试安装JupyterLab但遇到错误,错误提示缺少Rust和Cargo编译环境,因为pywinpty包需要它... 目录背景问题解决方案总结背景最近在学习Yolo模型,然后其中要下载jupyter(有点LSVmu像一个

解决jupyterLab打开后出现Config option `template_path`not recognized by `ExporterCollapsibleHeadings`问题

《解决jupyterLab打开后出现Configoption`template_path`notrecognizedby`ExporterCollapsibleHeadings`问题》在Ju... 目录jupyterLab打开后出现“templandroidate_path”相关问题这是 tensorflo

如何解决Pycharm编辑内容时有光标的问题

《如何解决Pycharm编辑内容时有光标的问题》文章介绍了如何在PyCharm中配置VimEmulator插件,包括检查插件是否已安装、下载插件以及安装IdeaVim插件的步骤... 目录Pycharm编辑内容时有光标1.如果Vim Emulator前面有对勾2.www.chinasem.cn如果tools工

最长公共子序列问题的深度分析与Java实现方式

《最长公共子序列问题的深度分析与Java实现方式》本文详细介绍了最长公共子序列(LCS)问题,包括其概念、暴力解法、动态规划解法,并提供了Java代码实现,暴力解法虽然简单,但在大数据处理中效率较低,... 目录最长公共子序列问题概述问题理解与示例分析暴力解法思路与示例代码动态规划解法DP 表的构建与意义动

Java多线程父线程向子线程传值问题及解决

《Java多线程父线程向子线程传值问题及解决》文章总结了5种解决父子之间数据传递困扰的解决方案,包括ThreadLocal+TaskDecorator、UserUtils、CustomTaskDeco... 目录1 背景2 ThreadLocal+TaskDecorator3 RequestContextH

关于Spring @Bean 相同加载顺序不同结果不同的问题记录

《关于Spring@Bean相同加载顺序不同结果不同的问题记录》本文主要探讨了在Spring5.1.3.RELEASE版本下,当有两个全注解类定义相同类型的Bean时,由于加载顺序不同,最终生成的... 目录问题说明测试输出1测试输出2@Bean注解的BeanDefiChina编程nition加入时机总结问题说明