问题 HBase RegionServer频繁挂掉

2024-02-24 22:20

本文主要是介绍问题 HBase RegionServer频繁挂掉,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

错误日志
2019-09-21 20:42:17,264 INFO org.apache.hadoop.hbase.ScheduledChore: Chore: CompactionChecker missed its start time
2019-09-21 20:42:17,273 WARN org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 156013ms
GC pool 'ParNew' had collection(s): count=1 time=156080ms
2019-09-21 20:42:17,264 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 158843ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2019-09-21 20:42:17,281 WARN org.apache.hadoop.hbase.ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1569069581136,"responsesize":2051,"method":"Scan","processingtimems":156145,"client":"10.97.202.19:58322","queuetimems":0,"class":"HRegionServer"}
2019-09-21 20:42:17,300 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server hdh19,60020,1568940808648: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hdh19,60020,1568940808648 as dead serverat org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:426)at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:331)at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:345)at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8617)at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hdh19,60020,1568940808648 as dead serverat org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:426)at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:331)at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:345)at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8617)at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:423)at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:327)at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1158)at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:966)at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.YouAreDeadException): org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hdh19,60020,1568940808648 as dead server
......
2019-09-21 20:42:17,621 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x86cf6a57553f9a7 has expired, closing socket connection
2019-09-21 20:42:17,621 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server hdh19,60020,1568940808648: regionserver:60020-0x86cf6a57553f9a7, quorum=hdh12:2181,hdh53:2181,hdh1-07.p.xyidc:2181,hdh52:2181,hdh1-10.p.xyidc:2181, baseZNode=/hbase regionserver:60020-0x86cf6a57553f9a7 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expiredat org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:700)at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:611)at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2019-09-21 20:42:42,269 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getChildren failed after 4 attempts
2019-09-21 20:42:42,269 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020-0x86cf6a57553f9a7, quorum=hdh12:2181,hdh53:2181,hdh1-07.p.xyidc:2181,hdh52:2181,hdh1-10.p.xyidc:2181, baseZNode=/hbase Unable to list children of znode /hbase/replication/rs/hdh19,60020,1568940808648
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/rs/hdh19,60020,1568940808648at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:456)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:484)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenBFSAndWatchThem(ZKUtil.java:1476)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursivelyMultiOrSequential(ZKUtil.java:1398)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursively(ZKUtil.java:1280)at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeAllQueues(ReplicationQueuesZKImpl.java:187)at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:310)at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:180)at org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:172)at org.apache.hadoop.hbase.regionserver.HRegionServer.stopServiceThreads(HRegionServer.java:2162)at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1088)at java.lang.Thread.run(Thread.java:748)
2019-09-21 20:42:42,270 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020-0x86cf6a57553f9a7, quorum=hdh12:2181,hdh53:2181,hdh1-07.p.xyidc:2181,hdh52:2181,hdh1-10.p.xyidc:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/rs/hdh19,60020,1568940808648at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:456)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:484)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenBFSAndWatchThem(ZKUtil.java:1476)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursivelyMultiOrSequential(ZKUtil.java:1398)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursively(ZKUtil.java:1280)at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeAllQueues(ReplicationQueuesZKImpl.java:187)at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:310)at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:180)

通过线上日志可以看到 hbase由于GC时间较长,zk服务自动剔除该hbase节点,关闭当前连接,这种情况下,hbase框架选择停止了不能连接到zookeeper的 hbase regionserver,因为请求到这个超时节点的请求可能已经转到其他的节点。

解决方法
提高hbase zk的超时时间

hbase设置超时时间5分钟
只设置hbase的超时时间是不够的的,还需要设置zk的最大超时时间
zk最大超时时间5分钟

这篇关于问题 HBase RegionServer频繁挂掉的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/743547

相关文章

SpringBoot启动报错的11个高频问题排查与解决终极指南

《SpringBoot启动报错的11个高频问题排查与解决终极指南》这篇文章主要为大家详细介绍了SpringBoot启动报错的11个高频问题的排查与解决,文中的示例代码讲解详细,感兴趣的小伙伴可以了解一... 目录1. 依赖冲突:NoSuchMethodError 的终极解法2. Bean注入失败:No qu

MySQL新增字段后Java实体未更新的潜在问题与解决方案

《MySQL新增字段后Java实体未更新的潜在问题与解决方案》在Java+MySQL的开发中,我们通常使用ORM框架来映射数据库表与Java对象,但有时候,数据库表结构变更(如新增字段)后,开发人员可... 目录引言1. 问题背景:数据库与 Java 实体不同步1.1 常见场景1.2 示例代码2. 不同操作

如何解决mysql出现Incorrect string value for column ‘表项‘ at row 1错误问题

《如何解决mysql出现Incorrectstringvalueforcolumn‘表项‘atrow1错误问题》:本文主要介绍如何解决mysql出现Incorrectstringv... 目录mysql出现Incorrect string value for column ‘表项‘ at row 1错误报错

如何解决Spring MVC中响应乱码问题

《如何解决SpringMVC中响应乱码问题》:本文主要介绍如何解决SpringMVC中响应乱码问题,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录Spring MVC最新响应中乱码解决方式以前的解决办法这是比较通用的一种方法总结Spring MVC最新响应中乱码解

pip无法安装osgeo失败的问题解决

《pip无法安装osgeo失败的问题解决》本文主要介绍了pip无法安装osgeo失败的问题解决,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一... 进入官方提供的扩展包下载网站寻找版本适配的whl文件注意:要选择cp(python版本)和你py

解决Java中基于GeoTools的Shapefile读取乱码的问题

《解决Java中基于GeoTools的Shapefile读取乱码的问题》本文主要讨论了在使用Java编程语言进行地理信息数据解析时遇到的Shapefile属性信息乱码问题,以及根据不同的编码设置进行属... 目录前言1、Shapefile属性字段编码的情况:一、Shp文件常见的字符集编码1、System编码

Spring MVC使用视图解析的问题解读

《SpringMVC使用视图解析的问题解读》:本文主要介绍SpringMVC使用视图解析的问题解读,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录Spring MVC使用视图解析1. 会使用视图解析的情况2. 不会使用视图解析的情况总结Spring MVC使用视图

Redis解决缓存击穿问题的两种方法

《Redis解决缓存击穿问题的两种方法》缓存击穿问题也叫热点Key问题,就是⼀个被高并发访问并且缓存重建业务较复杂的key突然失效了,无数的请求访问会在瞬间给数据库带来巨大的冲击,本文给大家介绍了Re... 目录引言解决办法互斥锁(强一致,性能差)逻辑过期(高可用,性能优)设计逻辑过期时间引言缓存击穿:给

Java程序运行时出现乱码问题的排查与解决方法

《Java程序运行时出现乱码问题的排查与解决方法》本文主要介绍了Java程序运行时出现乱码问题的排查与解决方法,包括检查Java源文件编码、检查编译时的编码设置、检查运行时的编码设置、检查命令提示符的... 目录一、检查 Java 源文件编码二、检查编译时的编码设置三、检查运行时的编码设置四、检查命令提示符

Jackson库进行JSON 序列化时遇到了无限递归(Infinite Recursion)的问题及解决方案

《Jackson库进行JSON序列化时遇到了无限递归(InfiniteRecursion)的问题及解决方案》使用Jackson库进行JSON序列化时遇到了无限递归(InfiniteRecursi... 目录解决方案‌1. 使用 @jsonIgnore 忽略一个方向的引用2. 使用 @JsonManagedR