解决金仓数据库KingbaseES V8R3 shared_buffer占用过多导致实例崩溃的问题

本文主要是介绍解决金仓数据库KingbaseES V8R3 shared_buffer占用过多导致实例崩溃的问题,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

问题描述

备库意外宕机,从集群日志只看出发生了主备切换,备库一直持续恢复备库没有成功,从数据库日志看到如下报错:

terminating connection because of crash of another server process
DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

根据报错提示,怀疑当时并发太高,或者业务繁忙导致shared_buffer不够用,进而导致数据库宕机。由于V8R3版本数据库没有办法收集kwr报告,所以不容易定位这个判断。

分析

现在模拟实验:

测试环境:

shared_buffer 设置成16MB
max_wal_size  设置成32MB
create table test01(id integer, val char(1024)); insert into test01 values(generate_series(1,2888600),repeat( chr(int4(random()*26)+65),1024));TEST=# create table test01(id integer, val char(1024));
CREATE TABLE
TEST=# insert into test01 values(generate_series(1,2888600),repeat( chr(int4(random()*26)+65),1024));等待......
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

ps命令看到了每个process,其中process13674占用了大量内存

数据库日志警告发生corrupted shared memory 。实例崩溃,发生重启。

在这之前触发了大量检查点,这也符合预期,因为已经把max_wal_size调的足够小。需要不断写出page以保证足够的shared_buffer满足insert。

数据库也给出了合理建议增加参数“max_wal_size”大小。

2022-05-25 15:38:04 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:05 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:05 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:05 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:05 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:06 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:06 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:07 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:07 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:07 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:07 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:07 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:07 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:09 CST LOG:  checkpoints are occurring too frequently (2 seconds apart)
2022-05-25 15:38:09 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:09 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:09 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:10 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:10 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:12 CST LOG:  checkpoints are occurring too frequently (2 seconds apart)
2022-05-25 15:38:12 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:12 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:12 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:12 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:12 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:13 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:13 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:14 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:14 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:15 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:15 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:15 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:15 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:15 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:15 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:16 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:16 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:17 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:17 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:17 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:17 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:18 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:18 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:18 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:18 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:18 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:18 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:19 CST LOG:  server process (PID 13674) was terminated by signal 9: Killed
2022-05-25 15:38:19 CST DETAIL:  Failed process was running: insert into test01 values(generate_series(1,2888600),repeat( chr(int4(random()*26)+65),1024));
2022-05-25 15:38:19 CST LOG:  terminating any other active server processes
2022-05-25 15:38:19 CST WARNING:  terminating connection because of crash of another server process
2022-05-25 15:38:19 CST DETAIL:  The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-05-25 15:38:19 CST HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-05-25 15:38:19 CST LOG:  all server processes terminated; reinitializing
2022-05-25 15:38:19 CST LOG:  database system was interrupted; last known up at 2022-05-25 15:38:19 CST
2022-05-25 15:38:19 CST LOG:  database system was not properly shut down; automatic recovery in progress
2022-05-25 15:38:19 CST LOG:  redo starts at 0/8F050338
2022-05-25 15:38:19 CST LOG:  redo wal segment count 1
2022-05-25 15:38:19 CST LOG:  invalid record length at 0/8FA6C178: wanted 24, got 0
2022-05-25 15:38:19 CST LOG:  complete: 1/1
2022-05-25 15:38:19 CST LOG:  redo done at 0/8FA6C108
2022-05-25 15:38:19 CST LOG:  MultiXact member wraparound protections are now enabled
2022-05-25 15:38:19 CST LOG:  redo done at 0/8FA6C108
2022-05-25 15:38:19 CST LOG:  MultiXact member wraparound protections are now enabled
2022-05-25 15:38:19 CST LOG:  database system is ready to accept connections
2022-05-25 15:38:19 CST LOG:  autovacuum launcher started
2022-05-25 15:38:19 CST LOG:  starting syslogical supervisor
2022-05-25 15:38:19 CST LOG:  starting syslogical database manager for database TEST
2022-05-25 15:38:19 CST LOG:  manager worker [13929] at slot 0 generation 1 detaching cleanly
2022-05-25 15:38:20 CST LOG:  starting syslogical database manager for database TEMPLATE1
2022-05-25 15:38:20 CST LOG:  manager worker [13930] at slot 0 generation 2 detaching cleanly
2022-05-25 15:38:20 CST LOG:  starting syslogical database manager for database TEMPLATE2
2022-05-25 15:38:20 CST LOG:  manager worker [13932] at slot 0 generation 3 detaching cleanly
2022-05-25 15:38:20 CST LOG:  starting syslogical database manager for database SAMPLES
2022-05-25 15:38:20 CST LOG:  manager worker [13935] at slot 0 generation 4 detaching cleanly
2022-05-25 15:38:20 CST LOG:  starting syslogical database manager for database SECURITY
2022-05-25 15:38:20 CST LOG:  manager worker [13940] at slot 0 generation 5 detaching cleanly

再查看那个占用内存高的进程已经被干掉。

需要说明的是同样的环境,我在KingbaseV8R6上并没有复现,也没有发生宕机。能看到插入时间比较慢,看到进程占用内存没有如此之高。

总结:

由于突然性大并发导致数据库资源使用上限是常有之事,我们尽量和业务协商保持业务稳定,如有新上业务要提前评估内存,cpu,io使用情况后做决定。是否有可用内存以供增加,不然很容易像以上例子导致数据库崩溃。

尽量升级到高版本规避此问题。或在系统级限定资源消费上限。

这篇关于解决金仓数据库KingbaseES V8R3 shared_buffer占用过多导致实例崩溃的问题的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/318102

相关文章

Spring Security基于数据库验证流程详解

Spring Security 校验流程图 相关解释说明(认真看哦) AbstractAuthenticationProcessingFilter 抽象类 /*** 调用 #requiresAuthentication(HttpServletRequest, HttpServletResponse) 决定是否需要进行验证操作。* 如果需要验证,则会调用 #attemptAuthentica

好题——hdu2522(小数问题:求1/n的第一个循环节)

好喜欢这题,第一次做小数问题,一开始真心没思路,然后参考了网上的一些资料。 知识点***********************************无限不循环小数即无理数,不能写作两整数之比*****************************(一开始没想到,小学没学好) 此题1/n肯定是一个有限循环小数,了解这些后就能做此题了。 按照除法的机制,用一个函数表示出来就可以了,代码如下

hdu1043(八数码问题,广搜 + hash(实现状态压缩) )

利用康拓展开将一个排列映射成一个自然数,然后就变成了普通的广搜题。 #include<iostream>#include<algorithm>#include<string>#include<stack>#include<queue>#include<map>#include<stdio.h>#include<stdlib.h>#include<ctype.h>#inclu

MySQL数据库宕机,启动不起来,教你一招搞定!

作者介绍:老苏,10余年DBA工作运维经验,擅长Oracle、MySQL、PG、Mongodb数据库运维(如安装迁移,性能优化、故障应急处理等)公众号:老苏畅谈运维欢迎关注本人公众号,更多精彩与您分享。 MySQL数据库宕机,数据页损坏问题,启动不起来,该如何排查和解决,本文将为你说明具体的排查过程。 查看MySQL error日志 查看 MySQL error日志,排查哪个表(表空间

安卓链接正常显示,ios#符被转义%23导致链接访问404

原因分析: url中含有特殊字符 中文未编码 都有可能导致URL转换失败,所以需要对url编码处理  如下: guard let allowUrl = webUrl.addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed) else {return} 后面发现当url中有#号时,会被误伤转义为%23,导致链接无法访问

如何解决线上平台抽佣高 线下门店客流少的痛点!

目前,许多传统零售店铺正遭遇客源下降的难题。尽管广告推广能带来一定的客流,但其费用昂贵。鉴于此,众多零售商纷纷选择加入像美团、饿了么和抖音这样的大型在线平台,但这些平台的高佣金率导致了利润的大幅缩水。在这样的市场环境下,商家之间的合作网络逐渐成为一种有效的解决方案,通过资源和客户基础的共享,实现共同的利益增长。 以最近在上海兴起的一个跨行业合作平台为例,该平台融合了环保消费积分系统,在短

购买磨轮平衡机时应该注意什么问题和技巧

在购买磨轮平衡机时,您应该注意以下几个关键点: 平衡精度 平衡精度是衡量平衡机性能的核心指标,直接影响到不平衡量的检测与校准的准确性,从而决定磨轮的振动和噪声水平。高精度的平衡机能显著减少振动和噪声,提高磨削加工的精度。 转速范围 宽广的转速范围意味着平衡机能够处理更多种类的磨轮,适应不同的工作条件和规格要求。 振动监测能力 振动监测能力是评估平衡机性能的重要因素。通过传感器实时监

【机器学习】高斯过程的基本概念和应用领域以及在python中的实例

引言 高斯过程(Gaussian Process,简称GP)是一种概率模型,用于描述一组随机变量的联合概率分布,其中任何一个有限维度的子集都具有高斯分布 文章目录 引言一、高斯过程1.1 基本定义1.1.1 随机过程1.1.2 高斯分布 1.2 高斯过程的特性1.2.1 联合高斯性1.2.2 均值函数1.2.3 协方差函数(或核函数) 1.3 核函数1.4 高斯过程回归(Gauss

缓存雪崩问题

缓存雪崩是缓存中大量key失效后当高并发到来时导致大量请求到数据库,瞬间耗尽数据库资源,导致数据库无法使用。 解决方案: 1、使用锁进行控制 2、对同一类型信息的key设置不同的过期时间 3、缓存预热 1. 什么是缓存雪崩 缓存雪崩是指在短时间内,大量缓存数据同时失效,导致所有请求直接涌向数据库,瞬间增加数据库的负载压力,可能导致数据库性能下降甚至崩溃。这种情况往往发生在缓存中大量 k

6.1.数据结构-c/c++堆详解下篇(堆排序,TopK问题)

上篇:6.1.数据结构-c/c++模拟实现堆上篇(向下,上调整算法,建堆,增删数据)-CSDN博客 本章重点 1.使用堆来完成堆排序 2.使用堆解决TopK问题 目录 一.堆排序 1.1 思路 1.2 代码 1.3 简单测试 二.TopK问题 2.1 思路(求最小): 2.2 C语言代码(手写堆) 2.3 C++代码(使用优先级队列 priority_queue)