mha mysql 坑_MHA-Failover可能遇到的坑

2023-10-10 21:10

本文主要是介绍mha mysql 坑_MHA-Failover可能遇到的坑,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

一、主从数据一致性

1.1、如何保证主从数据一致性

在MySQL中,一次事务提交后,需要写undo、写redo、写binlog,写数据文件等等。在这个过程中,可能在某个步骤发生crash,就有可能导致主从数据的不一致。为了避免这种情况,我们需要调整主从上面相关选项配置,确保即便发生crash了,也不能发生主从复制的数据丢失。

MASTER上修改配置

innodb_flush_log_at_trx_commit = 1 -->redo log 1写磁盘、2写系统缓存(操作系统挂可能丢数据)、0写redo log buffer(mysql挂可能丢数据)

sync_binlog= 1 --> binlog 1写磁盘、0写系统缓存

保证每次事务提交后,都能实时刷新到磁盘中,尤其是确保每次事务对应的binlog都能及时刷新到磁盘中

SLAVE上修改配置

master_info_repository = "TABLE"relay_log_info_repository= "TABLE"relay_log_recovery= 1

确保在slave上和复制相关的元数据表也采用InnoDB引擎,受到InnoDB事务安全的保护;开启relay-log自动修复机制,发生crash时根据relay_log_info中记录的已执行的binlog位置从master上重新抓取回来再次应用,以此避免部分数据丢失的可能性。

这样配置后,正常情况下主从数据应该是一致的~

1.2、主从数据不一致,复制状态正常

• binlog_format='STATEMENT'

只要复制语句对应的表结构一致,主从数据是否一致不会影响复制状态

• binlog_format='ROW'

1、有主键/唯一索引的情况下,slave应用relay-log的过程只需匹配主键/唯一索引即可,不会考虑其他列与master上的原始值是否一致

2、slave update/delete master上永远不会访问的数据

一致性的保证,需要定期使用pt工具检测并同步啦●-●

二、relay_log_recovery && relay_log_purge

有时候,我们希望将MySQL的relay-log多保留一段时间,比如用于高可用切换后的数据补齐,于是就会设置relay_log_purge=0,禁止SQL_Thread在执行完一个relay-log后自动将其删除。

relay_log_recovery=1 && relay_log_purge=0会有什么坑

• 由于崩溃或停止MySQL时,SQL_Thread可能没有执行完全部的relay-log,最后一个relay-log中的一部分数据会被重新获取到新的文件中。也就是说,这部分数据重复了两次

• 如果SQL_Thread跟得很紧,则可能在IO_Thread写入relay-log,但还没有同步到磁盘时,就已经读取执行了。这时,就会造成新的文件和旧的文件中少了一部分数据

对于复制来说这样不会有什么影响,但如果我们读取relay-log来获取数据,必须注意这一点,否则就会造成数据不一致

三、MHA-Failover可能遇到的坑

传统复制环境,MHA利用Latest Slave的relay-log去补全其他Slave的与Latest Slave之间的差异数据;GTID环境,通过change master to利用binlog补全数据,不再依赖relay-log

为了方便模拟,本文选择手动Failover来检测MHA遇到上面提到的坑会出现什么现象?本文使用MHA-手动Failover流程(传统复制&GTID复制)中的基本环境

3.1、relay-log重复

人为暂停SQL_Thread,再关闭MySQL实例,模拟SQL_Thread没有执行完全部的relay-log

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

relay_log_recovery=1 && relay_log_purge=0#测试数据简写如下

Node1写入第一条记录->Node3停止io_thread

Node1写入第二条记录->

1、Node2从库stop slave sql_thread;2、Node1主库写入一条新数据row_new3、Node2从库shutdown;4、Node2从库启动mysql,start slave;

Node2停止io_thread

Node1写入第三条记录

View Code

暂停从库的SQL_Thread,主库写入新数据,新数据被IO_Thread获取写入到relay-log,然后重新启动从库的mysql实例,IO_Thread根据relay_log_info中记录的已执行的binlog位置从master上重新抓取回来再次应用,因此在relay-log中可以解析到row_new获取过两次~

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover

...

Fri Apr13 18:09:37 2018 - [info] * Phase 3.4: Master Log Apply Phase..

Fri Apr13 18:09:37 2018 - [info]

Fri Apr13 18:09:37 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.

Fri Apr13 18:09:37 2018 - [info] Starting recovery on 192.168.85.134(192.168.85.134:3307)..

Fri Apr13 18:09:37 2018 - [info] Generating diffs succeeded.

Fri Apr13 18:09:37 2018 - [info] Waiting untilall relay logs are applied.

Fri Apr13 18:09:37 2018 - [info] done.

Fri Apr13 18:09:37 2018 - [debug] Stopping SQL thread on 192.168.85.134(192.168.85.134:3307)..

Fri Apr13 18:09:37 2018 - [debug] done.

Fri Apr13 18:09:37 2018 - [info] Getting slave status..

Fri Apr13 18:09:37 2018 - [info] This slave(192.168.85.134)''s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000005:484). No need to recover from Exec_Master_Log_Pos.

Fri Apr13 18:09:37 2018 - [debug] Current max_allowed_packet is 4194304.

Fri Apr13 18:09:37 2018 -[debug] Tentatively setting max_allowed_packet to 1GB succeeded.

Fri Apr13 18:09:37 2018 - [info] Connecting to the target slave host 192.168.85.134, running recover script..

Fri Apr13 18:09:37 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mydba' --slave_host=192.168.85.134 --slave_ip=192.168.85.134 --slave_port=3307 --apply_files=/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180413180912.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.21-log --timestamp=20180413180912 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --debug --slave_pass=xxx

Fri Apr13 18:09:45 2018 - [info]

Concat all apply files to/var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180413180912.binlog ..

Copying the first binlogfile /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180413180912.binlog to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180413180912.binlog.. ok.

Dumping binloghead events (rotate events), skipping format description events from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog.. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123Binlog Checksum enabled

parse_init_headers:file=saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog event_type=35 server_id=1323307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154Got previous gtids log event:154.

parse_init_headers:file=saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog event_type=34 server_id=1323307 length=65 nextmpos=1209 prevrelay=154 cur(post)relay=219dumped up to pos154. ok./var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog has effective binlog events from pos 154.

Dumping effective binlog data from/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog position 154 to tail(507).. ok.

Concat succeeded.

All apply target binary logs are concatinated at/var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180413180912.binlog .

MySQL client version is5.7.21. Using --binary-mode.

Applying differential binary/relay log files /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180413180912.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog on 192.168.85.134:3307. This may take long time...

FATAL: applying log files failed with rc1:0!Error logs from ZST3:/var/log/masterha/app1/relay_log_apply_for_192.168.85.134_3307_20180413180912_err.log (the last 200lines)..

mysql: [Warning] Using a password on the command line interface can be insecure.

...

ERROR1062 (23000) at line 92: Duplicate entry '3' for key 'PRIMARY'

--------------BINLOG'NoDQWhMrMRQAPwAAAAMEAAAAAG4AAAAAAAEACXJlcGxjcmFzaAAHcHlfdXNlcgAEAw8SDwVgAAAe

AA7JJu9M

NoDQWh4rMRQAVgAAAFkEAAAAAG4AAAAAAAEAAgAE//ADAAAAIGM3MzExZWQ0LTNmMDEtMTFlOC05

ODg4LTAwMGMyOWMxmZ+bIJ4HMTMyMzMwN3PJaGg=

'--------------Bye

at/usr/bin/apply_diff_relay_logs line 515eval {...} called at/usr/bin/apply_diff_relay_logs line 475main::main() called at/usr/bin/apply_diff_relay_logs line 120Fri Apr13 18:09:45 2018 - [debug] Setting max_allowed_packet back to 4194304succeeded.

Fri Apr13 18:09:45 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln1398] Applying diffs failed with return code 22:0.

Fri Apr13 18:09:45 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln1561] Recovering master server failed.

Fri Apr13 18:09:45 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53Fri Apr13 18:09:45 2018 - [debug] Disconnected from 192.168.85.133(192.168.85.133:3307)

Fri Apr13 18:09:45 2018 - [debug] Disconnected from 192.168.85.134(192.168.85.134:3307)

Fri Apr13 18:09:45 2018 - [info]----- Failover Report -----app1: MySQL Master failover192.168.85.132(192.168.85.132:3307)

Master192.168.85.132(192.168.85.132:3307) is down!Check MHA Manager logs at ZST3fordetails.

Started manual(interactive) failover.

Invalidated master IP address on192.168.85.132(192.168.85.132:3307)

The latest slave192.168.85.133(192.168.85.133:3307) has all relay logs forrecovery.

Selected192.168.85.134(192.168.85.134:3307) as a new master.

Recovering master server failed.

Got Error so could not continue failover from here.

[root@ZST3 app1]#

View Code

MHA切换会报错!原因就是Node3获取Latest Slave上的数据,会有重复记录,导致应用差异日志时报错。relay_from_read_to_latest_**里面也能看到有重复数据

3.2、relay-log缺失

要模拟SQL_Thread跟得比较紧不太好实现,但是可以变相模拟从库缺失relay-log的情况

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

relay_log_recovery=1 && relay_log_purge=1#测试数据简写如下

Node1写入第一条记录->Node3停止io_thread

Node1写入第二条记录->Node2执行两次flush relay logs;->Node2停止io_thread

Node1写入第三条记录

View Code

目的是将第二条记录相关的relay-log给purge掉,这样Latest Slave上就没有足够的relay-log用于其他Slave的恢复

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover

...

Fri Apr13 15:26:39 2018 - [info] * Phase 3.3: Determining New Master Phase..

Fri Apr13 15:26:39 2018 - [info]

Fri Apr13 15:26:39 2018 - [info] Finding the latest slave that has all relay logs forrecovering other slaves..

Fri Apr13 15:26:39 2018 - [info] Checking whether 192.168.85.133has relay logs from the oldest position..

Fri Apr13 15:26:39 2018 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000001 --latest_rmlp=1303 --target_mlf=mysql-bin.000001 --target_rmlp=643 --server_id=1333307 --workdir=/var/log/masterha/app1 --timestamp=20180413152622 --manager_version=0.56 --relay_log_info=/data/mysql/mysql3307/data/relay-log.info --relay_dir=/data/mysql/mysql3307/data/ --debug :

Opening/data/mysql/mysql3307/data/relay-log.info... ok.

Relay log found at/data/mysql/mysql3307/data, up to relay-bin.000004Fast relay log position search failed. Reading relay logs tofind..

Reading relay-bin.000004parse_init_headers:file=relay-bin.000004 event_type=15 server_id=1333307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123Binlog Checksum enabled

parse_init_headers:file=relay-bin.000004 event_type=35 server_id=1333307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154Got previous gtids log event:154.

parse_init_headers:file=relay-bin.000004 event_type=15 server_id=1323307 length=119 nextmpos=0 prevrelay=154 cur(post)relay=273Master Version is5.7.21-log

Binlog Checksum enabled

parse_init_headers:file=relay-bin.000004 event_type=34 server_id=1323307 length=65 nextmpos=1038 prevrelay=273 cur(post)relay=338get_starting_mlp:file=relay-bin.000004 event_type=2 server_id=1323307 length=85 next=1123relay-bin.000004 contains master mysql-bin.000001 from position 1123Reading relay-bin.000003parse_init_headers:file=relay-bin.000003 event_type=15 server_id=1333307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123Binlog Checksum enabled

parse_init_headers:file=relay-bin.000003 event_type=35 server_id=1333307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154Got previous gtids log event:154.

parse_init_headers:file=relay-bin.000003 event_type=15 server_id=1323307 length=119 nextmpos=0 prevrelay=154 cur(post)relay=273parse_init_headers:file=relay-bin.000003 event_type=4 server_id=1333307 length=47 nextmpos=320 prevrelay=273 cur(post)relay=320Reading relay-bin.000002No suchfile or directory:/data/mysql/mysql3307/data/relay-bin.000002 at /usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm line 102Fri Apr13 15:26:40 2018 - [warning] 192.168.85.133does not have all relay logs. Maybe some logs were purged.

Fri Apr13 15:26:40 2018 -[warning] None of latest servers have enough relay logs from oldest position. We can not recover oldest slaves.

Fri Apr13 15:26:40 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln947] None of the latest slaves has enough relay logs forrecovery.

Fri Apr13 15:26:40 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53Fri Apr13 15:26:40 2018 - [debug] Disconnected from 192.168.85.133(192.168.85.133:3307)

Fri Apr13 15:26:40 2018 - [debug] Disconnected from 192.168.85.134(192.168.85.134:3307)

Fri Apr13 15:26:40 2018 - [info]----- Failover Report -----app1: MySQL Master failover192.168.85.132(192.168.85.132:3307)

Master192.168.85.132(192.168.85.132:3307) is down!Check MHA Manager logs at ZST3fordetails.

Started manual(interactive) failover.

Invalidated master IP address on192.168.85.132(192.168.85.132:3307)

None of the latest slaves has enough relay logsforrecovery.

Got Error so could not continue failover from here.

[root@ZST3 app1]#

View Code

MHA切换会报错!原因是Latest Slave没有包含足够的relay-log用于其他Slave的恢复操作

这样看来MHA需要relay-log恢复数据的过程,如果relay-log重复或者缺失会直接报错,切换失败!!!

自动切换先找出所有配置candidate_master=1的[server],再从中找出日志最新的,如果有多个日志最新的,那就按[server]的先后顺序来选new master

传统复制环境,如果选择了"问题Slave"作为Latest Slave,不管手动还是自动Failover,切换都会报错。所以尽量用GTID吧~

3.3、default-character-set

15:07 2018/7/26 补充

GTID环境,执行save_binary_logs --command=save 保存Dead Master/Binlog Server和Latest Slave之间的差异数据报错

mysqlbinlog:[ERROR] unknown variable 'default-character-set=utf8'

/etc/mysql.cnf中有

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

[client]

default-character-set=utf8

View Code

当注释掉这行就可以正常切换(不需重启),什么原因呢?

GTID环境save_binary_logs执行的类似这种命令

Executing command: mysqlbinlog --start-position=1013 /data/mysql/mysql3307/logs/mysql-bin.000009

mysqlbinlog类似mysqladmin会到/etc/my.cnf /etc/mysql/my.cnf /usr/local/mysql/etc/my.cnf ~/.my.cnf文件中读取[mysqladmin] [client]组

如果上述配置文件中添加前面的字符集信息,尝试打印mysqlbinlog默认参数信息

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

[root@ZST1 ~]# mysqlbinlog --print-defaults

mysqlbinlog would have been started with the following arguments:--character-set-server=utf8

View Code

也就是说mysqlbinlog --start-position=1013 /data/mysql/mysql3307/logs/mysql-bin.000009,等效命令

mysqlbinlog --character-set-server=utf8 --start-position=1013 /data/mysql/mysql3307/logs/mysql-bin.000009

但是mysqlbinlog并不支持--character-set-server这样的变量所以就报错啦~

解决方法嘛,注释配置文件中的字符集信息,或者给mysqlbinlog增加一个别名:alias mysqlbinlog='mysqlbinlog --no-defaults'

这篇关于mha mysql 坑_MHA-Failover可能遇到的坑的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/183117

相关文章

MySQL 主从复制部署及验证(示例详解)

《MySQL主从复制部署及验证(示例详解)》本文介绍MySQL主从复制部署步骤及学校管理数据库创建脚本,包含表结构设计、示例数据插入和查询语句,用于验证主从同步功能,感兴趣的朋友一起看看吧... 目录mysql 主从复制部署指南部署步骤1.环境准备2. 主服务器配置3. 创建复制用户4. 获取主服务器状态5

SpringBoot中六种批量更新Mysql的方式效率对比分析

《SpringBoot中六种批量更新Mysql的方式效率对比分析》文章比较了MySQL大数据量批量更新的多种方法,指出REPLACEINTO和ONDUPLICATEKEY效率最高但存在数据风险,MyB... 目录效率比较测试结构数据库初始化测试数据批量修改方案第一种 for第二种 case when第三种

MySql基本查询之表的增删查改+聚合函数案例详解

《MySql基本查询之表的增删查改+聚合函数案例详解》本文详解SQL的CURD操作INSERT用于数据插入(单行/多行及冲突处理),SELECT实现数据检索(列选择、条件过滤、排序分页),UPDATE... 目录一、Create1.1 单行数据 + 全列插入1.2 多行数据 + 指定列插入1.3 插入否则更

MySQL深分页进行性能优化的常见方法

《MySQL深分页进行性能优化的常见方法》在Web应用中,分页查询是数据库操作中的常见需求,然而,在面对大型数据集时,深分页(deeppagination)却成为了性能优化的一个挑战,在本文中,我们将... 目录引言:深分页,真的只是“翻页慢”那么简单吗?一、背景介绍二、深分页的性能问题三、业务场景分析四、

MySQL 迁移至 Doris 最佳实践方案(最新整理)

《MySQL迁移至Doris最佳实践方案(最新整理)》本文将深入剖析三种经过实践验证的MySQL迁移至Doris的最佳方案,涵盖全量迁移、增量同步、混合迁移以及基于CDC(ChangeData... 目录一、China编程JDBC Catalog 联邦查询方案(适合跨库实时查询)1. 方案概述2. 环境要求3.

SQL server数据库如何下载和安装

《SQLserver数据库如何下载和安装》本文指导如何下载安装SQLServer2022评估版及SSMS工具,涵盖安装配置、连接字符串设置、C#连接数据库方法和安全注意事项,如混合验证、参数化查... 目录第一步:打开官网下载对应文件第二步:程序安装配置第三部:安装工具SQL Server Manageme

C#连接SQL server数据库命令的基本步骤

《C#连接SQLserver数据库命令的基本步骤》文章讲解了连接SQLServer数据库的步骤,包括引入命名空间、构建连接字符串、使用SqlConnection和SqlCommand执行SQL操作,... 目录建议配合使用:如何下载和安装SQL server数据库-CSDN博客1. 引入必要的命名空间2.

全面掌握 SQL 中的 DATEDIFF函数及用法最佳实践

《全面掌握SQL中的DATEDIFF函数及用法最佳实践》本文解析DATEDIFF在不同数据库中的差异,强调其边界计算原理,探讨应用场景及陷阱,推荐根据需求选择TIMESTAMPDIFF或inte... 目录1. 核心概念:DATEDIFF 究竟在计算什么?2. 主流数据库中的 DATEDIFF 实现2.1

MySQL 多列 IN 查询之语法、性能与实战技巧(最新整理)

《MySQL多列IN查询之语法、性能与实战技巧(最新整理)》本文详解MySQL多列IN查询,对比传统OR写法,强调其简洁高效,适合批量匹配复合键,通过联合索引、分批次优化提升性能,兼容多种数据库... 目录一、基础语法:多列 IN 的两种写法1. 直接值列表2. 子查询二、对比传统 OR 的写法三、性能分析

MySQL中的LENGTH()函数用法详解与实例分析

《MySQL中的LENGTH()函数用法详解与实例分析》MySQLLENGTH()函数用于计算字符串的字节长度,区别于CHAR_LENGTH()的字符长度,适用于多字节字符集(如UTF-8)的数据验证... 目录1. LENGTH()函数的基本语法2. LENGTH()函数的返回值2.1 示例1:计算字符串