使用pt-table-checksum校验MySQL主从复制

2023-10-10 20:38

本文主要是介绍使用pt-table-checksum校验MySQL主从复制,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

pt-table-checksum是一个基于MySQL数据库主从架构在线数据一致性校验工具。其工作原理在主库上运行, 通过对同步的表在主从段执行checksum, 从而判断数据是否一致。在校验完毕时,该工具将列出与主库存在差异的对象结果。

一、主从不一致的情形

    Master端使用了不确定的语句(如:CURRENT_USER(), UUID())不正确的故障转移(failover)流程误操作或直接在Slave进行DML操作持续的升级更新(Rolling upgrades)混合使用事务引擎和非事务引擎的表跳过了复制事件 (SET GLOBAL SQL_SLAVE_SKIP_COUNTER = N)使用临时表(Temporary Tables)复制过滤(Replication Filters)使用含LIMIT且没有order by的更新语句(update/delete with LIMIT clause without order by)

二、pt-table-checksum特性

    pt-table-checksum connects to the server you specify, and finds databases and tables that match the filters you specify  (if any). It works one table at a time, so it does not accumulatelarge amounts of memory or do a lot of work before beginning to checksum. This makes it usableon very large servers. We have used it on servers with hundreds of thousands of databases and tables, and trillions of rows. No matter how large the server is, pt-table-checksum works equally well.One reason it can work on very large tables is that it divides each table into chunks of rows, and checksums each chunk with a single REPLACE..SELECT query. It varies the chunk size to make the checksum queries run in the desired amount of time. The goal of chunking the tables, instead of doing each table with a single big query, is to ensure that checksums are unintrusive and don’t cause toomuch replication lag or load on the server. That’s why the target time for each chunk is 0.5 seconds by default.The tool keeps track of how quickly the server is able to execute the queries, and adjusts the chunks as it learns more about the server’s performance. It uses an exponentially decaying weighted average to keep the chunk size stable, yet remain responsive if the server’s performance changes during checksumming for any reason. This means that the tool will quickly throttle itself if your server becomes heavily loaded during a trafficc spike or a background task, for example.After pt-table-checksum finishes checksumming all of the chunks in a table, it pauses and waits for all detected replicas to finish executing the checksum queries. Once that is finished, it checks all of the replicas to see if they have the same data as the master, and then prints a line of output with the results. 

三、演示pt-table-checksum

-- 环境:Master 192.168.1.8, Slave 192.168.1.12,主从已构建
-- 演示中,mysql提示符为:用户名@主机名[库名]
-- 如master@localhost[test],表示master用户表示在主,slave表示用户在slave上
-- 复制过滤器如下:
[root@vdbsrv4 ~]# mysql -uroot -p -e "show slave status\G"|grep "Replicate
Enter password: 
              Replicate_Do_DB: sakila,test
          Replicate_Ignore_DB: mysql
a、环境准备
--对用于执行checksum的用户授权,注,如果主从复制未开启mysql系统库复制,则从库也同样执行用户创建
master@localhost[test]> grant select, process, super, replication slave on *.* to
 ->  'checksums'@'192.168.1.%' identified by 'xxx';
Query OK, 0 rows affected (0.00 sec)--主库建表及插入记录
master@localhost[test]> create table t(id tinyint primary key auto_increment,ename varchar(20));
Query OK, 0 rows affected (0.01 sec)master@localhost[test]> insert into t(ename) values('Leshami'),('Henry'),('Jack');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0--从库查询结果
slave@localhost[test]> select * from t;
+----+---------+
| id | ename  |
+----+---------+
|  1 | Leshami |
|  2 | Henry  |
|  3 | Jack    |
+----+---------+--模拟数据不一致,slave端删除记录
slave@localhost[test]> delete from t where id=2;b、单表校验
-- 执行pt-table-checksum
[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \
> -dtest -tt --nocheck-replication-filters \
> --no-check-binlog-format  --replicate=test.checksum
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T10:14:32      0      1        3      1      0  0.031 test.tTS            :完成检查的时间。
ERRORS        :检查时候发生错误和警告的数量。
DIFFS        :0表示一致,1表示不一致。当指定--no-replicate-check时,
                会一直为0,当指定--replicate-check-only会显示不同的信息。
ROWS          :表的行数。
CHUNKS        :被划分到表中的块的数目。
SKIPPED      :由于错误或警告或过大,则跳过块的数目。
TIME          :执行的时间。
TABLE        :被检查的表名。--基于从库端SQL脚本查看checksum结果
slave@localhost[test]> system more check_sync_stat.sql;
SELECT 
    db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks
FROM
    test.checksum
WHERE
    (master_cnt <> this_cnt
        OR master_crc <> this_crc
        OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db , tbl;slave@localhost[test]> source check_sync_stat.sql;
+------+-----+------------+--------+
| db  | tbl | total_rows | chunks |
+------+-----+------------+--------+
| test | t  |          2 |      1 |
+------+-----+------------+--------+--从库端插入记录
slave@localhost[test]> insert into t(ename) values('Robin');
Query OK, 1 row affected (0.00 sec)slave@localhost[test]> select * from t;
+----+---------+
| id | ename  |
+----+---------+
|  1 | Leshami |   #Author : Leshami
|  3 | Jack    |   #Blog     : http://blog.csdn.net/leshami
|  4 | Robin  |
+----+---------+-- 再次在master端执行pt-table-checksum(此处略),后查看结果如下
slave@localhost[test]> source check_sync_stat.sql;
+------+-----+------------+--------+
| db  | tbl | total_rows | chunks |
+------+-----+------------+--------+
| test | t  |          3 |      1 |
+------+-----+------------+--------+b、查看pt-table-checksum工作原理
-- 使用--explain参数,不执行checksum,列出checksum时真正执行的SQL语句
Show, but do not execute, checksum queries (disables --[no]empty-replicate-table). If specifed
twice, the tool actually iterates through the chunking algorithm, printing the upper and lower boundary values
for each chunk, but not executing the checksum queries.[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \
> -dtest -tt --nocheck-replication-filters \
> --no-check-binlog-format  --replicate=test.checksum --explain
--
-- test.t
--REPLACE INTO `test`.`checksum` (db, tbl, chunk, chunk_index, lower_boundary,
upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, 
COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, `ename`, 
CONCAT(ISNULL(`ename`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t`/*checksum table*/c、库级别校验
[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \
> --databases=sakila --nocheck-replication-filters --no-check-binlog-format \
> --replicate=test.checksum 
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T13:52:17      0      0      200      1      0  0.083 sakila.actor
08-06T13:52:17      0      0      603      1      0  0.024 sakila.address
08-06T13:52:17      0      0      16      1      0  0.012 sakila.category
08-06T13:52:17      0      0      600      1      0  0.025 sakila.city
08-06T13:52:17      0      0      109      1      0  0.019 sakila.country
08-06T13:52:17      0      0      599      1      0  0.019 sakila.customer
08-06T13:52:17      0      0    1000      1      0  0.035 sakila.film
08-06T13:52:17      0      0    5462      1      0  0.295 sakila.film_actor
08-06T13:52:17      0      0    1000      1      0  0.019 sakila.film_category
08-06T13:52:17      0      0    1000      1      0  0.015 sakila.film_text
08-06T13:52:17      0      0    4581      1      0  0.041 sakila.inventory
08-06T13:52:17      0      0        6      1      0  0.012 sakila.language
08-06T13:52:18      0      0    16049      1      0  0.367 sakila.payment
08-06T13:52:18      0      0    16044      1      0  0.357 sakila.rental
08-06T13:52:18      0      0        2      1      0  0.013 sakila.staff
08-06T13:52:18      0      0        2      1      0  0.012 sakila.store--在从库删除一张表
slave@localhost[test]> drop table sakila.payment;
Query OK, 0 rows affected (0.01 sec)-- 再次执行pt-table-checksum,收到如下提示
08-06T13:56:42 Skipping table sakila.payment because it has problems on these replicas:
Table sakila.payment does not exist on replica vdbsrv4
This can break replication.  If you understand the risks, specify --no-check-slave-tables to disable this check.
08-06T13:56:42 Error checksumming table sakila.payment: DBD::mysql::db selectrow_hashref failed:
Table 'sakila.payment' doesn't exist 
[for Statement "EXPLAIN SELECT * FROM `sakila`.`payment` WHERE 1=1"] at /usr/bin/pt-table-checksum line 6530.d、多从校验
-- 下面演示多个从库时主从一致性校验
-- 缺省情况下
-- 参数:--recursion-method ; type: array; default: processlist,hosts.
--            Preferred recursion method for discovering replicas.
--  pt-table-checksum performs several “REPLICACHECKS” before and while running.master@localhost[(none)]> show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID                          |
+-----------+------+------+-----------+--------------------------------------+
|        11 |      | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |
|        1 |      | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |
+-----------+------+------+-----------+--------------------------------------+root@localhost[(none)]> show variables like 'port';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| port          | 3307  |
+---------------+-------+root@localhost[(none)]> delete from test.t where id=1;
Query OK, 1 row affected (0.00 sec)[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest \
> -tt --nocheck-replication-filters --no-check-binlog-format --replicate=test.checksum \
> --recursion-method=hosts# A software update is available:
#  * The current version for Percona::Toolkit is 2.2.14.            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T16:12:52      0      1        3      1      0  0.034 test.t

四、参数描述

–nocheck-replication-filters
  不检查复制过滤器,建议启用。后面可以用–databases来指定需要检查的数据库。
–no-check-binlog-format
  不检查复制的binlog模式,要是binlog模式是ROW,则会报错。
–replicate-check-only
  只显示不同步的信息。
–replicate=
  把checksum的信息写入到指定表中,建议直接写到被检查的数据库当中。
–databases=
  指定需要被检查的数据库,多个则用逗号隔开。
–tables=
  指定需要被检查的表,多个用逗号隔开
  h=127.0.0.1 :Master的地址
  u=root :用户名
  p=123456 :密码
  P=3306 :端口

五、常见问题

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -d mysql \
> --nocheck-replication-filters --replicate=test.checksums
Replica vdbsrv4 has binlog_format MIXED which could cause pt-table-checksum to break replication. 
Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation.If you understand the risks, specify --no-check-binlog-format to disable this check.
上面描述的是关于使用mixed日志格式时的问题  [root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -d mysql \
> --nocheck-replication-filters --no-check-binlog-format
DBD::mysql::db do failed: Access denied for user 'checksums'@'192.168.1.%' to database 'percona' 
[for Statement "CREATE DATABASE IF NOT EXISTS `percona` /* pt-table-checksum */"] 
at /usr/bin/pt-table-checksum line 10743.
07-29T08:42:03 --replicate database percona does not exist and it cannot be created automatically. 
You need to create the database.[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest -tt \
> --nocheck-replication-filters --no-check-binlog-format  --replicate=test.checksum
Cannot connect to P=3306,h=vdbsrv4,p=...,u=checksums
Diffs cannot be detected because no slaves were found.  
Please read the --recursion-method documentation for information.TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T10:03:10      0      0        3      1      0  0.023 test.t[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest -tt \
> --nocheck-replication-filters --no-check-binlog-format \
> --replicate=test.checksum --recursion-method=hosts
Cannot connect to P=3306,h=,p=...,u=checksums
Cannot connect to P=3307,h=,p=...,u=checksums
Diffs cannot be detected because no slaves were found.  
Please read the --recursion-method documentation for information.TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T16:02:27      0      0        3      1      0  0.016 test.tmaster@localhost[(none)]> show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID                          |
+-----------+------+------+-----------+--------------------------------------+
|        1 |      | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |
|        11 |      | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |
+-----------+------+------+-----------+--------------------------------------+-- 增加参数report_host后重启从库
[root@vdbsrv4 ~]# grep report_host /etc/my.cnf
report_host='192.168.1.12'master@localhost[(none)]> show slave hosts;
+-----------+--------------+------+-----------+--------------------------------------+
| Server_id | Host        | Port | Master_id | Slave_UUID                          |
+-----------+--------------+------+-----------+--------------------------------------+
|        11 | 192.168.1.12 | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |
|        1 | 192.168.1.12 | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |
+-----------+--------------+------+-----------+--------------------------------------+

DBA牛鹏社(SQL/NOSQL/LINUX)

这里写图片描述

这篇关于使用pt-table-checksum校验MySQL主从复制的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/182931

相关文章

SQL中的外键约束

外键约束用于表示两张表中的指标连接关系。外键约束的作用主要有以下三点: 1.确保子表中的某个字段(外键)只能引用父表中的有效记录2.主表中的列被删除时,子表中的关联列也会被删除3.主表中的列更新时,子表中的关联元素也会被更新 子表中的元素指向主表 以下是一个外键约束的实例展示

基于MySQL Binlog的Elasticsearch数据同步实践

一、为什么要做 随着马蜂窝的逐渐发展,我们的业务数据越来越多,单纯使用 MySQL 已经不能满足我们的数据查询需求,例如对于商品、订单等数据的多维度检索。 使用 Elasticsearch 存储业务数据可以很好的解决我们业务中的搜索需求。而数据进行异构存储后,随之而来的就是数据同步的问题。 二、现有方法及问题 对于数据同步,我们目前的解决方案是建立数据中间表。把需要检索的业务数据,统一放到一张M

如何去写一手好SQL

MySQL性能 最大数据量 抛开数据量和并发数,谈性能都是耍流氓。MySQL没有限制单表最大记录数,它取决于操作系统对文件大小的限制。 《阿里巴巴Java开发手册》提出单表行数超过500万行或者单表容量超过2GB,才推荐分库分表。性能由综合因素决定,抛开业务复杂度,影响程度依次是硬件配置、MySQL配置、数据表设计、索引优化。500万这个值仅供参考,并非铁律。 博主曾经操作过超过4亿行数据

中文分词jieba库的使用与实景应用(一)

知识星球:https://articles.zsxq.com/id_fxvgc803qmr2.html 目录 一.定义: 精确模式(默认模式): 全模式: 搜索引擎模式: paddle 模式(基于深度学习的分词模式): 二 自定义词典 三.文本解析   调整词出现的频率 四. 关键词提取 A. 基于TF-IDF算法的关键词提取 B. 基于TextRank算法的关键词提取

使用SecondaryNameNode恢复NameNode的数据

1)需求: NameNode进程挂了并且存储的数据也丢失了,如何恢复NameNode 此种方式恢复的数据可能存在小部分数据的丢失。 2)故障模拟 (1)kill -9 NameNode进程 [lytfly@hadoop102 current]$ kill -9 19886 (2)删除NameNode存储的数据(/opt/module/hadoop-3.1.4/data/tmp/dfs/na

Hadoop数据压缩使用介绍

一、压缩原则 (1)运算密集型的Job,少用压缩 (2)IO密集型的Job,多用压缩 二、压缩算法比较 三、压缩位置选择 四、压缩参数配置 1)为了支持多种压缩/解压缩算法,Hadoop引入了编码/解码器 2)要在Hadoop中启用压缩,可以配置如下参数

Makefile简明使用教程

文章目录 规则makefile文件的基本语法:加在命令前的特殊符号:.PHONY伪目标: Makefilev1 直观写法v2 加上中间过程v3 伪目标v4 变量 make 选项-f-n-C Make 是一种流行的构建工具,常用于将源代码转换成可执行文件或者其他形式的输出文件(如库文件、文档等)。Make 可以自动化地执行编译、链接等一系列操作。 规则 makefile文件

性能分析之MySQL索引实战案例

文章目录 一、前言二、准备三、MySQL索引优化四、MySQL 索引知识回顾五、总结 一、前言 在上一讲性能工具之 JProfiler 简单登录案例分析实战中已经发现SQL没有建立索引问题,本文将一起从代码层去分析为什么没有建立索引? 开源ERP项目地址:https://gitee.com/jishenghua/JSH_ERP 二、准备 打开IDEA找到登录请求资源路径位置

使用opencv优化图片(画面变清晰)

文章目录 需求影响照片清晰度的因素 实现降噪测试代码 锐化空间锐化Unsharp Masking频率域锐化对比测试 对比度增强常用算法对比测试 需求 对图像进行优化,使其看起来更清晰,同时保持尺寸不变,通常涉及到图像处理技术如锐化、降噪、对比度增强等 影响照片清晰度的因素 影响照片清晰度的因素有很多,主要可以从以下几个方面来分析 1. 拍摄设备 相机传感器:相机传

MySQL数据库宕机,启动不起来,教你一招搞定!

作者介绍:老苏,10余年DBA工作运维经验,擅长Oracle、MySQL、PG、Mongodb数据库运维(如安装迁移,性能优化、故障应急处理等)公众号:老苏畅谈运维欢迎关注本人公众号,更多精彩与您分享。 MySQL数据库宕机,数据页损坏问题,启动不起来,该如何排查和解决,本文将为你说明具体的排查过程。 查看MySQL error日志 查看 MySQL error日志,排查哪个表(表空间