Linux|centos7-postgresql数据库|yum安装数据库和配置repmgr高可用集群以及repmgr的日常管理工作

本文主要是介绍Linux|centos7-postgresql数据库|yum安装数据库和配置repmgr高可用集群以及repmgr的日常管理工作,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

一、

前言

postgresql 的yum部署其实还是有点东西的,本文就做一个小小的记录,高可用方面repmgr插件还是非常不错的,但如何部署以及部署后如何使用也是一个难点,因此,也在本文里做一个记录

环境介绍:

第一台服务器:操作系统是centos7,内核版本是3.10,IP地址是192.168.123.17  VMware虚拟机,内存4G,cpu4核,计划在此服务器上安装postgresql12.4版本以及repmgr高可用插件。该服务器是主节点

第二台服务器:操作系统是centos7,内核版本是3.10,IP地址是192.168.123.20  VMware虚拟机,内存4G,cpu4核,计划在此服务器上安装postgresql12.4版本以及repmgr高可用插件。该服务器是从节点

二、

详细的yum安装postgreslq数据库

1,配置yum源

yum安装postgresql比较简单,源配置官方源就可用了,命令如下(2台服务器都执行):

yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

为了防止某些库安装的时候提示缺少依赖,增加阿里云的库和阿里云的epel库

wget -O /etc/yum.repos.d/epel.repo https://mirrors.aliyun.com/repo/epel-7.repo
wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo

 增加不常用的rpm包依赖仓库:

[centos-sclo-sclo]
name=CentOS-7 - SCLo sclo
baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos/7/sclo/$basearch/sclo/
gpgcheck=0
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-SCLo[centos-sclo-rh]
name=CentOS-7 - SCLo rh
baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos/7/sclo/$basearch/rh/
gpgcheck=0
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-SCLo

2,

安装postgresql的组件和postgresql的server服务(2台服务器都执行)

组件比较多,主要是lib库(postgresql运行的依赖库),插件扩展库(postgresql的插件库),开发库(一些postgresql的特殊组件需要使用的库),连接库(开发微服务使用的库)

后面的几个看自己的需求安装不,比如postgresql12-plperl

版本号根据自己需求来,我这里是12.4, 不想安这个低版本的,修改12.4就可以了,比如全部的12.4修改为12.8

yum install postgresql12-server-12.4 postgresql12-libs-12.4 postgresql12-devel-12.4 postgresql12-contrib-12.4 postgresql12-tcl postgresql12-odbc postgresql12-12.4  postgresql12-plperl-12.4 postgresql12-pltcl-12.4 postgresql12-plpython-12.4  postgresql12-plpython3-12.4 postgresql12-llvmjit-12.4 postgresql12-devel-12.4  -y

3,

postgresql的环境变量定义和一些特殊处理(两台服务器都执行)

在/etc/profile 文件末尾添加如下变量,计划postgresql安装在/data/pgsql/data目录下

PGDATA=/data/pgsql/data
export PGDATA
PGHOME=/data/pgsql
export PGHOME
PATH=$PATH:$PGHOME/bin:$PGDATA:/usr/pgsql-12/bin/
export PATH  PGHOME PGDATA

写入完成后,激活变量:

source /etc/profile

切换用户到普通用户postgres,将该用户的环境变量文件全部注释掉,结果如下:

-bash-4.2$ whoami
postgres
-bash-4.2$ cat ~/.bash_profile 
#[ -f /etc/profile ] && source /etc/profile
#PGDATA=/var/lib/pgsql/12/data
#export PGDATA
# If you want to customize your settings,
# Use the file below. This is not overridden
# by the RPMS.
#[ -f /var/lib/pgsql/.pgsql_profile ] && source /var/lib/pgsql/.pgsql_profile

注释的原因是我们不希望使用默认的/var/lib/12目录存放数据库的数据文件,注销普通用户再次su - postgres 重新激活变量后,在普通用户下,可以看到postgresql数据库相关变量已经固定了:

[root@centos10 media]# su - postgres
Last login: Sat Mar 30 05:19:48 CST 2024 on pts/0
-bash-4.2$ echo $PGDATA
/data/pgsql/data

4,创建相关目录并赋予普通用户的权限,准备初始化工作(root用户执行,两台服务器都执行):

mkdir -p /data/pgsql/
chown -Rf postgres. /data/pgsql

 

5,执行数据库初始化(仍然是切换到普通用户postgres):

由于前面的准备工作比较充分,环境变量已经完全配置好了,因此,执行initdb就可以完成初始化了,/data/pgsql/data/目录以及其下的文件将在初始化的时候创建好

初始化日志如下:

[root@centos7 media]# su - postgres
Last login: Sat Mar 30 05:25:23 CST 2024 on pts/0
-bash-4.2$ initdb
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".Data page checksums are disabled.creating directory /data/pgsql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Asia/Shanghai
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... okinitdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.Success. You can now start the database server using:pg_ctl -D /data/pgsql/data -l logfile start

6,

在root用户下管理postgresql的启停

由于是yum安装的,因此,启停脚本是已经写好的,可以直接用,但我们数据库的数据文件存放路径修改了,因此,需要先修改一下启停脚本:

/usr/lib/systemd/system/postgresql-12.service

Environment=PGDATA=/var/lib/pgsql/12/data/
修改为
Environment=PGDATA=/data/pgsql/data/

这个启停脚本名字太长了,修改一下名字 就叫pg12吧:

mv /usr/lib/systemd/system/postgresql-12.service /usr/lib/systemd/system/pg12.service

启动postgresql数据库并查看进行确认是否启动成功:

systemctl enable pg12 && systemctl start pg12

查看postgreslq的进程:

[root@centos7 media]# ps aux |grep postgre
postgres 16161  0.2  0.4 397292 17424 ?        Ss   05:51   0:00 /usr/pgsql-12/bin/postmaster -D /data/pgsql/data/
postgres 16164  0.0  0.0 251676  2008 ?        Ss   05:51   0:00 postgres: logger   
postgres 16166  0.0  0.0 397292  2060 ?        Ss   05:51   0:00 postgres: checkpointer   
postgres 16167  0.0  0.0 397292  2288 ?        Ss   05:51   0:00 postgres: background writer   
postgres 16168  0.0  0.0 397292  2064 ?        Ss   05:51   0:00 postgres: walwriter   
postgres 16169  0.0  0.0 397844  3212 ?        Ss   05:51   0:00 postgres: autovacuum launcher   
postgres 16170  0.0  0.0 251672  2120 ?        Ss   05:51   0:00 postgres: stats collector   
postgres 16171  0.0  0.0 397844  2808 ?        Ss   05:51   0:00 postgres: logical replication launcher   
root     16173  0.0  0.0 112712   964 pts/0    S+   05:51   0:00 grep --color=auto postgre

7,

远程连接和postgres用户密码设置

修改主配置文件 /data/pgsql/data/postgresql.conf 

listen_addresses = '*'          
port = 15433                            
max_connections = 1000   
wal_level = logical

这里是设置数据库连接端口是15433 

修改安全连接文件pg_hba.conf:

末尾添加如下两行,这两行的意思是复制用户replication可以随意连接,但其它用户远程连接的时候需要密码,本地登录的时候不校验密码

host    replication     all             0.0.0.0/0                 trust
host    all             all             0.0.0.0/0           md5

8,

本地登录数据库设置密码

su - postgres -c "psql -p15433"alter user postgres with password '123456';

此时,数据库算是完全安装完毕,连接端口15433,连接密码123456 后面就不在重复这些信息了

随意安装一个插件,看看lib库什么的是否正常:

[root@centos10 ~]# su - postgres -c "psql -p15433"
psql (12.4)
Type "help" for help.postgres=# create extension 
adminpack           btree_gin           dblink              file_fdw            hstore_plperlu      insert_username     jsonb_plperl        jsonb_plpythonu     ltree_plpython3u    pg_buffercache      pgrowlocks          pg_visibility       plpython3u          postgres_fdw        tablefunc           unaccent
amcheck             btree_gist          dict_int            fuzzystrmatch       hstore_plpython2u   intagg              jsonb_plperlu       lo                  ltree_plpythonu     pgcrypto            pg_stat_statements  plperl              plpythonu           refint              tcn                 "uuid-ossp"
autoinc             citext              dict_xsyn           hstore              hstore_plpython3u   intarray            jsonb_plpython2u    ltree               moddatetime         pg_freespacemap     pgstattuple         plperlu             pltcl               seg                 tsm_system_rows     xml2
bloom               cube                earthdistance       hstore_plperl       hstore_plpythonu    isn                 jsonb_plpython3u    ltree_plpython2u    pageinspect         pg_prewarm          pg_trgm             plpython2u          pltclu              sslinfo             tsm_system_time     
postgres=# create extension pg_trgm ;
CREATE EXTENSION

二,

repmgr的基本安装部署和配置

repmgr的介绍:

Repmgr是2ndQuadrant(第二象限公司)开发的一款复制的开源工具套件,用于管理PostgreSQL服务器集群中的复制和故障转移。

最初,它主要是为了简化流副本的管理,后来发展成为一个完整的故障转移管理套件。它通过设置备用服务器,监视复制以及执行管理任务(如故障转移或手动切换操作)的工具,增强了PostgreSQL内置的热备份功能。

其中各组件功能如下:

No.1 Repmgrd 守护进程

它主动监视复制集群中的服务器并执行以下任务:

1)监控和记录集群复制性能;

2)通过检测主服务器故障并提升最合适的备用服务器来执行故障转移;

3)将有关群集中事件的通知提供给用户定义的脚本,该脚本可以执行诸如通过电子邮件发送警报等任务;

4)repmgrd 根据本地数据库角色不同,其功能也不同:

  • 主库:repmgrd仅监控本地数据库,负责自动恢复、同异步切换;

  • 备库:repmgrd监控本地数据库和主数据库,负责自动切换、复制槽删除。

No.2 Repmgr命令管理

用于执行管理任务的命令行工具,主要有以下方面作用:

1)设置备用服务器;

2)将备用服务器升级为主服务器;

3)切换主服务器和备用服务器;

4)显示复制群集中的服务器状态。

No.3 用户和元数据 为了有效地管理复制集群,repmgr提供专用数据库存储和管理有关repmgr集群服务的相关信息。

此模式在 部署repmgr服务时,由repmgr扩展自动创建,该扩展在初始化repmgr -administered集群(repmgr主寄存器) 的第一步中安装,包含以下对象:

  • 表:

    repmgr.events:记录感兴趣的事件;

    repmgr.nodes:复制群集中每个服务器的连接和状态信息 ;

    repmgr.monitoring_history:repmgrd写入的历史备用监视信息。

  • 视图

    repmgr.show_nodes:基于表repmgr.nodes,另外显示服务器上游节点的名称。

    repmgr.replication_status:启用repmgrd的监视时,显示每个备用数据库的当前监视状态。

    repmgr元数据模式可以存储在现有的数据库或在自己的专用数据库。

请注意,repmgr元数据模式不能驻留在不属于repmgr管理的复制集群的数据库服务器上。数据库用户必须可供repmgr访问此数据库并执行必要的更改。此用户不需要是超级用户,但是某些操作(如初始安装repmgr扩展)将需要超级用户连接(可以使用命令 行选项--superuser在需要时指定 )。

1、

计划repmgr安装方式为yum,安装命令如下:

yum install repmgr_12-5.4.1       repmgr_12-devel-5.4.1  repmgr_12-llvmjit-5.4.1  -y

2、

主数据库装载repmgr插件

修改主数据库17的主配置文件,data/pgsql/data/postgresql.conf,修改完毕后重启服务器

主要是定义数据库的端口,以及归档命令

listen_addresses = '*'
port = 15433
max_connections = 1000	
shared_buffers = 128MB	
dynamic_shared_memory_type = posix
wal_level = logical	
max_wal_size = 1GB
min_wal_size = 80MB
archive_mode = on	
archive_command = 'test ! -f /data/pgsql/arclog/%f && cp %p /data/pgsql/arclog/%f'		# 
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%a.log'	
log_truncate_on_rotation = on		
log_rotation_age = 1d			
log_rotation_size = 0			
log_line_prefix = '%m [%p] '		
log_timezone = 'Asia/Shanghai'
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.UTF-8'							
lc_monetary = 'en_US.UTF-8'			
lc_numeric = 'en_US.UTF-8'			
lc_time = 'en_US.UTF-8'	
default_text_search_config = 'pg_catalog.english'
shared_preload_libraries = 'repmgr'
wal_log_hints = on					

DETAIL: pg_rewind requires "wal_log_hints" to be enabled 

根据以上配置,创建目录/data/pgsql/arclog并赋予postgres这个用户权限,最后重启服务器

mkdir -p /data/pgsql/arclog
chown -Rf postgres. /data/pgsql/arclog/
systemctl restart pg12
ntpdate ntp.aliyun.com

3、

主节点17repmgr的配置文件 /etc/repmgr/12/repmgr.conf 增加如下内容:

node_id=1
node_name='pg1'
conninfo='host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=123456'
data_directory='/data/pgsql/data'
replication_user='repmgr'
replication_type='physical'
#location='pg1'                                  
use_replication_slots=true
witness_sync_interval=15
log_level='DEBUG'
log_facility='STDERR'
log_file='/var/log/repmgr/pg2.log'
pg_bindir='/usr/pgsql-12/bin/'
ssh_options='-q -o ConnectTimeout=10'
promote_check_timeout=60
promote_check_interval=1
node_rejoin_timeout=60
failover='automatic'
#priority=100                                                   
connection_check_type=ping
promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
monitoring_history=yes
service_start_command = '/usr/bin/systemctl start pg12'
service_stop_command = '/usr/bin/systemctl stop pg12'
service_restart_command = '/usr/bin/systemctl restart pg12'
service_reload_command = '/usr/bin/systemctl reload pg12'
service_promote_command = '/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data'
repmgrd_service_start_command = '/usr/pgsql-12/bin/repmgrd -f /etc/repmgr/12/repmgr.conf -p /run/repmgr/repmgrd-12.pid -d --verbose'
repmgrd_service_stop_command = '/usr/bin/kill `cat /run/repmgr/repmgrd-12.pid`'

 

4、

根据以上配置文件,数据库添加repmgr用户

[root@centos7 ~]# su - postgres -c 'psql -p15433'
psql (12.4)
Type "help" for help.postgres=# create user repmgr with password '123456';
CREATE ROLE
postgres=# create database repmgr;
CREATE DATABASEALTER USER repmgr WITH REPLICATION;

5、

切换普通用户,利用repmgr注册主节点

[root@centos7 ~]# su - postgres
Last login: Sat Mar 30 23:08:48 CST 2024 on pts/0
-bash-4.2$ repmgr primary register
INFO: connecting to primary database...
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
ERROR: connection to database failed
DETAIL: 
fe_sendauth: no password suppliedDETAIL: attempted to connect using:user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=

失败了,原因是没有传递密码变量,执行变量命令,并重新注册,注册的时候临时给repmgr用户超级权限,注册成功

export PGPASSWORD="123456"-bash-4.2$ repmgr primary register -S postgres
WARNING: following problems with command line parameters detected:--superuser ignored when executing PRIMARY REGISTER
INFO: connecting to primary database...
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
NOTICE: attempting to install extension "repmgr"
DEBUG: established superuser connection as "postgres"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered

可以看到,扩展插件repmgr启用了,具体路径为repmgr数据库下的名为repmgr的schema下

 命令行也可以佐证确实是安装了插件repmgr:

repmgr=# set search_path to repmgr ;
SET
repmgr=# \dxList of installed extensionsName   | Version |   Schema   |            Description             
---------+---------+------------+------------------------------------plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural languagerepmgr  | 5.4     | repmgr     | Replication manager for PostgreSQL
(2 rows)

如果觉得注册的不对,可以重新注册,重新注册命令如下:

-bash-4.2$ repmgr primary unregister -S postgres --force
WARNING: following problems with command line parameters detected:--superuser ignored when executing PRIMARY UNREGISTER
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: node "pg1" (ID: 1) was successfully unregistered

看到 successfully后,再次执行注册就可以,查看repmgr集群的状态:

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                       
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------1  | pg1  | primary | * running |          | pg1      | 100      | 1        | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr

6、

从节点20的repmgr的配置文件

node_id=2
node_name='pg2'
conninfo='host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456'
data_directory='/data/pgsql/data'
replication_user='repmgr'
replication_type='physical'
#location='pg2'
use_replication_slots=true
witness_sync_interval=15
log_level='DEBUG'
log_facility='STDERR'
log_file='/var/log/repmgr/pg2.log'
pg_bindir='/usr/pgsql-12/bin/'
ssh_options='-q -o ConnectTimeout=10'
promote_check_timeout=60
promote_check_interval=1
node_rejoin_timeout=60
failover='automatic'
#priority=120
connection_check_type=ping
promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
monitoring_history=yes
service_start_command = '/usr/bin/systemctl start pg12'
service_stop_command = '/usr/bin/systemctl stop pg12'
service_restart_command = '/usr/bin/systemctl restart pg12'
service_reload_command = '/usr/bin/systemctl reload pg12'
service_promote_command = '/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data'
repmgrd_service_start_command = '/usr/pgsql-12/bin/repmgrd -f /etc/repmgr/12/repmgr.conf -p /run/repmgr/repmgrd-12.pid -d --verbose'
repmgrd_service_stop_command = '/usr/bin/kill `cat /run/repmgr/repmgrd-12.pid`'

7、

从节点利用流复制搭建主从集群,通过repmgr命令来执行(postgres这个普通用户执行)

repmgr standby clone -h 192.168.123.17 -p15433 -d repmgr --force  -U repmgr -c

HINT: this may take some time; consider using the -c/--fast-checkpoint option 

日志如下:

NOTICE: destination directory "/data/pgsql/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.123.17 port=15433 user=repmgr dbname=repmgr
DETAIL: current installation size is 32 MB
DEBUG: 1 node records returned by source node
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: upstream_node_id determined as 1
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: directory "/data/pgsql/data" exists but is not empty
NOTICE: -F/--force provided - deleting existing data directory "/data/pgsql/data"
DEBUG: replication slot "repmgr_slot_2" exists but is inactive; reusing
NOTICE: starting backup (using pg_basebackup)...
INFO: executing:/usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /data/pgsql/data -h 192.168.123.17 -p 15433 -U repmgr -c fast -X stream -S repmgr_slot_2 
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
could not identify current directory: No such file or directory
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: /usr/bin/systemctl start pg12
HINT: after starting the server, you need to register this standby with "repmgr standby register"

命令执行完成后,就可以启动从节点了:

systemctl start pg12

查看进程可以看到从节点正常复制了:

[root@centos10 ~]# ps aux |grep postgre
root      1637  0.0  0.0 191784  2336 pts/0    S    Mar30   0:00 su - postgres
postgres  1638  0.0  0.0 115444  2064 pts/0    S+   Mar30   0:00 -bash
postgres  1744  0.3  1.1 450504 46412 ?        Ss   00:05   0:00 /usr/pgsql-12/bin/postmaster -D /data/pgsql/data/
postgres  1745  0.0  0.0 260664  2084 ?        Ss   00:05   0:00 postgres: logger   
postgres  1746  0.0  0.0 450708  3100 ?        Ss   00:05   0:00 postgres: startup   recovering 000000010000000000000013
postgres  1747  0.0  0.0 450504  2112 ?        Ss   00:05   0:00 postgres: checkpointer   
postgres  1748  0.0  0.0 450504  2120 ?        Ss   00:05   0:00 postgres: background writer   
postgres  1749  0.0  0.0 262784  2008 ?        Ss   00:05   0:00 postgres: stats collector   
postgres  1750  0.4  0.1 457520  4700 ?        Ss   00:05   0:00 postgres: walreceiver   streaming 0/130007C8
root      1752  0.0  0.0 112712   964 pts/1    S+   00:05   0:00 grep --color=auto postgre

主节点正常发送wal日志了:

[root@centos7 ~]# ps aux |grep postgres
postgres  2205  0.0  1.1 450504 46464 ?        Ss   00:01   0:00 /usr/pgsql-12/bin/postmaster -D /data/pgsql/data/
postgres  2207  0.0  0.0 260664  2084 ?        Ss   00:01   0:00 postgres: logger   
postgres  2209  0.0  0.1 450908  4180 ?        Ss   00:01   0:00 postgres: checkpointer   
postgres  2210  0.0  0.0 450764  3916 ?        Ss   00:01   0:00 postgres: background writer   
postgres  2211  0.0  0.1 450504  6296 ?        Ss   00:01   0:00 postgres: walwriter   
postgres  2212  0.0  0.0 451784  3352 ?        Ss   00:01   0:00 postgres: autovacuum launcher   
postgres  2213  0.0  0.0 262784  2196 ?        Ss   00:01   0:00 postgres: archiver   last was 000000010000000000000012.00000028.backup
postgres  2214  0.0  0.0 262940  2352 ?        Ss   00:01   0:00 postgres: stats collector   
postgres  2215  0.0  0.0 451624  2856 ?        Ss   00:01   0:00 postgres: logical replication launcher   
root      2216  0.0  0.0 191784  2336 pts/0    S    00:01   0:00 su - postgres
postgres  2217  0.0  0.0 115444  2004 pts/0    S+   00:01   0:00 -bash
postgres  2260  0.0  0.0 451812  3848 ?        Ss   00:05   0:00 postgres: walsender repmgr 192.168.123.20(38408) streaming 0/130008B0
postgres  2265  0.0  0.1 452464  7044 ?        Ss   00:07   0:00 postgres: postgres postgres 192.168.123.1(56048) idle
root      2294  0.0  0.0 112712   960 pts/1    S+   00:10   0:00 grep --color=auto postgres

8、

注册从节点

-bash-4.2$ repmgr standby register
INFO: connecting to local node "pg2" (ID: 2)
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: connecting to primary database
DEBUG: connecting to: "user=repmgr dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "pg2" (ID: 2) successfully registered

9、

查看集群状态

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                           
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------1  | pg1  | primary | * running |          | default  | 100      | 1        | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=1234562  | pg2  | standby |   running | pg1      | default  | 100      | 1        | host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456



三、

自动故障转移和测试

如果需要启用自动故障转移,那么,我们需要开启repmgrd服务,每个节点都开启

开启比较简单,一条命令就可以了

systemctl start repmgr-12

服务启动完毕后,可以查看进程是否正确:

[root@centos7 ~]# ps aux |grep repm
postgres  5915  0.2  0.2 453488  9716 ?        Ss   09:28   0:00 postgres: repmgr repmgr 192.168.123.17(35544) idle
postgres  5917  0.0  0.0  84276  2160 ?        S    09:28   0:00 /usr/pgsql-12/bin/repmgrd -f /etc/repmgr/12/repmgr.conf -p /run/repmgr/repmgrd-12.pid -d --verbose
root      5925  0.0  0.0 112712   960 pts/3    S+   09:29   0:00 grep --color=auto repm

注意,这里问题来了,该服务是依赖于postgresql数据库的进程的,数据库必须是在线的才可以启停repmgrd服务

那么,很多人会疑惑,这个服务有什么用?其实很简单,就是故障转移用的,这些是关键配置,在发生宕机的情况下,这些命令会自动的启停数据库,例如,主节点宕机,那么,从节点的该服务将会在规定时间内检查主节点,如果一直找不到主,将会把自己提升为主,先执行这个命令:

promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'

然后执行这个命令:

service_promote_command = '/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data'

特别注意:这些命令全部需要绝对路径,否则会报找不到命令,配置文件修改后,重启repmgrd服务就可以生效

promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
monitoring_history=yes
service_start_command = '/usr/bin/systemctl start pg12'
service_stop_command = '/usr/bin/systemctl stop pg12'
service_restart_command = '/usr/bin/systemctl restart pg12'
service_reload_command = '/usr/bin/systemctl reload pg12'
service_promote_command = '/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data'
repmgrd_service_start_command = '/usr/pgsql-12/bin/repmgrd -f /etc/repmgr/12/repmgr.conf -p /run/repmgr/repmgrd-12.pid -d --verbose'
repmgrd_service_stop_command = '/usr/bin/kill `cat /run/repmgr/repmgrd-12.pid`'

下面是将主节点数据库服务关闭后,从节点的repmgrd服务打印的日志/var/log/repmgr/pg2.log

[2024-03-31 09:13:29] [INFO] 2 total nodes registered
[2024-03-31 09:13:29] [INFO] primary node  "pg1" (ID: 1) and this node have the same location ("default")
[2024-03-31 09:13:29] [INFO] no other sibling nodes - we win by default
[2024-03-31 09:13:29] [DEBUG] election result: WON
[2024-03-31 09:13:29] [NOTICE] this node is the only available candidate and will now promote itself
[2024-03-31 09:13:29] [DEBUG] get_node_record():SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1
[2024-03-31 09:13:29] [INFO] promote_command is:"/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file"
[2024-03-31 09:13:29] [NOTICE] redirecting logging output to "/var/log/repmgr/pg2.log"[2024-03-31 09:13:29] [DEBUG] connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
[2024-03-31 09:13:29] [DEBUG] connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
[2024-03-31 09:13:29] [DEBUG] connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
[2024-03-31 09:13:29] [NOTICE] promoting standby to primary
[2024-03-31 09:13:29] [DETAIL] promoting server "pg2" (ID: 2) using "/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data"
[2024-03-31 09:13:29] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2024-03-31 09:13:29] [DEBUG] setting node 2 as primary and marking existing primary as failed
[2024-03-31 09:13:29] [NOTICE] STANDBY PROMOTE successful
[2024-03-31 09:13:29] [DETAIL] server "pg2" (ID: 2) was successfully promoted to primary
[2024-03-31 09:13:29] [DEBUG] result of promote_command: 0
[2024-03-31 09:13:29] [INFO] checking state of node 2, 1 of 6 attempts

再次启动原主节点的postgresql数据库,在主节点查看集群状态,可以发现脑裂了:

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="ID | Name | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string                                                           
----+------+---------+----------------------+----------+----------+----------+----------+------------------------------------------------------------------------------1  | pg1  | primary | * running            |          | default  | 100      | 1        | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=1234562  | pg2  | standby | ! running as primary |          | default  | 100      | 2        | host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456WARNING: following issues were detected- node "pg2" (ID: 2) is registered as standby but running as primary

在从节点,可以看到显示的不一样,不过仍然是脑裂:

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                           
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------1  | pg1  | primary | - failed  | ?        | default  | 100      |          | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=1234562  | pg2  | primary | * running |          | default  | 100      | 2        | host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456WARNING: following issues were detected- unable to connect to node "pg1" (ID: 1)

解决方案:

首先,在从节点强制踢掉自己,数据库服务不能关闭,命令如下:

repmgr primary unregister --force

然后关闭从节点的数据库服务,重新从现主17服务器拉取:

repmgr standby clone -h 192.168.123.17 -p15433 -d repmgr --force  -U repmgr -c

同步拉取完毕后,在重新强制注册为standby节点,就恢复正常了:

-bash-4.2$ repmgr standby register --force
INFO: connecting to local node "pg2" (ID: 2)
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: connecting to primary database
DEBUG: connecting to: "user=repmgr password=1234562 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: standby registration complete
NOTICE: standby node "pg2" (ID: 2) successfully registered

再次查看集群状态,发现恢复正常了:

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                           
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------1  | pg1  | primary | * running |          | default  | 100      | 1        | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=1234562  | pg2  | standby |   running | pg1      | default  | 100      | 1        | host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456

总结:

脑裂的处理步骤是,首先,将异常的节点也就是 running as primary的节点踢掉,这个节点在哪,就在哪个节点踢掉,踢节点也就是unregister,踢完后standby clone,重新克隆的时候需要停止异常节点数据库服务,也就是原从节点的数据库服务,完事后在重新注册并再次启动原从节点的数据库就可以了;而脑裂产生的原因是原主节点数据库挂掉后,又重新启动数据库服务了,重新启动数据库后,原主认为自己是主节点,而原从节点已经提升为主节点,它也认为自己是主节点,造成集群内有两个主节点的信息了,那么,此时集群的主从复制就是混乱状态

如果脑裂很快发现按上述步骤没什么问题,但如果时间很长才发现,那么,需要根据pg_controlldata命令确定哪个是主,哪个是从,Latest checkpoint's oldestXID高的为主,反之为从,在从节点按上述操作即可

-bash-4.2$ pg_controldata 
pg_control version number:            1201
Catalog version number:               201909212
Database system identifier:           7351900556498275811
Database cluster state:               in production
pg_control last modified:             Sun 31 Mar 2024 11:41:05 AM CST
Latest checkpoint location:           0/2F001528
Latest checkpoint's REDO location:    0/2F0014F0
Latest checkpoint's REDO WAL file:    00000001000000000000002F
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:3382
Latest checkpoint's NextOID:          16454
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        479
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  3382
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Sun 31 Mar 2024 11:41:05 AM CST
Fake LSN counter for unlogged rels:   0/3E8
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    logical
wal_log_hints setting:                on
max_connections setting:              1000
max_worker_processes setting:         8
max_wal_senders setting:              10
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0
Mock authentication nonce:            e76af00554364729517eece5fa0fcf6f7f0662da3a154cabb730c2ab24edd267

重新加入节点:

repmgr node rejoin -h 192.168.123.20 -p 15433 -Upostgres -d repmgr --force-rewind

四,

主备手动切换

repmgr standby switchover --force -S postgres

这个命令比较方便,在从节点执行,不过需要配置免密,并且需要standby从库有上级,比如下面的,Upstream这一栏里,表明pg1的上级是pg2,pg2是主节点,pg1是从节点,

-bash-4.2$ repmgr cluster show
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                           
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------1  | pg1  | standby |   running | pg2      | default  | 100      | 2        | host=192.168.123.17 port=15433 user=repmgr dbname=repmgr password=1234562  | pg2  | primary | * running |          | default  | 100      | 3        | host=192.168.123.20 port=15433 user=repmgr dbname=repmgr password=123456

那么,想快速的把这两个节点调换一个位置就是上面的switchover命令了,非常的方便,比如,现在我又想把pg2切换到从,pg1切换到主,那么,应该是在pg1上执行switchover命令

在切换期间,pg2的数据库需要重启一次,重启完毕后,看pg1的日志如下:

EBUG: minimum of 1 free slots (0 for siblings) required; 10 available
NOTICE: attempting to pause repmgrd on 2 nodes
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: pausing repmgrd on node "pg1" (ID: 1)
DEBUG: pausing repmgrd on node "pg2" (ID: 2)
NOTICE: local node "pg1" (ID: 1) will be promoted to primary; current primary "pg2" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "pg2" (ID: 2)
DEBUG: remote_command():ssh -o Batchmode=yes -q -o ConnectTimeout=10 192.168.123.20 /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf node service --action=stop --checkpoint --superuser=postgres
NOTICE: issuing CHECKPOINT on node "pg2" (ID: 2) 
DETAIL: executing server command "/usr/bin/systemctl stop pg12"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 4 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 5 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 6 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 7 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 8 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 9 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 10 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 11 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 12 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 13 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 14 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 15 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 16 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_OK
DEBUG: sleeping 1 second until next check
INFO: checking for primary shutdown; 17 of 60 attempts ("shutdown_check_timeout")
DEBUG: ping status is: PQPING_NO_RESPONSE
DEBUG: remote_command():ssh -o Batchmode=yes -q -o ConnectTimeout=10 192.168.123.20 /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf node status --is-shutdown-cleanly
NOTICE: current primary has been cleanly shut down at location 0/3C000028
DEBUG: local node last receive LSN is 0/3C0000A0, primary shutdown checkpoint LSN is 0/3C000028
NOTICE: promoting standby to primary
DETAIL: promoting server "pg1" (ID: 1) using "/usr/pgsql-12/bin/pg_ctl promote -D /data/pgsql/data"
waiting for server to promote.... done
server promoted
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
DEBUG: setting node 1 as primary and marking existing primary as failed
NOTICE: STANDBY PROMOTE successful
DETAIL: server "pg1" (ID: 1) was successfully promoted to primary
DEBUG: executing:/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf  --no-wait -d \'user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433\' node rejoin > /tmp/node-rejoin.1711860263.log 2>&1 && echo "1" || echo "0"
DEBUG: remote_command():ssh -o Batchmode=yes -q -o ConnectTimeout=10 192.168.123.20 /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf  --no-wait -d \'user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433\' node rejoin > /tmp/node-rejoin.1711860263.log 2>&1 && echo "1" || echo "0"
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
ERROR: connection to database failed
DETAIL: 
could not connect to server: Connection refusedIs the server running on host "192.168.123.20" and acceptingTCP/IP connections on port 15433?DETAIL: attempted to connect using:user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=
ERROR: execution of "repmgr node rejoin" on demotion candidate "pg2" (ID: 2) failed
DETAIL: check log file "/tmp/node-rejoin.1711860263.log" on "pg2" for details
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
ERROR: connection to database failed
DETAIL: 
could not connect to server: Connection refusedIs the server running on host "192.168.123.20" and acceptingTCP/IP connections on port 15433?DETAIL: attempted to connect using:user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=
INFO: sleeping 1 second; 1 of 60 attempts ("standby_reconnect_timeout") to reconnect to demoted primary
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
NOTICE: replication slot "repmgr_slot_1" deleted on node 2
NOTICE: switchover was successful
DETAIL: node "pg1" is now primary and node "pg2" is attached as standby
DEBUG: unpausing repmgrd on node "pg1" (ID: 1)
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: unpausing repmgrd on node "pg2" (ID: 2)
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
NOTICE: STANDBY SWITCHOVER has completed successfully

在强调一次,需要配置免密才可以使用此功能,因为从切换到主的时候,需要ssh到主做一点事情,切换完毕后,原主也就是新从需要在注册一下,到pg2执行:

-bash-4.2$ repmgr standby register -F
INFO: connecting to local node "pg2" (ID: 2)
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: connecting to primary database
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
INFO: standby registration complete
NOTICE: standby node "pg2" (ID: 2) successfully registered

再看集群状态就十分正常了:

-bash-4.2$ repmgr service status
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.20 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr password=123456 dbname=repmgr host=192.168.123.17 port=15433 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="ID | Name | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+------+---------+-----------+----------+---------+------+---------+--------------------1  | pg1  | primary | * running |          | running | 7631 | no      | n/a                2  | pg2  | standby |   running | pg1      | running | 5511 | no      | 0 second(s) ago    

这篇关于Linux|centos7-postgresql数据库|yum安装数据库和配置repmgr高可用集群以及repmgr的日常管理工作的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/865627

相关文章

Spring Security基于数据库验证流程详解

Spring Security 校验流程图 相关解释说明(认真看哦) AbstractAuthenticationProcessingFilter 抽象类 /*** 调用 #requiresAuthentication(HttpServletRequest, HttpServletResponse) 决定是否需要进行验证操作。* 如果需要验证,则会调用 #attemptAuthentica

服务器集群同步时间手记

1.时间服务器配置(必须root用户) (1)检查ntp是否安装 [root@node1 桌面]# rpm -qa|grep ntpntp-4.2.6p5-10.el6.centos.x86_64fontpackages-filesystem-1.41-1.1.el6.noarchntpdate-4.2.6p5-10.el6.centos.x86_64 (2)修改ntp配置文件 [r

Zookeeper安装和配置说明

一、Zookeeper的搭建方式 Zookeeper安装方式有三种,单机模式和集群模式以及伪集群模式。 ■ 单机模式:Zookeeper只运行在一台服务器上,适合测试环境; ■ 伪集群模式:就是在一台物理机上运行多个Zookeeper 实例; ■ 集群模式:Zookeeper运行于一个集群上,适合生产环境,这个计算机集群被称为一个“集合体”(ensemble) Zookeeper通过复制来实现

CentOS7安装配置mysql5.7 tar免安装版

一、CentOS7.4系统自带mariadb # 查看系统自带的Mariadb[root@localhost~]# rpm -qa|grep mariadbmariadb-libs-5.5.44-2.el7.centos.x86_64# 卸载系统自带的Mariadb[root@localhost ~]# rpm -e --nodeps mariadb-libs-5.5.44-2.el7

Centos7安装Mongodb4

1、下载源码包 curl -O https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-rhel70-4.2.1.tgz 2、解压 放到 /usr/local/ 目录下 tar -zxvf mongodb-linux-x86_64-rhel70-4.2.1.tgzmv mongodb-linux-x86_64-rhel70-4.2.1/

HDFS—集群扩容及缩容

白名单:表示在白名单的主机IP地址可以,用来存储数据。 配置白名单步骤如下: 1)在NameNode节点的/opt/module/hadoop-3.1.4/etc/hadoop目录下分别创建whitelist 和blacklist文件 (1)创建白名单 [lytfly@hadoop102 hadoop]$ vim whitelist 在whitelist中添加如下主机名称,假如集群正常工作的节

Hadoop集群数据均衡之磁盘间数据均衡

生产环境,由于硬盘空间不足,往往需要增加一块硬盘。刚加载的硬盘没有数据时,可以执行磁盘数据均衡命令。(Hadoop3.x新特性) plan后面带的节点的名字必须是已经存在的,并且是需要均衡的节点。 如果节点不存在,会报如下错误: 如果节点只有一个硬盘的话,不会创建均衡计划: (1)生成均衡计划 hdfs diskbalancer -plan hadoop102 (2)执行均衡计划 hd

hadoop开启回收站配置

开启回收站功能,可以将删除的文件在不超时的情况下,恢复原数据,起到防止误删除、备份等作用。 开启回收站功能参数说明 (1)默认值fs.trash.interval = 0,0表示禁用回收站;其他值表示设置文件的存活时间。 (2)默认值fs.trash.checkpoint.interval = 0,检查回收站的间隔时间。如果该值为0,则该值设置和fs.trash.interval的参数值相等。

NameNode内存生产配置

Hadoop2.x 系列,配置 NameNode 内存 NameNode 内存默认 2000m ,如果服务器内存 4G , NameNode 内存可以配置 3g 。在 hadoop-env.sh 文件中配置如下。 HADOOP_NAMENODE_OPTS=-Xmx3072m Hadoop3.x 系列,配置 Nam