本文主要是介绍【全网首发】Mogdb 5.0.6新特性:CM双网卡生产落地方案,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
在写这篇文章的时候,刚刚加班结束,顺手写了这篇文章。
前言
某大型全国性行业核心系统数据库需要A、B两个物理隔离的双网卡架构方案,已成为行业标准。而最新发布的MogDB 5.0.6的CM新增支持流复制双网段部署,用于网卡级高可用容灾(PTK 1.4及更高版本)。支持双网段后,某个节点在发生单网段故障后仍然能保证正常工作,提供网段级故障的高可用管理能力,保驾了数据库底座的稳定运行。
CM简介
CM(Cluster Manager)是一款集群资源管理软件。支持自定义资源 监控,提供了数据库主备的状态监控、网络通信故障监控、文件系统 故障监控、故障自动主备切换等能力。 MogDB数据库和CM是解耦的,可以分开安装。CM为数据库 集群提供管理服务,功能十分强大。
CM功能特性
-
数据库实例主备角色仲裁 -
CM自身高可用仲裁 -
数据库实例运行状态监控 -
数据库实例所在节点资源检测 -
数据库集群、节点、实例的启动、停止 -
数据库集群状态查询、状态更新 -
数据库集群数据库实例switchover、failover -
支持双网虚拟ip(VIP) -
CM支持两节点部署 -
CM事件触发器 -
CM支持双网段
生产落地方案
如图为完整架构方案,MogDB支持主备双机房的方案,A和B网络物理隔离,分别联通独立的A和B交换机,两者不互通,并且在A和B上挂载两个vip,对应用服务器供服务。
流复制网络方案
C和D网卡做Bond0,再通过A和B两个交换机中单独虚拟出一个vlan,通过VLAN Trunking技术做堆叠,提供流复制网络的交换机高可用,为流复制网络提供通讯。
本次落地的方案
此次方案为一主一备的架构。同样A和B网络提供业务网络。 C和D网卡做Bond0+VLAN Trunking技术做堆叠,为流复制提供冗余。
安装部署
主机 | 业务A网 | 业务B网 | 流复制网 |
---|---|---|---|
mogdb1 | 10.130.0.5 | 10.130.4.5 | 192.168.100 |
mogdb2 | 10.130.0.7 | 10.130.4.7 | 192.168.101 |
配置config
PTK对CM实现了适配,安装时在config.yaml文件中配置参数ha_ips即可实现双网段部署。需要下载PTK 1.4及更高版本。
-
PTK下载地址: https://www.mogdb.io/downloads/ptk/all
vi config.2024-04-01T17_37_01.yaml
global:
cluster_name: ats_cs
user: omm506
group: omm506
db_password: pTk6YmQwZmY1ODU8QDw9PUM/Q085ZDFjV2g5dGZFQVRpSlJfQ2tGeG1GTzB3WjRfa3lDSmpsdjVTcVdVLUE=
db_port: 28000
base_dir: /opt/mogdb506
db_conf:
log_min_messages: 'DEBUG5'
ssh_option:
port: 22
user: root
key_file: "/root/.ssh/id_rsa"
cm_option:
cm_server_port: 15300
cm_server_conf:
ddb_type: 1
enable_ssl: on
third_party_gateway_ip: 10.130.7.254,10.130.3.254
cms_enable_failover_on2nodes: 'true'
cms_enable_db_crash_recovery: 'true'
log_min_messages: 'DEBUG5'
cm_agent_conf:
enable_ssl: on
log_min_messages: 'DEBUG5'
db_servers:
- host: 10.130.4.5
role: primary
az_name: AZ1
az_priority: 1
ha_ips: [10.130.0.5]
ssh_option:
password: pTk6MzkzYTk5ZTg8QDw9PUM/REpvRHhwYk1LWGRjS3dER0I5RC1NNFVnMFNjalU1NUpQSGVCWXFHTU5LLVU=
- host: 10.130.4.7
role: standby
az_name: AZ1
az_priority: 1
ha_ips: [10.130.0.7]
ssh_option:
password: pTk6MzkzYTk5ZTg8QDw9PUM/REpvRHhwYk1LWGRjS3dER0I5RC1NNFVnMFNjalU1NUpQSGVCWXFHTU5LLVU=
查看网卡的网关
两节点部署(一主一备),需要在yaml配置文件中配置third_party_gateway_ip参数作为仲裁ip(单网段的三方网关)为双网段参数,以逗号分隔。这里配置如下: third_party_gateway_ip :10.130.7.254,10.130.3.254
[root@mogdb1 ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.130.7.254 0.0.0.0 UG 102 0 0 em1
0.0.0.0 10.130.3.254 0.0.0.0 UG 103 0 0 em2
10.130.0.0 0.0.0.0 255.255.252.0 U 103 0 0 em2
10.130.4.0 0.0.0.0 255.255.252.0 U 102 0 0 em1
192.168.1.0 0.0.0.0 255.255.255.0 U 300 0 0 bond0
192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
安装数据库
加上参数 --install-cm表示同数据库一起安装cm,也可以单独部署。
ptk install -f config.2024-04-01T17_37_01.yaml --install-cm -p MogDB-5.0.6-Kylin-x86_64-all.tar.gz
INFO[2024-04-22T16:21:59.149] PTK 版本: 1.4.1 release
INFO[2024-04-22T16:21:59.149] 从 config.2024-04-01T17_37_01.yaml 加载配置
如果您选择继续安装软件,
就代表您接受该软件的许可协议。
[Y]: 接受并继续
[C]: 显示许可协议内容
[N]: 中止安装并退出
✔ 请输入 (默认: Y): y
INFO[2024-04-22T16:22:00.884] CM 启用, 但 cm_servers 为空,默认使用数据库的服务器作为 CM 服务器
集群将包含 2 个 CM 节点,所以请确认以下 cms 的配置:
- (Optional) db_service_vip=""
- (Required) third_party_gateway_ip="10.130.7.254,10.130.3.254"
- (Optional) cms_enable_failover_on2nodes="true"
- (Optional) cms_enable_db_crash_recovery="true"
✔ 请输入虚拟 IP (仅支持 IPv4): 10.130.4.6,10.130.0.6
现在这些配置的值是:
- db_service_vip="10.130.4.6,10.130.0.6"
- third_party_gateway_ip="10.130.7.254,10.130.3.254"
- cms_enable_failover_on2nodes="true"
- cms_enable_db_crash_recovery="true"
✔ 您想修改它们吗 (默认 n) [y/n]: n
集群名:"ats_cs"
+--------------+------------+----------------+----------+------------+-----------------------+----------+
| az(priority) | ip | user(group) | port | role | data dir | upstream |
+--------------+------------+----------------+----------+------------+-----------------------+----------+
| AZ1(1) | 10.130.4.7 | omm506(omm506) | db:28000 | db:standby | db:/opt/mogdb506/data | - |
| | | | cm:15300 | | cm:/opt/mogdb506/cm | |
| | 10.130.4.5 | omm506(omm506) | db:28000 | db:primary | db:/opt/mogdb506/data | - |
MogDB和CM安装完毕。
添加VIP
建议添加参数--log-level debug方便查看和分析报错。ptk会自动判断把vip挂载同网段的地址上,简单易用。
ptk cluster -n ats_cs load-cm-vip --vip 10.130.4.6 --action install --log-level debug
ptk cluster -n ats_cs load-cm-vip --vip 10.130.0.6 --action install --log-level debug
添加完后,会在 cd /opt/mogdb506/cm/cm_agent的下新增cm_resource.json的配置文件,"float_ip": "10.130.4.6","float_ip": "10.130.0.6"已经添加。
{
"resources": [{
"name": "VIP_az996037",
"resources_type": "VIP",
"instances": [{
"node_id": 1,
"res_instance_id": 6001,
"inst_attr": "base_ip=10.130.4.5"
}, {
"node_id": 2,
"res_instance_id": 6002,
"inst_attr": "base_ip=10.130.4.7"
}],
"float_ip": "10.130.4.6"
}, {
"name": "VIP_az40655",
"resources_type": "VIP",
"instances": [{
"node_id": 1,
"res_instance_id": 6001,
"inst_attr": "base_ip=10.130.0.5"
}, {
"node_id": 2,
"res_instance_id": 6002,
"inst_attr": "base_ip=10.130.0.7"
}],
"float_ip": "10.130.0.6"
cm_ctl show命令查看vip信息 vip:10.130.4.6挂载到10.130.4.5 ,vip:10.130.0.5挂载到10.130.0.6上面。
[omm506@mogdb1 cm_agent]$ cm_ctl show
[ Network Connect State ]
Network timeout: 6s
Current CMServer time: 2024-04-22 18:29:06
Network stat('Y' means connected, otherwise 'N'):
| \ | Y |
| Y | \ |
[ Node Disk HB State ]
Node disk hb timeout: 200s
Current CMServer time: 2024-04-22 18:29:07
Node disk hb stat('Y' means connected, otherwise 'N'):
| N | N |
[ FloatIp Network State ]
node instance base_ip float_ip_name float_ip
------------------------------------------------------
1 mogdb1 6001 10.130.4.5 VIP_az996037 10.130.4.6
1 mogdb1 6001 10.130.0.5 VIP_az40655 10.130.0.6
修改流复制ip
两台机器的postgresql.con都修改replconninfo[1-2]参数,如下。
replconninfo1 = 'localhost=192.168.1.100 localport=28001 localheartbeatport=26005 localservice=28004 remotehost=192.168.1.101 remoteport=28001 remoteheartbeatport=28005 remoteservice=28004'
replconninfo2 = 'localhost=192.168.1.101 localport=28001 localheartbeatport=28005 localservice=28004 remotehost=192.168.1.100 remoteport=28001 remoteheartbeatport=28005 remoteservice=28004'
修改pg_hba
添加互信,不然会出现Forbid remote connection with initia1 user的报错,添加#add MogDB那部分内容。
# "local" is for Unix domain socket connections only
local all all trust
host all omm506 10.130.0.5/32 trust
host all omm506 10.130.4.5/32 trust
host all omm506 10.130.0.7/32 trust
host all omm506 10.130.4.7/32 trust
# add MogDB
host all omm506 192.168.1.101/32 trust
host all omm506 192.168.1.100/32 trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
host all all 10.130.0.5/32 sha256
host all all 10.130.4.5/32 sha256
host all all 10.130.0.7/32 sha256
host all all 10.130.4.7/32 sha256
# add MogDB
host all all 192.168.1.101/32 sha256
host all all 192.168.1.100/32 sha256
host all all 10.130.4.6/32 sha256
host all all 10.130.0.6/32 sha256
# IPv6 local connections:
host all all ::1/128 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local replication omm506 trust
#host replication omm506 127.0.0.1/32 trust
#host replication omm506 ::1/128 trust
# add MogDB
host all all 0.0.0.0/0 md5
修改local_bind_address参数
设置为*。
vi /opt/mogdb506/data/postgresql.conf
local_bind_address = '*'
重启生效
这里能看到流复制通道为:192.168.1.100:28001-->192.168.1.101:56732,cm接管的为业务ip:10.130.4.5,10.130.0.5,10.130.4.7,10.130.0.7,大功告成。
[root@mogdb1 ~]# ptk cluster -n ats_cs status --detail
[ Cluster State ]
cluster_name : ats_cs
cluster_state : Normal
database_version : MogDB 5.0.6 (build 8b0a6ca8)
active_vip : 10.130.4.6,10.130.0.6
[ CMServer State ]
id | ip | port | hostname | role
-----+-----------------------+-------+----------+----------
1 | 10.130.4.5,10.130.0.5 | 15300 | mogdb1 | primary
2 | 10.130.4.7,10.130.0.7 | 15300 | mogdb2 | standby
[ Datanode State ]
cluster_name | id | ip | port | user | nodename | db_role | state | uptime | upstream
---------------+------+------------+-------+--------+----------+---------+--------+----------+-----------
ats_cs | 6001 | 10.130.4.5 | 28000 | omm506 | dn_6001 | primary | Normal | 00:02:16 | -
| | 10.130.0.5 | | | | | | |
| 6002 | 10.130.4.7 | 28000 | omm506 | dn_6002 | standby | Normal | 00:02:17 | -
| | 10.130.0.7 | | | | | | |
[ DataNode Detail ]
--------------- 10.130.4.5:28000(dn_6001) ---------------
role : primary
data_dir : /opt/mogdb506/data
az_name : AZ1
[Senders Info]:
sender_pid : 3158066
local_role : Primary
peer_role : Standby
peer_state : Normal
state : Streaming
sender_sent_location : 0/5E95998
sender_write_location : 0/5E95998
sender_flush_location : 0/5E95998
sender_replay_location : 0/5E95998
receiver_received_location : 0/5E95998
receiver_write_location : 0/5E95998
receiver_flush_location : 0/5E95998
receiver_replay_location : 0/5E95998
sync_percent : 100%
sync_state : Sync
sync_priority : 1
sync_most_available : On
channel : 192.168.1.100:28001-->192.168.1.101:56732
--------------- 10.130.4.7:28000(dn_6002) ---------------
role : standby
data_dir : /opt/mogdb506/data
az_name : AZ1
[Receiver Info]:
receiver_pid : 3121372
local_role : Standby
peer_role : Primary
peer_state : Normal
state : Normal
sender_sent_location : 0/5E95AB8
sender_write_location : 0/5E95AB8
sender_flush_location : 0/5E95AB8
sender_replay_location : 0/5E95AB8
receiver_received_location : 0/5E95AB8
receiver_write_location : 0/5E95AB8
receiver_flush_location : 0/5E95AB8
receiver_replay_location : 0/5E95AB8
sync_percent,channel : 100%
CM双网高可用测试
测试场景:
-
-
备库单网卡故障
-
-
-
主库A网卡故障
-
-
-
主库B网卡故障
-
预期:
-
-
无影响,不影响主备同步,不影响业务
-
-
-
主库A网故障,无影响,不影响主备同步,不影响业务
-
-
-
主库B网故障,进行主备切换,备库升主(switchover)
-
备库业务单网故障
[root@mogdb2 dn_6002]# ifdown em2
WARN : [ifdown] 您正在使用由 ‘network-scripts’ 所提供的 ‘ifdown’,这一命令现在不推荐使用。
WARN : [ifdown] ‘network-scripts’ 将在不久的将来的发行版中被移除。
WARN : [ifdown] 建议切换到 ‘NetworkManager’ 作为代替,它也提供了 ‘ifup/ifdown’ 脚本。
[root@mogdb1 ~]# ptk cluster -n ats_cs status --detail
[ Cluster State ]
cluster_name : ats_cs
cluster_state : Normal
database_version : MogDB 5.0.6 (build 8b0a6ca8)
active_vip : 10.130.4.6,10.130.0.6
[ CMServer State ]
id | ip | port | hostname | role
-----+-----------------------+-------+----------+----------
1 | 10.130.4.5,10.130.0.5 | 15300 | mogdb1 | primary
2 | 10.130.4.7,10.130.0.7 | 15300 | mogdb2 | standby
[ Datanode State ]
cluster_name | id | ip | port | user | nodename | db_role | state | uptime | upstream
---------------+------+------------+-------+--------+----------+---------+--------+----------+-----------
ats_cs | 6001 | 10.130.4.5 | 28000 | omm506 | dn_6001 | primary | Normal | 00:03:16 | -
| | 10.130.0.5 | | | | | | |
| 6002 | 10.130.4.7 | 28000 | omm506 | dn_6002 | standby | Normal | 00:03:17 | -
| | 10.130.0.7 | | | | | | |
[ DataNode Detail ]
--------------- 10.130.4.5:28000(dn_6001) ---------------
role : primary
data_dir : /opt/mogdb506/data
az_name : AZ1
[Senders Info]:
sender_pid : 3158066
local_role : Primary
peer_role : Standby
peer_state : Normal
state : Streaming
sender_sent_location : 0/5E95998
sender_write_location : 0/5E95998
sender_flush_location : 0/5E95998
sender_replay_location : 0/5E95998
receiver_received_location : 0/5E95998
receiver_write_location : 0/5E95998
receiver_flush_location : 0/5E95998
receiver_replay_location : 0/5E95998
sync_percent : 100%
sync_state : Sync
sync_priority : 1
sync_most_available : On
channel : 192.168.1.100:28001-->192.168.1.101:56732
--------------- 10.130.4.7:28000(dn_6002) ---------------
role : standby
data_dir : /opt/mogdb506/data
az_name : AZ1
[Receiver Info]:
receiver_pid : 3121372
local_role : Standby
peer_role : Primary
peer_state : Normal
state : Normal
sender_sent_location : 0/5E95AB8
sender_write_location : 0/5E95AB8
sender_flush_location : 0/5E95AB8
sender_replay_location : 0/5E95AB8
receiver_received_location : 0/5E95AB8
receiver_write_location : 0/5E95AB8
receiver_flush_location : 0/5E95AB8
receiver_replay_location : 0/5E95AB8
sync_percent,channel : 100%
无影响,不影响主备同步,不影响业务,集群一切正常。
主库A网故障
down掉A网。
[root@mogdb1 ~]# ifdown em1
WARN : [ifdown] 您正在使用由 ‘network-scripts’ 所提供的 ‘ifdown’,这一命令现在不推荐使用。
WARN : [ifdown] ‘network-scripts’ 将在不久的将来的发行版中被移除。
WARN : [ifdown] 建议切换到 ‘NetworkManager’ 作为代替,它也提供了 ‘ifup/ifdown’ 脚本。
数据库同步正常
[2024-04-23 19:44:12.469][944266][][gs_ctl]: gs_ctl query ,datadir is /opt/mogdb506/data
HA state:
local_role : Primary
static_connections : 2
db_state : Normal
detail_information : Normal
Senders info:
sender_pid : 226949
local_role : Primary
peer_role : Standby
peer_state : Normal
state : Streaming
sender_sent_location : 0/A1575E8
sender_write_location : 0/A1575E8
sender_flush_location : 0/A1575E8
sender_replay_location : 0/A1575E8
receiver_received_location : 0/A1575E8
receiver_write_location : 0/A1575E8
receiver_flush_location : 0/A1575E8
receiver_replay_location : 0/A1575E8
sync_percent : 100%
sync_state : Sync
sync_priority : 1
sync_most_available : On
channel : 192.168.1.100:28001-->192.168.1.101:41842
Receiver info:
No information
主库B网故障
down掉B网。
[root@mogdb1 ~]# ifdown em2
WARN : [ifdown] 您正在使用由 ‘network-scripts’ 所提供的 ‘ifdown’,这一命令现在不推荐使用。
WARN : [ifdown] ‘network-scripts’ 将在不久的将来的发行版中被移除。
WARN : [ifdown] 建议切换到 ‘NetworkManager’ 作为代替,它也提供了 ‘ifup/ifdown’ 脚本。
成功断开设备 "em2"。
主库数据库被cm关闭。
[omm506@mogdb1 ~]$ cm_ctl query -Cvi
[ CMServer State ]
node node_ip instance state
----------------------------------------------------------
cm_ctl: [DoConnCmserver] ip 10.130.4.5 is not reachable.
cm_ctl: [DoConnCmserver] ip 10.130.0.5 is not reachable.
cm_ctl: [DoConnCmserver] ip 10.130.4.7 is not reachable.
cm_ctl: [DoConnCmserver] ip 10.130.0.7 is not reachable.
1 mogdb1 10.130.4.5,10.130.0.5 1 Down
2 mogdb2 10.130.4.7,10.130.0.7 2 Down
cm_ctl: can't connect to cm_server.
Maybe cm_server is not running, or timeout expired. Please try again.
主库数据库被cm关闭。
[omm506@mogdb1 ~]$ ps -ef |grep mogdb
pcp 2823 1 0 19:27 ? 00:00:00 /usr/bin/pmie -b -h local: -l /var/log/pcp/pmie/mogdb1/pmie.log -c config.default
omm506 8941 1 0 19:28 ? 00:00:05 /opt/mogdb506/app/bin/om_monitor -L /opt/mogdb506/log/cm/om_monitor
omm506 8945 8941 5 19:28 ? 00:01:18 /opt/mogdb506/app/bin/cm_agent
omm506 8988 1 0 19:28 ? 00:00:00 mogdb fenced UDF master process
omm506 1115099 1114674 0 19:51 pts/1 00:00:00 grep mogdb
[omm506@mogdb1 ~]$ gsql -r
failed to connect /opt/mogdb506/tmp:28000.
cm_ctl query -Cvi查看集群状态,可以看出备库已经是Primary Normal,已经升主。
[ CMServer State ]
node node_ip instance state
----------------------------------------------------------
cm_ctl: [DoConnCmserver] ip 10.130.4.5 is not reachable.
cm_ctl: [DoConnCmserver] ip 10.130.0.5 is not reachable.
1 mogdb1 10.130.4.5,10.130.0.5 1 Down
2 mogdb2 10.130.4.7,10.130.0.7 2 Primary
[ Cluster State ]
cluster_state : Degraded
redistributing : No
balanced : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
--------------------------------------------------------------------------------------------------------------------------------------------
1 mogdb1 10.130.4.5,10.130.0.5 6001 P Down Unknown | 2 mogdb2 10.130.4.7,10.130.0.7 6002 S Primary Normal
cm_ctl show查看vip状态,全部漂移到mogdb2的A和B网上,对业务透明无感知。
[omm506@mogdb2 ~]$ cm_ctl show
cm_ctl: [DoConnCmserver] ip 10.130.4.5 is not reachable.
cm_ctl: [DoConnCmserver] ip 10.130.0.5 is not reachable.
[ Network Connect State ]
Network timeout: 6s
Current CMServer time: 2024-04-23 19:52:21
Network stat('Y' means connected, otherwise 'N'):
| \ | N |
| N | \ |
[ Node Disk HB State ]
Node disk hb timeout: 200s
Current CMServer time: 2024-04-23 19:52:22
Node disk hb stat('Y' means connected, otherwise 'N'):
| N | N |
[ FloatIp Network State ]
node instance base_ip float_ip_name float_ip
------------------------------------------------------
2 mogdb2 6002 10.130.4.7 VIP_az996037 10.130.4.6
总结
MogDB 5.0.6特性CM支持双网段,在某个节点在发生单网段故障后仍然能保证正常工作,提供网段级故障的高可用管理能力。Mogdb首先实现了对Oracle传统双网方案的兼容,并且为保证双网方案具有较好的业务切换体验,还实现了IT环境解耦,持续稳定支撑核心系统。
作者介绍:云和恩墨资深Oracle dba,专注于数据库运维、架构和行业发展,有12年左右的金融、保险、政府、地税、运营商等业务关键型系统的运维经验,曾担任公司异常恢复东区接口人,负责紧急异常恢复工作,技术二线专家。目前负责PG、openGauss/MogDB运维、国产化MogDB数据库的推广工作。
这篇关于【全网首发】Mogdb 5.0.6新特性:CM双网卡生产落地方案的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!