DM8环境DSC集群故障模拟及日志分析

2024-04-29 04:32

本文主要是介绍DM8环境DSC集群故障模拟及日志分析,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

上一节中讲到了DSC集群的服务管理和备份还原,这节对DSC集群的故障处理过程进行探讨。
首先,看一下实例环境中的数据库实例情况:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME
---------- ---- ------------- --------------- ---------SVR_VERSION                DB_VERSION         -------------------------- -------------------START_TIME                                                                                          ----------------------------------------------------------------------------------------------------STATUS$ MODE$  OGUID       DSC_SEQNO   DSC_ROLE    ------- ------ ----------- ----------- ------------
1          DSC0 DSC0          1               dcs0DM Database Server x64 V8  DB Version: 0x7000a2021-05-01 23:05:12OPEN    NORMAL 0           0           Control nodeused time: 76.219(ms). Execute id is 4.

在名为DSC0的实例中,该实例状态正常,目前为集群控制节点,查一下另外的实例情况:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION                DB_VERSION         
---------- ---- ------------- --------------- --------- -------------------------- -------------------START_TIME                                                                                           STATUS$---------------------------------------------------------------------------------------------------- -------MODE$  OGUID       DSC_SEQNO   DSC_ROLE   ------ ----------- ----------- -----------
1          DSC1 DSC1          2               dcs1      DM Database Server x64 V8  DB Version: 0x7000a2021-05-01 23:04:54                                                                                  OPENNORMAL 0           1           Normal nodeused time: 148.055(ms). Execute id is 1.

名为DSC1的实例状态也是正常的,目前为普通节点,下面模拟故障,通过系统KILL命令将实例进程强杀,确认实例进程已经不存在了,过程如下图所示:
在这里插入图片描述
在DISQL中进一步确认实例状态,此时DSC0中的DISQL已失去连接,在DSC1中的DISQL中查询实例,显示其状态已经切换为控制节点,如下:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION               
---------- ---- ------------- --------------- --------- --------------------------DB_VERSION         -------------------START_TIME                                                                                          ----------------------------------------------------------------------------------------------------STATUS$ MODE$  OGUID       DSC_SEQNO   DSC_ROLE    ------- ------ ----------- ----------- ------------
1          DSC1 DSC1          2               dcs1      DM Database Server x64 V8 DB Version: 0x7000a2021-05-01 23:04:54OPEN    NORMAL 0           1           Control node
used time: 1.152(ms). Execute id is 2.

分析日志文件中的详细过程,查看集群日志文件中的记录
cat dm_CSS0_202105.log

2021-05-01 23:50:16.096 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Detected EP DSC0[0] break in PROCESS_OPEN
2021-05-01 23:50:16.099 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC0[0] as break EP
2021-05-01 23:50:16.105 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (OPEN, STARTUP) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP1)
2021-05-01 23:50:16.105 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[0], break ep[0], recover ep[255], n_ok_ep[2]
2021-05-01 23:50:17.738 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC1[1] as Control node
2021-05-01 23:50:17.740 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP_CRASH_STEP1, dest_ep DSC1 seqno = 1, cmd_seq = 49
2021-05-01 23:50:17.748 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP1) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP1)
2021-05-01 23:50:17.748 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:18.754 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:18.759 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[EP_CRASH_STEP1] process over!
2021-05-01 23:50:18.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP1) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP2)
2021-05-01 23:50:18.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:19.770 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP_CRASH_STEP2, dest_ep DSC1 seqno = 1, cmd_seq = 52
2021-05-01 23:50:19.778 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP2) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP2)
2021-05-01 23:50:19.778 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:21.815 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:21.817 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[EP_CRASH_STEP2] process over!
2021-05-01 23:50:21.848 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP2) to (PROCESS_EP_CRASH, SLAVE_CONFIG_VIP)
2021-05-01 23:50:21.848 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:22.858 [ERROR] dmcss P0000000707 T0000000000000000766  [CSS]: css_vip_config(enp0s80, 192.168.56.121, 255.255.255.0, DOWN) failed
2021-05-01 23:50:22.865 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, SLAVE_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_SLAVE_CONFIG_VIP)
2021-05-01 23:50:22.866 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:23.925 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_SLAVE_CONFIG_VIP) to (PROCESS_EP_CRASH, MASTER_CONFIG_VIP)
2021-05-01 23:50:23.925 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:24.930 [INFO] dmcss P0000000707 T0000000000000000766  [CSS]: CSS set cmd CONFIG VIP, dest_ep CSS1 seqno = 1, cmd_seq = 3
2021-05-01 23:50:24.938 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, MASTER_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_MASTER_CONFIG_VIP)
2021-05-01 23:50:24.939 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:25.951 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_MASTER_CONFIG_VIP) to (PROCESS_EP_CRASH, EP_CONFIG_VIP)
2021-05-01 23:50:25.951 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:26.966 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd CONFIG VIP, dest_ep DSC1 seqno = 1, cmd_seq = 59
2021-05-01 23:50:26.975 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, EP_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_EP_CONFIG_VIP)
2021-05-01 23:50:26.975 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:27.980 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:27.984 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[CONFIG VIP] process over!
2021-05-01 23:50:27.991 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_EP_CONFIG_VIP) to (OPEN, STARTUP)
2021-05-01 23:50:27.991 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[255], n_ok_ep[1]

通过上面的记录可以看到,在我们将实例DSC0强行关闭后,dmcss进程立即检测到,并将该节点标记为故障节点,从正常开启状态切换为故障状态,选举DSC1实例为控制节点,并重新配置VIP,大致过程就是这样。
当我们再次启动DSC0实例后,通过DISQL查看实例状态:
在这里插入图片描述
可以看到,新加入的实例是普通节点,而原控制节点没有变化。查看故障实例启动后的日志,作以分析:

2021-05-01 23:57:25.354 [INFO] dmcss P0000000707 T0000000000000000766  css detect DB [DSC0] startup2
2021-05-01 23:57:25.354 [INFO] dmcss P0000000707 T0000000000000000766  css set DB [DSC0] guid [329285185]
2021-05-01 23:57:25.355 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC0[0] as recover EP
2021-05-01 23:57:25.364 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (OPEN, STARTUP) to (PROCESS_RECOVER, SUSPEND_WORKER)
2021-05-01 23:57:25.365 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[1]
2021-05-01 23:57:25.392 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SUSPEND_WORKER) to (PROCESS_RECOVER, WAIT_SUSPEND_WORKER)
2021-05-01 23:57:25.393 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[1]
2021-05-01 23:57:25.394 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd START NOTIFY, dest_ep DSC0 seqno = 0, cmd_seq = 64
2021-05-01 23:57:26.403 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd SUSPEND EP WORKER THREAD, dest_ep DSC1 seqno = 1, cmd_seq = 65
2021-05-01 23:57:30.434 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Suspend ep worker thread is over!
2021-05-01 23:57:30.487 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd DCR_LOAD, dest_ep DSC0 seqno = 0, cmd_seq = 66
2021-05-01 23:57:30.490 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd DCR_LOAD, dest_ep DSC1 seqno = 1, cmd_seq = 67
2021-05-01 23:57:30.498 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_SUSPEND_WORKER) to (PROCESS_RECOVER, CSS_SUB_STATUS_WAIT_DCR_LOAD)
2021-05-01 23:57:30.499 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:31.509 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Error ep add is over!
2021-05-01 23:57:31.510 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd ERROR EP ADD, dest_ep DSC0 seqno = 0, cmd_seq = 69
2021-05-01 23:57:31.514 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd ERROR EP ADD, dest_ep DSC1 seqno = 1, cmd_seq = 70
2021-05-01 23:57:31.524 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, CSS_SUB_STATUS_WAIT_DCR_LOAD) to (PROCESS_RECOVER, SUB_WAIT_ERROR_EP_ADD)
2021-05-01 23:57:31.524 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:32.554 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Error ep add is over!
2021-05-01 23:57:32.556 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP RECV, dest_ep DSC1 seqno = 1, cmd_seq = 72
2021-05-01 23:57:32.565 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SUB_WAIT_ERROR_EP_ADD) to (PROCESS_RECOVER, WAIT_EP_RECOVER)
2021-05-01 23:57:32.565 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:33.574 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Recover ep is over!
2021-05-01 23:57:33.580 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_RECOVER) to (PROCESS_RECOVER, EP_CONFIG_VIP)
2021-05-01 23:57:33.581 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:33.584 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd CONFIG VIP, dest_ep DSC1 seqno = 1, cmd_seq = 75
2021-05-01 23:57:33.592 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, EP_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_EP_CONFIG_VIP)
2021-05-01 23:57:33.593 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.641 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:57:34.648 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_CONFIG_VIP) to (PROCESS_RECOVER, MASTER_CONFIG_VIP)
2021-05-01 23:57:34.648 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.651 [INFO] dmcss P0000000707 T0000000000000000766  [CSS]: CSS set cmd CONFIG VIP, dest_ep CSS1 seqno = 1, cmd_seq = 4
2021-05-01 23:57:34.659 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, MASTER_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_MASTER_CONFIG_VIP)
2021-05-01 23:57:34.659 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.667 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_MASTER_CONFIG_VIP) to (PROCESS_RECOVER, SLAVE_CONFIG_VIP)
2021-05-01 23:57:34.667 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.672 [ERROR] dmcss P0000000707 T0000000000000000766  [CSS]: css_vip_config(enp0s80, 192.168.56.121, 255.255.255.0, UP) failed
2021-05-01 23:57:34.677 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SLAVE_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_SLAVE_CONFIG_VIP)
2021-05-01 23:57:34.677 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.680 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP START, dest_ep DSC0 seqno = 0, cmd_seq = 81
2021-05-01 23:57:34.688 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_SLAVE_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_STARTUP)
2021-05-01 23:57:34.689 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:35.700 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP START2, dest_ep DSC0 seqno = 0, cmd_seq = 83
2021-05-01 23:57:35.709 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_STARTUP) to (PROCESS_RECOVER, AFTER_REDO)
2021-05-01 23:57:35.710 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:37.723 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP OPEN, dest_ep DSC0 seqno = 0, cmd_seq = 85
2021-05-01 23:57:37.731 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, AFTER_REDO) to (PROCESS_RECOVER, WAIT_EP_OPEN)
2021-05-01 23:57:37.732 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:38.741 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC0 seqno = 0, cmd_seq = 0
2021-05-01 23:57:38.746 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd RESUME EP WORKER THREAD, dest_ep DSC1 seqno = 1, cmd_seq = 87
2021-05-01 23:57:38.756 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_OPEN) to (PROCESS_RECOVER, WAIT_RESUME_WORKER)
2021-05-01 23:57:38.756 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:39.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Resume ep worker thread is over!
2021-05-01 23:57:39.768 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:57:39.772 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP REAL OPEN, dest_ep DSC0 seqno = 0, cmd_seq = 89
2021-05-01 23:57:39.782 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_RESUME_WORKER) to (PROCESS_RECOVER, WAIT_REAL_OPEN)
2021-05-01 23:57:39.783 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:40.793 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC0 seqno = 0, cmd_seq = 0
2021-05-01 23:57:40.808 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_REAL_OPEN) to (OPEN, STARTUP)
2021-05-01 23:57:40.809 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[255], n_ok_ep[2]

通过上面的日志可以看到,实例DSC0启动后即被CSS进程检测到,首先为DSC0设置了 guid,并设该节点为故障恢复节点,并启动恢复过程,CSS set cmd分别为 START NOTIFY,SUSPEND EP WORKER THREAD,DCR_LOAD,ERROR EP ADD,通过这一连串的步骤,故障节点加入到集群中来,然后CSS set cmd EP RECV,进程实例恢复,配置VIP,再经过CSS set cmd EP START,CSS set cmd EP OPEN,CSS set cmd EP REAL OPEN,原故障实例转为普通节点,可对外提供服务。
故障处理小结:
DMCSS 控制节点检测到实例故障后,首先向故障实例的Voting disk 区域写入 Kill 命令(所有实例一旦发现 Kill 命令,无条件自杀),避免故障实例仍然处于活动状态,引 发脑裂,然后启动故障处理流程,不同类型实例的故障处理流程存在一些差异。

DMCSS 控制节点故障处理流程

  1. 活动节点重新选举 DMCSS 控制节点
  2. 新的 DMCSS 控制节点通知出现 DMCSS 故障节点对应的 dmasmsvr、dmserver 强制退出

DMASMSVR 实例故障处理流程

  1. 挂起工作线程
  2. 更新 DCR 的节点故障节点信息
  3. 通知故障节点对应 dmserver 强制退出
  4. dmasmsvr 进行故障恢复
  5. 恢复工作线程

dmserver 实例故障处理流程

  1. 更新 DCR 故障节点信息
  2. 重新选取一个控制节点
  3. 通知 dmserver 控制节点启动故障处理流程(参考 DMDSC 故障处理)
  4. 等待 dmserver 故障处理结束
    节点重加入
    如果检测到故障节点恢复,DMCSS 会通知控制节点启动节点重加入流程。

数据库实例重加入

  1. 挂起工作线程
  2. 修改节点的状态
  3. 执行恢复操作
  4. 重新进入 STARTUP 状态,准备启动
  5. OPEN 重加入的节点
  6. 重启工作线程
  7. 执行 OPEN 数据库实例的操作

DMASM 实例重加入

  1. 挂起工作线程
  2. 修改节点的状态
  3. 执行恢复操作
  4. 重新进入 STARTUP 状态,准备启动
  5. OPEN 重加入的节点
  6. 重启工作线程

这篇关于DM8环境DSC集群故障模拟及日志分析的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/945110

相关文章

怎样通过分析GC日志来定位Java进程的内存问题

《怎样通过分析GC日志来定位Java进程的内存问题》:本文主要介绍怎样通过分析GC日志来定位Java进程的内存问题,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、GC 日志基础配置1. 启用详细 GC 日志2. 不同收集器的日志格式二、关键指标与分析维度1.

解读GC日志中的各项指标用法

《解读GC日志中的各项指标用法》:本文主要介绍GC日志中的各项指标用法,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、基础 GC 日志格式(以 G1 为例)1. Minor GC 日志2. Full GC 日志二、关键指标解析1. GC 类型与触发原因2. 堆

MySQL中的表连接原理分析

《MySQL中的表连接原理分析》:本文主要介绍MySQL中的表连接原理分析,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录1、背景2、环境3、表连接原理【1】驱动表和被驱动表【2】内连接【3】外连接【4编程】嵌套循环连接【5】join buffer4、总结1、背景

SQLite3 在嵌入式C环境中存储音频/视频文件的最优方案

《SQLite3在嵌入式C环境中存储音频/视频文件的最优方案》本文探讨了SQLite3在嵌入式C环境中存储音视频文件的优化方案,推荐采用文件路径存储结合元数据管理,兼顾效率与资源限制,小文件可使用B... 目录SQLite3 在嵌入式C环境中存储音频/视频文件的专业方案一、存储策略选择1. 直接存储 vs

python中Hash使用场景分析

《python中Hash使用场景分析》Python的hash()函数用于获取对象哈希值,常用于字典和集合,不可变类型可哈希,可变类型不可,常见算法包括除法、乘法、平方取中和随机数哈希,各有优缺点,需根... 目录python中的 Hash除法哈希算法乘法哈希算法平方取中法随机数哈希算法小结在Python中,

Java Stream的distinct去重原理分析

《JavaStream的distinct去重原理分析》Javastream中的distinct方法用于去除流中的重复元素,它返回一个包含过滤后唯一元素的新流,该方法会根据元素的hashcode和eq... 目录一、distinct 的基础用法与核心特性二、distinct 的底层实现原理1. 顺序流中的去重

Redis分片集群、数据读写规则问题小结

《Redis分片集群、数据读写规则问题小结》本文介绍了Redis分片集群的原理,通过数据分片和哈希槽机制解决单机内存限制与写瓶颈问题,实现分布式存储和高并发处理,但存在通信开销大、维护复杂及对事务支持... 目录一、分片集群解android决的问题二、分片集群图解 分片集群特征如何解决的上述问题?(与哨兵模

SpringBoot连接Redis集群教程

《SpringBoot连接Redis集群教程》:本文主要介绍SpringBoot连接Redis集群教程,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录1. 依赖2. 修改配置文件3. 创建RedisClusterConfig4. 测试总结1. 依赖 <de

关于MyISAM和InnoDB对比分析

《关于MyISAM和InnoDB对比分析》:本文主要介绍关于MyISAM和InnoDB对比分析,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录开篇:从交通规则看存储引擎选择理解存储引擎的基本概念技术原理对比1. 事务支持:ACID的守护者2. 锁机制:并发控制的艺

MySQL 打开binlog日志的方法及注意事项

《MySQL打开binlog日志的方法及注意事项》本文给大家介绍MySQL打开binlog日志的方法及注意事项,本文通过实例代码给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要... 目录一、默认状态二、如何检查 binlog 状态三、如何开启 binlog3.1 临时开启(重启后失效)