DM8环境DSC集群故障模拟及日志分析

2024-04-29 04:32

本文主要是介绍DM8环境DSC集群故障模拟及日志分析,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

上一节中讲到了DSC集群的服务管理和备份还原,这节对DSC集群的故障处理过程进行探讨。
首先,看一下实例环境中的数据库实例情况:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME
---------- ---- ------------- --------------- ---------SVR_VERSION                DB_VERSION         -------------------------- -------------------START_TIME                                                                                          ----------------------------------------------------------------------------------------------------STATUS$ MODE$  OGUID       DSC_SEQNO   DSC_ROLE    ------- ------ ----------- ----------- ------------
1          DSC0 DSC0          1               dcs0DM Database Server x64 V8  DB Version: 0x7000a2021-05-01 23:05:12OPEN    NORMAL 0           0           Control nodeused time: 76.219(ms). Execute id is 4.

在名为DSC0的实例中,该实例状态正常,目前为集群控制节点,查一下另外的实例情况:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION                DB_VERSION         
---------- ---- ------------- --------------- --------- -------------------------- -------------------START_TIME                                                                                           STATUS$---------------------------------------------------------------------------------------------------- -------MODE$  OGUID       DSC_SEQNO   DSC_ROLE   ------ ----------- ----------- -----------
1          DSC1 DSC1          2               dcs1      DM Database Server x64 V8  DB Version: 0x7000a2021-05-01 23:04:54                                                                                  OPENNORMAL 0           1           Normal nodeused time: 148.055(ms). Execute id is 1.

名为DSC1的实例状态也是正常的,目前为普通节点,下面模拟故障,通过系统KILL命令将实例进程强杀,确认实例进程已经不存在了,过程如下图所示:
在这里插入图片描述
在DISQL中进一步确认实例状态,此时DSC0中的DISQL已失去连接,在DSC1中的DISQL中查询实例,显示其状态已经切换为控制节点,如下:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION               
---------- ---- ------------- --------------- --------- --------------------------DB_VERSION         -------------------START_TIME                                                                                          ----------------------------------------------------------------------------------------------------STATUS$ MODE$  OGUID       DSC_SEQNO   DSC_ROLE    ------- ------ ----------- ----------- ------------
1          DSC1 DSC1          2               dcs1      DM Database Server x64 V8 DB Version: 0x7000a2021-05-01 23:04:54OPEN    NORMAL 0           1           Control node
used time: 1.152(ms). Execute id is 2.

分析日志文件中的详细过程,查看集群日志文件中的记录
cat dm_CSS0_202105.log

2021-05-01 23:50:16.096 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Detected EP DSC0[0] break in PROCESS_OPEN
2021-05-01 23:50:16.099 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC0[0] as break EP
2021-05-01 23:50:16.105 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (OPEN, STARTUP) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP1)
2021-05-01 23:50:16.105 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[0], break ep[0], recover ep[255], n_ok_ep[2]
2021-05-01 23:50:17.738 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC1[1] as Control node
2021-05-01 23:50:17.740 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP_CRASH_STEP1, dest_ep DSC1 seqno = 1, cmd_seq = 49
2021-05-01 23:50:17.748 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP1) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP1)
2021-05-01 23:50:17.748 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:18.754 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:18.759 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[EP_CRASH_STEP1] process over!
2021-05-01 23:50:18.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP1) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP2)
2021-05-01 23:50:18.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:19.770 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP_CRASH_STEP2, dest_ep DSC1 seqno = 1, cmd_seq = 52
2021-05-01 23:50:19.778 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP2) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP2)
2021-05-01 23:50:19.778 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:21.815 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:21.817 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[EP_CRASH_STEP2] process over!
2021-05-01 23:50:21.848 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP2) to (PROCESS_EP_CRASH, SLAVE_CONFIG_VIP)
2021-05-01 23:50:21.848 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:22.858 [ERROR] dmcss P0000000707 T0000000000000000766  [CSS]: css_vip_config(enp0s80, 192.168.56.121, 255.255.255.0, DOWN) failed
2021-05-01 23:50:22.865 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, SLAVE_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_SLAVE_CONFIG_VIP)
2021-05-01 23:50:22.866 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:23.925 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_SLAVE_CONFIG_VIP) to (PROCESS_EP_CRASH, MASTER_CONFIG_VIP)
2021-05-01 23:50:23.925 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:24.930 [INFO] dmcss P0000000707 T0000000000000000766  [CSS]: CSS set cmd CONFIG VIP, dest_ep CSS1 seqno = 1, cmd_seq = 3
2021-05-01 23:50:24.938 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, MASTER_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_MASTER_CONFIG_VIP)
2021-05-01 23:50:24.939 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:25.951 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_MASTER_CONFIG_VIP) to (PROCESS_EP_CRASH, EP_CONFIG_VIP)
2021-05-01 23:50:25.951 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:26.966 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd CONFIG VIP, dest_ep DSC1 seqno = 1, cmd_seq = 59
2021-05-01 23:50:26.975 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, EP_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_EP_CONFIG_VIP)
2021-05-01 23:50:26.975 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:27.980 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:27.984 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[CONFIG VIP] process over!
2021-05-01 23:50:27.991 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_EP_CONFIG_VIP) to (OPEN, STARTUP)
2021-05-01 23:50:27.991 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[255], n_ok_ep[1]

通过上面的记录可以看到,在我们将实例DSC0强行关闭后,dmcss进程立即检测到,并将该节点标记为故障节点,从正常开启状态切换为故障状态,选举DSC1实例为控制节点,并重新配置VIP,大致过程就是这样。
当我们再次启动DSC0实例后,通过DISQL查看实例状态:
在这里插入图片描述
可以看到,新加入的实例是普通节点,而原控制节点没有变化。查看故障实例启动后的日志,作以分析:

2021-05-01 23:57:25.354 [INFO] dmcss P0000000707 T0000000000000000766  css detect DB [DSC0] startup2
2021-05-01 23:57:25.354 [INFO] dmcss P0000000707 T0000000000000000766  css set DB [DSC0] guid [329285185]
2021-05-01 23:57:25.355 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC0[0] as recover EP
2021-05-01 23:57:25.364 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (OPEN, STARTUP) to (PROCESS_RECOVER, SUSPEND_WORKER)
2021-05-01 23:57:25.365 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[1]
2021-05-01 23:57:25.392 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SUSPEND_WORKER) to (PROCESS_RECOVER, WAIT_SUSPEND_WORKER)
2021-05-01 23:57:25.393 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[1]
2021-05-01 23:57:25.394 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd START NOTIFY, dest_ep DSC0 seqno = 0, cmd_seq = 64
2021-05-01 23:57:26.403 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd SUSPEND EP WORKER THREAD, dest_ep DSC1 seqno = 1, cmd_seq = 65
2021-05-01 23:57:30.434 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Suspend ep worker thread is over!
2021-05-01 23:57:30.487 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd DCR_LOAD, dest_ep DSC0 seqno = 0, cmd_seq = 66
2021-05-01 23:57:30.490 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd DCR_LOAD, dest_ep DSC1 seqno = 1, cmd_seq = 67
2021-05-01 23:57:30.498 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_SUSPEND_WORKER) to (PROCESS_RECOVER, CSS_SUB_STATUS_WAIT_DCR_LOAD)
2021-05-01 23:57:30.499 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:31.509 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Error ep add is over!
2021-05-01 23:57:31.510 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd ERROR EP ADD, dest_ep DSC0 seqno = 0, cmd_seq = 69
2021-05-01 23:57:31.514 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd ERROR EP ADD, dest_ep DSC1 seqno = 1, cmd_seq = 70
2021-05-01 23:57:31.524 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, CSS_SUB_STATUS_WAIT_DCR_LOAD) to (PROCESS_RECOVER, SUB_WAIT_ERROR_EP_ADD)
2021-05-01 23:57:31.524 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:32.554 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Error ep add is over!
2021-05-01 23:57:32.556 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP RECV, dest_ep DSC1 seqno = 1, cmd_seq = 72
2021-05-01 23:57:32.565 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SUB_WAIT_ERROR_EP_ADD) to (PROCESS_RECOVER, WAIT_EP_RECOVER)
2021-05-01 23:57:32.565 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:33.574 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Recover ep is over!
2021-05-01 23:57:33.580 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_RECOVER) to (PROCESS_RECOVER, EP_CONFIG_VIP)
2021-05-01 23:57:33.581 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:33.584 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd CONFIG VIP, dest_ep DSC1 seqno = 1, cmd_seq = 75
2021-05-01 23:57:33.592 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, EP_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_EP_CONFIG_VIP)
2021-05-01 23:57:33.593 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.641 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:57:34.648 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_CONFIG_VIP) to (PROCESS_RECOVER, MASTER_CONFIG_VIP)
2021-05-01 23:57:34.648 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.651 [INFO] dmcss P0000000707 T0000000000000000766  [CSS]: CSS set cmd CONFIG VIP, dest_ep CSS1 seqno = 1, cmd_seq = 4
2021-05-01 23:57:34.659 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, MASTER_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_MASTER_CONFIG_VIP)
2021-05-01 23:57:34.659 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.667 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_MASTER_CONFIG_VIP) to (PROCESS_RECOVER, SLAVE_CONFIG_VIP)
2021-05-01 23:57:34.667 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.672 [ERROR] dmcss P0000000707 T0000000000000000766  [CSS]: css_vip_config(enp0s80, 192.168.56.121, 255.255.255.0, UP) failed
2021-05-01 23:57:34.677 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SLAVE_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_SLAVE_CONFIG_VIP)
2021-05-01 23:57:34.677 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.680 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP START, dest_ep DSC0 seqno = 0, cmd_seq = 81
2021-05-01 23:57:34.688 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_SLAVE_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_STARTUP)
2021-05-01 23:57:34.689 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:35.700 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP START2, dest_ep DSC0 seqno = 0, cmd_seq = 83
2021-05-01 23:57:35.709 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_STARTUP) to (PROCESS_RECOVER, AFTER_REDO)
2021-05-01 23:57:35.710 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:37.723 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP OPEN, dest_ep DSC0 seqno = 0, cmd_seq = 85
2021-05-01 23:57:37.731 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, AFTER_REDO) to (PROCESS_RECOVER, WAIT_EP_OPEN)
2021-05-01 23:57:37.732 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:38.741 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC0 seqno = 0, cmd_seq = 0
2021-05-01 23:57:38.746 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd RESUME EP WORKER THREAD, dest_ep DSC1 seqno = 1, cmd_seq = 87
2021-05-01 23:57:38.756 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_OPEN) to (PROCESS_RECOVER, WAIT_RESUME_WORKER)
2021-05-01 23:57:38.756 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:39.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Resume ep worker thread is over!
2021-05-01 23:57:39.768 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:57:39.772 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP REAL OPEN, dest_ep DSC0 seqno = 0, cmd_seq = 89
2021-05-01 23:57:39.782 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_RESUME_WORKER) to (PROCESS_RECOVER, WAIT_REAL_OPEN)
2021-05-01 23:57:39.783 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:40.793 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC0 seqno = 0, cmd_seq = 0
2021-05-01 23:57:40.808 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_REAL_OPEN) to (OPEN, STARTUP)
2021-05-01 23:57:40.809 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[255], n_ok_ep[2]

通过上面的日志可以看到,实例DSC0启动后即被CSS进程检测到,首先为DSC0设置了 guid,并设该节点为故障恢复节点,并启动恢复过程,CSS set cmd分别为 START NOTIFY,SUSPEND EP WORKER THREAD,DCR_LOAD,ERROR EP ADD,通过这一连串的步骤,故障节点加入到集群中来,然后CSS set cmd EP RECV,进程实例恢复,配置VIP,再经过CSS set cmd EP START,CSS set cmd EP OPEN,CSS set cmd EP REAL OPEN,原故障实例转为普通节点,可对外提供服务。
故障处理小结:
DMCSS 控制节点检测到实例故障后,首先向故障实例的Voting disk 区域写入 Kill 命令(所有实例一旦发现 Kill 命令,无条件自杀),避免故障实例仍然处于活动状态,引 发脑裂,然后启动故障处理流程,不同类型实例的故障处理流程存在一些差异。

DMCSS 控制节点故障处理流程

  1. 活动节点重新选举 DMCSS 控制节点
  2. 新的 DMCSS 控制节点通知出现 DMCSS 故障节点对应的 dmasmsvr、dmserver 强制退出

DMASMSVR 实例故障处理流程

  1. 挂起工作线程
  2. 更新 DCR 的节点故障节点信息
  3. 通知故障节点对应 dmserver 强制退出
  4. dmasmsvr 进行故障恢复
  5. 恢复工作线程

dmserver 实例故障处理流程

  1. 更新 DCR 故障节点信息
  2. 重新选取一个控制节点
  3. 通知 dmserver 控制节点启动故障处理流程(参考 DMDSC 故障处理)
  4. 等待 dmserver 故障处理结束
    节点重加入
    如果检测到故障节点恢复,DMCSS 会通知控制节点启动节点重加入流程。

数据库实例重加入

  1. 挂起工作线程
  2. 修改节点的状态
  3. 执行恢复操作
  4. 重新进入 STARTUP 状态,准备启动
  5. OPEN 重加入的节点
  6. 重启工作线程
  7. 执行 OPEN 数据库实例的操作

DMASM 实例重加入

  1. 挂起工作线程
  2. 修改节点的状态
  3. 执行恢复操作
  4. 重新进入 STARTUP 状态,准备启动
  5. OPEN 重加入的节点
  6. 重启工作线程

这篇关于DM8环境DSC集群故障模拟及日志分析的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/945110

相关文章

Springboot中分析SQL性能的两种方式详解

《Springboot中分析SQL性能的两种方式详解》文章介绍了SQL性能分析的两种方式:MyBatis-Plus性能分析插件和p6spy框架,MyBatis-Plus插件配置简单,适用于开发和测试环... 目录SQL性能分析的两种方式:功能介绍实现方式:实现步骤:SQL性能分析的两种方式:功能介绍记录

最长公共子序列问题的深度分析与Java实现方式

《最长公共子序列问题的深度分析与Java实现方式》本文详细介绍了最长公共子序列(LCS)问题,包括其概念、暴力解法、动态规划解法,并提供了Java代码实现,暴力解法虽然简单,但在大数据处理中效率较低,... 目录最长公共子序列问题概述问题理解与示例分析暴力解法思路与示例代码动态规划解法DP 表的构建与意义动

Spring Boot整合log4j2日志配置的详细教程

《SpringBoot整合log4j2日志配置的详细教程》:本文主要介绍SpringBoot项目中整合Log4j2日志框架的步骤和配置,包括常用日志框架的比较、配置参数介绍、Log4j2配置详解... 目录前言一、常用日志框架二、配置参数介绍1. 日志级别2. 输出形式3. 日志格式3.1 PatternL

在Mysql环境下对数据进行增删改查的操作方法

《在Mysql环境下对数据进行增删改查的操作方法》本文介绍了在MySQL环境下对数据进行增删改查的基本操作,包括插入数据、修改数据、删除数据、数据查询(基本查询、连接查询、聚合函数查询、子查询)等,并... 目录一、插入数据:二、修改数据:三、删除数据:1、delete from 表名;2、truncate

C#使用DeepSeek API实现自然语言处理,文本分类和情感分析

《C#使用DeepSeekAPI实现自然语言处理,文本分类和情感分析》在C#中使用DeepSeekAPI可以实现多种功能,例如自然语言处理、文本分类、情感分析等,本文主要为大家介绍了具体实现步骤,... 目录准备工作文本生成文本分类问答系统代码生成翻译功能文本摘要文本校对图像描述生成总结在C#中使用Deep

开启mysql的binlog日志步骤详解

《开启mysql的binlog日志步骤详解》:本文主要介绍MySQL5.7版本中二进制日志(bin_log)的配置和使用,文中通过图文及代码介绍的非常详细,需要的朋友可以参考下... 目录1.查看是否开启bin_log2.数据库会把日志放进logs目录中3.查看log日志总结 mysql版本5.71.查看

VScode连接远程Linux服务器环境配置图文教程

《VScode连接远程Linux服务器环境配置图文教程》:本文主要介绍如何安装和配置VSCode,包括安装步骤、环境配置(如汉化包、远程SSH连接)、语言包安装(如C/C++插件)等,文中给出了详... 目录一、安装vscode二、环境配置1.中文汉化包2.安装remote-ssh,用于远程连接2.1安装2

Redis主从/哨兵机制原理分析

《Redis主从/哨兵机制原理分析》本文介绍了Redis的主从复制和哨兵机制,主从复制实现了数据的热备份和负载均衡,而哨兵机制可以监控Redis集群,实现自动故障转移,哨兵机制通过监控、下线、选举和故... 目录一、主从复制1.1 什么是主从复制1.2 主从复制的作用1.3 主从复制原理1.3.1 全量复制

Java中的Opencv简介与开发环境部署方法

《Java中的Opencv简介与开发环境部署方法》OpenCV是一个开源的计算机视觉和图像处理库,提供了丰富的图像处理算法和工具,它支持多种图像处理和计算机视觉算法,可以用于物体识别与跟踪、图像分割与... 目录1.Opencv简介Opencv的应用2.Java使用OpenCV进行图像操作opencv安装j

C++中实现调试日志输出

《C++中实现调试日志输出》在C++编程中,调试日志对于定位问题和优化代码至关重要,本文将介绍几种常用的调试日志输出方法,并教你如何在日志中添加时间戳,希望对大家有所帮助... 目录1. 使用 #ifdef _DEBUG 宏2. 加入时间戳:精确到毫秒3.Windows 和 MFC 中的调试日志方法MFC