DM8环境DSC集群故障模拟及日志分析

2024-04-29 04:32

本文主要是介绍DM8环境DSC集群故障模拟及日志分析,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

上一节中讲到了DSC集群的服务管理和备份还原,这节对DSC集群的故障处理过程进行探讨。
首先,看一下实例环境中的数据库实例情况:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME
---------- ---- ------------- --------------- ---------SVR_VERSION                DB_VERSION         -------------------------- -------------------START_TIME                                                                                          ----------------------------------------------------------------------------------------------------STATUS$ MODE$  OGUID       DSC_SEQNO   DSC_ROLE    ------- ------ ----------- ----------- ------------
1          DSC0 DSC0          1               dcs0DM Database Server x64 V8  DB Version: 0x7000a2021-05-01 23:05:12OPEN    NORMAL 0           0           Control nodeused time: 76.219(ms). Execute id is 4.

在名为DSC0的实例中,该实例状态正常,目前为集群控制节点,查一下另外的实例情况:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION                DB_VERSION         
---------- ---- ------------- --------------- --------- -------------------------- -------------------START_TIME                                                                                           STATUS$---------------------------------------------------------------------------------------------------- -------MODE$  OGUID       DSC_SEQNO   DSC_ROLE   ------ ----------- ----------- -----------
1          DSC1 DSC1          2               dcs1      DM Database Server x64 V8  DB Version: 0x7000a2021-05-01 23:04:54                                                                                  OPENNORMAL 0           1           Normal nodeused time: 148.055(ms). Execute id is 1.

名为DSC1的实例状态也是正常的,目前为普通节点,下面模拟故障,通过系统KILL命令将实例进程强杀,确认实例进程已经不存在了,过程如下图所示:
在这里插入图片描述
在DISQL中进一步确认实例状态,此时DSC0中的DISQL已失去连接,在DSC1中的DISQL中查询实例,显示其状态已经切换为控制节点,如下:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION               
---------- ---- ------------- --------------- --------- --------------------------DB_VERSION         -------------------START_TIME                                                                                          ----------------------------------------------------------------------------------------------------STATUS$ MODE$  OGUID       DSC_SEQNO   DSC_ROLE    ------- ------ ----------- ----------- ------------
1          DSC1 DSC1          2               dcs1      DM Database Server x64 V8 DB Version: 0x7000a2021-05-01 23:04:54OPEN    NORMAL 0           1           Control node
used time: 1.152(ms). Execute id is 2.

分析日志文件中的详细过程,查看集群日志文件中的记录
cat dm_CSS0_202105.log

2021-05-01 23:50:16.096 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Detected EP DSC0[0] break in PROCESS_OPEN
2021-05-01 23:50:16.099 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC0[0] as break EP
2021-05-01 23:50:16.105 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (OPEN, STARTUP) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP1)
2021-05-01 23:50:16.105 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[0], break ep[0], recover ep[255], n_ok_ep[2]
2021-05-01 23:50:17.738 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC1[1] as Control node
2021-05-01 23:50:17.740 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP_CRASH_STEP1, dest_ep DSC1 seqno = 1, cmd_seq = 49
2021-05-01 23:50:17.748 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP1) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP1)
2021-05-01 23:50:17.748 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:18.754 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:18.759 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[EP_CRASH_STEP1] process over!
2021-05-01 23:50:18.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP1) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP2)
2021-05-01 23:50:18.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:19.770 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP_CRASH_STEP2, dest_ep DSC1 seqno = 1, cmd_seq = 52
2021-05-01 23:50:19.778 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP2) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP2)
2021-05-01 23:50:19.778 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:21.815 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:21.817 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[EP_CRASH_STEP2] process over!
2021-05-01 23:50:21.848 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP2) to (PROCESS_EP_CRASH, SLAVE_CONFIG_VIP)
2021-05-01 23:50:21.848 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:22.858 [ERROR] dmcss P0000000707 T0000000000000000766  [CSS]: css_vip_config(enp0s80, 192.168.56.121, 255.255.255.0, DOWN) failed
2021-05-01 23:50:22.865 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, SLAVE_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_SLAVE_CONFIG_VIP)
2021-05-01 23:50:22.866 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:23.925 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_SLAVE_CONFIG_VIP) to (PROCESS_EP_CRASH, MASTER_CONFIG_VIP)
2021-05-01 23:50:23.925 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:24.930 [INFO] dmcss P0000000707 T0000000000000000766  [CSS]: CSS set cmd CONFIG VIP, dest_ep CSS1 seqno = 1, cmd_seq = 3
2021-05-01 23:50:24.938 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, MASTER_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_MASTER_CONFIG_VIP)
2021-05-01 23:50:24.939 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:25.951 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_MASTER_CONFIG_VIP) to (PROCESS_EP_CRASH, EP_CONFIG_VIP)
2021-05-01 23:50:25.951 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:26.966 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd CONFIG VIP, dest_ep DSC1 seqno = 1, cmd_seq = 59
2021-05-01 23:50:26.975 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, EP_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_EP_CONFIG_VIP)
2021-05-01 23:50:26.975 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:27.980 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:27.984 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[CONFIG VIP] process over!
2021-05-01 23:50:27.991 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_EP_CONFIG_VIP) to (OPEN, STARTUP)
2021-05-01 23:50:27.991 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[255], n_ok_ep[1]

通过上面的记录可以看到,在我们将实例DSC0强行关闭后,dmcss进程立即检测到,并将该节点标记为故障节点,从正常开启状态切换为故障状态,选举DSC1实例为控制节点,并重新配置VIP,大致过程就是这样。
当我们再次启动DSC0实例后,通过DISQL查看实例状态:
在这里插入图片描述
可以看到,新加入的实例是普通节点,而原控制节点没有变化。查看故障实例启动后的日志,作以分析:

2021-05-01 23:57:25.354 [INFO] dmcss P0000000707 T0000000000000000766  css detect DB [DSC0] startup2
2021-05-01 23:57:25.354 [INFO] dmcss P0000000707 T0000000000000000766  css set DB [DSC0] guid [329285185]
2021-05-01 23:57:25.355 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC0[0] as recover EP
2021-05-01 23:57:25.364 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (OPEN, STARTUP) to (PROCESS_RECOVER, SUSPEND_WORKER)
2021-05-01 23:57:25.365 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[1]
2021-05-01 23:57:25.392 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SUSPEND_WORKER) to (PROCESS_RECOVER, WAIT_SUSPEND_WORKER)
2021-05-01 23:57:25.393 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[1]
2021-05-01 23:57:25.394 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd START NOTIFY, dest_ep DSC0 seqno = 0, cmd_seq = 64
2021-05-01 23:57:26.403 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd SUSPEND EP WORKER THREAD, dest_ep DSC1 seqno = 1, cmd_seq = 65
2021-05-01 23:57:30.434 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Suspend ep worker thread is over!
2021-05-01 23:57:30.487 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd DCR_LOAD, dest_ep DSC0 seqno = 0, cmd_seq = 66
2021-05-01 23:57:30.490 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd DCR_LOAD, dest_ep DSC1 seqno = 1, cmd_seq = 67
2021-05-01 23:57:30.498 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_SUSPEND_WORKER) to (PROCESS_RECOVER, CSS_SUB_STATUS_WAIT_DCR_LOAD)
2021-05-01 23:57:30.499 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:31.509 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Error ep add is over!
2021-05-01 23:57:31.510 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd ERROR EP ADD, dest_ep DSC0 seqno = 0, cmd_seq = 69
2021-05-01 23:57:31.514 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd ERROR EP ADD, dest_ep DSC1 seqno = 1, cmd_seq = 70
2021-05-01 23:57:31.524 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, CSS_SUB_STATUS_WAIT_DCR_LOAD) to (PROCESS_RECOVER, SUB_WAIT_ERROR_EP_ADD)
2021-05-01 23:57:31.524 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:32.554 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Error ep add is over!
2021-05-01 23:57:32.556 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP RECV, dest_ep DSC1 seqno = 1, cmd_seq = 72
2021-05-01 23:57:32.565 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SUB_WAIT_ERROR_EP_ADD) to (PROCESS_RECOVER, WAIT_EP_RECOVER)
2021-05-01 23:57:32.565 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:33.574 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Recover ep is over!
2021-05-01 23:57:33.580 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_RECOVER) to (PROCESS_RECOVER, EP_CONFIG_VIP)
2021-05-01 23:57:33.581 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:33.584 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd CONFIG VIP, dest_ep DSC1 seqno = 1, cmd_seq = 75
2021-05-01 23:57:33.592 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, EP_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_EP_CONFIG_VIP)
2021-05-01 23:57:33.593 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.641 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:57:34.648 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_CONFIG_VIP) to (PROCESS_RECOVER, MASTER_CONFIG_VIP)
2021-05-01 23:57:34.648 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.651 [INFO] dmcss P0000000707 T0000000000000000766  [CSS]: CSS set cmd CONFIG VIP, dest_ep CSS1 seqno = 1, cmd_seq = 4
2021-05-01 23:57:34.659 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, MASTER_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_MASTER_CONFIG_VIP)
2021-05-01 23:57:34.659 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.667 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_MASTER_CONFIG_VIP) to (PROCESS_RECOVER, SLAVE_CONFIG_VIP)
2021-05-01 23:57:34.667 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.672 [ERROR] dmcss P0000000707 T0000000000000000766  [CSS]: css_vip_config(enp0s80, 192.168.56.121, 255.255.255.0, UP) failed
2021-05-01 23:57:34.677 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SLAVE_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_SLAVE_CONFIG_VIP)
2021-05-01 23:57:34.677 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.680 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP START, dest_ep DSC0 seqno = 0, cmd_seq = 81
2021-05-01 23:57:34.688 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_SLAVE_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_STARTUP)
2021-05-01 23:57:34.689 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:35.700 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP START2, dest_ep DSC0 seqno = 0, cmd_seq = 83
2021-05-01 23:57:35.709 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_STARTUP) to (PROCESS_RECOVER, AFTER_REDO)
2021-05-01 23:57:35.710 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:37.723 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP OPEN, dest_ep DSC0 seqno = 0, cmd_seq = 85
2021-05-01 23:57:37.731 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, AFTER_REDO) to (PROCESS_RECOVER, WAIT_EP_OPEN)
2021-05-01 23:57:37.732 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:38.741 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC0 seqno = 0, cmd_seq = 0
2021-05-01 23:57:38.746 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd RESUME EP WORKER THREAD, dest_ep DSC1 seqno = 1, cmd_seq = 87
2021-05-01 23:57:38.756 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_OPEN) to (PROCESS_RECOVER, WAIT_RESUME_WORKER)
2021-05-01 23:57:38.756 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:39.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Resume ep worker thread is over!
2021-05-01 23:57:39.768 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:57:39.772 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP REAL OPEN, dest_ep DSC0 seqno = 0, cmd_seq = 89
2021-05-01 23:57:39.782 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_RESUME_WORKER) to (PROCESS_RECOVER, WAIT_REAL_OPEN)
2021-05-01 23:57:39.783 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:40.793 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC0 seqno = 0, cmd_seq = 0
2021-05-01 23:57:40.808 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_REAL_OPEN) to (OPEN, STARTUP)
2021-05-01 23:57:40.809 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[255], n_ok_ep[2]

通过上面的日志可以看到,实例DSC0启动后即被CSS进程检测到,首先为DSC0设置了 guid,并设该节点为故障恢复节点,并启动恢复过程,CSS set cmd分别为 START NOTIFY,SUSPEND EP WORKER THREAD,DCR_LOAD,ERROR EP ADD,通过这一连串的步骤,故障节点加入到集群中来,然后CSS set cmd EP RECV,进程实例恢复,配置VIP,再经过CSS set cmd EP START,CSS set cmd EP OPEN,CSS set cmd EP REAL OPEN,原故障实例转为普通节点,可对外提供服务。
故障处理小结:
DMCSS 控制节点检测到实例故障后,首先向故障实例的Voting disk 区域写入 Kill 命令(所有实例一旦发现 Kill 命令,无条件自杀),避免故障实例仍然处于活动状态,引 发脑裂,然后启动故障处理流程,不同类型实例的故障处理流程存在一些差异。

DMCSS 控制节点故障处理流程

  1. 活动节点重新选举 DMCSS 控制节点
  2. 新的 DMCSS 控制节点通知出现 DMCSS 故障节点对应的 dmasmsvr、dmserver 强制退出

DMASMSVR 实例故障处理流程

  1. 挂起工作线程
  2. 更新 DCR 的节点故障节点信息
  3. 通知故障节点对应 dmserver 强制退出
  4. dmasmsvr 进行故障恢复
  5. 恢复工作线程

dmserver 实例故障处理流程

  1. 更新 DCR 故障节点信息
  2. 重新选取一个控制节点
  3. 通知 dmserver 控制节点启动故障处理流程(参考 DMDSC 故障处理)
  4. 等待 dmserver 故障处理结束
    节点重加入
    如果检测到故障节点恢复,DMCSS 会通知控制节点启动节点重加入流程。

数据库实例重加入

  1. 挂起工作线程
  2. 修改节点的状态
  3. 执行恢复操作
  4. 重新进入 STARTUP 状态,准备启动
  5. OPEN 重加入的节点
  6. 重启工作线程
  7. 执行 OPEN 数据库实例的操作

DMASM 实例重加入

  1. 挂起工作线程
  2. 修改节点的状态
  3. 执行恢复操作
  4. 重新进入 STARTUP 状态,准备启动
  5. OPEN 重加入的节点
  6. 重启工作线程

这篇关于DM8环境DSC集群故障模拟及日志分析的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/945110

相关文章

Go标准库常见错误分析和解决办法

《Go标准库常见错误分析和解决办法》Go语言的标准库为开发者提供了丰富且高效的工具,涵盖了从网络编程到文件操作等各个方面,然而,标准库虽好,使用不当却可能适得其反,正所谓工欲善其事,必先利其器,本文将... 目录1. 使用了错误的time.Duration2. time.After导致的内存泄漏3. jsO

SpringBoot日志配置SLF4J和Logback的方法实现

《SpringBoot日志配置SLF4J和Logback的方法实现》日志记录是不可或缺的一部分,本文主要介绍了SpringBoot日志配置SLF4J和Logback的方法实现,文中通过示例代码介绍的非... 目录一、前言二、案例一:初识日志三、案例二:使用Lombok输出日志四、案例三:配置Logback一

Spring事务中@Transactional注解不生效的原因分析与解决

《Spring事务中@Transactional注解不生效的原因分析与解决》在Spring框架中,@Transactional注解是管理数据库事务的核心方式,本文将深入分析事务自调用的底层原理,解释为... 目录1. 引言2. 事务自调用问题重现2.1 示例代码2.2 问题现象3. 为什么事务自调用会失效3

golang 日志log与logrus示例详解

《golang日志log与logrus示例详解》log是Go语言标准库中一个简单的日志库,本文给大家介绍golang日志log与logrus示例详解,感兴趣的朋友一起看看吧... 目录一、Go 标准库 log 详解1. 功能特点2. 常用函数3. 示例代码4. 优势和局限二、第三方库 logrus 详解1.

找不到Anaconda prompt终端的原因分析及解决方案

《找不到Anacondaprompt终端的原因分析及解决方案》因为anaconda还没有初始化,在安装anaconda的过程中,有一行是否要添加anaconda到菜单目录中,由于没有勾选,导致没有菜... 目录问题原因问http://www.chinasem.cn题解决安装了 Anaconda 却找不到 An

Spring定时任务只执行一次的原因分析与解决方案

《Spring定时任务只执行一次的原因分析与解决方案》在使用Spring的@Scheduled定时任务时,你是否遇到过任务只执行一次,后续不再触发的情况?这种情况可能由多种原因导致,如未启用调度、线程... 目录1. 问题背景2. Spring定时任务的基本用法3. 为什么定时任务只执行一次?3.1 未启用

如何自定义Nginx JSON日志格式配置

《如何自定义NginxJSON日志格式配置》Nginx作为最流行的Web服务器之一,其灵活的日志配置能力允许我们根据需求定制日志格式,本文将详细介绍如何配置Nginx以JSON格式记录访问日志,这种... 目录前言为什么选择jsON格式日志?配置步骤详解1. 安装Nginx服务2. 自定义JSON日志格式各

SpringBoot项目使用MDC给日志增加唯一标识的实现步骤

《SpringBoot项目使用MDC给日志增加唯一标识的实现步骤》本文介绍了如何在SpringBoot项目中使用MDC(MappedDiagnosticContext)为日志增加唯一标识,以便于日... 目录【Java】SpringBoot项目使用MDC给日志增加唯一标识,方便日志追踪1.日志效果2.实现步

SQL Server清除日志文件ERRORLOG和删除tempdb.mdf

《SQLServer清除日志文件ERRORLOG和删除tempdb.mdf》数据库再使用一段时间后,日志文件会增大,特别是在磁盘容量不足的情况下,更是需要缩减,以下为缩减方法:如果可以停止SQLSe... 目录缩减 ERRORLOG 文件(停止服务后)停止 SQL Server 服务:找到错误日志文件:删除

C++ 各种map特点对比分析

《C++各种map特点对比分析》文章比较了C++中不同类型的map(如std::map,std::unordered_map,std::multimap,std::unordered_multima... 目录特点比较C++ 示例代码 ​​​​​​代码解释特点比较1. std::map底层实现:基于红黑