DM8环境DSC集群故障模拟及日志分析

2024-04-29 04:32

本文主要是介绍DM8环境DSC集群故障模拟及日志分析,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

上一节中讲到了DSC集群的服务管理和备份还原,这节对DSC集群的故障处理过程进行探讨。
首先,看一下实例环境中的数据库实例情况:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME
---------- ---- ------------- --------------- ---------SVR_VERSION                DB_VERSION         -------------------------- -------------------START_TIME                                                                                          ----------------------------------------------------------------------------------------------------STATUS$ MODE$  OGUID       DSC_SEQNO   DSC_ROLE    ------- ------ ----------- ----------- ------------
1          DSC0 DSC0          1               dcs0DM Database Server x64 V8  DB Version: 0x7000a2021-05-01 23:05:12OPEN    NORMAL 0           0           Control nodeused time: 76.219(ms). Execute id is 4.

在名为DSC0的实例中,该实例状态正常,目前为集群控制节点,查一下另外的实例情况:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION                DB_VERSION         
---------- ---- ------------- --------------- --------- -------------------------- -------------------START_TIME                                                                                           STATUS$---------------------------------------------------------------------------------------------------- -------MODE$  OGUID       DSC_SEQNO   DSC_ROLE   ------ ----------- ----------- -----------
1          DSC1 DSC1          2               dcs1      DM Database Server x64 V8  DB Version: 0x7000a2021-05-01 23:04:54                                                                                  OPENNORMAL 0           1           Normal nodeused time: 148.055(ms). Execute id is 1.

名为DSC1的实例状态也是正常的,目前为普通节点,下面模拟故障,通过系统KILL命令将实例进程强杀,确认实例进程已经不存在了,过程如下图所示:
在这里插入图片描述
在DISQL中进一步确认实例状态,此时DSC0中的DISQL已失去连接,在DSC1中的DISQL中查询实例,显示其状态已经切换为控制节点,如下:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION               
---------- ---- ------------- --------------- --------- --------------------------DB_VERSION         -------------------START_TIME                                                                                          ----------------------------------------------------------------------------------------------------STATUS$ MODE$  OGUID       DSC_SEQNO   DSC_ROLE    ------- ------ ----------- ----------- ------------
1          DSC1 DSC1          2               dcs1      DM Database Server x64 V8 DB Version: 0x7000a2021-05-01 23:04:54OPEN    NORMAL 0           1           Control node
used time: 1.152(ms). Execute id is 2.

分析日志文件中的详细过程,查看集群日志文件中的记录
cat dm_CSS0_202105.log

2021-05-01 23:50:16.096 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Detected EP DSC0[0] break in PROCESS_OPEN
2021-05-01 23:50:16.099 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC0[0] as break EP
2021-05-01 23:50:16.105 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (OPEN, STARTUP) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP1)
2021-05-01 23:50:16.105 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[0], break ep[0], recover ep[255], n_ok_ep[2]
2021-05-01 23:50:17.738 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC1[1] as Control node
2021-05-01 23:50:17.740 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP_CRASH_STEP1, dest_ep DSC1 seqno = 1, cmd_seq = 49
2021-05-01 23:50:17.748 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP1) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP1)
2021-05-01 23:50:17.748 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:18.754 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:18.759 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[EP_CRASH_STEP1] process over!
2021-05-01 23:50:18.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP1) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP2)
2021-05-01 23:50:18.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:19.770 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP_CRASH_STEP2, dest_ep DSC1 seqno = 1, cmd_seq = 52
2021-05-01 23:50:19.778 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP2) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP2)
2021-05-01 23:50:19.778 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:21.815 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:21.817 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[EP_CRASH_STEP2] process over!
2021-05-01 23:50:21.848 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP2) to (PROCESS_EP_CRASH, SLAVE_CONFIG_VIP)
2021-05-01 23:50:21.848 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:22.858 [ERROR] dmcss P0000000707 T0000000000000000766  [CSS]: css_vip_config(enp0s80, 192.168.56.121, 255.255.255.0, DOWN) failed
2021-05-01 23:50:22.865 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, SLAVE_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_SLAVE_CONFIG_VIP)
2021-05-01 23:50:22.866 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:23.925 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_SLAVE_CONFIG_VIP) to (PROCESS_EP_CRASH, MASTER_CONFIG_VIP)
2021-05-01 23:50:23.925 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:24.930 [INFO] dmcss P0000000707 T0000000000000000766  [CSS]: CSS set cmd CONFIG VIP, dest_ep CSS1 seqno = 1, cmd_seq = 3
2021-05-01 23:50:24.938 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, MASTER_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_MASTER_CONFIG_VIP)
2021-05-01 23:50:24.939 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:25.951 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_MASTER_CONFIG_VIP) to (PROCESS_EP_CRASH, EP_CONFIG_VIP)
2021-05-01 23:50:25.951 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:26.966 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd CONFIG VIP, dest_ep DSC1 seqno = 1, cmd_seq = 59
2021-05-01 23:50:26.975 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, EP_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_EP_CONFIG_VIP)
2021-05-01 23:50:26.975 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:27.980 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:27.984 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[CONFIG VIP] process over!
2021-05-01 23:50:27.991 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_EP_CONFIG_VIP) to (OPEN, STARTUP)
2021-05-01 23:50:27.991 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[255], n_ok_ep[1]

通过上面的记录可以看到,在我们将实例DSC0强行关闭后,dmcss进程立即检测到,并将该节点标记为故障节点,从正常开启状态切换为故障状态,选举DSC1实例为控制节点,并重新配置VIP,大致过程就是这样。
当我们再次启动DSC0实例后,通过DISQL查看实例状态:
在这里插入图片描述
可以看到,新加入的实例是普通节点,而原控制节点没有变化。查看故障实例启动后的日志,作以分析:

2021-05-01 23:57:25.354 [INFO] dmcss P0000000707 T0000000000000000766  css detect DB [DSC0] startup2
2021-05-01 23:57:25.354 [INFO] dmcss P0000000707 T0000000000000000766  css set DB [DSC0] guid [329285185]
2021-05-01 23:57:25.355 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC0[0] as recover EP
2021-05-01 23:57:25.364 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (OPEN, STARTUP) to (PROCESS_RECOVER, SUSPEND_WORKER)
2021-05-01 23:57:25.365 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[1]
2021-05-01 23:57:25.392 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SUSPEND_WORKER) to (PROCESS_RECOVER, WAIT_SUSPEND_WORKER)
2021-05-01 23:57:25.393 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[1]
2021-05-01 23:57:25.394 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd START NOTIFY, dest_ep DSC0 seqno = 0, cmd_seq = 64
2021-05-01 23:57:26.403 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd SUSPEND EP WORKER THREAD, dest_ep DSC1 seqno = 1, cmd_seq = 65
2021-05-01 23:57:30.434 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Suspend ep worker thread is over!
2021-05-01 23:57:30.487 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd DCR_LOAD, dest_ep DSC0 seqno = 0, cmd_seq = 66
2021-05-01 23:57:30.490 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd DCR_LOAD, dest_ep DSC1 seqno = 1, cmd_seq = 67
2021-05-01 23:57:30.498 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_SUSPEND_WORKER) to (PROCESS_RECOVER, CSS_SUB_STATUS_WAIT_DCR_LOAD)
2021-05-01 23:57:30.499 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:31.509 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Error ep add is over!
2021-05-01 23:57:31.510 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd ERROR EP ADD, dest_ep DSC0 seqno = 0, cmd_seq = 69
2021-05-01 23:57:31.514 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd ERROR EP ADD, dest_ep DSC1 seqno = 1, cmd_seq = 70
2021-05-01 23:57:31.524 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, CSS_SUB_STATUS_WAIT_DCR_LOAD) to (PROCESS_RECOVER, SUB_WAIT_ERROR_EP_ADD)
2021-05-01 23:57:31.524 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:32.554 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Error ep add is over!
2021-05-01 23:57:32.556 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP RECV, dest_ep DSC1 seqno = 1, cmd_seq = 72
2021-05-01 23:57:32.565 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SUB_WAIT_ERROR_EP_ADD) to (PROCESS_RECOVER, WAIT_EP_RECOVER)
2021-05-01 23:57:32.565 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:33.574 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Recover ep is over!
2021-05-01 23:57:33.580 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_RECOVER) to (PROCESS_RECOVER, EP_CONFIG_VIP)
2021-05-01 23:57:33.581 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:33.584 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd CONFIG VIP, dest_ep DSC1 seqno = 1, cmd_seq = 75
2021-05-01 23:57:33.592 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, EP_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_EP_CONFIG_VIP)
2021-05-01 23:57:33.593 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.641 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:57:34.648 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_CONFIG_VIP) to (PROCESS_RECOVER, MASTER_CONFIG_VIP)
2021-05-01 23:57:34.648 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.651 [INFO] dmcss P0000000707 T0000000000000000766  [CSS]: CSS set cmd CONFIG VIP, dest_ep CSS1 seqno = 1, cmd_seq = 4
2021-05-01 23:57:34.659 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, MASTER_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_MASTER_CONFIG_VIP)
2021-05-01 23:57:34.659 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.667 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_MASTER_CONFIG_VIP) to (PROCESS_RECOVER, SLAVE_CONFIG_VIP)
2021-05-01 23:57:34.667 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.672 [ERROR] dmcss P0000000707 T0000000000000000766  [CSS]: css_vip_config(enp0s80, 192.168.56.121, 255.255.255.0, UP) failed
2021-05-01 23:57:34.677 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SLAVE_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_SLAVE_CONFIG_VIP)
2021-05-01 23:57:34.677 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.680 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP START, dest_ep DSC0 seqno = 0, cmd_seq = 81
2021-05-01 23:57:34.688 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_SLAVE_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_STARTUP)
2021-05-01 23:57:34.689 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:35.700 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP START2, dest_ep DSC0 seqno = 0, cmd_seq = 83
2021-05-01 23:57:35.709 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_STARTUP) to (PROCESS_RECOVER, AFTER_REDO)
2021-05-01 23:57:35.710 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:37.723 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP OPEN, dest_ep DSC0 seqno = 0, cmd_seq = 85
2021-05-01 23:57:37.731 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, AFTER_REDO) to (PROCESS_RECOVER, WAIT_EP_OPEN)
2021-05-01 23:57:37.732 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:38.741 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC0 seqno = 0, cmd_seq = 0
2021-05-01 23:57:38.746 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd RESUME EP WORKER THREAD, dest_ep DSC1 seqno = 1, cmd_seq = 87
2021-05-01 23:57:38.756 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_OPEN) to (PROCESS_RECOVER, WAIT_RESUME_WORKER)
2021-05-01 23:57:38.756 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:39.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Resume ep worker thread is over!
2021-05-01 23:57:39.768 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:57:39.772 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP REAL OPEN, dest_ep DSC0 seqno = 0, cmd_seq = 89
2021-05-01 23:57:39.782 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_RESUME_WORKER) to (PROCESS_RECOVER, WAIT_REAL_OPEN)
2021-05-01 23:57:39.783 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:40.793 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC0 seqno = 0, cmd_seq = 0
2021-05-01 23:57:40.808 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_REAL_OPEN) to (OPEN, STARTUP)
2021-05-01 23:57:40.809 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[255], n_ok_ep[2]

通过上面的日志可以看到,实例DSC0启动后即被CSS进程检测到,首先为DSC0设置了 guid,并设该节点为故障恢复节点,并启动恢复过程,CSS set cmd分别为 START NOTIFY,SUSPEND EP WORKER THREAD,DCR_LOAD,ERROR EP ADD,通过这一连串的步骤,故障节点加入到集群中来,然后CSS set cmd EP RECV,进程实例恢复,配置VIP,再经过CSS set cmd EP START,CSS set cmd EP OPEN,CSS set cmd EP REAL OPEN,原故障实例转为普通节点,可对外提供服务。
故障处理小结:
DMCSS 控制节点检测到实例故障后,首先向故障实例的Voting disk 区域写入 Kill 命令(所有实例一旦发现 Kill 命令,无条件自杀),避免故障实例仍然处于活动状态,引 发脑裂,然后启动故障处理流程,不同类型实例的故障处理流程存在一些差异。

DMCSS 控制节点故障处理流程

  1. 活动节点重新选举 DMCSS 控制节点
  2. 新的 DMCSS 控制节点通知出现 DMCSS 故障节点对应的 dmasmsvr、dmserver 强制退出

DMASMSVR 实例故障处理流程

  1. 挂起工作线程
  2. 更新 DCR 的节点故障节点信息
  3. 通知故障节点对应 dmserver 强制退出
  4. dmasmsvr 进行故障恢复
  5. 恢复工作线程

dmserver 实例故障处理流程

  1. 更新 DCR 故障节点信息
  2. 重新选取一个控制节点
  3. 通知 dmserver 控制节点启动故障处理流程(参考 DMDSC 故障处理)
  4. 等待 dmserver 故障处理结束
    节点重加入
    如果检测到故障节点恢复,DMCSS 会通知控制节点启动节点重加入流程。

数据库实例重加入

  1. 挂起工作线程
  2. 修改节点的状态
  3. 执行恢复操作
  4. 重新进入 STARTUP 状态,准备启动
  5. OPEN 重加入的节点
  6. 重启工作线程
  7. 执行 OPEN 数据库实例的操作

DMASM 实例重加入

  1. 挂起工作线程
  2. 修改节点的状态
  3. 执行恢复操作
  4. 重新进入 STARTUP 状态,准备启动
  5. OPEN 重加入的节点
  6. 重启工作线程

这篇关于DM8环境DSC集群故障模拟及日志分析的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/945110

相关文章

通过Docker容器部署Python环境的全流程

《通过Docker容器部署Python环境的全流程》在现代化开发流程中,Docker因其轻量化、环境隔离和跨平台一致性的特性,已成为部署Python应用的标准工具,本文将详细演示如何通过Docker容... 目录引言一、docker与python的协同优势二、核心步骤详解三、进阶配置技巧四、生产环境最佳实践

SpringBoot日志级别与日志分组详解

《SpringBoot日志级别与日志分组详解》文章介绍了日志级别(ALL至OFF)及其作用,说明SpringBoot默认日志级别为INFO,可通过application.properties调整全局或... 目录日志级别1、级别内容2、调整日志级别调整默认日志级别调整指定类的日志级别项目开发过程中,利用日志

SpringBoot 多环境开发实战(从配置、管理与控制)

《SpringBoot多环境开发实战(从配置、管理与控制)》本文详解SpringBoot多环境配置,涵盖单文件YAML、多文件模式、MavenProfile分组及激活策略,通过优先级控制灵活切换环境... 目录一、多环境开发基础(单文件 YAML 版)(一)配置原理与优势(二)实操示例二、多环境开发多文件版

使用docker搭建嵌入式Linux开发环境

《使用docker搭建嵌入式Linux开发环境》本文主要介绍了使用docker搭建嵌入式Linux开发环境,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面... 目录1、前言2、安装docker3、编写容器管理脚本4、创建容器1、前言在日常开发全志、rk等不同

深度剖析SpringBoot日志性能提升的原因与解决

《深度剖析SpringBoot日志性能提升的原因与解决》日志记录本该是辅助工具,却为何成了性能瓶颈,SpringBoot如何用代码彻底破解日志导致的高延迟问题,感兴趣的小伙伴可以跟随小编一起学习一下... 目录前言第一章:日志性能陷阱的底层原理1.1 日志级别的“双刃剑”效应1.2 同步日志的“吞吐量杀手”

Redis中哨兵机制和集群的区别及说明

《Redis中哨兵机制和集群的区别及说明》Redis哨兵通过主从复制实现高可用,适用于中小规模数据;集群采用分布式分片,支持动态扩展,适合大规模数据,哨兵管理简单但扩展性弱,集群性能更强但架构复杂,根... 目录一、架构设计与节点角色1. 哨兵机制(Sentinel)2. 集群(Cluster)二、数据分片

java -jar example.jar 产生的日志输出到指定文件的方法

《java-jarexample.jar产生的日志输出到指定文件的方法》这篇文章给大家介绍java-jarexample.jar产生的日志输出到指定文件的方法,本文给大家介绍的非常详细,对大家的... 目录怎么让 Java -jar example.jar 产生的日志输出到指定文件一、方法1:使用重定向1、

c++日志库log4cplus快速入门小结

《c++日志库log4cplus快速入门小结》文章浏览阅读1.1w次,点赞9次,收藏44次。本文介绍Log4cplus,一种适用于C++的线程安全日志记录API,提供灵活的日志管理和配置控制。文章涵盖... 目录简介日志等级配置文件使用关于初始化使用示例总结参考资料简介log4j 用于Java,log4c

Android 缓存日志Logcat导出与分析最佳实践

《Android缓存日志Logcat导出与分析最佳实践》本文全面介绍AndroidLogcat缓存日志的导出与分析方法,涵盖按进程、缓冲区类型及日志级别过滤,自动化工具使用,常见问题解决方案和最佳实... 目录android 缓存日志(Logcat)导出与分析全攻略为什么要导出缓存日志?按需过滤导出1. 按

nginx配置错误日志的实现步骤

《nginx配置错误日志的实现步骤》配置nginx代理过程中,如果出现错误,需要看日志,可以把nginx日志配置出来,以便快速定位日志问题,下面就来介绍一下nginx配置错误日志的实现步骤,感兴趣的可... 目录前言nginx配置错误日志总结前言在配置nginx代理过程中,如果出现错误,需要看日志,可以把