DM8环境DSC集群故障模拟及日志分析

2024-04-29 04:32

本文主要是介绍DM8环境DSC集群故障模拟及日志分析,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

上一节中讲到了DSC集群的服务管理和备份还原,这节对DSC集群的故障处理过程进行探讨。
首先,看一下实例环境中的数据库实例情况:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME
---------- ---- ------------- --------------- ---------SVR_VERSION                DB_VERSION         -------------------------- -------------------START_TIME                                                                                          ----------------------------------------------------------------------------------------------------STATUS$ MODE$  OGUID       DSC_SEQNO   DSC_ROLE    ------- ------ ----------- ----------- ------------
1          DSC0 DSC0          1               dcs0DM Database Server x64 V8  DB Version: 0x7000a2021-05-01 23:05:12OPEN    NORMAL 0           0           Control nodeused time: 76.219(ms). Execute id is 4.

在名为DSC0的实例中,该实例状态正常,目前为集群控制节点,查一下另外的实例情况:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION                DB_VERSION         
---------- ---- ------------- --------------- --------- -------------------------- -------------------START_TIME                                                                                           STATUS$---------------------------------------------------------------------------------------------------- -------MODE$  OGUID       DSC_SEQNO   DSC_ROLE   ------ ----------- ----------- -----------
1          DSC1 DSC1          2               dcs1      DM Database Server x64 V8  DB Version: 0x7000a2021-05-01 23:04:54                                                                                  OPENNORMAL 0           1           Normal nodeused time: 148.055(ms). Execute id is 1.

名为DSC1的实例状态也是正常的,目前为普通节点,下面模拟故障,通过系统KILL命令将实例进程强杀,确认实例进程已经不存在了,过程如下图所示:
在这里插入图片描述
在DISQL中进一步确认实例状态,此时DSC0中的DISQL已失去连接,在DSC1中的DISQL中查询实例,显示其状态已经切换为控制节点,如下:

SQL> select * from v$instance;LINEID     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION               
---------- ---- ------------- --------------- --------- --------------------------DB_VERSION         -------------------START_TIME                                                                                          ----------------------------------------------------------------------------------------------------STATUS$ MODE$  OGUID       DSC_SEQNO   DSC_ROLE    ------- ------ ----------- ----------- ------------
1          DSC1 DSC1          2               dcs1      DM Database Server x64 V8 DB Version: 0x7000a2021-05-01 23:04:54OPEN    NORMAL 0           1           Control node
used time: 1.152(ms). Execute id is 2.

分析日志文件中的详细过程,查看集群日志文件中的记录
cat dm_CSS0_202105.log

2021-05-01 23:50:16.096 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Detected EP DSC0[0] break in PROCESS_OPEN
2021-05-01 23:50:16.099 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC0[0] as break EP
2021-05-01 23:50:16.105 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (OPEN, STARTUP) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP1)
2021-05-01 23:50:16.105 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[0], break ep[0], recover ep[255], n_ok_ep[2]
2021-05-01 23:50:17.738 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC1[1] as Control node
2021-05-01 23:50:17.740 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP_CRASH_STEP1, dest_ep DSC1 seqno = 1, cmd_seq = 49
2021-05-01 23:50:17.748 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP1) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP1)
2021-05-01 23:50:17.748 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:18.754 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:18.759 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[EP_CRASH_STEP1] process over!
2021-05-01 23:50:18.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP1) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP2)
2021-05-01 23:50:18.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:19.770 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP_CRASH_STEP2, dest_ep DSC1 seqno = 1, cmd_seq = 52
2021-05-01 23:50:19.778 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_STEP2) to (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP2)
2021-05-01 23:50:19.778 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:21.815 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:21.817 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[EP_CRASH_STEP2] process over!
2021-05-01 23:50:21.848 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, CSS_SUB_STATUS_EP_CRASH_WAIT_STEP2) to (PROCESS_EP_CRASH, SLAVE_CONFIG_VIP)
2021-05-01 23:50:21.848 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:22.858 [ERROR] dmcss P0000000707 T0000000000000000766  [CSS]: css_vip_config(enp0s80, 192.168.56.121, 255.255.255.0, DOWN) failed
2021-05-01 23:50:22.865 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, SLAVE_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_SLAVE_CONFIG_VIP)
2021-05-01 23:50:22.866 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:23.925 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_SLAVE_CONFIG_VIP) to (PROCESS_EP_CRASH, MASTER_CONFIG_VIP)
2021-05-01 23:50:23.925 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:24.930 [INFO] dmcss P0000000707 T0000000000000000766  [CSS]: CSS set cmd CONFIG VIP, dest_ep CSS1 seqno = 1, cmd_seq = 3
2021-05-01 23:50:24.938 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, MASTER_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_MASTER_CONFIG_VIP)
2021-05-01 23:50:24.939 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:25.951 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_MASTER_CONFIG_VIP) to (PROCESS_EP_CRASH, EP_CONFIG_VIP)
2021-05-01 23:50:25.951 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:26.966 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd CONFIG VIP, dest_ep DSC1 seqno = 1, cmd_seq = 59
2021-05-01 23:50:26.975 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, EP_CONFIG_VIP) to (PROCESS_EP_CRASH, WAIT_EP_CONFIG_VIP)
2021-05-01 23:50:26.975 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[0], recover ep[255], n_ok_ep[1]
2021-05-01 23:50:27.980 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:50:27.984 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: cmd[CONFIG VIP] process over!
2021-05-01 23:50:27.991 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_EP_CRASH, WAIT_EP_CONFIG_VIP) to (OPEN, STARTUP)
2021-05-01 23:50:27.991 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[255], n_ok_ep[1]

通过上面的记录可以看到,在我们将实例DSC0强行关闭后,dmcss进程立即检测到,并将该节点标记为故障节点,从正常开启状态切换为故障状态,选举DSC1实例为控制节点,并重新配置VIP,大致过程就是这样。
当我们再次启动DSC0实例后,通过DISQL查看实例状态:
在这里插入图片描述
可以看到,新加入的实例是普通节点,而原控制节点没有变化。查看故障实例启动后的日志,作以分析:

2021-05-01 23:57:25.354 [INFO] dmcss P0000000707 T0000000000000000766  css detect DB [DSC0] startup2
2021-05-01 23:57:25.354 [INFO] dmcss P0000000707 T0000000000000000766  css set DB [DSC0] guid [329285185]
2021-05-01 23:57:25.355 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Set EP DSC0[0] as recover EP
2021-05-01 23:57:25.364 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (OPEN, STARTUP) to (PROCESS_RECOVER, SUSPEND_WORKER)
2021-05-01 23:57:25.365 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[1]
2021-05-01 23:57:25.392 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SUSPEND_WORKER) to (PROCESS_RECOVER, WAIT_SUSPEND_WORKER)
2021-05-01 23:57:25.393 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[1]
2021-05-01 23:57:25.394 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd START NOTIFY, dest_ep DSC0 seqno = 0, cmd_seq = 64
2021-05-01 23:57:26.403 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd SUSPEND EP WORKER THREAD, dest_ep DSC1 seqno = 1, cmd_seq = 65
2021-05-01 23:57:30.434 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Suspend ep worker thread is over!
2021-05-01 23:57:30.487 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd DCR_LOAD, dest_ep DSC0 seqno = 0, cmd_seq = 66
2021-05-01 23:57:30.490 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd DCR_LOAD, dest_ep DSC1 seqno = 1, cmd_seq = 67
2021-05-01 23:57:30.498 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_SUSPEND_WORKER) to (PROCESS_RECOVER, CSS_SUB_STATUS_WAIT_DCR_LOAD)
2021-05-01 23:57:30.499 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:31.509 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Error ep add is over!
2021-05-01 23:57:31.510 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd ERROR EP ADD, dest_ep DSC0 seqno = 0, cmd_seq = 69
2021-05-01 23:57:31.514 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd ERROR EP ADD, dest_ep DSC1 seqno = 1, cmd_seq = 70
2021-05-01 23:57:31.524 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, CSS_SUB_STATUS_WAIT_DCR_LOAD) to (PROCESS_RECOVER, SUB_WAIT_ERROR_EP_ADD)
2021-05-01 23:57:31.524 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:32.554 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Error ep add is over!
2021-05-01 23:57:32.556 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP RECV, dest_ep DSC1 seqno = 1, cmd_seq = 72
2021-05-01 23:57:32.565 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SUB_WAIT_ERROR_EP_ADD) to (PROCESS_RECOVER, WAIT_EP_RECOVER)
2021-05-01 23:57:32.565 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:33.574 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Recover ep is over!
2021-05-01 23:57:33.580 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_RECOVER) to (PROCESS_RECOVER, EP_CONFIG_VIP)
2021-05-01 23:57:33.581 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:33.584 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd CONFIG VIP, dest_ep DSC1 seqno = 1, cmd_seq = 75
2021-05-01 23:57:33.592 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, EP_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_EP_CONFIG_VIP)
2021-05-01 23:57:33.593 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.641 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:57:34.648 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_CONFIG_VIP) to (PROCESS_RECOVER, MASTER_CONFIG_VIP)
2021-05-01 23:57:34.648 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.651 [INFO] dmcss P0000000707 T0000000000000000766  [CSS]: CSS set cmd CONFIG VIP, dest_ep CSS1 seqno = 1, cmd_seq = 4
2021-05-01 23:57:34.659 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, MASTER_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_MASTER_CONFIG_VIP)
2021-05-01 23:57:34.659 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.667 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_MASTER_CONFIG_VIP) to (PROCESS_RECOVER, SLAVE_CONFIG_VIP)
2021-05-01 23:57:34.667 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.672 [ERROR] dmcss P0000000707 T0000000000000000766  [CSS]: css_vip_config(enp0s80, 192.168.56.121, 255.255.255.0, UP) failed
2021-05-01 23:57:34.677 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, SLAVE_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_SLAVE_CONFIG_VIP)
2021-05-01 23:57:34.677 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:34.680 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP START, dest_ep DSC0 seqno = 0, cmd_seq = 81
2021-05-01 23:57:34.688 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_SLAVE_CONFIG_VIP) to (PROCESS_RECOVER, WAIT_STARTUP)
2021-05-01 23:57:34.689 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:35.700 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP START2, dest_ep DSC0 seqno = 0, cmd_seq = 83
2021-05-01 23:57:35.709 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_STARTUP) to (PROCESS_RECOVER, AFTER_REDO)
2021-05-01 23:57:35.710 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:37.723 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP OPEN, dest_ep DSC0 seqno = 0, cmd_seq = 85
2021-05-01 23:57:37.731 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, AFTER_REDO) to (PROCESS_RECOVER, WAIT_EP_OPEN)
2021-05-01 23:57:37.732 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:38.741 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC0 seqno = 0, cmd_seq = 0
2021-05-01 23:57:38.746 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd RESUME EP WORKER THREAD, dest_ep DSC1 seqno = 1, cmd_seq = 87
2021-05-01 23:57:38.756 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_EP_OPEN) to (PROCESS_RECOVER, WAIT_RESUME_WORKER)
2021-05-01 23:57:38.756 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:39.765 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Resume ep worker thread is over!
2021-05-01 23:57:39.768 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC1 seqno = 1, cmd_seq = 0
2021-05-01 23:57:39.772 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd EP REAL OPEN, dest_ep DSC0 seqno = 0, cmd_seq = 89
2021-05-01 23:57:39.782 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_RESUME_WORKER) to (PROCESS_RECOVER, WAIT_REAL_OPEN)
2021-05-01 23:57:39.783 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[0], n_ok_ep[2]
2021-05-01 23:57:40.793 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: CSS set cmd NONE, dest_ep DSC0 seqno = 0, cmd_seq = 0
2021-05-01 23:57:40.808 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: status change from (PROCESS_RECOVER, WAIT_REAL_OPEN) to (OPEN, STARTUP)
2021-05-01 23:57:40.809 [INFO] dmcss P0000000707 T0000000000000000766  [DB]: Control Node[1], break ep[255], recover ep[255], n_ok_ep[2]

通过上面的日志可以看到,实例DSC0启动后即被CSS进程检测到,首先为DSC0设置了 guid,并设该节点为故障恢复节点,并启动恢复过程,CSS set cmd分别为 START NOTIFY,SUSPEND EP WORKER THREAD,DCR_LOAD,ERROR EP ADD,通过这一连串的步骤,故障节点加入到集群中来,然后CSS set cmd EP RECV,进程实例恢复,配置VIP,再经过CSS set cmd EP START,CSS set cmd EP OPEN,CSS set cmd EP REAL OPEN,原故障实例转为普通节点,可对外提供服务。
故障处理小结:
DMCSS 控制节点检测到实例故障后,首先向故障实例的Voting disk 区域写入 Kill 命令(所有实例一旦发现 Kill 命令,无条件自杀),避免故障实例仍然处于活动状态,引 发脑裂,然后启动故障处理流程,不同类型实例的故障处理流程存在一些差异。

DMCSS 控制节点故障处理流程

  1. 活动节点重新选举 DMCSS 控制节点
  2. 新的 DMCSS 控制节点通知出现 DMCSS 故障节点对应的 dmasmsvr、dmserver 强制退出

DMASMSVR 实例故障处理流程

  1. 挂起工作线程
  2. 更新 DCR 的节点故障节点信息
  3. 通知故障节点对应 dmserver 强制退出
  4. dmasmsvr 进行故障恢复
  5. 恢复工作线程

dmserver 实例故障处理流程

  1. 更新 DCR 故障节点信息
  2. 重新选取一个控制节点
  3. 通知 dmserver 控制节点启动故障处理流程(参考 DMDSC 故障处理)
  4. 等待 dmserver 故障处理结束
    节点重加入
    如果检测到故障节点恢复,DMCSS 会通知控制节点启动节点重加入流程。

数据库实例重加入

  1. 挂起工作线程
  2. 修改节点的状态
  3. 执行恢复操作
  4. 重新进入 STARTUP 状态,准备启动
  5. OPEN 重加入的节点
  6. 重启工作线程
  7. 执行 OPEN 数据库实例的操作

DMASM 实例重加入

  1. 挂起工作线程
  2. 修改节点的状态
  3. 执行恢复操作
  4. 重新进入 STARTUP 状态,准备启动
  5. OPEN 重加入的节点
  6. 重启工作线程

这篇关于DM8环境DSC集群故障模拟及日志分析的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/945110

相关文章

服务器集群同步时间手记

1.时间服务器配置(必须root用户) (1)检查ntp是否安装 [root@node1 桌面]# rpm -qa|grep ntpntp-4.2.6p5-10.el6.centos.x86_64fontpackages-filesystem-1.41-1.1.el6.noarchntpdate-4.2.6p5-10.el6.centos.x86_64 (2)修改ntp配置文件 [r

HDFS—集群扩容及缩容

白名单:表示在白名单的主机IP地址可以,用来存储数据。 配置白名单步骤如下: 1)在NameNode节点的/opt/module/hadoop-3.1.4/etc/hadoop目录下分别创建whitelist 和blacklist文件 (1)创建白名单 [lytfly@hadoop102 hadoop]$ vim whitelist 在whitelist中添加如下主机名称,假如集群正常工作的节

Hadoop集群数据均衡之磁盘间数据均衡

生产环境,由于硬盘空间不足,往往需要增加一块硬盘。刚加载的硬盘没有数据时,可以执行磁盘数据均衡命令。(Hadoop3.x新特性) plan后面带的节点的名字必须是已经存在的,并且是需要均衡的节点。 如果节点不存在,会报如下错误: 如果节点只有一个硬盘的话,不会创建均衡计划: (1)生成均衡计划 hdfs diskbalancer -plan hadoop102 (2)执行均衡计划 hd

性能分析之MySQL索引实战案例

文章目录 一、前言二、准备三、MySQL索引优化四、MySQL 索引知识回顾五、总结 一、前言 在上一讲性能工具之 JProfiler 简单登录案例分析实战中已经发现SQL没有建立索引问题,本文将一起从代码层去分析为什么没有建立索引? 开源ERP项目地址:https://gitee.com/jishenghua/JSH_ERP 二、准备 打开IDEA找到登录请求资源路径位置

阿里开源语音识别SenseVoiceWindows环境部署

SenseVoice介绍 SenseVoice 专注于高精度多语言语音识别、情感辨识和音频事件检测多语言识别: 采用超过 40 万小时数据训练,支持超过 50 种语言,识别效果上优于 Whisper 模型。富文本识别:具备优秀的情感识别,能够在测试数据上达到和超过目前最佳情感识别模型的效果。支持声音事件检测能力,支持音乐、掌声、笑声、哭声、咳嗽、喷嚏等多种常见人机交互事件进行检测。高效推

搭建Kafka+zookeeper集群调度

前言 硬件环境 172.18.0.5        kafkazk1        Kafka+zookeeper                Kafka Broker集群 172.18.0.6        kafkazk2        Kafka+zookeeper                Kafka Broker集群 172.18.0.7        kafkazk3

安装nodejs环境

本文介绍了如何通过nvm(NodeVersionManager)安装和管理Node.js及npm的不同版本,包括下载安装脚本、检查版本并安装特定版本的方法。 1、安装nvm curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash 2、查看nvm版本 nvm --version 3、安装

【IPV6从入门到起飞】5-1 IPV6+Home Assistant(搭建基本环境)

【IPV6从入门到起飞】5-1 IPV6+Home Assistant #搭建基本环境 1 背景2 docker下载 hass3 创建容器4 浏览器访问 hass5 手机APP远程访问hass6 更多玩法 1 背景 既然电脑可以IPV6入站,手机流量可以访问IPV6网络的服务,为什么不在电脑搭建Home Assistant(hass),来控制你的设备呢?@智能家居 @万物互联

高并发环境中保持幂等性

在高并发环境中保持幂等性是一项重要的挑战。幂等性指的是无论操作执行多少次,其效果都是相同的。确保操作的幂等性可以避免重复执行带来的副作用。以下是一些保持幂等性的常用方法: 唯一标识符: 请求唯一标识:在每次请求中引入唯一标识符(如 UUID 或者生成的唯一 ID),在处理请求时,系统可以检查这个标识符是否已经处理过,如果是,则忽略重复请求。幂等键(Idempotency Key):客户端在每次

SWAP作物生长模型安装教程、数据制备、敏感性分析、气候变化影响、R模型敏感性分析与贝叶斯优化、Fortran源代码分析、气候数据降尺度与变化影响分析

查看原文>>>全流程SWAP农业模型数据制备、敏感性分析及气候变化影响实践技术应用 SWAP模型是由荷兰瓦赫宁根大学开发的先进农作物模型,它综合考虑了土壤-水分-大气以及植被间的相互作用;是一种描述作物生长过程的一种机理性作物生长模型。它不但运用Richard方程,使其能够精确的模拟土壤中水分的运动,而且耦合了WOFOST作物模型使作物的生长描述更为科学。 本文让更多的科研人员和农业工作者