ORA-00494: enqueue [CF] held for too long (more than 900 seconds) cause instance crash

2023-10-17 02:18

本文主要是介绍ORA-00494: enqueue [CF] held for too long (more than 900 seconds) cause instance crash,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!


故障现象:alert 日志
Restarting dead background process MMON
Tue May 16 17:24:51 2017
Process 0x0x6b0f14da8 appears to be hung while dumping
Current time = 465593890, process death time = 465533027 interval = 60000
Attempting to kill process 0x0x6b0f14da8 with OS pid = 61258
OSD kill succeeded for process 0x6b0f14da8

  VERSION INFORMATION:
        TNS for Linux: Version 11.2.0.1.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.1.0 - Production
  Time: 16-5?? -2017 17:24:51
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

Tue May 16 17:24:52 2017
Errors in file /oracle/ora11g/diag/rdbms/lhdw/LHDW/trace/LHDW_m000_54933.trc:
ORA-12751: cpu time or run time policy violation
TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 110
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=172.17.0.52)(PORT=65434))
Restarting dead background process MMON
Tue May 16 17:24:51 2017
Errors in file /oracle/ora11g/diag/rdbms/lhdw/LHDW/trace/LHDW_arc0_16428.trc  (incident=258198):
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 4595'
Tue May 16 17:24:51 2017
Errors in file /oracle/ora11g/diag/rdbms/lhdw/LHDW/trace/LHDW_arc2_50083.trc  (incident=258214):
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 4595'
Incident details in: /oracle/ora11g/diag/rdbms/lhdw/LHDW/incident/incdir_258198/LHDW_arc0_16428_i258198.trc
Incident details in: /oracle/ora11g/diag/rdbms/lhdw/LHDW/incident/incdir_258214/LHDW_arc2_50083_i258214.trc
Tue May 16 17:24:51 2017
Thread 1 advanced to log sequence 86188 (LGWR switch)
  Current log# 10 seq# 86188 mem# 0: /oracle/ora11g/oradata/LHDW/redo10.log
  Current log# 10 seq# 86188 mem# 1: /oracle/ora11g/flash_recovery_area/LHDW/redo10a.log
Tue May 16 17:04:56 2017
LNS: Standby redo logfile selected for thread 1 sequence 86188 for destination LOG_ARCHIVE_DEST_2
Tue May 16 17:12:23 2017


***********************************************************************
Tue May 16 17:21:50 2017
AUD: Audit Commit Delay exceeded, written a copy to OS Audit Trail


Fatal NI connect error 12170.
Tue May 16 17:24:51 2017
Process 0x0x6b0f14da8 appears to be hung while dumping
Current time = 465593890, process death time = 465533027 interval = 60000
Attempting to kill process 0x0x6b0f14da8 with OS pid = 61258
OSD kill succeeded for process 0x6b0f14da8


Tue May 16 17:24:52 2017
Errors in file /oracle/ora11g/diag/rdbms/lhdw/LHDW/trace/LHDW_m000_54933.trc:
ORA-12751: cpu time or run time policy violation
TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505


Restarting dead background process MMON
Tue May 16 17:24:51 2017
Errors in file /oracle/ora11g/diag/rdbms/lhdw/LHDW/trace/LHDW_arc0_16428.trc  (incident=258198):
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 4595'
Tue May 16 17:24:51 2017
Errors in file /oracle/ora11g/diag/rdbms/lhdw/LHDW/trace/LHDW_arc2_50083.trc  (incident=258214):
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 4595'
Incident details in: /oracle/ora11g/diag/rdbms/lhdw/LHDW/incident/incdir_258198/LHDW_arc0_16428_i258198.trc
Incident details in: /oracle/ora11g/diag/rdbms/lhdw/LHDW/incident/incdir_258214/LHDW_arc2_50083_i258214.trc
Tue May 16 17:24:51 2017
Tue May 16 17:24:52 2017
Errors in file /oracle/ora11g/diag/rdbms/lhdw/LHDW/trace/LHDW_arc1_50394.trc  (incident=258205):
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 4595'
Errors in file /oracle/ora11g/diag/rdbms/lhdw/LHDW/trace/LHDW_arc3_16457.trc  (incident=258222):
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 4595'
Incident details in: /oracle/ora11g/diag/rdbms/lhdw/LHDW/incident/incdir_258222/LHDW_arc3_16457_i258222.trc
Incident details in: /oracle/ora11g/diag/rdbms/lhdw/LHDW/incident/incdir_258205/LHDW_arc1_50394_i258205.trc
Killing enqueue blocker (pid=4595) on resource CF-00000000-00000000 by (pid=16428)
Killing enqueue blocker (pid=4595) on resource CF-00000000-00000000 by (pid=50083) by killing session 586.1


Killing enqueue blocker (pid=4595) on resource CF-00000000-00000000 by (pid=16428)
 by killing session 586.1
 by terminating the process
Killing enqueue blocker (pid=4595) on resource CF-00000000-00000000 by (pid=50083)ARC0 (ospid: 16428): terminating the instance due to error 2103 by terminating the process
Tue May 16 17:25:04 2017
Instance terminated by ARC0, pid = 16428


[oracle@hadoop-21 trace]$ ls -trl *4595*
-rw-r----- 1 oracle oinstall   894 May 16 17:00 LHDW_ckpt_4595.trm
-rw-r----- 1 oracle oinstall 62781 May 16 17:00 LHDW_ckpt_4595.trc



##CAUSE
I/O contention or slowness leads to control file enqueue timeout.
One particular situation that can be seen is LGWR timeout while waiting for control file enqueue, and the blocker is CKPT :

From the AWR:
1) high "log file parallel write" and "control file sequential read" waits 
2) Very slow Tablespace I/O, Av Rd(ms) of 1000-4000 ms (when lower than 20 ms is acceptable)
3) very high %iowait : 98.57%.
4) confirmed IO peak during that time  本次故障原因


Please note: Remote archive destination is also a possible cause. Networking issues can also cause this type of issue when a remote archive destination is in use for a standby database. 


###SOLUTION
Since the problem is caused by I/O contention/slow, improving the I/O performance will avoid such issue.

It is an expected behavior that the control file enqueue requestor terminates itself with ORA-239 if the state machine failed to kill the blocker. In most cases, crashing the instance will prevent cluster-wide hang. This is the default setting.


There is a workaround to resolve the issue:
Disable kill blocker code via setting _kill_controlfile_enqueue_blocker=false.


 alter system set "_kill_controlfile_enqueue_blocker"=false;


_kill_controlfile_enqueue_blocker is NOT available in 10.2.0.1, it's available from 10.2.0.2.


[oracle@hadoop-21 trace]$ dmesg|grep oracle
dracut: inactive '/dev/lhdw_oracle/lvm_sdb1' [9.91 TiB] inherit
EXT4-fs (dm-3): Unaligned AIO/DIO on inode 49283886 by oracle; performance will be poor.
INFO: task oracle:4583 blocked for more than 120 seconds.
oracle        D 000000000000001d     0  4583      1 0x00000080
INFO: task oracle:4585 blocked for more than 120 seconds.
oracle        D 0000000000000001     0  4585      1 0x00000080---------------->>>>ckpt 进程挂起 120秒,无法对控制文件进行io操作,超时持有 cf enqueue 
INFO: task oracle:4587 blocked for more than 120 seconds.
oracle        D 000000000000000a     0  4587      1 0x00000080
INFO: task oracle:4589 blocked for more than 120 seconds.
oracle        D 0000000000000010     0  4589      1 0x00000080
INFO: task oracle:4591 blocked for more than 120 seconds.
oracle        D 0000000000000000     0  4591      1 0x00000080
INFO: task oracle:56454 blocked for more than 120 seconds.
oracle        D 0000000000000000     0 56454      1 0x00000080
INFO: task oracle:2323 blocked for more than 120 seconds.
oracle        D 0000000000000000     0  2323      1 0x00000080
INFO: task oracle:4835 blocked for more than 120 seconds.
oracle        D 0000000000000008     0  4835      1 0x00000080
EXT4-fs (dm-3): Unaligned AIO/DIO on inode 49283837 by oracle; performance will be poor.
EXT4-fs (dm-3): Unaligned AIO/DIO on inode 49283406 by oracle; performance will be poor.
EXT4-fs (dm-3): Unaligned AIO/DIO on inode 49283408 by oracle; performance will be poor.
EXT4-fs (dm-3): Unaligned AIO/DIO on inode 49283859 by oracle; performance will be poor.

[oracle@hadoop-21 trace]$ find / -inum 49283837 2 > /dev/null 



##_controlfile_enqueue_timeout
This parameter is used to set the timeout limit for CF enqueue. By default, it is 900 seconds.
This timeout can be reduced using the above parameter so that the process dont have to wait for 15 minutes to get the enqueue.


##_kill_controlfile_enqueue_blocker


_kill_controlfile_enqueue_blocker  FALSE   
  enable killing controlfile enqueue blocker on timeout 


This parameter enables killing controlfile enqueue blocker on timeout.


TRUE.   Default value. Enables this mechanism and kills blocker process in CF enqueue.
FALSE.  Disables this mechanism and no blocker process in CF enqueue will be killed.


This kill blocker interface / ORA-494 was introduced in 10.2.0.4. This new mechanism will kill *any* kind of blocking process, non-background or background.


The difference will be that if the enqueue holder is a non-background process, even if it is killed, the instance can function without it.
In case the holder is a background process, for example the LGWR, the kill of the holder leads to instance crash.



If you want to avoid the kill of the blocker (background or non-background process), you can set _kill_controlfile_enqueue_blocker=false.
This means that no type of blocker will be killed anymore although the resolution to this problem should focus on why the process is holding the enqueue for so long.
Also, you may prefer to only avoid killing background processes, since they are vital to the instance, and you may want to allow the killing of non-background blockers.


In order to prevent a background blocker from being killed, the following init.ora parameter can be set to 1 (default is 2).

_kill_enqueue_blocker=1 


#for 11g:
_kill_enqueue_blocker 2
    if greater than 0 enables killing enqueue blocker


_kill_enqueue_blocker = { 0 | 1 | 2 | 3 }
0. Disables this mechanism and no foreground or background blocker process in enqueue will be killed.
1. Enables this mechanism and only kills foreground blocker process in enqueue while background process is not affected.
2. Enables this mechanism and only kills background blocker process in enqueue.
3. Default value. Enables this mechanism and kills blocker processes in enqueue.


alter system set "_kill_enqueue_blocker"=1 scope=spfile;


请勿在rac 环境修改这个参数:因为:The reason why the blocker interface with ORA-494 is kept is because, in most cases, customers would prefer crashing the instance than having a cluster-wide hang.


With this parameter, if the enqueue holder is a background process, then it will not be killed, therefore the instance will not crash. If the enqueue holder is not a background process, the new 10.2.0.4 mechanism will still try to kill it.


可以杀掉任何oracle的后台进程


总结:

os 调整 ,调整 sysctl.conf 文件

 sysctl -w vm.dirty_ratio=10
 sysctl -w vm.dirty_background_ratio=5


db 调整

 alter system set "_kill_controlfile_enqueue_blocker"=false;

alter system set "_kill_enqueue_blocker"=1 scope=spfile;



这篇关于ORA-00494: enqueue [CF] held for too long (more than 900 seconds) cause instance crash的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/222247

相关文章

cf 164 C 费用流

给你n个任务,k个机器,n个任务的起始时间,持续时间,完成任务的获利 每个机器可以完成任何一项任务,但是同一时刻只能完成一项任务,一旦某台机器在完成某项任务时,直到任务结束,这台机器都不能去做其他任务 最后问你当获利最大时,应该安排那些机器工作,即输出方案 具体建图方法: 新建源汇S T‘ 对任务按照起始时间s按升序排序 拆点: u 向 u'连一条边 容量为 1 费用为 -c,

CF 508C

点击打开链接 import java.util.Arrays;import java.util.Scanner;public class Main {public static void main(String [] args){new Solve().run() ;} }class Solve{int bit[] = new int[608] ;int l

ora-01017 ora-02063 database link,oracle11.2g通过dblink连接oracle11.2g

错误图示: 问题解决 All database links, whether public or private, need username/password of the remote/target database. Public db links are accessible by all accounts on the local database, while private

OpenStack实例操作选项解释:启动和停止instance实例

关于启动和停止OpenStack实例 如果你想要启动和停止OpenStack实例时,有四种方法可以考虑。 管理员可以暂停、挂起、搁置、停止OpenStack 的计算实例。但是这些方法之间有什么不同之处? 目录 关于启动和停止OpenStack实例1.暂停和取消暂停实例2.挂起和恢复实例3.搁置(废弃)实例和取消废弃实例4.停止(删除)实例 1.暂停和取消暂停实例

ORA-25150:不允许对区参数执行ALTERING

在用PL/SQL工具修改表存储报错: 百度一下找到原因: 表空间使用本地管理,其中的表不能修改NEXT MAXEXTENTS和PCTINCREASE参数 使用数据自动管理的表空间,其中的表可以修改NEXT MAXEXTENTS和PCTINCREASE参数

ORA-01861:文字与格式字符串不匹配

select t.*, t.rowid from log_jk_dtl t; insert into log_jk_dtl (rq,zy,kssj,jssj,memo)  values (to_date(sysdate,'yyyy-mm-dd'),'插入供应商', to_char(sysdate,'hh24:mi:ss'),to_char(sysdate,'hh24:mi:ss'),'备注'

利用PL/SQL工具连接Oracle数据库的时候,报错:ORA-12638: 身份证明检索失败的解决办法

找到相对应的安装目录:比如:E:\oracle\product\10.2.0\client_1\NETWORK\ADMIN 在里面找到:SQLNET.AUTHENTICATION_SERVICES= (NTS) 将其更改为:SQLNET.AUTHENTICATION_SERVICES= (BEQ,NONE) 或者注释掉:#SQLNET.AUTHENTICATION_SERVICES= (N

【CF】C. Glass Carving(二分 + 树状数组 + 优先队列 + 数组计数)

这题简直蛋疼死。。。。。 A了一下午 #include<cstdio>#include<queue>#include<cstring>#include<algorithm>using namespace std;typedef long long LL;const int maxn = 200005;int h,w,n;int C1[maxn],C2[maxn];int

【CF】E. Anya and Cubes(双向DFS)

根据题意的话每次递归分3种情况 一共最多25个数,时间复杂度为3^25,太大了 我们可以分2次求解第一次求一半的结果,也就是25/2 = 12,记录结果 之后利用剩余的一半求结果 s-结果 = 之前记录过的结果 就可以 时间复杂度降低为 3 ^ (n/2+1) 题目链接:http://codeforces.com/contest/525/problem/E #include<set

【CF】D. Arthur and Walls(BFS + 贪心)

D题 解题思路就是每次检查2X2的方格里是否只有一个‘*’,如果有的话这个*就需要变成‘.’,利用BFS进行遍历,入队的要求是这个点为. 一开始将所有的'.'全部加入队列,如果碰到一个'*'变成'.'就入队,判断的时候从4个方向就行判断 题目链接:http://codeforces.com/contest/525/problem/D #include<cstdio>#include<