本文主要是介绍记Solaris下一个rac 异常hang故障,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
故障现象
rac 某一节点hang住,另一节点也不可用,重启hang住节点恢复。该故障出现了多次,平均1月出现一次。
故障原因
查看cssd.log
2021-05-22 13:53:50.565: [GIPCXCPT][5] gipclibMalloc: failed to allocate 10376 bytes, cowork ffffffff7cae18e8, ret gipcretOutOfMemory (28)
2021-05-22 13:53:50.566: [GIPCXCPT][5] gipcmodNetworkAttrEndpUserData: failed to read osd id for endp 104f9c390 [00000000095fea12] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_hnyx-db1_)(GIPCID=00000000-00000000-1516))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_hnyx-db1_)(GIPCID=00000000-00000000-0))', numPend 0, numReady 0, numDone 1, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef 100b84550, ready 1, wobj 104f35490, sendp 104e50050flags 0x8060371e, usrFlags 0x14000 }
2021-05-22 13:53:50.566: [GIPCXCPT][5] gipcmodNetworkAttrEndpUserData: slos op : sgipcnDSAttrEndpUserData
2021-05-22 13:53:50.566: [GIPCXCPT][5] gipcmodNetworkAttrEndpUserData: slos dep : Operation not supported (48)
2021-05-22 13:53:50.566: [GIPCXCPT][5] gipcmodNetworkAttrEndpUserData: slos loc : getpeerucred
2021-05-22 13:53:50.566: [GIPCXCPT][5] gipcmodNetworkAttrEndpUserData: slos info: sid 0, failed to get creds
2021-05-22 13:53:50.585: [ CSSD][5]###################################
2021-05-22 13:53:50.585: [ CSSD][5]clssscExit: CSSD signal 11 in thread GMClientListener
2021-05-22 13:53:50.585: [ CSSD][5]###################################
2021-05-22 13:53:50.585: [ CSSD][5](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2021-05-22 13:53:50.586: [ CSSD][5]----- Call Stack Trace -----
2021-05-22 13:53:50.586: [ CSSD][5]calling call entry argument values in hex
2021-05-22 13:53:50.586: [ CSSD][5]location type point (? means dubious value)
2021-05-22 13:53:50.586: [ CSSD][5]-------------------- -------- -------------------- ----------------------------
2021-05-22 13:53:50.635: [ CSSD][5]mmap(offset=3137536, len=8192) failed with errno=11 for the file /export/home/grid/bin/ocssd.bin
2021-05-22 13:53:50.636: [ CSSD][5]mmap(offset=3137536, len=8192) failed with errno=11 for the file /export/home/grid/bin/ocssd.bin
2021-05-22 13:53:50.636: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.636: [ CSSD][5]mmap(offset=50946048, len=16384) failed with errno=11 for the file /export/home/grid/lib/libclntsh.so.11.1
2021-05-22 13:53:50.636: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.639: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.639: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.639: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.639: [ CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
注意:2021-05-22 13:53:50.565: [GIPCXCPT][5] gipclibMalloc: failed to allocate 10376 bytes, cowork ffffffff7cae18e8, ret gipcretOutOfMemory (28)
对比对比故障现象,查找mos最接近为Document 2113841.1,gipcd stack内存不足。
但是 Document 2113841.1是aix环境。该环境为solaris。决定死马当活马医。
解决办法
Document 2113841.1文档中该故障解决为,解除相关limits的限制,包括grid与root用户
查询到root下stack的值偏小(8192),不是无限制,建议对其进行修改
故障解决,未再出现。
学习原理,积累工具。孵化思路,下笔有道。
这篇关于记Solaris下一个rac 异常hang故障的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!