【故障处理】DG环境主库丢失归档情况下数据文件的恢复
1 BLOG文档结构图
2 前言部分
2.1 导读和注意事项
各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~:
① BBED的编译
② BBED修改文件头让其跳过归档从而可以ONLINE(重点)
③ OS命名格式转换为ASM的命名格式
④ DG环境中备库丢失数据文件的情况下的处理过程(重点)
⑤ 数据文件OFFLINE后应立即做一次RECOVER操作
⑥ BBED环境中kscnwrp的使用
⑦ 查询表空间的大小,表空间大小为空,数据文件大小为空的情况
Tips:
① 本文在itpub(http://blog.itpub.net/26736162)、博客园(http://www.cnblogs.com/lhrbest)和微信公众号(xiaomaimiaolhr)有同步更新。
② 文章中用到的所有代码,相关软件,相关资料请前往小麦苗的云盘下载(http://blog.itpub.net/26736162/viewspace-1624453/)。
③ 若网页文章代码格式有错乱,推荐使用360浏览器,也可以下载pdf格式的文档来查看,pdf文档下载地址:http://blog.itpub.net/26736162/viewspace-1624453/,另外itpub格式显示有问题,也可以去博客园地址阅读。
④ 本篇BLOG中命令的输出部分需要特别关注的地方我都用灰色背景和粉红色字体来表示,比如下边的例子中,thread 1的最大归档日志号为33,thread 2的最大归档日志号为43是需要特别关注的地方;而命令一般使用黄色背景和红色字体标注;对代码或代码输出部分的注释一般采用蓝色字体表示。
List of Archived Logs in backup set 11 Thrd Seq Low SCN Low Time Next SCN Next Time ---- ------- ---------- ------------------- ---------- --------- 1 32 1621589 2015-05-29 11:09:52 1625242 2015-05-29 11:15:48 1 33 1625242 2015-05-29 11:15:48 1625293 2015-05-29 11:15:58 2 42 1613951 2015-05-29 10:41:18 1625245 2015-05-29 11:15:49 2 43 1625245 2015-05-29 11:15:49 1625253 2015-05-29 11:15:53 [ZHLHRDB1:root]:/>lsvg -o T_XLHRD_APP1_vg rootvg [ZHLHRDB1:root]:/> 00:27:22 SQL> alter tablespace idxtbs read write; ====》2097152*512/1024/1024/1024=1G |
本文如有错误或不完善的地方请大家多多指正,ITPUB留言或QQ皆可,您的批评指正是我写作的最大动力。
2.2 相关参考文章连接
BBED |
|
【推荐】 【BBED】 SYSTEM文件头损坏的恢复(4) | http://blog.itpub.net/26736162/viewspace-2084329/ |
【推荐】 【BBED】 sys.bootstrap$ 对象的恢复 | http://blog.itpub.net/26736162/viewspace-2083621/ |
【推荐】 【BBED】丢失归档文件情况下的恢复 | http://blog.itpub.net/26736162/viewspace-2079337/ |
【推荐】 【BBED】编译及基本命令(1) | http://blog.itpub.net/26736162/viewspace-2075216/ |
【BBED】bbed常用命令 | http://blog.itpub.net/26736162/viewspace-2123465/ |
3 故障分析及解决过程
3.1 故障环境介绍
项目 | 源库 | DG库 |
db 类型 | RAC | RAC |
db version | 11.2.0.3.7 | 11.2.0.3.7 |
db 存储 | ASM | ASM |
OS版本及kernel版本 | AIX 64位 7.1.0.0 | AIX 64位 7.1.0.0 |
关系 | 主备库为RAC+RAC的物理DG环境 |
3.2 故障发生现象及报错信息
今天查询一套DG环境的表空间大小的时候,发现一个表空间的返回值为空,很奇怪,起初我以为是自己的脚本问题,可是这个脚本是自己写的,而且用了很长时间的了,还花了几分钟的时间又仔细审核了一下脚本,没发现有什么不对的地方。
查询表空间大小的脚本:
set pagesize 9999 line 9999 col TS_Name format a30 WITH WT1 AS (SELECT TS.TABLESPACE_NAME, DF.ALL_BYTES, DECODE(DF.TYPE, 'D', NVL(FS.FREESIZ, 0), 'T', DF.ALL_BYTES - NVL(FS.FREESIZ, 0)) FREESIZ, DF.MAXSIZ, TS.BLOCK_SIZE, TS.LOGGING, TS.FORCE_LOGGING, TS.CONTENTS, TS.EXTENT_MANAGEMENT, TS.SEGMENT_SPACE_MANAGEMENT, TS.RETENTION, TS.DEF_TAB_COMPRESSION, DF.TS_DF_COUNT, TS.BIGFILE, TS.STATUS FROM DBA_TABLESPACES TS, (SELECT 'D' TYPE, TABLESPACE_NAME, COUNT(*) TS_DF_COUNT, SUM(BYTES) ALL_BYTES, SUM(DECODE(MAXBYTES, 0, BYTES, MAXBYTES)) MAXSIZ FROM DBA_DATA_FILES D GROUP BY TABLESPACE_NAME UNION ALL SELECT 'T', TABLESPACE_NAME, COUNT(*) TS_DF_COUNT, SUM(BYTES) ALL_BYTES, SUM(DECODE(MAXBYTES, 0, BYTES, MAXBYTES)) FROM DBA_TEMP_FILES D GROUP BY TABLESPACE_NAME) DF, (SELECT TABLESPACE_NAME, SUM(BYTES) FREESIZ FROM DBA_FREE_SPACE GROUP BY TABLESPACE_NAME UNION ALL SELECT TABLESPACE_NAME, SUM(D.BLOCK_SIZE * A.BLOCKS) BYTES FROM GV$SORT_USAGE A, DBA_TABLESPACES D WHERE A.TABLESPACE = D.TABLESPACE_NAME GROUP BY TABLESPACE_NAME) FS WHERE TS.TABLESPACE_NAME = DF.TABLESPACE_NAME AND TS.TABLESPACE_NAME = FS.TABLESPACE_NAME(+)) SELECT (SELECT A.TS# FROM V$TABLESPACE A WHERE A.NAME = UPPER(T.TABLESPACE_NAME)) TS#, T.TABLESPACE_NAME TS_NAME, ROUND(T.ALL_BYTES / 1024 / 1024) TS_SIZE_M, ROUND(T.FREESIZ / 1024 / 1024) FREE_SIZE_M, ROUND((T.ALL_BYTES - T.FREESIZ) / 1024 / 1024) USED_SIZE_M, ROUND((T.ALL_BYTES - T.FREESIZ) * 100 / T.ALL_BYTES, 3) USED_PER, ROUND(MAXSIZ / 1024 / 1024 / 1024, 3) MAX_SIZE_G, ROUND(DECODE(MAXSIZ, 0, TO_NUMBER(NULL), (T.ALL_BYTES - FREESIZ)) * 100 / MAXSIZ, 3) USED_PER_MAX, ROUND(T.BLOCK_SIZE) BLOCK_SIZE, T.LOGGING, T.BIGFILE, T.STATUS, T.TS_DF_COUNT FROM WT1 T UNION ALL SELECT TO_NUMBER('') TS#, 'ALL TS:' TS_NAME, ROUND(SUM(T.ALL_BYTES) / 1024 / 1024, 3) TS_SIZE_M, ROUND(SUM(T.FREESIZ) / 1024 / 1024) FREE_SIZE_M, ROUND(SUM(T.ALL_BYTES - T.FREESIZ) / 1024 / 1024) USED_SIZE_M, ROUND(SUM(T.ALL_BYTES - T.FREESIZ) * 100 / SUM(T.ALL_BYTES), 3) USED_PER, ROUND(SUM(MAXSIZ) / 1024 / 1024 / 1024) MAX_SIZE, TO_NUMBER('') "USED,% of MAX Size", TO_NUMBER('') BLOCK_SIZE, '' LOGGING, MAX(T.BIGFILE), MAX(T.STATUS), TO_NUMBER('') TS_DF_COUNT FROM WT1 T ORDER BY TS#; |
结果如下图:
因为表空间是ONLINE的,若是OFFLINE的话,结果自然为空,由于只有一个数据文件,那就看看数据文件的状态:
SELECT * FROM v$datafile d WHERE d.FILE#=64;
果然数据文件是64,数据文件为OFFLINE状态,而且去备库查看的时候数据文件也是OFFLINE的。这里有一个LAST_TIME需要注意,日志为2015年4月21号,而现在都2016年9月21号了,看来是很久很久很久没有用这个数据文件了。好吧,很久没有写BLOG了,今天就以这个案例为主,说说其修复过程把。
3.2.1 健康检查报告
一、 运行
用自己的健康检查报告看一下能否发现这个问题呢?
跑完之后,生成的报告在当前目录,报告的目录大概如下所示:
巡检服务概要 | |||||
数据库总体概况 | 数据库基本信息 | 数据库大小 | 资源使用情况 | 组件和特性 | 库 |
参数文件 | 所有的初始化参数 | 关键的初始化参数 | 隐含参数 | spfile文件内容 | Statistics Level |
表空间情况 | 表空间状况信息 | 闪回空间使用情况 | 临时表空间使用情况 | Undo表空间使用情况 | 表空间扩展状况 |
数据文件状况 | 控制文件 | ||||
ASM磁盘监控 | ASM磁盘使用情况 | ASM磁盘组使用情况 | ASM磁盘组参数配置情况 | ASM实例 | |
JOB情况 | 作业运行状况 | 数据库job报错信息 |
巡检服务明细 | |||||
RMAN信息 | RMAN备份状况 | RMAN配置情况 | RMAN所有备份 | RMAN所有备份详情 | 控制文件备份 |
spfile文件备份 | RMAN归档文件备份 | 数据库闪回 | |||
归档信息 | 归档日志设置 | 归档日志生成情况 | 归档日志占用率 | 近7天日志切换频率分析 | 每天日志切换的量 |
日志组大小 | |||||
SGA信息 | SGA使用情况 | SGA配置信息 | SGA建议配置 | SGA动态组件 | PGA TARGET 建议配置 |
文件IO信息 | 文件IO分析 | 文件IO时间分析 | 全表扫描情况 | 排序情况 | |
SQL监控 | 逻辑读TOP10的SQL | 物理读TOP10的SQL | 执行时间TOP10的SQL | 执行次数TOP10的SQL | 解析次数TOP10的SQL |
版本TOP10的SQL语句 | 内存TOP10的SQL语句 | DISK_SORT严重的SQL | 垃圾SQL之RUNNING_11G | 垃圾SQL之RUNNING_10G | |
LAST快照中SQL情况 | LAST快照中执行时间最长SQL | 执行时间最长SQL | 执行时间最长的SQL报告 | ||
闪回归档 | 闪回归档配置 | 开启了闪回归档的表 | 闪回归档空间 | ||
DG库 | DG库配置情况 | DG库运行情况 | 主库DG进程 | 主库standby日志 | 备库日志应用情况 |
数据库安全 | |||||
数据库用户 | 数据库用户一览 | 拥有DBA角色的用户 | 拥有SYS角色的用户 | 角色概况 | 密码为系统默认值的用户 |
整个用户有多大 | 近一周登录错误的用户 | ||||
系统表空间用户 | SYSTEM为缺省表空间的用户 | SYSTEM为临时表空间的用户 | 系统表空间上的对象 | ||
数据库审计 | 审计参数配置 | 审计表情况 | DB中所有审计记录 |
数据库对象 | |||||
段情况 | 对象汇总 | 段的汇总 | 体积最大的10个段 | 扩展最多的10个段 | LOB段 |
不能扩展的对象 | 扩展超过1/2最大扩展度的对象 | Undo 段 | 表空间所有者 | ||
表情况 | 行链接或行迁移的表 | 超过10W行无主键的表 | 无数据有高水位的表 | ||
分区表情况 | 表大小超过10GB未建分区 | 分区最多的前10个对象 | 分区个数超过100个的表 | ||
无效对象 | 无效的对象 | 无效的普通索引 | 无效的分区索引 | 无效的触发器 | |
索引情况 | 索引个数超过5个的表 | 大表未建索引 | 组合索引与单列索引存在交叉 | 位图索引和函数索引 | 外键未建索引 |
大索引从未使用 | 索引列个数大于3 | 索引高度大于3 | 索引的统计信息过旧 | ||
并行度 | 表带有并行度 | 索引带有并行度 | |||
其他对象 | 告警日志 | 数据库目录 | 回收站情况 | 数据库链路(db_link) | 外部表 |
所有的触发器 | 序列cache小于20 | 物化视图 | type | 数据泵 |
数据库性能分析 | |||||
AWR | AWR统计 | AWR参数配置状况 | 数据库服务器主机的情况 | AWR视图中的load profile | 热块 |
最新的一次AWR报告 | |||||
ASH | ASH快照状况 | 最新的一次ASH报告 | |||
ADDM | 最新的一次ADDM | ||||
统计信息 | 统计信息是否自动收集 | 需收集统计信息的表 | 被收集统计信息的临时表 | ||
会话 | 会话概况 | 会话状态一览(当前) | 历史ACTIVE会话数 | 登录时间最长的10个会话 | 超过10小时无响应的会话 |
提交次数最多的会话 | CPU或等待最长的会话 | ||||
锁 | 查看LOCK锁情况 | 查看谁锁住了谁 | 游标使用情况 | 并行进程完成情况 | |
内存占用 | 查询共享内存占有率 | PGA占用最多的进程 | 命中率 | ||
其它 | 等待事件 | OLAP | Networking | Replication |
健康检查结果 | |||||
健康检查结果 | 健康检查结果 | 健康检查过程中脚本产生的错误 |
二、 概况
先看看数据库的概况:
三、 1级告警:数据文件OFFLINE
再看看,健康检查的结果:
有2个地方很重要,1个数据文件有OFFLINE的,第二个是序列的CACHE值小于20,并且已经有enq: SQ - contention等待事件的发生了,说明比较严重,应该修改其cache值。我们点击到相应的位置可以查看细节。
可以看到是64号文件是OFFLINE状态的。
四、 2级告警:序列问题
另外,我们看看报告中提到的序列等待问题,可以看到有6个序列的cache值设置有问题,已经导致了会话阻塞了,这部分的cache值强烈建议修改,修改语句在报告中也已经给出。
五、 2级告警:告警日志问题
告警日志问题不是很大,可以忽略。
六、 4级告警:无效对象
无效对象也可以修改一下,报告中提供了具体的脚本。
好了,报告不多看了,今天的主题是如何修复那个OFFLINE的数据问题,报告的脚本内容可以私聊我。
3.3 故障分析及解决过程
因为是DG环境,所以首先我们来恢复主库,然后再修复备库的文件问题。
SYS@oraLHRD1> select status from v$datafile d WHERE d.FILE#=64;
STATUS ------- OFFLINE
SYS@oraLHRD1> select file#,online_status,change#,ERROR from v$recover_file;
FILE# ONLINE_ CHANGE# ERROR ---------- ------- ---------- ----------------------------------------------------------------- 64 OFFLINE 1764555149
SYS@oraLHRD1> alter database datafile 64 online; alter database datafile 64 online * ERROR at line 1: ORA-01113: file 64 needs media recovery ORA-01110: data file 64: '+DATA1/oralhrs/datafile/tbs101.262.876578481'
SYS@oraLHRD1> recover datafile 64; ORA-00279: change 1764555149 generated at 03/27/2015 10:42:00 needed for thread 2 ORA-00289: suggestion : /arch/2_1128_868895513.arc ORA-00280: change 1764555149 for thread 2 is in sequence #1128
Specify log: {<RET>=suggested | filename | AUTO | CANCEL} AUTO ORA-00308: cannot open archived log '/arch/2_1128_868895513.arc' ORA-27037: unable to obtain file status IBM AIX RISC System/6000 Error: 2: No such file or directory Additional information: 3
ORA-00308: cannot open archived log '/arch/2_1128_868895513.arc' ORA-27037: unable to obtain file status IBM AIX RISC System/6000 Error: 2: No such file or directory Additional information: 3
SYS@oraLHRD1> ! ls /arch/2_1128_868895513.arc ls: 0653-341 The file /arch/2_1128_868895513.arc does not exist.
|
可以看到要恢复64号文件需要的是1128号归档日志,从之前的查询我们也知道日志最后一次访问是2015年4月21,而现在系统的归档号为1W多了:
SELECT * FROM v$log d WHERE d.STATUS='CURRENT' ORDER BY thread#;
SELECT a.FILE#, a.NAME,a.CHECKPOINT_CHANGE#,a.LAST_CHANGE#,status FROM v$datafile a;
SELECT a.FILE#,a.NAME,a.RECOVER,a.CHECKPOINT_CHANGE#,status FROM v$datafile_header a;
那目前是数据文件OFFLINE,而归档文件又丢失了,如果想把该文件ONLINE,我们必须采用BBED来推进数据文件的SCN号到最近的日志号才可以。有关该部分的理论知识可以参考: 【BBED】丢失归档文件情况下的数据文件的恢复:http://blog.itpub.net/26736162/viewspace-2079337/
这里我们依然采用BBED来修复该问题。
注意:由于我们的环境是DG环境,所以先把备库的监听器停掉,以免恢复的过程中,主库生成的日志传递到备库,而主库日志被删除后,修复该文件就又得往前推进了,所以先把备库的监听停掉,确保主库的日志不被删除。
[ZFLHRSDB4:root]:/>crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE zflhrsdb3 ONLINE ONLINE zflhrsdb4 ora.LISTENER_DG.lsnr====>>>>> 这个是DG的监听器 ONLINE ONLINE zflhrsdb3 ONLINE ONLINE zflhrsdb4 ora.asm ONLINE ONLINE zflhrsdb3 Started ONLINE ONLINE zflhrsdb4 Started ora.gsd OFFLINE OFFLINE zflhrsdb3 OFFLINE OFFLINE zflhrsdb4 ora.net1.network ONLINE ONLINE zflhrsdb3 ONLINE ONLINE zflhrsdb4 ora.ons ONLINE ONLINE zflhrsdb3 ONLINE ONLINE zflhrsdb4 ora.registry.acfs ONLINE ONLINE zflhrsdb3 ONLINE ONLINE zflhrsdb4 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE zflhrsdb4 ora.cvu 1 ONLINE ONLINE zflhrsdb4 ora.oc4j 1 ONLINE ONLINE zflhrsdb4 ora.oralhrsg.db 1 ONLINE ONLINE zflhrsdb3 Open,Readonly 2 ONLINE ONLINE zflhrsdb4 Open,Readonly ora.scan1.vip 1 ONLINE ONLINE zflhrsdb4 ora.zflhrsdb3.vip 1 ONLINE ONLINE zflhrsdb3 ora.zflhrsdb4.vip 1 ONLINE ONLINE zflhrsdb4 [ZFLHRSDB4:root]:/> [ZFLHRSDB4:root]:/> [ZFLHRSDB4:root]:/> [ZFLHRSDB4:root]:/> [ZFLHRSDB4:root]:/>crsctl stop res ora.LISTENER_DG.lsnr CRS-2673: Attempting to stop 'ora.LISTENER_DG.lsnr' on 'zflhrsdb4' CRS-2673: Attempting to stop 'ora.LISTENER_DG.lsnr' on 'zflhrsdb3' CRS-2677: Stop of 'ora.LISTENER_DG.lsnr' on 'zflhrsdb4' succeeded CRS-2677: Stop of 'ora.LISTENER_DG.lsnr' on 'zflhrsdb3' succeeded [ZFLHRSDB4:root]:/> |
接下来就可以做恢复操作了。
3.3.1 修复主库的OFFLINE文件
首先,64号文件当前的SCN号1764555149,我们需要将其修改为15760391176,而日志号也需要转换为11087号,这些都需要转换为十六进制,如下:
SYS@oraLHRD2> SELECT TO_CHAR(1764555149, 'xxxxxxxxxxxx') CUR_SCN, 2 TO_CHAR(15760391176, 'xxxxxxxxxxxx') TARGET_SCN, 3 TO_CHAR(11087, 'xxxxxxxxxxxx') TARGET_SEQ 4 FROM DUAL;
CUR_SCN TARGET_SCN TARGET_SEQ ------------- ------------- ------------- 692cf98d 3ab647c08 3275 |
692cf98d和后边BBED查询出来的数据文件头的结果一致。
一、 BBED修改文件头推进SCN号
1、 编译BBED
首先准备BBED的环境,编译BBED,将以下4个文件拷贝到Oracle的相关的目录:
注意:文件我已上传到云盘,可以去http://blog.itpub.net/26736162/viewspace-1624453/下载。
接下来我们编译BBED:
[ZFLHRSDB1:oracle]:/oracle>ls -l $ORACLE_HOME/rdbms/lib/*sbbd* -rw-r--r-- 1 root system 1671 May 26 2010 /oracle/app/oracle/product/11.2.0/db/rdbms/lib/sbbdpt.o -rw-r--r-- 1 root system 900 May 26 2010 /oracle/app/oracle/product/11.2.0/db/rdbms/lib/ssbbded.o [ZFLHRSDB1:oracle]:/oracle>ls -l $ORACLE_HOME/rdbms/mesg/bbed* -rw-r--r-- 1 root system 8704 May 27 2010 /oracle/app/oracle/product/11.2.0/db/rdbms/mesg/bbedus.msb -rw-r--r-- 1 root system 10270 Jul 25 2000 /oracle/app/oracle/product/11.2.0/db/rdbms/mesg/bbedus.msg [ZFLHRSDB1:oracle]:/oracle>exit You have mail in /usr/spool/mail/root [ZFLHRSDB1:root]:/>chown oracle:dba /oracle/app/oracle/product/11.2.0/db/rdbms/lib/sbbdpt.o [ZFLHRSDB1:root]:/>chown oracle:dba /oracle/app/oracle/product/11.2.0/db/rdbms/lib/ssbbded.o [ZFLHRSDB1:root]:/>chown oracle:dba /oracle/app/oracle/product/11.2.0/db/rdbms/mesg/bbedus.msb [ZFLHRSDB1:root]:/>chown oracle:dba /oracle/app/oracle/product/11.2.0/db/rdbms/mesg/bbedus.msg [ZFLHRSDB1:root]:/>su - oracle [ZFLHRSDB1:oracle]:/oracle>make -f $ORACLE_HOME/rdbms/lib/ins_rdbms.mk BBED=$ORACLE_HOME/bin/bbed $ORACLE_HOME/bin/bbed
Linking BBED utility (bbed) rm -f /oracle/app/oracle/product/11.2.0/db/bin/bbed ld -b64 -o /oracle/app/oracle/product/11.2.0/db/bin/bbed -L/oracle/app/oracle/product/11.2.0/db/rdbms/lib/ -L/oracle/app/oracle/product/11.2.0/db/lib/ /oracle/app/oracle/product/11.2.0/db/lib/s0main.o /oracle/app/oracle/product/11.2.0/db/rdbms/lib/ssbbded.o /oracle/app/oracle/product/11.2.0/db/rdbms/lib/sbbdpt.o -bI:/usr/lib/aio.exp `cat /oracle/app/oracle/product/11.2.0/db/lib/ldflags` -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -ldbtools11 `cat /oracle/app/oracle/product/11.2.0/db/lib/ldflags` -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnro11 `cat /oracle/app/oracle/product/11.2.0/db/lib/ldflags` -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnnz11 -lzt11 -lztkg11 -lztkg11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lmm -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 `cat /oracle/app/oracle/product/11.2.0/db/lib/ldflags` -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnro11 `cat /oracle/app/oracle/product/11.2.0/db/lib/ldflags` -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lpls11 -lrt -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lld -lm `cat /oracle/app/oracle/product/11.2.0/db/lib/sysliblist` -lm -lsql11 /oracle/app/oracle/product/11.2.0/db/lib/nautab.o /oracle/app/oracle/product/11.2.0/db/lib/naeet.o /oracle/app/oracle/product/11.2.0/db/lib/naect.o /oracle/app/oracle/product/11.2.0/db/lib/naedhs.o ld: 0711-224 WARNING: Duplicate symbol: .aio_nwait_timeout64 ld: 0711-224 WARNING: Duplicate symbol: aio_nwait_timeout64 ld: 0711-224 WARNING: Duplicate symbol: .aio_nwait64 ld: 0711-224 WARNING: Duplicate symbol: aio_nwait64 ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information. ld: 0711-773 WARNING: Object /oracle/app/oracle/product/11.2.0/db/lib//libgeneric11.a[sdbgrfu.o], imported symbol timezone Symbol was expected to be local. Extra instructions are being generated to reference the symbol. |
2、 修复文件头的scn号
编译完成后可以使用BBED了:
SYS@oraLHRD1> set line 9999 pagesize 9999 SYS@oraLHRD1> col name format a80 SYS@oraLHRD1> select file#||' '||name||' '||bytes name from v$datafile where file#=64;
NAME -------------------------------------------------------------------------------- 64 +DATA1/oralhrs/datafile/tbs101.262.876578481 104857600 [ZFLHRSDB1:root]:/>su - grid [ZFLHRSDB1:grid]:/home/grid>asmcmd ASMCMD> cp +DATA1/oralhrs/datafile/tbs101.262.876578481 /tmp/a.dbf copying +DATA1/oralhrs/datafile/tbs101.262.876578481 -> /tmp/a.dbf ASMCMD> exit [ZFLHRSDB1:grid]:/home/grid>exit [ZFLHRSDB1:root]:/>su - oracle [ZFLHRSDB1:oracle]:/oracle>vi /tmp/file.txt [ZFLHRSDB1:oracle]:/oracle>cat /tmp/file.txt 1 /tmp/a.dbf [ZFLHRSDB1:oracle]:/oracle>bbed PASSWORD=blockedit mode=edit blocksize=8192 listfile=/tmp/file.txt BBED-00303: unable to open file '/tmp/a.dbf'
[ZFLHRSDB1:oracle]:/oracle>l /tmp/a.dbf -rw-r----- 1 grid dba 104865792 Sep 20 17:07 /tmp/a.dbf [ZFLHRSDB1:oracle]:/oracle>exit [ZFLHRSDB1:root]:/>chown oracle.dba /tmp/a.dbf [ZFLHRSDB1:root]:/>su - oracle [ZFLHRSDB1:oracle]:/oracle>bbed PASSWORD=blockedit mode=edit blocksize=8192 listfile=/tmp/file.txt BBED-00303: unable to open file 'log.bbd'
[ZFLHRSDB1:oracle]:/oracle>cd /tmp [ZFLHRSDB1:oracle]:/tmp>bbed PASSWORD=blockedit mode=edit blocksize=8192 listfile=/tmp/file.txt
BBED: Release 2.0.0.0.0 - Limited Production on Tue Sep 20 17:11:28 2016
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> info File# Name Size(blks) ----- ---- ---------- 1 /tmp/a.dbf 0
BBED> show FILE# 1 BLOCK# 1 OFFSET 0 DBA 0x00400001 (4194305 1,1) FILENAME /tmp/a.dbf BIFILE bifile.bbd LISTFILE /tmp/file.txt BLOCKSIZE 8192 MODE Edit EDIT Unrecoverable IBASE Dec OBASE Dec WIDTH 80 COUNT 512 LOGFILE log.bbd SPOOL No
BBED> p kcvfhckp struct kcvfhckp, 160 bytes @484 struct kcvcpscn, 8 bytes @484 ====>>>>>kscnbas,这里是64号文件的当前SCN号,和之前查询来的是一致的,十进制为:1764555149 ub4 kscnbas @484 0x692cf98d ub2 kscnwrp @488 0x0000 ub4 kcvcptim @492 0x342e3478 ub2 kcvcpthr @496 0x0002 union u, 12 bytes @500 struct kcvcprba, 12 bytes @500 ====>>>>>kcrbaseq,这里是64号文件的当前日志号,468转换为十进制是1128 ub4 kcrbaseq @500 0x00000468 ub4 kcrbabno @504 0x0002c2fe ub2 kcrbabof @508 0x0010 ub1 kcvcpetb[0] @512 0x06 ub1 kcvcpetb[1] @513 0x00 ub1 kcvcpetb[2] @514 0x00
《《《《。。。。。。。。篇幅原因,有省略。。。。。。。。》》》》
====》数据文件的scn被记录在文件1号block偏移量484字节开始的四个字节中
BBED> d /v dba 1,1 offset 484 count 64 File: /tmp/a.dbf (1) Block: 1 Offsets: 484 to 547 Dba:0x00400001 ------------------------------------------------------- 692cf98d 00000000 342e3478 00020000 l i,......4.4x.... 00000468 0002c2fe 00100001 06000000 l ...h............ 00000000 00000000 00000000 00000000 l ................ 00000000 00000000 00000000 00000000 l ................
<16 bytes per line>
====》AIX下存储是正序,这个和linux正好相反 BBED> modify /x 3ab645ab3 dba 1,1 offset 484 BBED-00209: invalid number (3ab645ab3)
BBED> modify /x ab647c08 dba 1,1 offset 484 BBED-00209: invalid number (ab647c08) ====》484号不能直接修改,因为ab647c08是以字母开头,必须以数字开头,那么我们从483号开始修改 BBED> modify /x 00ab647c dba 1,1 offset 483 File: /tmp/a.dbf (1) Block: 1 Offsets: 483 to 546 Dba:0x00400001 ------------------------------------------------------------------------ 00ab647c 7b000300 00342e34 78000200 0000002b 4f0002c2 fe000000 10060000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<32 bytes per line>
BBED> modify /x 08 dba 1,1 offset 487 File: /tmp/a.dbf (1) Block: 1 Offsets: 487 to 550 Dba:0x00400001 ------------------------------------------------------------------------ 08000300 00342e34 78000200 0000002b 4f0002c2 fe000000 10060000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<32 bytes per line>
BBED> d /v dba 1,1 offset 484 count 64 File: /tmp/a.dbf (1) Block: 1 Offsets: 484 to 547 Dba:0x00400001 ------------------------------------------------------- ab647c08 00030000 342e3478 00020000 l .d|.....4.4x.... 00002b4f 0002c2fe 00000010 06000000 l ..+O............ 00000000 00000000 00000000 00000000 l ................ 00000000 00000000 00000000 00000000 l ................
<16 bytes per line>
BBED>
BBED> p kcvfhckp struct kcvfhckp, 160 bytes @484 struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0xab647c08 ub2 kscnwrp @488 0x0003 ====>>>日志号过大,所以用到了kscnwrp ub4 kcvcptim @492 0x342e3478 ub2 kcvcpthr @496 0x0002 union u, 12 bytes @500 struct kcvcprba, 12 bytes @500 ub4 kcrbaseq @500 0x00002b4f ub4 kcrbabno @504 0x0002c2fe ub2 kcrbabof @508 0x0000 ub1 kcvcpetb[0] @512 0x06 ub1 kcvcpetb[1] @513 0x00
BBED> sum apply Check value for File 1, Block 1: current = 0xcb25, required = 0xcb25 |
3、 修复数据文件头的序列号
要想跳过归档还需要数据文件头块的rba。它由seq#、log block#、偏移量(固定为16)组成,决定了数据文件从哪个归档日志的哪个位置开始应用归档。Rba位于数据文件头块偏移量500处开始连续的12个字节,有关RBA的理论知识参考:http://blog.itpub.net/26736162/viewspace-2079337/
BBED> d /v dba 1,1 offset 500 count 64 File: /tmp/a.dbf (1) Block: 1 Offsets: 500 to 563 Dba:0x00400001 ------------------------------------------------------- 00000468 0002c2fe 00100001 06000000 l ...h............ 00000000 00000000 00000000 00000000 l ................ 00000000 00000000 00000000 00000000 l ................ 00000000 00000000 00000000 00000000 l ................
<16 bytes per line>
BBED> modify /x 00003275 dba 1,1 offset 500 File: /tmp/a.dbf (1) Block: 1 Offsets: 500 to 563 Dba:0x00400001 ------------------------------------------------------------------------ 00003275 0002c2fe 00000010 06000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<32 bytes per line>
BBED> sum apply Check value for File 1, Block 1: current = 0xd21f, required = 0xd21f
BBED> modify /x 00000002 dba 1,1 offset 504 File: /tmp/a.dbf (1) Block: 1 Offsets: 504 to 567 Dba:0x00400001 ------------------------------------------------------------------------ 00000002 00000010 06000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<32 bytes per line>
BBED> sum apply Check value for File 1, Block 1: current = 0x10e1, required = 0x10e1
BBED> modify /x 00000010 dba 1,1 offset 508 File: /tmp/a.dbf (1) Block: 1 Offsets: 508 to 571 Dba:0x00400001 ------------------------------------------------------------------------ 00000010 06000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<32 bytes per line>
BBED> sum apply Check value for File 1, Block 1: current = 0xd21f, required = 0xd21f
BBED> d /v dba 1,1 offset 500 count 64 File: /tmp/a.dbf (1) Block: 1 Offsets: 500 to 563 Dba:0x00400001 ------------------------------------------------------- 00003275 00000002 00000010 06000000 l ..2u............ 00000000 00000000 00000000 00000000 l ................ 00000000 00000000 00000000 00000000 l ................ 00000000 00000000 00000000 00000000 l ................
<16 bytes per line>
BBED> p kcvfhckp struct kcvfhckp, 160 bytes @484 struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0xab647c08 ub2 kscnwrp @488 0x0003 ub4 kcvcptim @492 0x342e3478 ub2 kcvcpthr @496 0x0002 union u, 12 bytes @500 struct kcvcprba, 12 bytes @500 ub4 kcrbaseq @500 0x00003275 ub4 kcrbabno @504 0x00000002 ub2 kcrbabof @508 0x0000 ub1 kcvcpetb[0] @512 0x06 ub1 kcvcpetb[1] @513 0x00 ub1 kcvcpetb[2] @514 0x00 ub1 kcvcpetb[3] @515 0x00 |
修复完毕,BBED的任务已经完成。
下边将文件从文件系统拷贝到ASM中:
[ZFLHRSDB1:root]:/>l /tmp/a.dbf -rw-r----- 1 oracle dba 104865792 Sep 20 17:39 /tmp/a.dbf [ZFLHRSDB1:root]:/>chown grid.dba /tmp/a.dbf [ZFLHRSDB1:root]:/>su - grid [ZFLHRSDB1:grid]:/home/grid>asmcmd ASMCMD> cp /tmp/a.dbf +DATA1/oralhrs/datafile/a.dbf copying /tmp/a.dbf -> +DATA1/oralhrs/datafile/a.dbf ASMCMD> [ZFLHRSDB1:root]:/>su - oracle [ZFLHRSDB1:oracle]:/oracle>sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Tue Sep 20 17:47:21 2016
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options
SYS@oraLHRD1> alter database rename file '+DATA1/oralhrs/datafile/tbs101.262.923076161' TO '+DATA1/oralhrs/datafile/a.dbf';
Database altered.
SELECT a.FILE#,a.NAME,a.RECOVER,a.CHECKPOINT_CHANGE#,status FROM v$datafile_header a WHERE A.FILE# IN (1,2,64);
====>>>>>从截图可以看出虽然是OFFLINE状态,但是CHECKPOINT_CHANGE#已经和其它文件是一致的了。
SYS@oraLHRD1> COL CHANGE# FOR 999999999999999 SYS@oraLHRD1> select file#,online_status,change#,ERROR from v$recover_file;
FILE# ONLINE_ CHANGE# ERROR ---------- ------- ---------------- ----------------------------------------------------------------- 64 OFFLINE 15760391176
SYS@oraLHRD1> recover datafile 64; Media recovery complete.====>>>>>恢复操作成功完成。 SYS@oraLHRD1> alter database datafile 64 online;
Database altered.====>>>>>数据文件成功ONLINE了。
SYS@oraLHRD1> SYS@oraLHRD1> select file#,online_status,change#,ERROR from v$recover_file;
no rows selected====>>>>>没有需要恢复的文件了 SYS@oraLHRD1> COL CHECKPOINT_CHANGE# FOR 999999999999999 SYS@oraLHRD1> SELECT a.FILE#,a.RECOVER,a.CHECKPOINT_CHANGE#,status FROM v$datafile_header a WHERE A.FILE# IN (1,2,64); FILE# REC CHECKPOINT_CHANGE# STATUS ---------- --- ------------------ ------- 1 NO 15760391176 ONLINE 2 NO 15760391176 ONLINE 64 NO 15760480489 ONLINE |
此时再次查询表空间的占用情况,已经可以看到了TBS101的大小了:
接下来创建一个表,看看表空间是否正常:
SYS@oraLHRD1> CREATE TABLE T_TEST_LHR TABLESPACE TBS101 NOLOGGING AS SELECT * FROM DBA_OBJECTS;
Table created.
SYS@oraLHRD1> insert into t_test select * from T_TEST_LHR;
198881 rows created.
SYS@oraLHRD1> commit;
Commit complete.
SYS@oraLHRD1> ALTER SYSTEM CHECKPOINT;
System altered.
SYS@oraLHRD1> |
再次查看表空间大小:
表空间占用从原来的11M到现在的63M,正常了。
二、 修改主库的64号文件名称为ASM格式
表空间恢复了,但是文件名称还是a.dbf,接下来我们修改a.dbf为ASM的命名格式:
SYS@oraLHRD1> alter tablespace TBS101 offline ;
Tablespace altered.
SYS@oraLHRD1> EXIT Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options [ZFLHRSDB1:oracle]:/oracle>rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Wed Sep 21 11:35:39 2016
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORAIPPS (DBID=1344172889)
RMAN> convert datafile '+DATA1/oralhrs/datafile/a.dbf' format '+DATA1';
Starting conversion at target at 2016-09-21 11:36:12 using channel ORA_DISK_1 channel ORA_DISK_1: starting datafile conversion input file name=+DATA1/oralhrs/datafile/a.dbf converted datafile=+DATA1/oralhrs/datafile/tbs101.262.923139373 channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:00:01 Finished conversion at target at 2016-09-21 11:36:13
RMAN> exit
Recovery Manager complete. [ZFLHRSDB1:oracle]:/oracle>sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Wed Sep 21 11:36:33 2016
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options
SYS@oraLHRD1> alter tablespace TBS101 rename datafile '+DATA1/oralhrs/datafile/a.dbf' to '+DATA1/oralhrs/datafile/tbs101.262.923139373';
Tablespace altered.
SYS@oraLHRD1> alter tablespace TBS101 online;
Tablespace altered.
SYS@oraLHRD1> SYS@oraLHRD1> col name for a50 SYS@oraLHRD1> col CHECKPOINT_CHANGE# for 9999999999999 SYS@oraLHRD1> SELECT a.FILE#,a.NAME,a.RECOVER,a.CHECKPOINT_CHANGE#,status FROM v$datafile_header a WHERE A.FILE# IN (1,2,64);
FILE# NAME REC CHECKPOINT_CHANGE# STATUS ---------- -------------------------------------------------- --- ------------------ ------- 1 +DATA1/oralhrs/datafile/system.331.876578243 NO 15760485734 ONLINE 2 +DATA1/oralhrs/datafile/sysaux.330.876578347 NO 15760485734 ONLINE 64 +DATA1/oralhrs/datafile/tbs101.262.923139373 NO 15760490379 ONLINE
SYS@oraLHRD1> alter system checkpoint;
System altered.
SYS@oraLHRD1> SELECT a.FILE#,a.NAME,a.RECOVER,a.CHECKPOINT_CHANGE#,status FROM v$datafile_header a WHERE A.FILE# IN (1,2,64);
FILE# NAME REC CHECKPOINT_CHANGE# STATUS ---------- -------------------------------------------------- --- ------------------ ------- 1 +DATA1/oralhrs/datafile/system.331.876578243 NO 15760492416 ONLINE 2 +DATA1/oralhrs/datafile/sysaux.330.876578347 NO 15760492416 ONLINE 64 +DATA1/oralhrs/datafile/tbs101.262.923139373 NO 15760492416 ONLINE ====>>>>> 执行完checkpoint后,SCN号已经一致了。 |
OK,成功!主库修复完毕,接下来就剩下备库了。
3.3.2 修复备库的OFFLINE文件
查看备库的文件情况,发现64号文件依然处于OFFLINE状态。
SYS@oraLHRDG2> SELECT a.FILE#,a.NAME,a.RECOVER,a.CHECKPOINT_CHANGE#,status FROM v$datafile_header a WHERE A.FILE# IN (1,2,64);
FILE# NAME REC CHECKPOINT_CHANGE# STATUS ---------- ---------------------------------------------------- --- ------------------ ------- 1 +DATA1/oralhrsg/datafile/system.358.869055401 1.5760E+10 ONLINE 2 +DATA1/oralhrsg/datafile/sysaux.354.869047985 1.5760E+10 ONLINE 64 +DATA1/oralhrsg/datafile/tbs101.382.875442343 1764555149 OFFLINE SYS@oraLHRDG2> recover datafile 64; ORA-00283: recovery session canceled due to errors ORA-01153: an incompatible media recovery is active SYS@oraLHRDG2> recover managed standby database cancel; Media recovery complete. SYS@oraLHRDG2> recover datafile 64; ORA-00283: recovery session canceled due to errors ORA-01610: recovery using the BACKUP CONTROLFILE option must be done SYS@oraLHRDG2> alter database recover managed standby database using current logfile disconnect from session;
Database altered.
SYS@oraLHRDG2> alter database datafile 64 online; alter database datafile 64 online * ERROR at line 1: ORA-01113: file 64 needs media recovery ORA-01110: data file 64: '+DATA1/oralhrsg/datafile/tbs101.382.875442343' |
虽然可以开启实时应用进程,但是64号文件依然不能ONLINE,因为现在的日志号已经到了1W多了,而64号文件的日志号却还在1K多,这个用日志必然不能恢复了,因为日志早不存在了嘛,难道我又得用BBED?不!!!这里我们可以从主库拷贝数据文件过来,且往下看。。。
主库用CONVERT命令备份64号文件:
[ZFLHRSDB1:oracle]:/oracle>rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Wed Sep 21 14:49:56 2016
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORAIPPS (DBID=1344172889)
RMAN> convert datafile '+DATA1/oralhrs/datafile/tbs101.262.923139373' format '/tmp/tbs101.dbf_bk';
Starting conversion at target at 2016-09-21 14:51:16 using channel ORA_DISK_1 channel ORA_DISK_1: starting datafile conversion input file name=+DATA1/oralhrs/datafile/tbs101.262.923139373 converted datafile=/tmp/tbs101.dbf_bk channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:00:03 Finished conversion at target at 2016-09-21 14:51:19 将备份的文件拷贝到备库: [ZFLHRSDB1:oracle]:/tmp>scp /tmp/tbs101.dbf_bk oracle@22.166.166.16:/tmp/tbs101.dbf_bk The authenticity of host '22.166.166.16 (22.166.166.16)' can't be established. RSA key fingerprint is 7b:d6:ba:ca:b3:71:b5:0b:bf:14:f4:e4:18:5f:51:45. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '22.166.166.16' (RSA) to the list of known hosts. tbs101.dbf_bk 100% 100MB 50.0MB/s 00:02
|
在备库上转换文件为ASM格式:
[ZFLHRSDB4:root]:/>l /tmp/tbs101.dbf_bk -rw-r----- 1 oracle dba 104865792 Sep 21 14:52 /tmp/tbs101.dbf_bk [ZFLHRSDB4:root]:/>su - oracle [ZFLHRSDB4:oracle]:/oracle>rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Wed Sep 21 14:52:49 2016
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORAIPPS (DBID=1344172889)
RMAN> convert datafile '/tmp/tbs101.dbf_bk' format '+DATA1';
Starting conversion at target at 2016-09-21 14:53:33 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=1542 instance=oraLHRDG2 device type=DISK channel ORA_DISK_1: starting datafile conversion input file name=/tmp/tbs101.dbf_bk converted datafile=+DATA1/oralhrsg/datafile/tbs101.382.923151215 channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:00:01 Finished conversion at target at 2016-09-21 14:53:36
RMAN> exit
Recovery Manager complete.
|
备库上进行重命名操作,若是备库上64号文件被删除了,我们此时也可以先重建64号文件:
SYS@oraLHRDG2> alter system set standby_file_management='MANUAL' SID='*';
System altered. SYS@oraLHRDG2> alter database create datafile 64 as '+DATA1';
Database altered.
SYS@oraLHRDG2> SELECT a.FILE#,a.NAME,a.RECOVER,a.CHECKPOINT_CHANGE#,status FROM v$datafile_header a WHERE A.FILE# IN (1,2,64);
FILE# NAME REC CHECKPOINT_CHANGE# STATUS ---------- ------------------------------------------------ --- ------------------ ------- 1 +DATA1/oralhrsg/datafile/system.358.869055401 1.5761E+10 ONLINE 2 +DATA1/oralhrsg/datafile/sysaux.354.869047985 1.5761E+10 ONLINE 64 +DATA1/oralhrsg/datafile/tbs101.483.923151901 1.5761E+10 OFFLINE SYS@oraLHRDG2> ALTER DATABASE DATAFILE 64 ONLINE; ALTER DATABASE DATAFILE 64 ONLINE * ERROR at line 1: ORA-01113: file 64 needs media recovery ORA-01110: data file 64: '+DATA1/oralhrsg/datafile/tbs101.483.923151901' |
可以看到64号文件有了,下边进行重命名,修改为我们从主库拷贝过来的64号文件:
SYS@oraLHRDG2> ALTER DATABASE RENAME FILE '+DATA1/oralhrsg/datafile/tbs101.483.923151901' TO '+DATA1/oralhrsg/datafile/tbs101.382.923151215'; ALTER DATABASE RENAME FILE '+DATA1/oralhrsg/datafile/tbs101.483.923151901' TO '+DATA1/oralhrsg/datafile/tbs101.382.923151215' * ERROR at line 1: ORA-01511: error in renaming log/data files ORA-01121: cannot rename database file 64 - file is in use or recovery ORA-01110: data file 64: '+DATA1/oralhrsg/datafile/tbs101.483.923151901' SYS@oraLHRDG2> ! oerr ora 01121 01121, 00000, "cannot rename database file %s - file is in use or recovery" // *Cause: Attempted to use ALTER DATABASE RENAME to rename a // datafile that is online in an open instance or is being recovered. // *Action: Close database in all instances and end all recovery sessions. |
文件在使用,不能进行重命名,该库是RAC库,我们先关闭DG,启动到MOUNT状态后再重命名:
[ZFLHRSDB4:root]:/>srvctl stop db -d oralhrsg [ZFLHRSDB4:root]:/>srvctl start db -d oralhrsg -o mount
SYS@oraLHRDG2> conn / as sysdba Connected.
SYS@oraLHRDG2> ALTER DATABASE RENAME FILE '+DATA1/oralhrsg/datafile/tbs101.483.923151901' TO '+DATA1/oralhrsg/datafile/tbs101.382.923151215';
Database altered.
SYS@oraLHRDG2> ALTER DATABASE DATAFILE 64 ONLINE;
Database altered.<<<<<<<<<---------数据文件可以ONLINE了
SYS@oraLHRDG2> col name for a50 SYS@oraLHRDG2> col CHECKPOINT_CHANGE# for 9999999999999 SYS@oraLHRDG2> SELECT a.FILE#,a.NAME,a.RECOVER,a.CHECKPOINT_CHANGE#,status FROM v$datafile_header a WHERE A.FILE# IN (1,2,64);
FILE# NAME REC CHECKPOINT_CHANGE# STATUS ---------- -------------------------------------------------- --- ------------------ ------- 1 +DATA1/oralhrsg/datafile/system.358.869055401 15760776695 ONLINE 2 +DATA1/oralhrsg/datafile/sysaux.354.869047985 15760776695 ONLINE 64 +DATA1/oralhrsg/datafile/tbs101.382.923151215 15760492416 ONLINE
SYS@oraLHRDG2> alter database open read only; alter database open read only * ERROR at line 1: ORA-10458: standby database requires recovery ORA-01194: file 64 needs more recovery to be consistent ORA-01110: data file 64: '+DATA1/oralhrsg/datafile/tbs101.382.923151215' <<<<<<<<<------------ 打开数据库依然报错,我们手动恢复一下,看看需要哪些日志,因为64号文件已经是最新的了
SYS@oraLHRDG2> recover database; ORA-00283: recovery session canceled due to errors ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
SYS@oraLHRDG2> recover standby database using backup controlfile; ORA-00279: change 15760492416 generated at 09/21/2016 11:38:54 needed for thread 1 ORA-00289: suggestion : /arch/1_12918_868895513.arc ORA-00280: change 15760492416 for thread 1 is in sequence #12918
Specify log: {<RET>=suggested | filename | AUTO | CANCEL} cancel ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 64 needs more recovery to be consistent ORA-01110: data file 64: '+DATA1/oralhrsg/datafile/tbs101.382.923151215'
ORA-01112: media recovery not started
<<<<<<<<<------------ 缺少12918日志,很欣慰,因为12918已经是最新的日志了,而不是最初的1K号日志,这里解决起来就很简单了,可以从主库拷贝12918日志到备库,但是这样太麻烦,我们可以开启备库的应用进程让其自动解决备库的gap问题
SYS@oraLHRDG2> alter database recover managed standby database using current logfile disconnect from session;
Database altered. |
此时查看告警日志,很欣慰看到了12918日志过来了:
Wed Sep 21 15:24:33 2016 alter database recover managed standby database using current logfile disconnect from session Attempt to start background Managed Standby Recovery process (oraLHRDG2) Wed Sep 21 15:24:33 2016 MRP0 started with pid=44, OS id=12649040 MRP0: Background Managed Standby Recovery process started (oraLHRDG2) started logmerger process Wed Sep 21 15:24:39 2016 Managed Standby Recovery starting Real Time Apply Parallel Media Recovery started with 16 slaves Waiting for all non-current ORLs to be archived... All non-current ORLs have been archived. Wed Sep 21 15:24:40 2016 Media Recovery Log /arch/1_12918_868895513.arc Media Recovery Log /arch/2_12918_868895513.arc Completed: alter database recover managed standby database using current logfile disconnect from session Datafile 64 added to flashback set Media Recovery Log /arch/2_12919_868895513.arc Media Recovery Log /arch/1_12919_868895513.arc Media Recovery Log /arch/2_12920_868895513.arc Media Recovery Log /arch/1_12920_868895513.arc Media Recovery Log /arch/2_12921_868895513.arc Media Recovery Log /arch/1_12921_868895513.arc Media Recovery Waiting for thread 2 sequence 12922 (in transit) Recovery of Online Redo Log: Thread 2 Group 12 Seq 12922 Reading mem 0 Mem# 0: +DATA1/oralhrsg/onlinelog/group_12.353.869055809 Media Recovery Waiting for thread 1 sequence 12922 (in transit) Recovery of Online Redo Log: Thread 1 Group 8 Seq 12922 Reading mem 0 Mem# 0: +DATA1/oralhrsg/onlinelog/group_8.344.869055791 |
最后我们重启备库的2个节点:
[ZFLHRSDB4:root]:/>srvctl stop db -d oralhrsg [ZFLHRSDB4:root]:/>srvctl start db -d oralhrsg [ZFLHRSDB4:root]:/> [ZFLHRSDB4:root]:/>srvctl status db -d oralhrsg Instance oraLHRDG1 is running on node zflhrsdb3 Instance oraLHRDG2 is running on node zflhrsdb4 [ZFLHRSDB4:root]:/>crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE zflhrsdb3 ONLINE ONLINE zflhrsdb4 ora.LISTENER_DG.lsnr ONLINE ONLINE zflhrsdb3 ONLINE ONLINE zflhrsdb4 ora.asm ONLINE ONLINE zflhrsdb3 Started ONLINE ONLINE zflhrsdb4 Started ora.gsd OFFLINE OFFLINE zflhrsdb3 OFFLINE OFFLINE zflhrsdb4 ora.net1.network ONLINE ONLINE zflhrsdb3 ONLINE ONLINE zflhrsdb4 ora.ons ONLINE ONLINE zflhrsdb3 ONLINE ONLINE zflhrsdb4 ora.registry.acfs ONLINE ONLINE zflhrsdb3 ONLINE ONLINE zflhrsdb4 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE zflhrsdb4 ora.cvu 1 ONLINE ONLINE zflhrsdb4 ora.oc4j 1 ONLINE ONLINE zflhrsdb4 ora.oralhrsg.db 1 ONLINE ONLINE zflhrsdb3 Open,Readonly 2 ONLINE ONLINE zflhrsdb4 Open,Readonly ora.scan1.vip 1 ONLINE ONLINE zflhrsdb4 ora.zflhrsdb3.vip 1 ONLINE ONLINE zflhrsdb3 ora.zflhrsdb4.vip 1 ONLINE ONLINE zflhrsdb4 [ZFLHRSDB4:root]:/> |
而数据库中64号文件已经正常了:
SYS@oraLHRDG2> SELECT a.FILE#,a.NAME,a.RECOVER,a.CHECKPOINT_CHANGE#,status FROM v$datafile_header a WHERE A.FILE# IN (1,2,64);
FILE# NAME REC CHECKPOINT_CHANGE# STATUS ---------- -------------------------------------------------- --- ------------------ ------- 1 +DATA1/oralhrsg/datafile/system.358.869055401 15760815694 ONLINE 2 +DATA1/oralhrsg/datafile/sysaux.354.869047985 15760815694 ONLINE 64 +DATA1/oralhrsg/datafile/tbs101.382.923151215 15760815694 ONLINE
SYS@oraLHRDG2> show parameter standby
NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ standby_archive_dest string ?/dbs/arch standby_file_management string MANUAL SYS@oraLHRDG2> ALTER SYSTEM SET standby_file_management='AUTO' SID='*';====>>>>> 别忘记将该参数修改回来
System altered.
SYS@oraLHRDG2> set pagesize 9999 line 9999 SYS@oraLHRDG2> col TS_Name format a30 SYS@oraLHRDG2> WITH WT1 AS 2 (SELECT TS.TABLESPACE_NAME, 3 DF.ALL_BYTES, 4 DECODE(DF.TYPE, 5 'D', 6 NVL(FS.FREESIZ, 0), 7 'T', 8 DF.ALL_BYTES - NVL(FS.FREESIZ, 0)) FREESIZ, 9 DF.MAXSIZ, 10 TS.BLOCK_SIZE, 11 TS.LOGGING, 12 TS.FORCE_LOGGING, 13 TS.CONTENTS, 14 TS.EXTENT_MANAGEMENT, 15 TS.SEGMENT_SPACE_MANAGEMENT, 16 TS.RETENTION, 17 TS.DEF_TAB_COMPRESSION, 18 DF.TS_DF_COUNT, 19 TS.BIGFILE, 20 TS.STATUS 21 FROM DBA_TABLESPACES TS, 22 (SELECT 'D' TYPE, 23 TABLESPACE_NAME, 24 COUNT(*) TS_DF_COUNT, 25 SUM(BYTES) ALL_BYTES, 26 SUM(DECODE(MAXBYTES, 0, BYTES, MAXBYTES)) MAXSIZ 27 FROM DBA_DATA_FILES D 28 GROUP BY TABLESPACE_NAME 29 UNION ALL 30 SELECT 'T', 31 TABLESPACE_NAME, 32 COUNT(*) TS_DF_COUNT, 33 SUM(BYTES) ALL_BYTES, SUM(DECODE(MAXBYTES, 0, BYTES, MAXBYTES)) 34 35 FROM DBA_TEMP_FILES D 36 GROUP BY TABLESPACE_NAME) DF, 37 (SELECT TABLESPACE_NAME, SUM(BYTES) FREESIZ 38 FROM DBA_FREE_SPACE 39 GROUP BY TABLESPACE_NAME 40 UNION ALL 41 SELECT TABLESPACE_NAME, SUM(D.BLOCK_SIZE * A.BLOCKS) BYTES 42 FROM GV$SORT_USAGE A, DBA_TABLESPACES D 43 WHERE A.TABLESPACE = D.TABLESPACE_NAME 44 GROUP BY TABLESPACE_NAME) FS 45 WHERE TS.TABLESPACE_NAME = DF.TABLESPACE_NAME 46 AND TS.TABLESPACE_NAME = FS.TABLESPACE_NAME(+)) 47 SELECT (SELECT A.TS# 48 FROM V$TABLESPACE A 49 WHERE A.NAME = UPPER(T.TABLESPACE_NAME)) TS#, 50 T.TABLESPACE_NAME TS_NAME, 51 ROUND(T.ALL_BYTES / 1024 / 1024) TS_SIZE_M, 52 ROUND(T.FREESIZ / 1024 / 1024) FREE_SIZE_M, 53 ROUND((T.ALL_BYTES - T.FREESIZ) / 1024 / 1024) USED_SIZE_M, 54 ROUND((T.ALL_BYTES - T.FREESIZ) * 100 / T.ALL_BYTES, 3) USED_PER, 55 ROUND(MAXSIZ / 1024 / 1024 / 1024, 3) MAX_SIZE_G, 56 ROUND(DECODE(MAXSIZ, 0, TO_NUMBER(NULL), (T.ALL_BYTES - FREESIZ)) * 100 / 57 MAXSIZ, 58 3) USED_PER_MAX, 59 ROUND(T.BLOCK_SIZE) BLOCK_SIZE, 60 T.LOGGING, 61 T.BIGFILE, 62 T.STATUS, 63 T.TS_DF_COUNT 64 FROM WT1 T 65 UNION ALL 66 SELECT TO_NUMBER('') TS#, 67 'ALL TS:' TS_NAME, 68 ROUND(SUM(T.ALL_BYTES) / 1024 / 1024, 3) TS_SIZE_M, 69 ROUND(SUM(T.FREESIZ) / 1024 / 1024) FREE_SIZE_M, 70 ROUND(SUM(T.ALL_BYTES - T.FREESIZ) / 1024 / 1024) USED_SIZE_M, 71 ROUND(SUM(T.ALL_BYTES - T.FREESIZ) * 100 / SUM(T.ALL_BYTES), 3) USED_PER, 72 ROUND(SUM(MAXSIZ) / 1024 / 1024 / 1024) MAX_SIZE, 73 TO_NUMBER('') "USED,% of MAX Size", 74 TO_NUMBER('') BLOCK_SIZE, 75 '' LOGGING, 76 MAX(T.BIGFILE), 77 MAX(T.STATUS), 78 TO_NUMBER('') TS_DF_COUNT 79 FROM WT1 T 80 ORDER BY TS#;
TS# TS_NAME TS_SIZE_M FREE_SIZE_M USED_SIZE_M USED_PER MAX_SIZE_G USED_PER_MAX BLOCK_SIZE LOGGING BIG STATUS TS_DF_COUNT ---------- ------------------------------ ---------- ----------- ----------- ---------- ---------- ------------ ---------- --------- --- --------- ----------- 0 SYSTEM 4096 2613 1483 36.214 4 36.214 8192 LOGGING NO ONLINE 1 1 SYSAUX 4096 1845 2251 54.955 4 54.955 8192 LOGGING NO ONLINE 1 2 UNDOTBS1 122880 117985 4895 3.983 120 3.983 8192 LOGGING NO ONLINE 4 3 TEMP 4096 4093 3 .073 4 .073 8192 NOLOGGING NO ONLINE 1 4 UNDOTBS2 122880 117544 5336 4.342 120 4.342 8192 LOGGING NO ONLINE 4 5 USERS 4096 4083 13 .308 4 .308 8192 LOGGING NO ONLINE 1 6 IPPS_DATA 1544192 358905 1185287 76.758 1508 76.758 8192 LOGGING NO ONLINE 52 7 IPPS_INDEX 512000 495662 16338 3.191 500 3.191 8192 LOGGING NO ONLINE 17 9 TBS001 100 99 1 1.063 .098 1.063 8192 LOGGING NO ONLINE 1 10 TBS101 100 89 11 11 .098 11 8192 LOGGING NO ONLINE 1 ALL TS: 2318536 1102919 1215617 52.43 2264 NO ONLINE
11 rows selected. |
最后不要忘记执行:ALTER SYSTEM SET standby_file_management='AUTO' SID='*';将standby_file_management参数修改为AUTO。
4 环境修复之后的反思
结论:数据文件OFFLINE之后必须要做的一件事就是紧接着立刻执行一次RECOVER操作(小Y之前告诉我们的~~o(∩_∩)o ~~)。
一个数据文件OFFLINE为啥修复起来这么麻烦呢?就是因为归档丢失了,但是若是我们刚开始将数据文件OFFLINE之后若能立刻执行一次RECOVER操作的话,不管中间过了多久,归档丢失了多少,最后ONLINE数据文件的时候都会直接ONLINE起来数据文件而不用做RECOVER操作。废话不多说,我们且做个实验。
项目 | source db |
db 类型 | 单实例 |
db version | 11.2.0.3.4 |
db 存储 | ASM |
OS版本及kernel版本 | AIX 64位 7.1.0.0 |
4.1 数据文件OFFLINE后没有立刻做RECOVER操作
SYS@lhrdb> col name for a60 SYS@lhrdb> select file#,name,status from v$datafile;
FILE# NAME STATUS ---------- ------------------------------------------------------------ ------- 1 +DATA/lhrdb/datafile/system.347.916601927 SYSTEM 2 +DATA/lhrdb/datafile/sysaux.340.916601927 ONLINE 3 +DATA/lhrdb/datafile/undotbs1.353.916601927 ONLINE 4 +DATA/lhrdb/datafile/users.445.916601927 ONLINE 5 +DATA/lhrdb/datafile/example.416.916602001 ONLINE 6 +DATA/lhrdb/datafile/ts_mig_chain_lhr.471.919677645 ONLINE
6 rows selected.
SYS@lhrdb> alter database datafile 6 offline; alter database datafile 6 offline * ERROR at line 1: ORA-01145: offline immediate disallowed unless media recovery enabled ====>>>>> 数据库必须归档才可以OFFLINE
SYS@lhrdb> archive log list; Database log mode No Archive Mode Automatic archival Disabled Archive destination USE_DB_RECOVERY_FILE_DEST Oldest online log sequence 265 Current log sequence 267 SYS@lhrdb> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. SYS@lhrdb> startup mount ORACLE instance started.
Total System Global Area 1720328192 bytes Fixed Size 2247072 bytes Variable Size 486540896 bytes Database Buffers 1224736768 bytes Redo Buffers 6803456 bytes Database mounted. SYS@lhrdb> alter database archivelog;
Database altered.
SYS@lhrdb> alter database open;
Database altered.
SYS@lhrdb> alter database datafile 6 offline;
Database altered.
SYS@lhrdb> select file#,name,status from v$datafile;
FILE# NAME STATUS ---------- ------------------------------------------------------------ ------- 1 +DATA/lhrdb/datafile/system.347.916601927 SYSTEM 2 +DATA/lhrdb/datafile/sysaux.340.916601927 ONLINE 3 +DATA/lhrdb/datafile/undotbs1.353.916601927 ONLINE 4 +DATA/lhrdb/datafile/users.445.916601927 ONLINE 5 +DATA/lhrdb/datafile/example.416.916602001 ONLINE 6 +DATA/lhrdb/datafile/ts_mig_chain_lhr.471.919677645 RECOVER
6 rows selected.
SYS@lhrdb> select file#,online_status,change#,ERROR from v$recover_file;
FILE# ONLINE_ CHANGE# ERROR ---------- ------- ---------- ----------------------------------------------------------------- 6 OFFLINE 7485831
SYS@lhrdb> alter system switch logfile;
System altered.
SYS@lhrdb> alter system switch logfile;
System altered.
SYS@lhrdb> alter database datafile 6 online; alter database datafile 6 online * ERROR at line 1: ORA-01113: file 6 needs media recovery ORA-01110: data file 6: '+DATA/lhrdb/datafile/ts_mig_chain_lhr.471.919677645'
SYS@lhrdb> recover datafile 6; Media recovery complete. SYS@lhrdb> alter database datafile 6 online;
Database altered.
<<<<<<<<<----- 可以看到6号文件必须先执行recover操作后才能执行ONLINE |
4.2 数据文件OFFLINE后立刻做一次RECOVER操作
SYS@lhrdb> alter database datafile 6 offline;
Database altered.
SYS@lhrdb> recover datafile 6;<<<<<<<<<----- OFFLINE后接着执行recover操作 Media recovery complete. SYS@lhrdb> select file#,online_status,change#,ERROR from v$recover_file;<<<<<<<<<--该视图查不到数据
no rows selected
SYS@lhrdb> select file#,name,status from v$datafile;
FILE# NAME STATUS ---------- ------------------------------------------------------------ ------- 1 +DATA/lhrdb/datafile/system.347.916601927 SYSTEM 2 +DATA/lhrdb/datafile/sysaux.340.916601927 ONLINE 3 +DATA/lhrdb/datafile/undotbs1.353.916601927 ONLINE 4 +DATA/lhrdb/datafile/users.445.916601927 ONLINE 5 +DATA/lhrdb/datafile/example.416.916602001 ONLINE 6 +DATA/lhrdb/datafile/ts_mig_chain_lhr.471.919677645 OFFLINE
6 rows selected.
SYS@lhrdb> alter system switch logfile;
System altered.
SYS@lhrdb> alter system switch logfile;
System altered.
SYS@lhrdb> alter system switch logfile;
System altered.
SYS@lhrdb> alter database datafile 6 online;<<<<<<<<<-----切换日志后让数据文件做ONLINE操作并不需要执行RECOVER操作
Database altered.
SYS@lhrdb> select file#,name,status from v$datafile;
FILE# NAME STATUS ---------- ------------------------------------------------------------ ------- 1 +DATA/lhrdb/datafile/system.347.916601927 SYSTEM 2 +DATA/lhrdb/datafile/sysaux.340.916601927 ONLINE 3 +DATA/lhrdb/datafile/undotbs1.353.916601927 ONLINE 4 +DATA/lhrdb/datafile/users.445.916601927 ONLINE 5 +DATA/lhrdb/datafile/example.416.916602001 ONLINE 6 +DATA/lhrdb/datafile/ts_mig_chain_lhr.471.919677645 ONLINE
6 rows selected.
SYS@lhrdb> |
实验结束,所以得养成习惯,若做了数据文件的OFFLINE操作后需要接着执行一次RECOVER操作,这样以后想啥时候ONLINE就啥时候ONLINE了。
5 总结
1、有关BBED的一些理论知识参考:http://blog.itpub.net/26736162/viewspace-2079337/
2、数据文件做OFFLINE后需接着执行一次RECOVER操作
3、最后不要忘记执行:ALTER SYSTEM SET standby_file_management='AUTO' SID='*';将standby_file_management参数修改为AUTO
4、该故障过程可以进行模拟实验,读者可以在自己的测试环境或虚拟机环境进行实验,实验操作很重要
About Me
............................................................................................................................... ● 本文作者:小麦苗,只专注于数据库的技术,更注重技术的运用 ● 本文在itpub(http://blog.itpub.net/26736162)、博客园(http://www.cnblogs.com/lhrbest)和个人微信公众号(xiaomaimiaolhr)上有同步更新 ● 本文itpub地址:http://blog.itpub.net/26736162/viewspace-2125336/ ● 本文博客园地址:http://www.cnblogs.com/lhrbest/articles/5897530.html ● 本文pdf版:http://yunpan.cn/cdEQedhCs2kFz (提取码:ed9b) ● 小麦苗云盘地址:http://blog.itpub.net/26736162/viewspace-1624453/ ● QQ群:230161599 微信群:私聊 ● 联系我请加QQ好友(642808185),注明添加缘由 ● 于 2016-09-20 10:00 ~ 2016-09-22 19:00 在中行完成 ● 文章内容来源于小麦苗的学习笔记,部分整理自网络,若有侵权或不当之处还请谅解! ● 【版权所有,文章允许转载,但须以链接方式注明源地址,否则追究法律责任】 ............................................................................................................................... 手机长按下图识别二维码或微信客户端扫描下边的二维码来关注小麦苗的微信公众号:xiaomaimiaolhr,免费学习最实用的数据库技术。 |