RMAN 11G The Data Recovery Advisor

RMAN 11G The Data Recovery Advisor

I would like to introduce you to a feature that was made available in Oracle 11g called the Data Recovery Advisor. The Data Recovery Advisor tool will automatically diagnose data failures, such as corruption or loss of persistent data on disk, and will attempt to identify and present you with repair options, which you the DBA can then review and execute. That is the quick summary of it, and it is a great little feature that can provide you with extra options to perform quick and easy failure detection, guiding you through the options available to repair and resolve the failure. What I do like about this tool is there are only a few commands you need to know in RMAN to make use of it:

list failure
advise failure
repair failure

These options are also available in Enterprise Manager and Database Control for those of you that prefer to use the GUI.

In the previous example, I had a lost datafile and showed how you can manually restore and recover it. But what I would like to do now is show you how you can use the Data Recovery Advisor to help you through this process. So first what I do is to delete one of the datafiles again, simulating a similar issue as in the previous example. But this time instead of executing the RMAN commands manually to perform the restore and recovery I will use the Data Recovery Advisor. Below are the steps that I followed to restore datafile 6 this time.

SQL> startup
ORACLE instance started.
Total System Global Area  521936896 bytes
Fixed Size                2229944 bytes
Variable Size           327158088 bytes
Database Buffers        188743680 bytes
Redo Buffers              3805184 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: '+DATA/testdb/datafile/data.284.798052235'

As you can see file 6 is now missing. I login to RMAN and execute the “list failure” command to show me what the Data Recovery Advisor has picked up:

oracle@dbvlin201[/usr/local/dbvisit/standby]: rman

Recovery Manager: Release 11.2.0.3.0 - Production on Wed Oct 31 10:57:32 2012
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target /
connected to target database: TESTDB (DBID=2569715825, not open)

RMAN> list failure;
using target database control file instead of recovery catalog

List of Database Failures
=========================
Failure ID Priority Status    Time Detected       Summary
---------- -------- --------- ------------------- -------
1062       HIGH     OPEN      31/10/2012 10:32:05 One or more non-system datafiles are missing

The next step was to execute the “advise failure” command to see what the suggested course of action would be to resolve this failure.

RMAN> advise failure;

List of Database Failures
=========================
Failure ID Priority Status    Time Detected       Summary
---------- -------- --------- ------------------- -------
1062       HIGH     OPEN      31/10/2012 10:32:05 One or more non-system datafiles are missing

analyzing automatic repair options; this may take some time
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=23 device type=DISK
analyzing automatic repair options complete

Mandatory Manual Actions
========================
no manual actions available

Optional Manual Actions
=======================
1. If file +DATA/testdb/datafile/data.284.798052235 was unintentionally renamed or moved, restore it

Automated Repair Options
========================
Option Repair Description
------ ------------------
1      Restore and recover datafile 6

Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/testdb/testdb/hm/reco_66710732.hm

As you can see above there are two options provided, the Manual and Automated options. We are going to use the Automated option, which will execute the repair script listed above. The next step then is to execute the repair command to automatically fix the issue for me. To do this I execute the “repair failure” command, and with this the restore and recovery process is started for you.

RMAN> repair failure;

Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/testdb/testdb/hm/reco_66710732.hm
contents of repair script:
# restore and recover datafile
restore datafile 6;
recover datafile 6;
sql 'alter database datafile 6 online';

Do you really want to execute the above repair (enter YES or NO)? YES
executing repair script

Starting restore at 31/10/2012 10:58:16
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00006 to +DATA/testdb/datafile/data.284.798052235
channel ORA_DISK_1: reading from backup piece /backups/rman/testdb/14np2kdc_1_1
channel ORA_DISK_1: piece handle=/backups/rman/testdb/14np2kdc_1_1 tag=TAG20121030T171940
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished restore at 31/10/2012 10:58:19

Starting recover at 31/10/2012 10:58:19
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:00
Finished recover at 31/10/2012 10:58:20

sql statement: alter database datafile 6 online
repair failure complete

Do you want to open the database (enter YES or NO)? YES
database opened

From the above output you can see that the “repair failure” command started the restore and recovery process after confirming with me that this is what I wanted to do. It also picked up the database was still mounted and asked if it can open the database, to which I also responded with YES, and within a few minutes my database was back up and running. This just makes backup and recovery so much easier. The extract below shows you that all my datafiles are now looking good following the restore and recovery:

SQL> select file#, status, name from v$datafile;

FILE#      STATUS  NAME
---------- ------- --------------------------------------------------
1          SYSTEM  +DATA/testdb/datafile/system.256.786387593
2          ONLINE  +DATA/testdb/datafile/sysaux.257.786387599
3          ONLINE  +DATA/testdb/datafile/undotbs1.258.786387599
4          ONLINE  +DATA/testdb/datafile/users.259.786387601
5          ONLINE  +DATA/testdb/datafile/data.274.798055391
6          ONLINE  +DATA/testdb/datafile/data.284.798116297
6 rows selected.

SQL> select file#, name, status, error from v$datafile_header;

FILE#      NAME                                               STATUS  ERROR
---------- -------------------------------------------------- ------- ----------
1          +DATA/testdb/datafile/system.256.786387593         ONLINE
2          +DATA/testdb/datafile/sysaux.257.786387599         ONLINE
3          +DATA/testdb/datafile/undotbs1.258.786387599       ONLINE
4          +DATA/testdb/datafile/users.259.786387601          ONLINE
5          +DATA/testdb/datafile/data.274.798055391           ONLINE
6          +DATA/testdb/datafile/data.284.798116297           ONLINE
6 rows selected.

Once done you can use the “list failure closed” command to show you a listing of closed (resolved) failures.

Please follow below link for information ..

http://www.oracle-base.com/articles/11g/data-recovery-advisor-11gr1.php

*****************************************************************************************

As per Meta-link.

RMAN 11G : Data Recovery Advisor - RMAN command line example (Doc ID 762339.1)

GOAL

This note will provide a DBA with an understanding of how the new feature Data Recovery Advisor in 11g operates.

SOLUTION

Overview

What Is the Data Recovery Advisor?

The Data Recovery Advisor is a tool that helps you to diagnose and repair data failures and corruptions. The Data Recovery Advisor analyzes failures based on symptoms and intelligently determines optimal repair strategies. The tool can also automatically repair diagnosed failures.

The Data Recovery Advisor is available from Enterprise Manager (EM) Database Control and Grid Control. You can also use it via the RMAN command-line.

In this example I will you will see examples of via the RMAN command line utilising the DRA commands:

This DRA commands are available within RMAN:

List Failure # lists the results of previously executed failure assessments. Revalidates existing failures and closes them, if possible.
Advise Failure # presents manual and automatic repair options
Repair Failure # automatically fix failures by running optimal repair option, suggested by ADVISE FAILURE. Revalidates existing failures when completed.
Change Failure # enables you to change the status of failures.

Below is a demonstration of DRA.

1. Take Full RMAN backup;

2. RMAN> validate check logical datafile 2;

Show that the datafile is free from corruption.

RMAN> validate check logical datafile 2;

Starting validate at 2008/11/10 09:52:36

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=132 device type=DISK

channel ORA_DISK_1: starting validation of datafile

channel ORA_DISK_1: specifying datafile(s) for validation

input datafile file number=00002 name=/oradata/V11/sysaux01.dbf

channel ORA_DISK_1: validation complete, elapsed time: 00:00:25

List of Datafiles

=================

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN

---- ------ -------------- ------------ --------------- ----------

2        OK              0        28279          104896    2885460

File Name: /oradata/V11/sysaux01.dbf

Block Type Blocks Failing Blocks Processed

---------- -------------- ----------------

Data                    0            21440

Index                   0            24664

Other                   0            30513

Finished validate at 2008/11/10 09:53:02

3. Corrupt several Blocks within the sysaux tablespace

SQL> @corrupt_blocks.sql  (internal Oracle script)

4. RMAN> validate check logical datafile 2;

This should show the corruption and indicate the # of blocks corrupted.

RMAN> validate check logical datafile 2;

Starting validate at 2008/11/10 09:57:05

using channel ORA_DISK_1

channel ORA_DISK_1: starting validation of datafile

channel ORA_DISK_1: specifying datafile(s) for validation

input datafile file number=00002 name=/oradata/V11/sysaux01.dbf

channel ORA_DISK_1: validation complete, elapsed time: 00:00:07

List of Datafiles

=================

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN

---- ------ -------------- ------------ --------------- ----------

2 FAILED                 0        28273          104896    2894632

File Name: /oradata/V11/sysaux01.dbf

Block Type Blocks Failing Blocks Processed

---------- -------------- ----------------

Data                    0            21440

Index                   0            24663

Other                  10            30520

validate found one or more corrupt blocks

See trace file /u01/app/oracle/product/11.1.0/db_1/diag/rdbms/v11/V11/trace/V11_ora_22522.trc for details

Finished validate at 2008/11/10 09:57:12

Review: /u01/app/oracle/product/11.1.0/db_1/diag/rdbms/v11/V11/trace/V11_ora_22522.trc

Open the trace file to show the errors that have been picked up.

Corrupt block relative dba: 0x00800024 (file 2, block 36)

Completely zero block found during validation

Reread of blocknum=36, file=/oradata/V11/sysaux01.dbf. found same corrupt data

Reread of blocknum=36, file=/oradata/V11/sysaux01.dbf. found same corrupt data

Reread of blocknum=36, file=/oradata/V11/sysaux01.dbf. found same corrupt data

Reread of blocknum=36, file=/oradata/V11/sysaux01.dbf. found same corrupt data

Reread of blocknum=36, file=/oradata/V11/sysaux01.dbf. found same corrupt data

Hex dump of (file 2, block 37)

We can now issue list failure & list failure <number> detail to get more about the issue.

5. RMAN> list failure;
   RMAN> list failure ### detail;  ( where ### equlas the failure number)

RMAN> list failure;

List of Database Failures

=========================

Failure ID Priority Status          Time Detected            Summary

---------- -------- ---------       -------------------       -------

351261         HIGH    OPEN 2008/11/10 09:56:33 Datafile 2: '/oradata/V11/sysa

ux01.dbf' contains one or more corrupt blocks

RMAN> list failure 351261 detail;

List of Database Failures

=========================

Failure ID Priority Status Time Detected Summary

---------- -------- --------- ------------------- -------

351261 HIGH OPEN 2008/11/10 09:56:33 Datafile 2: '/oradata/V11/sysa

ux01.dbf' contains one or more corrupt blocks

Impact: Some objects in tablespace might be unavailable

List of child failures for parent failure ID 351261

Failure ID Priority Status Time Detected Summary

---------- -------- --------- ------------------- -------

List of Database Failures

=========================

Failure ID Priority Status Time Detected Summary

---------- -------- --------- ------------------- -------

351261 HIGH OPEN 2008/11/10 09:56:33 Datafile 2: '/oradata/V11/sysa

ux01.dbf' contains one or more corrupt blocks

Impact: Some objects in tablespace might be unavailable

List of child failures for parent failure ID 351261

Failure ID Priority Status Time Detected Summary

---------- -------- --------- ------------------- -------

352028 HIGH OPEN 2008/11/10 09:56:35 Block 37 in datafile 2: '/or

adata/V11/sysaux01.dbf' is media corrupt

At this point you are able to issue the advise keyword which will prompt DBA will be advise on how to solve the current scenario.

6. RMAN> Advise Failure;

List of Database Failures

=========================

Failure ID Priority Status Time Detected Summary

---------- -------- --------- ------------------- -------

351261 HIGH OPEN 2008/11/10 09:56:33 Datafile 2: '/oradata/V11/sysa

ux01.dbf' contains one or more corrupt blocks

Impact: Some objects in tablespace might be unavailable

List of child failures for parent failure ID 351261

Failure ID Priority Status Time Detected Summary

---------- -------- --------- ------------------- -------

352028 HIGH OPEN 2008/11/10 09:56:35 Block 37 in datafile 2: '/or

adata/V11/sysaux01.dbf' is media corrupt

..

..

..

Impact: Object I_SMB$CONFIG_PKEY owned by SYS might be unavailable

351974 HIGH OPEN 2008/11/10 09:56:33 Block 28 in datafile 2: '/or

adata/V11/sysaux01.dbf' is media corrupt

Impact: Object I_SMB$CONFIG_PKEY owned by SYS might be unavailable

analyzing automatic repair options; this may take some time

using channel ORA_DISK_1

analyzing automatic repair options complete

Mandatory Manual Actions

========================

no manual actions available

Optional Manual Actions

=======================

1. If a standby database is available, then consider a Data Guard switchover or failover

Automated Repair Options

========================

Option Repair Description

------ ------------------

1 Recover multiple corrupt blocks in datafile 2

Strategy: The repair includes complete media recovery with no data loss

Repair script: /u01/app/oracle/product/11.1.0/db_1/diag/rdbms/v11/V11/hm/reco_189110358.hm

RMAN>

Review File: : /u01/app/oracle/product/11.1.0/db_1/diag/rdbms/v11/V11/hm/reco_189110358.hm

[oracle@aulnx7 hm]$ vi reco_189110358.hm

# block media recovery for multiple blocks

recover datafile 2 block 28 to 37;

As can be seen above one option is to perform rman block recovery. The next step for the DBA is to proceed with the repair preview.

7. RMAN> repair failure preview;

Strategy: The repair includes complete media recovery with no data loss

Repair script: /u01/app/oracle/product/11.1.0/db_1/diag/rdbms/v11/V11/hm/reco_42

19629556.hm

contents of repair script:

# block media recovery for multiple blocks

recover datafile 2 block 28 to 37;

Then you are ready to perform the fix.

8. RMAN> repair failure;

Strategy: The repair includes complete media recovery with no data loss

Repair script: /u01/app/oracle/product/11.1.0/db_1/diag/rdbms/v11/V11/hm/reco_42

19629556.hm

contents of repair script:

# block media recovery for multiple blocks

recover datafile 2 block 28 to 37;

Do you really want to execute the above repair (enter YES or NO)? YES

executing repair script

Starting recover at 2008/11/10 10:18:48

using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)

channel ORA_DISK_1: specifying block(s) to restore from backup set

restoring blocks of datafile 00002

channel ORA_DISK_1: reading from backup piece /recovery_area/V11/backupset/2008_

11_10/o1_mf_nnndf_TAG20081110T093808_4kgslm1w_.bkp

channel ORA_DISK_1: piece handle=/recovery_area/V11/backupset/2008_11_10/o1_mf_n

nndf_TAG20081110T093808_4kgslm1w_.bkp tag=TAG20081110T093808

channel ORA_DISK_1: restored block(s) from backup piece 1

channel ORA_DISK_1: block restore complete, elapsed time: 00:00:35

starting media recovery

media recovery complete, elapsed time: 00:00:03

Finished recover at 2008/11/10 10:19:27

repair failure complete

RMAN>

At this point the the blocks have been fully recovered.

CHANGE FAILURE # enables you to change the status of failures.

eg. If a failure is fixed manually outside of RMAN, and you need to clear this failure,
you can change it to Status: closed

RMAN> change failure all closed;

List of Database Failures

=========================

Failure ID Priority Status    Time Detected        Summary

---------- -------- --------- -------------------- -------

3782       HIGH     OPEN      21 AUG 2012 09:53:59 One or more non-system datafiles are offline

Do you really want to change the above failures (enter YES or NO)? yes

closed 1 failures

Restrctions:

In the current release, Data Recovery Advisor supports single-instance databases. Oracle Real Application Clusters databases are not supported in 11.1.0.6 -> 11.1.0.8

Data Recovery Advisor cannot use blocks or files transferred from a standby database to repair failures on a primary database. Also, you cannot use Data Recovery Advisor to diagnose and repair failures on a standby database. However, the Data Recovery Advisor does support failover to a standby database as a repair option (as mentioned above).

********************************************************************************************

LISTING FAILURES WITH DATA RECOVERY ADVISOR FEATURE IN 11G

In 11g Data Recovery Advisor tool in RMAN is used to diagnose and repair database failures.
Data Recovery Advisor is an Oracle Database tool that automatically diagnoses data failures, determines and presents appropriate repair options, and executes repairs at the user's request

One of them is Listing failures.

In the RMAN command-line interface, the Data Recovery Advisor commands are LIST FAILURE, ADVISE FAILURE, REPAIR FAILURE, and CHANGE FAILURE

Listing Failures :-

If you suspect or know that one or more database failures have occurred, then use LIST FAILURE to obtain information about them.

List failure command wouldnot start checks for new data failure instead it would list the information about previous encountered data failure.

Executing list failure would crosscheck the previously encountered failure and if they have been repaired manually or Automatically then Data Recovery Advisor removes these failures from the LIST FAILURE output

How use List Failure feature :-

1> Connect to target database using Rman

2> RMAN> list failure;

using target database control file instead of recovery catalog
List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
22591      HIGH     OPEN      27-MAY-08     One or more non-system datafiles are  corrupt
162        HIGH     OPEN      03-DEC-07     Datafile 6: 'D:\DD.DBF' contains one  or more corrupt blocks
The status of a failure is OPEN until we use required repair option to resolve the same. Once the repair operation is performed the status changes to Closed.

Here two datafiles are report one as missing and other containing an corrupt block.

How to list DETAIL of failures individually

This command would list failures individually

Syntax :- LIST FAILURE <FAILURE ID> DETAIL;

Rman > LIST FAILURE 162 DETAIL ;RMAN> list failure 162 DETAIL;

List of Database Failures
=========================

Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
162        HIGH     OPEN      03-DEC-07     Datafile 6: 'D:\DD.DBF' contains one
 or more corrupt blocks
  Impact: Some objects in tablespace CORRUPT might be unavailable
  List of child failures for parent failure ID 162
  Failure ID Priority Status    Time Detected Summary
  ---------- -------- --------- ------------- -------
  21891      HIGH     OPEN      21-MAR-08     Block 28 in datafile 6: 'D:\COR.DB
F' is media corrupt
    Impact: Object COR owned by SH might be unavailable
  1365       HIGH     OPEN      13-DEC-07     Block 16 in datafile 6: 'D:\COR.DB
F' is media corrupt
    Impact: Object COR owned by SH might be unavailable 

Details of Column information list above :-

Failure Status :- The status is either Open or Closed.

Failure Prority :-Every failure has a failure priority: CRITICAL, HIGH, or LOW. Data Recovery Advisor only assigns CRITICAL or HIGH priority to diagnosed failures.

Failure Grouping :-
For clarity, Data Recovery Advisor groups related failures together. For example, if 20 different blocks in a file are corrupted, then these failures will be grouped under a single parent failure. By default, Data Recovery Advisor lists information about the group of failures, although you can specify the DETAIL option to list information about the individual subfailures.

Time :- The time when the problem was detected.

Summary :- Gives details information about the problem

Other list commands are :-

LIST FAILURE LOW;
LIST FAILURE CLOSED;

*******************************************************************************

Oracle DBA Information at one place

Search This Blog