Recover Corrupt/Missing OCR with No Backup

Recover Corrupt/Missing OCR with No Backup - (Oracle 10g)
by Jeff Hunter, Sr. Database Administrator


Contents

  1. Overview
  2. Example Configuration
  3. Recover Corrupt/Missing OCR
  4. About the Author


Overview
It happens. Not very often, but it can happen. You are faced with a corrupt or missing Oracle Cluster Registry (OCR) and have no backup to recover from. So, how can something like this occur? We know that the CRSD process is responsible for creating backup copies of the OCR every 4 hours from the master node in the CRS_home/cdata directory. These backups are meant to be used to recover the OCR from a lost or corrupt OCR file using the ocrconfig -restore command, so how is it possible to be in a situation where the OCR needs to be recovered and you have no viable backup? Well, consider a scenario where you add a node to the cluster and before the next backup (before 4 hours) you find the OCR has been corrupted. You may have forgotten to create a logical export of the OCR before adding the new node or worse yet, the logical export you took is also corrupt. In either case, you are left with a corrupt OCR and no recent backup. Talk about a bad day! Another possible scenario could be a shell script that wrongly deletes all available backups. Talk about an even worse day.
In the event the OCR is corrupt on one node and all options to recover it have failed, one safe way to re-create the OCR (and consequently the voting disk) is to reinstall the Oracle Clusterware software. In order to accomplish this, a complete outage is required for the entire cluster throughout the duration of the re-install. The Oracle Clusterware software will need to be fully removed, the OCR and voting disks reformatted, all virtual IP addresses (VIPs) de-installed, and a complete reinstall of the Oracle Clusterware software will need to be performed. It should also be noted that any patches that were applied to the original clusterware install will need to be re-applied. As you can see, having a backup of the OCR and voting disk can dramatically simplify the recovery of your system!
A second and much more efficient method used to re-create the OCR (and consequently the voting disk as well) is to re-run the root.sh script from the primary node in the cluster. This is described in Doc ID: 399482.1 on the My Oracle Support web site. In my opinion, this method is quicker and much less intrusive than reinstalling Oracle Clusterware. Using root.sh to re-create the OCR/Voting Disk is the focus of this article.
It is worth mentioning that only one of the two methods mentioned above needs to be performed in order to recover from a lost or corrupt OCR. In addition to recovering the OCR, either method could also be used to restore the SCLS directories from an accidental delete. These are internal only directories which are created by root.sh and on the Linux platform are located at/etc/oracle/scls_scr. If the SCLS directories are accidentally removed then they can only be created using the same methods used to re-create the OCR which is the focus of this article.
There are two other critical files in Oracle Clusterware that if accidentally deleted, are a bit easier to recover from:

  • Voting DiskIf there are multiple voting disks and one was accidentally deleted, then check if there are any backups of this voting disk. If there are no backups then we can add one using the crsctl add votedisk command.
  • Socket files in /tmp/.oracle or /var/tmp/.oracleIf these files are accidentally deleted, then stop the Oracle Clusterware on that node and restart it again. This will recreate these socket files. If the socket files for cssd are deleted, then the Oracle Clusterware stack may not come down in which case the node has to be bounced.



Example Configuration
The example configuration used in this article consists of a two-node RAC with a clustered database named racdb.idevelopment.info running Oracle RAC 10g Release 2 on the Linux x86 platform. The two node names are racnode1 and racnode2, each hosting a single Oracle instance named racdb1 and racdb2 respectively. For a detailed guide on building the example clustered database environment, please see:
   Building an Inexpensive Oracle RAC 10g Release 2 on Linux - (CentOS 5.3 / iSCSI)

The example Oracle Clusterware environment is configured with three mirrored voting disks and two mirrored OCR files all of which are located on an OCFS2 clustered file system. Note that the voting disk is owned by the oracle user in the oinstall group with 0644 permissions while the OCR file is owned by root in the oinstall group with 0640 permissions:
[oracle@racnode1 ~]$ ls -l /u02/oradata/racdb
total 39840
-rw-r--r-- 1 oracle oinstall  10240000 Oct  9 19:33 CSSFile
-rw-r--r-- 1 oracle oinstall  10240000 Oct  9 19:36 CSSFile_mirror1
-rw-r--r-- 1 oracle oinstall  10240000 Oct  9 19:38 CSSFile_mirror2
drwxr-xr-x 2 oracle oinstall      3896 Aug 26 23:45 dbs
-rw-r----- 1 root   oinstall 268644352 Oct  9 19:27 OCRFile
-rw-r----- 1 root   oinstall 268644352 Oct  9 19:28 OCRFile_mirror

Check Current OCR File
[oracle@racnode1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       4676
         Available space (kbytes) :     257444
         ID                       : 1513888898
         Device/File Name         : /u02/oradata/racdb/OCRFile
                                    Device/File integrity check succeeded
         Device/File Name         : /u02/oradata/racdb/OCRFile_mirror
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

Check Current Voting Disk
[oracle@racnode1 ~]$ crsctl query css votedisk
 0.     0    /u02/oradata/racdb/CSSFile
 1.     0    /u02/oradata/racdb/CSSFile_mirror1
 2.     0    /u02/oradata/racdb/CSSFile_mirror2

located 3 votedisk(s).
Network Settings
Oracle RAC Node 1 - (racnode1)
DeviceIP AddressSubnetGatewayPurpose
eth0192.168.1.151255.255.255.0192.168.1.1Connects racnode1 to the public network
eth1192.168.2.151255.255.255.0Connects racnode1 to iSCSI shared storage (Openfiler).
eth2192.168.3.151255.255.255.0Connects racnode1 (interconnect) to racnode2 (racnode2-priv)
/etc/hosts
127.0.0.1        localhost.localdomain localhost

# Public Network - (eth0)
192.168.1.151    racnode1
192.168.1.152    racnode2

# Network Storage - (eth1)
192.168.2.151    racnode1-san
192.168.2.152    racnode2-san

# Private Interconnect - (eth2)
192.168.3.151    racnode1-priv
192.168.3.152    racnode2-priv

# Public Virtual IP (VIP) addresses - (eth0:1)
192.168.1.251    racnode1-vip
192.168.1.252    racnode2-vip

# Private Storage Network for Openfiler - (eth1)
192.168.1.195    openfiler1
192.168.2.195    openfiler1-priv
Oracle RAC Node 2 - (racnode2)
DeviceIP AddressSubnetGatewayPurpose
eth0192.168.1.152255.255.255.0192.168.1.1Connects racnode2 to the public network
eth1192.168.2.152255.255.255.0Connects racnode2 to iSCSI shared storage (Openfiler).
eth2192.168.3.152255.255.255.0Connects racnode2 (interconnect) to racnode1 (racnode1-priv)
/etc/hosts
127.0.0.1        localhost.localdomain localhost

# Public Network - (eth0)
192.168.1.151    racnode1
192.168.1.152    racnode2

# Network Storage - (eth1)
192.168.2.151    racnode1-san
192.168.2.152    racnode2-san

# Private Interconnect - (eth2)
192.168.3.151    racnode1-priv
192.168.3.152    racnode2-priv

# Public Virtual IP (VIP) addresses - (eth0:1)
192.168.1.251    racnode1-vip
192.168.1.252    racnode2-vip

# Private Storage Network for Openfiler - (eth1)
192.168.1.195    openfiler1
192.168.2.195    openfiler1-priv



Recover Corrupt/Missing OCR
To describe the steps required in recovering the OCR, it is assumed the current OCR has been accidentally deleted and no viable backups are available. It is also assumed the CRS stack was up and running on both nodes in the cluster at the time the OCR files were removed:
[root@racnode1 ~]# rm /u02/oradata/racdb/OCRFile
[root@racnode1 ~]# rm /u02/oradata/racdb/OCRFile_mirror

[root@racnode1 ~]# ps -ef | grep d.bin | grep -v grep
root       548 27171  0 Oct09 ?     00:06:17 /u01/app/crs/bin/crsd.bin reboot
oracle     575   566  0 Oct09 ?     00:00:10 /u01/app/crs/bin/evmd.bin
root      1118   660  0 Oct09 ?     00:00:00 /u01/app/crs/bin/oprocd.bin run -t 1000 -m 500 -f
oracle    1277   749  0 Oct09 ?     00:03:31 /u01/app/crs/bin/ocssd.bin


[root@racnode2 ~]# ps -ef | grep d.bin | grep -v grep
oracle     674   673  0 Oct09 ?     00:00:10 /u01/app/crs/bin/evmd.bin
root       815 27760  0 Oct09 ?     00:06:12 /u01/app/crs/bin/crsd.bin reboot
root      1201   827  0 Oct09 ?     00:00:00 /u01/app/crs/bin/oprocd.bin run -t 1000 -m 500 -f
oracle    1442   891  0 Oct09 ?     00:03:43 /u01/app/crs/bin/ocssd.bin

  1. Shutdown Oracle Clusterware on All Nodes.Although all OCR files have been lost or corrupted, the Oracle Clusterware daemons as well as the clustered database remain running. In this scenario, Oracle Clusterware and all managed resources need to be shut down in order to start the OCR recovery. Attempting to stop CRS using crsctl stop crs will fail given it cannot write to the now lost/corrupt OCR file:
    [root@racnode1 ~]# crsctl stop crs
    OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
    With the environment in this unstable state, shutdown all database instances from all nodes in the cluster and then reboot each node:
    [oracle@racnode1 ~]$ sqlplus / as sysdba
    
    SQL> shutdown immediate
    
    [root@racnode1 ~]# reboot
    
    ------------------------------------------------
    
    [oracle@racnode2 ~]$ sqlplus / as sysdba
    
    SQL> shutdown immediate
    
    [root@racnode2 ~]# reboot
    When the Oracle RAC nodes come back up, note that Oracle Clusterware will fail to start as a result of the lost/corrupt OCR file:
    [root@racnode1 ~]# crs_stat -t
    CRS-0184: Cannot communicate with the CRS daemon.
    
    [root@racnode2 ~]# crs_stat -t
    CRS-0184: Cannot communicate with the CRS daemon.
  2. Execute rootdelete.sh from All Nodes.The rootdelete.sh script can be found at $ORA_CRS_HOME/install/rootdelete.sh on all nodes in the cluster:
    [root@racnode1 ~]# $ORA_CRS_HOME/install/rootdelete.sh
    Shutting down Oracle Cluster Ready Services (CRS):
    OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
    Shutdown has begun. The daemons should exit soon.
    Checking to see if Oracle CRS stack is down...
    Oracle CRS stack is not running.
    Oracle CRS stack is down now.
    Removing script for Oracle Cluster Ready services
    Updating ocr file for downgrade
    Cleaning up SCR settings in '/etc/oracle/scls_scr'
    
    [root@racnode2 ~]# $ORA_CRS_HOME/install/rootdelete.sh
    Shutting down Oracle Cluster Ready Services (CRS):
    OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
    Shutdown has begun. The daemons should exit soon.
    Checking to see if Oracle CRS stack is down...
    Oracle CRS stack is not running.
    Oracle CRS stack is down now.
    Removing script for Oracle Cluster Ready services
    Updating ocr file for downgrade
    Cleaning up SCR settings in '/etc/oracle/scls_scr'

11g

qpass-test-rac-1.sea1@oracle [+ASM1] $ pwd
/opt/app/oragrid/11.2.0/crs/utl
qpass-test-rac-1.sea1@oracle [+ASM1] $ ls -rlt *.sh
-rw-r--r-- 1 root root 1279 Oct 28  2013 cmdllroot.sh
-rw-r--r-- 1 root root 8640 Oct 28  2013 crswrap.sh
-rw-r--r-- 1 root root  505 Oct 28  2013 diagcollection.sh
-rw-r--r-- 1 root root 6499 Oct 28  2013 gsd.sh
-rw-r--r-- 1 root root 5374 Oct 28  2013 preupdate.sh
-rwxr-xr-x 1 root root 4574 Oct 28  2013 rootaddnode.sh
-rwxr-xr-x 1 root root 5126 Oct 28  2013 rootdeinstall.sh
-rwxr-xr-x 1 root root 5922 Oct 28  2013 rootdelete.sh
-rwxr-xr-x 1 root root 1846 Oct 28  2013 rootdeletenode.sh
qpass-test-rac-1.sea1@oracle [+ASM1] $


  1. The "OCR initialization failed accessing OCR device" and PROC-26 errors can be safely ignored given the OCR is not available. The most important action is that the SCR entries are cleaned up.
    Keep in mind that if you have more than two nodes in your cluster, you need to run rootdelete.sh on all other nodes as well.
  2. Run rootdeinstall.sh from the Primary Node.The primary node is the node where the Oracle Clusterware installation was performed on (which is typically node1). For the purpose of this example, I originally installed Oracle Clusterware from the machine racnode1 which is therefore the primary node.
    The rootdeinstall.sh script will clear out any old data from a raw storage device in preparation for the new OCR. If the OCR is on a clustered file system, a new OCR file(s) will be created with null data.
    [root@racnode1 ~]# $ORA_CRS_HOME/install/rootdeinstall.sh
    Removing contents from OCR mirror device
    2560+0 records in
    2560+0 records out
    10485760 bytes (10 MB) copied, 0.0513806 seconds, 204 MB/s
    Removing contents from OCR device
    2560+0 records in
    2560+0 records out
    10485760 bytes (10 MB) copied, 0.0443477 seconds, 236 MB/s
  3. Run root.sh from the Primary Node. (same node as above)Amoung several other tasks, this script will create the OCR and voting disk(s).
    [root@racnode1 ~]# $ORA_CRS_HOME/root.sh
    Checking to see if Oracle CRS stack is already configured
    
    Setting the permissions on OCR backup directory
    Setting up NS directories
    Oracle Cluster Registry configuration upgraded successfully
    Successfully accumulated necessary OCR keys.
    Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
    node :   
    node 1: racnode1 racnode1-priv racnode1
    node 2: racnode2 racnode2-priv racnode2
    Creating OCR keys for user 'root', privgrp 'root'..
    Operation successful.
    Now formatting voting device: /u02/oradata/racdb/CSSFile
    Now formatting voting device: /u02/oradata/racdb/CSSFile_mirror1
    Now formatting voting device: /u02/oradata/racdb/CSSFile_mirror2
    Format of 3 voting devices complete.
    Startup will be queued to init within 30 seconds.
    Adding daemons to inittab
    Expecting the CRS daemons to be up within 600 seconds.
    CSS is active on these nodes.
            racnode1
    CSS is inactive on these nodes.
            racnode2
    Local node checking complete.
    Run root.sh on remaining nodes to start CRS daemons.
  4. Run root.sh from All Remaining Nodes.
    [root@racnode2 ~]# $ORA_CRS_HOME/root.sh
    Checking to see if Oracle CRS stack is already configured
    
    Setting the permissions on OCR backup directory
    Setting up NS directories
    Oracle Cluster Registry configuration upgraded successfully
    clscfg: EXISTING configuration version 3 detected.
    clscfg: version 3 is 10G Release 2.
    Successfully accumulated necessary OCR keys.
    Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
    node :   
    node 1: racnode1 racnode1-priv racnode1
    node 2: racnode2 racnode2-priv racnode2
    clscfg: Arguments check out successfully.
    
    NO KEYS WERE WRITTEN. Supply -force parameter to override.
    -force is destructive and will destroy any previous cluster
    configuration.
    Oracle Cluster Registry for cluster has already been initialized
    Startup will be queued to init within 30 seconds.
    Adding daemons to inittab
    Expecting the CRS daemons to be up within 600 seconds.
    CSS is active on these nodes.
            racnode1
            racnode2
    CSS is active on all nodes.
    Waiting for the Oracle CRSD and EVMD to start
    Oracle CRS stack installed and running under init(1M)
    Running vipca(silent) for configuring nodeapps
    
    Creating VIP application resource on (2) nodes...
    Creating GSD application resource on (2) nodes...
    Creating ONS application resource on (2) nodes...
    Starting VIP application resource on (2) nodes...
    Starting GSD application resource on (2) nodes...
    Starting ONS application resource on (2) nodes...
    
    
    Done.
    Oracle 10.2.0.1 users should note that running root.sh on the last node will fail. Most notably is the silent mode VIPCA configuration failing because of BUG 4437727 in 10.2.0.1. Refer to my article Building an Inexpensive Oracle RAC 10g Release 2 on Linux - (CentOS 5.3 / iSCSI) to workaround these errors.
    The Oracle Clusterware and Oracle RAC software in my configuration were patched with 10.2.0.4 and therefore did not receive any errors during the running of root.sh on the last node.
  5. Configure Server-Side ONS using racgons.
    CRS_home/bin/racgons add_config hostname1:port hostname2:port

    [root@racnode1 ~]# $ORA_CRS_HOME/bin/racgons add_config racnode1:6200 racnode2:6200
    
    [root@racnode1 ~]# $ORA_CRS_HOME/bin/onsctl ping
    Number of onsconfiguration retrieved, numcfg = 2
    onscfg[0]
       {node = racnode1, port = 6200}
    Adding remote host racnode1:6200
    onscfg[1]
       {node = racnode2, port = 6200}
    Adding remote host racnode2:6200
    ons is running ...
  6. Configure Network Interfaces for Clusterware.Log in as the owner of the Oracle Clusterware software which is typically the oracle user account and configure all network interfaces. The first step is to identify the current interfaces and IP addresses using oifcfg iflist. As discussed in the network settings section, eth0/192.168.1.0 is my public interface/network, eth1/192.168.2.0 is my iSCSI storage network and not used specifically for Oracle Clusterware, and eth2/192.168.3.0 is the cluster_interconnect interface/network.
    [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/oifcfg iflist
    eth0  192.168.1.0     <-- font="" interface="" public="">
    eth1  192.168.2.0     <-- font="" not="" used="">
    eth2  192.168.3.0     <-- cluster="" font="" interconnect="">
    
    [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/oifcfg setif -global eth0/192.168.1.0:public 
    [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/oifcfg setif -global eth2/192.168.3.0:cluster_interconnect
    
    [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/oifcfg getif
    eth0  192.168.1.0  global  public
    eth2  192.168.3.0  global  cluster_interconnect
  7. Add TNS Listener using NETCA.As the Oracle Clusterware software owner (typically oracle), add a cluster TNS listener configuration to OCR using netca. This may give errors if the listener.ora contains the entries already. If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the $TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca. Add all the listeners that were added during the original Oracle Clusterware software installation.
    [oracle@racnode1 ~]$ export DISPLAY=:0
    
    [oracle@racnode1 ~]$ mv $TNS_ADMIN/listener.ora /tmp/listener.ora.original
    [oracle@racnode2 ~]$ mv $TNS_ADMIN/listener.ora /tmp/listener.ora.original
    
    [oracle@racnode1 ~]$ netca &
  8. Add all Resources Back to OCR using srvctl.As a final step, log in as the Oracle Clusterware software owner (typically oracle) and add all resources back to the OCR using the srvctl command.
    Please ensure that these commands are not run as the root user account.
    Add ASM INSTANCE(S) to OCR:

    srvctl add asm -n  -i  -o 

    [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add asm -i +ASM1 -n racnode1 -o /u01/app/oracle/product/10.2.0/db_1
    [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add asm -i +ASM2 -n racnode2 -o /u01/app/oracle/product/10.2.0/db_1
    Add DATABASE to OCR:

    srvctl add database -d  -o 

    [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add database -d racdb -o /u01/app/oracle/product/10.2.0/db_1
    Add INSTANCE(S) to OCR:

    srvctl add instance -d  -i  -n 

    [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add instance -d racdb -i racdb1 -n racnode1
    [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add instance -d racdb -i racdb2 -n racnode2
    Add SERVICE(S) to OCR:

    srvctl add service -d  -s  -r  -P 
    where TAF_policy is set to NONEBASIC, or PRECONNECT

    [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add service -d racdb -s racdb_srvc -r racdb1,racdb2 -P BASIC

After completing the steps above, the OCR should have been successfully recreated. Bring up all of the resources that were added to the OCR and run cluvfy to verify the cluster configuration.
[oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.racdb.db   application    OFFLINE   OFFLINE
ora....b1.inst application    OFFLINE   OFFLINE
ora....b2.inst application    OFFLINE   OFFLINE
ora....srvc.cs application    OFFLINE   OFFLINE
ora....db1.srv application    OFFLINE   OFFLINE
ora....db2.srv application    OFFLINE   OFFLINE
ora....SM1.asm application    OFFLINE   OFFLINE
ora....E1.lsnr application    ONLINE    ONLINE    racnode1
ora....de1.gsd application    ONLINE    ONLINE    racnode1
ora....de1.ons application    ONLINE    ONLINE    racnode1
ora....de1.vip application    ONLINE    ONLINE    racnode1
ora....SM2.asm application    OFFLINE   OFFLINE
ora....E2.lsnr application    ONLINE    ONLINE    racnode2
ora....de2.gsd application    ONLINE    ONLINE    racnode2
ora....de2.ons application    ONLINE    ONLINE    racnode2
ora....de2.vip application    ONLINE    ONLINE    racnode2

[oracle@racnode1 ~]$ srvctl start asm -n racnode1
[oracle@racnode1 ~]$ srvctl start asm -n racnode2
[oracle@racnode1 ~]$ srvctl start database -d racdb
[oracle@racnode1 ~]$ srvctl start service -d racdb

[oracle@racnode1 ~]$ cluvfy stage -post crsinst -n racnode1,racnode2

Performing post-checks for cluster services setup

Checking node reachability...
Node reachability check passed from node "racnode1".


Checking user equivalence...
User equivalence check passed for user "oracle".

Checking Cluster manager integrity...


Checking CSS daemon...
Daemon status check passed for "CSS daemon".

Cluster manager integrity check passed.

Checking cluster integrity...


Cluster integrity check passed


Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR...
OCR of correct Version "2" exists.

Checking data integrity of OCR...
Data integrity check for OCR passed.

OCR integrity check passed.

Checking CRS integrity...

Checking daemon liveness...
Liveness check passed for "CRS daemon".

Checking daemon liveness...
Liveness check passed for "CSS daemon".

Checking daemon liveness...
Liveness check passed for "EVM daemon".

Checking CRS health...
CRS health check passed.

CRS integrity check passed.

Checking node application existence...


Checking existence of VIP node application (required)
Check passed.

Checking existence of ONS node application (optional)
Check passed.

Checking existence of GSD node application (optional)
Check passed.


Post-check for cluster services setup was successful.

  1. For 11g
    Execute as owner (generally oracle) of CRS_HOME command
    /install/onsconfig add_config hostname1:port hostname2:port
    $/u01/crs/install/onsconfig add_config halinux1:6251 halinux2:6251
  2. Execute as owner of CRS_HOME (generally oracle)  /bin/oifcfg setif -global. Please review Note 283684.1 for details.

    $/u01/crs/bin/oifcfg setif -global  eth0/192.168.0.0:cluster_interconnect eth1/10.35.140.0:public
  3. Add listener using netca. This may give errors if the listener.ora contains the entries already. If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the $TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca. Add all the listeners that were added earlier.
  4. Add ASM & database resource to the OCR using the appropriate srvctl add database command as the user who owns the ASM & database resource. Please ensure that this is not run as root user
  5. Add  Instance, services using appropriate srvctl add commands. Please refer to the documentation for the exact commands.
  6. execute cluvfy stage -post crsinst -n node1,node2    ### Please ensure to replace node1,node2 with the node names of the cluster

====================================================================

FIX

If none of the steps documented above can be used to restore the file that was accidentally deleted or is corrupted, then the following steps can be used to re-create/reinstantiate these files. The following steps require complete downtime on all the nodes.
  1. Shutdown the Oracle Clusterware stack on all the nodes using command crsctl stop crs as root user.
  2. Backup the entire Oracle Clusterware home.
  3. Execute /install/rootdelete.sh on all nodes
  4. Execute /install/rootdeinstall.sh on the node which is supposed to be the first node
  5. The following commands should return nothing
    • ps -e | grep -i ‘ocs[s]d’
    • ps -e | grep -i ‘cr[s]d.bin’
    • ps -e | grep -i ‘ev[m]d.bin’
  6. Execute /root.sh on first node
  7. After successful root.sh execution on first node Execute root.sh on the rest of the nodes of the cluster
  8. For 10gR2, use racgons; for 11g use onsconfig command. Using onsconfig stops and starts ONS so the changes take effect, while racgons doesn’t do that so the changes won’t take effect until ONS is restarted on all nodes. Examples for each are provided below.
    For 10g
    Execute as owner (generally oracle) of CRS_HOME command
    /bin/racgons add_config hostname1:port hostname2:port
    $/u01/crs/bin/racgons add_config halinux1:6251 halinux2:6251
    For 11g
    Execute as owner (generally oracle) of CRS_HOME command
    /install/onsconfig add_config hostname1:port hostname2:port
    $/u01/crs/install/onsconfig add_config halinux1:6251 halinux2:6251
  9. Execute as owner of CRS_HOME (generally oracle)  /bin/oifcfg setif -global. Please review Note 283684.1 for details.
    $/u01/crs/bin/oifcfg setif -global  eth0/192.168.0.0:cluster_interconnect eth1/10.35.140.0:public
  10. Add listener using netca. This may give errors if the listener.ora contains the entries already. If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the $TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca. Add all the listeners that were added earlier.
  11. Add ASM & database resource to the OCR using the appropriate srvctl add database command as the user who owns the ASM & database resource. Please ensure that this is not run as root user
  12. Add  Instance, services using appropriate srvctl add commands. Please refer to the documentation for the exact commands.
  13. execute cluvfy stage -post crsinst -n node1,node2    ### Please ensure to replace node1,node2 with the node names of the cluster

==============================================================
(OR)

Recreate OCR/Voting Disk Accidentally Deleted


The goal of this post is to help DBA’s who have accidentally deleted the OCR, voting disk or the files that are required for the operation of Oracle clusterware.
Depending on scenario of issue, you have to take decision to execute the steps:ocr-vdisk
  • OCR
    • If the OCR has been deleted, then check if the OCR mirror is OK and vice versa. It may be prudent to use the OCR mirror to create the OCR.
    • If the OCR mirror and OCR have been deleted, then it may be faster to restore the OCR using the OCR backups.
  • Voting Disk
    • If there are multiple voting disks and one was accidentally deleted, then check if there are any backups of this voting disk. If there are no backups then we can add one using :
crsctl add votedisk
  • SCLS Directories
    • These are internal only directories which are created by root.sh, if this directory is accidentally removed then they can only be created by the steps given below in post.
  • Socket file in /temp/.oracle or /var/temp/.oracle
    • If these files are accidentally deleted, then stop the Oracle Clusterware on that node and restart it again. This will recreate these socket files. If the socket files for cssd is deleted then the Oracle Clusterware stack may not come down in which case the node has to be bounced.
Solution:
If none of the steps given above can be used to restore the file that was accidentally deleted or is corrupted, then the following steps can be used to re-create these files. The following steps require complete downtime:
  • Shutdown the Oracle Clusterware stack on all the nodes using command as root user. # crsctl stop crs
  • Backup the entire Oracle CRS home.
  • Execute script /install/rootdelete.sh on all nodes.
  • Execute /install/rootdeinstall.sh from first node only.
  • To verify deletion execute following commands and it should return nothing:
    • ps -e | grep -i ‘ocssd’
    • ps -e | grep -i ‘crsd.bin’
    • ps -e | grep -i ‘evmd.bin’
  • Execute /root.sh on first node
  • After successful root.sh execution on first node. Execute root.sh on the rest of the nodes of the cluster.
  • For 10gR2, use racgons; for 11g use onsconfig command. Using onsconfig stops and starts ONS so the changes take effect, while racgons doesn’t do that so the changes won’t take effect until ONS is restarted on all nodes. Examples for each are provided below:
For 10g
Execute as CRS owner (generally grid/oracle) of CRS_HOME command
$ /bin/racgons add_config hostname1:port hostname2:port
Example: $/u01/crs/install/onsconfig add_config halinux1:6251 halinux2:6251
For 11g
Execute as owner (generally oracle) of CRS_HOME command
$ /install/onsconfig add_config hostname1:port hostname2:port
Example: $/u01/crs/bin/oifcfg setif -global eth0/192.168.0.0:cluster_interconnect eth1/10.35.140.0:public
  • Add listener using netca. This may give errors if the listener.ora contains the entries already. If this is the case, rename the listener.ora  and then run netca. Add all the listeners that were added earlier.
  • Add ASM & database resource to the OCR using the srvctl add database command as the user who is owner of ASM & database resource. Please ensure that this is not run as root user.
  • Add Instance, services using srvctl add commands.
  • execute cluvfy stage -post crsinst -n node1,node2
References: Metalink note 399482.1, Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide

========================================================================================

Comments

  1. Nice article, Good effort and I really appreciate it.

    ReplyDelete

Post a Comment

Oracle DBA Information