Recover Corrupt/Missing OCR with No Backup - (Oracle 10g)
by Jeff Hunter, Sr. Database Administrator
Contents
Overview
Example Configuration
Recover Corrupt/Missing OCR
11g
qpass-test-rac-1.sea1@oracle [+ASM1] $ pwd
/opt/app/oragrid/11.2.0/crs/utl
qpass-test-rac-1.sea1@oracle [+ASM1] $ ls -rlt *.sh
-rw-r--r-- 1 root root 1279 Oct 28 2013 cmdllroot.sh
-rw-r--r-- 1 root root 8640 Oct 28 2013 crswrap.sh
-rw-r--r-- 1 root root 505 Oct 28 2013 diagcollection.sh
-rw-r--r-- 1 root root 6499 Oct 28 2013 gsd.sh
-rw-r--r-- 1 root root 5374 Oct 28 2013 preupdate.sh
-rwxr-xr-x 1 root root 4574 Oct 28 2013 rootaddnode.sh
-rwxr-xr-x 1 root root 5126 Oct 28 2013 rootdeinstall.sh
-rwxr-xr-x 1 root root 5922 Oct 28 2013 rootdelete.sh
-rwxr-xr-x 1 root root 1846 Oct 28 2013 rootdeletenode.sh
qpass-test-rac-1.sea1@oracle [+ASM1] $
by Jeff Hunter, Sr. Database Administrator
Contents
Overview
It happens. Not very often, but it can happen. You are faced with a corrupt or missing Oracle Cluster Registry (OCR) and have no backup to recover from. So, how can something like this occur? We know that the CRSD process is responsible for creating backup copies of the OCR every 4 hours from the master node in the CRS_home/cdata directory. These backups are meant to be used to recover the OCR from a lost or corrupt OCR file using the ocrconfig -restore command, so how is it possible to be in a situation where the OCR needs to be recovered and you have no viable backup? Well, consider a scenario where you add a node to the cluster and before the next backup (before 4 hours) you find the OCR has been corrupted. You may have forgotten to create a logical export of the OCR before adding the new node or worse yet, the logical export you took is also corrupt. In either case, you are left with a corrupt OCR and no recent backup. Talk about a bad day! Another possible scenario could be a shell script that wrongly deletes all available backups. Talk about an even worse day.
In the event the OCR is corrupt on one node and all options to recover it have failed, one safe way to re-create the OCR (and consequently the voting disk) is to reinstall the Oracle Clusterware software. In order to accomplish this, a complete outage is required for the entire cluster throughout the duration of the re-install. The Oracle Clusterware software will need to be fully removed, the OCR and voting disks reformatted, all virtual IP addresses (VIPs) de-installed, and a complete reinstall of the Oracle Clusterware software will need to be performed. It should also be noted that any patches that were applied to the original clusterware install will need to be re-applied. As you can see, having a backup of the OCR and voting disk can dramatically simplify the recovery of your system!
A second and much more efficient method used to re-create the OCR (and consequently the voting disk as well) is to re-run the root.sh script from the primary node in the cluster. This is described in Doc ID: 399482.1 on the My Oracle Support web site. In my opinion, this method is quicker and much less intrusive than reinstalling Oracle Clusterware. Using root.sh to re-create the OCR/Voting Disk is the focus of this article.
It is worth mentioning that only one of the two methods mentioned above needs to be performed in order to recover from a lost or corrupt OCR. In addition to recovering the OCR, either method could also be used to restore the SCLS directories from an accidental delete. These are internal only directories which are created by root.sh and on the Linux platform are located at/etc/oracle/scls_scr. If the SCLS directories are accidentally removed then they can only be created using the same methods used to re-create the OCR which is the focus of this article.
There are two other critical files in Oracle Clusterware that if accidentally deleted, are a bit easier to recover from:
- Voting DiskIf there are multiple voting disks and one was accidentally deleted, then check if there are any backups of this voting disk. If there are no backups then we can add one using the crsctl add votedisk command.
- Socket files in /tmp/.oracle or /var/tmp/.oracleIf these files are accidentally deleted, then stop the Oracle Clusterware on that node and restart it again. This will recreate these socket files. If the socket files for cssd are deleted, then the Oracle Clusterware stack may not come down in which case the node has to be bounced.
Example Configuration
The example configuration used in this article consists of a two-node RAC with a clustered database named racdb.idevelopment.info running Oracle RAC 10g Release 2 on the Linux x86 platform. The two node names are racnode1 and racnode2, each hosting a single Oracle instance named racdb1 and racdb2 respectively. For a detailed guide on building the example clustered database environment, please see:
Building an Inexpensive Oracle RAC 10g Release 2 on Linux - (CentOS 5.3 / iSCSI)
The example Oracle Clusterware environment is configured with three mirrored voting disks and two mirrored OCR files all of which are located on an OCFS2 clustered file system. Note that the voting disk is owned by the oracle user in the oinstall group with 0644 permissions while the OCR file is owned by root in the oinstall group with 0640 permissions:
[oracle@racnode1 ~]$ ls -l /u02/oradata/racdb total 39840 -rw-r--r-- 1 oracle oinstall 10240000 Oct 9 19:33 CSSFile -rw-r--r-- 1 oracle oinstall 10240000 Oct 9 19:36 CSSFile_mirror1 -rw-r--r-- 1 oracle oinstall 10240000 Oct 9 19:38 CSSFile_mirror2 drwxr-xr-x 2 oracle oinstall 3896 Aug 26 23:45 dbs -rw-r----- 1 root oinstall 268644352 Oct 9 19:27 OCRFile -rw-r----- 1 root oinstall 268644352 Oct 9 19:28 OCRFile_mirror
Check Current OCR File
[oracle@racnode1 ~]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 262120 Used space (kbytes) : 4676 Available space (kbytes) : 257444 ID : 1513888898 Device/File Name : /u02/oradata/racdb/OCRFile Device/File integrity check succeeded Device/File Name : /u02/oradata/racdb/OCRFile_mirror Device/File integrity check succeeded Cluster registry integrity check succeeded
Check Current Voting Disk
Network Settings
[oracle@racnode1 ~]$ crsctl query css votedisk 0. 0 /u02/oradata/racdb/CSSFile 1. 0 /u02/oradata/racdb/CSSFile_mirror1 2. 0 /u02/oradata/racdb/CSSFile_mirror2 located 3 votedisk(s).
Oracle RAC Node 1 - (racnode1) Device IP Address Subnet Gateway Purpose eth0 192.168.1.151 255.255.255.0 192.168.1.1 Connects racnode1 to the public network eth1 192.168.2.151 255.255.255.0 Connects racnode1 to iSCSI shared storage (Openfiler). eth2 192.168.3.151 255.255.255.0 Connects racnode1 (interconnect) to racnode2 (racnode2-priv) /etc/hosts 127.0.0.1 localhost.localdomain localhost # Public Network - (eth0) 192.168.1.151 racnode1 192.168.1.152 racnode2 # Network Storage - (eth1) 192.168.2.151 racnode1-san 192.168.2.152 racnode2-san # Private Interconnect - (eth2) 192.168.3.151 racnode1-priv 192.168.3.152 racnode2-priv # Public Virtual IP (VIP) addresses - (eth0:1) 192.168.1.251 racnode1-vip 192.168.1.252 racnode2-vip # Private Storage Network for Openfiler - (eth1) 192.168.1.195 openfiler1 192.168.2.195 openfiler1-priv
Oracle RAC Node 2 - (racnode2) Device IP Address Subnet Gateway Purpose eth0 192.168.1.152 255.255.255.0 192.168.1.1 Connects racnode2 to the public network eth1 192.168.2.152 255.255.255.0 Connects racnode2 to iSCSI shared storage (Openfiler). eth2 192.168.3.152 255.255.255.0 Connects racnode2 (interconnect) to racnode1 (racnode1-priv) /etc/hosts 127.0.0.1 localhost.localdomain localhost # Public Network - (eth0) 192.168.1.151 racnode1 192.168.1.152 racnode2 # Network Storage - (eth1) 192.168.2.151 racnode1-san 192.168.2.152 racnode2-san # Private Interconnect - (eth2) 192.168.3.151 racnode1-priv 192.168.3.152 racnode2-priv # Public Virtual IP (VIP) addresses - (eth0:1) 192.168.1.251 racnode1-vip 192.168.1.252 racnode2-vip # Private Storage Network for Openfiler - (eth1) 192.168.1.195 openfiler1 192.168.2.195 openfiler1-priv
Recover Corrupt/Missing OCR
To describe the steps required in recovering the OCR, it is assumed the current OCR has been accidentally deleted and no viable backups are available. It is also assumed the CRS stack was up and running on both nodes in the cluster at the time the OCR files were removed:
[root@racnode1 ~]# rm /u02/oradata/racdb/OCRFile [root@racnode1 ~]# rm /u02/oradata/racdb/OCRFile_mirror [root@racnode1 ~]# ps -ef | grep d.bin | grep -v grep root 548 27171 0 Oct09 ? 00:06:17 /u01/app/crs/bin/crsd.bin reboot oracle 575 566 0 Oct09 ? 00:00:10 /u01/app/crs/bin/evmd.bin root 1118 660 0 Oct09 ? 00:00:00 /u01/app/crs/bin/oprocd.bin run -t 1000 -m 500 -f oracle 1277 749 0 Oct09 ? 00:03:31 /u01/app/crs/bin/ocssd.bin [root@racnode2 ~]# ps -ef | grep d.bin | grep -v grep oracle 674 673 0 Oct09 ? 00:00:10 /u01/app/crs/bin/evmd.bin root 815 27760 0 Oct09 ? 00:06:12 /u01/app/crs/bin/crsd.bin reboot root 1201 827 0 Oct09 ? 00:00:00 /u01/app/crs/bin/oprocd.bin run -t 1000 -m 500 -f oracle 1442 891 0 Oct09 ? 00:03:43 /u01/app/crs/bin/ocssd.bin
11g
qpass-test-rac-1.sea1@oracle [+ASM1] $ pwd
/opt/app/oragrid/11.2.0/crs/utl
qpass-test-rac-1.sea1@oracle [+ASM1] $ ls -rlt *.sh
-rw-r--r-- 1 root root 1279 Oct 28 2013 cmdllroot.sh
-rw-r--r-- 1 root root 8640 Oct 28 2013 crswrap.sh
-rw-r--r-- 1 root root 505 Oct 28 2013 diagcollection.sh
-rw-r--r-- 1 root root 6499 Oct 28 2013 gsd.sh
-rw-r--r-- 1 root root 5374 Oct 28 2013 preupdate.sh
-rwxr-xr-x 1 root root 4574 Oct 28 2013 rootaddnode.sh
-rwxr-xr-x 1 root root 5126 Oct 28 2013 rootdeinstall.sh
-rwxr-xr-x 1 root root 5922 Oct 28 2013 rootdelete.sh
-rwxr-xr-x 1 root root 1846 Oct 28 2013 rootdeletenode.sh
qpass-test-rac-1.sea1@oracle [+ASM1] $
The "OCR initialization failed accessing OCR device" and PROC-26 errors can be safely ignored given the OCR is not available. The most important action is that the SCR entries are cleaned up.
Keep in mind that if you have more than two nodes in your cluster, you need to run rootdelete.sh on all other nodes as well.- Run rootdeinstall.sh from the Primary Node.The primary node is the node where the Oracle Clusterware installation was performed on (which is typically node1). For the purpose of this example, I originally installed Oracle Clusterware from the machine racnode1 which is therefore the primary node.
The rootdeinstall.sh script will clear out any old data from a raw storage device in preparation for the new OCR. If the OCR is on a clustered file system, a new OCR file(s) will be created with null data.
[root@racnode1 ~]# $ORA_CRS_HOME/install/rootdeinstall.sh Removing contents from OCR mirror device 2560+0 records in 2560+0 records out 10485760 bytes (10 MB) copied, 0.0513806 seconds, 204 MB/s Removing contents from OCR device 2560+0 records in 2560+0 records out 10485760 bytes (10 MB) copied, 0.0443477 seconds, 236 MB/s
- Run root.sh from the Primary Node. (same node as above)Amoung several other tasks, this script will create the OCR and voting disk(s).
[root@racnode1 ~]# $ORA_CRS_HOME/root.sh Checking to see if Oracle CRS stack is already configured Setting the permissions on OCR backup directory Setting up NS directories Oracle Cluster Registry configuration upgraded successfully Successfully accumulated necessary OCR keys. Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897. node: node 1: racnode1 racnode1-priv racnode1 node 2: racnode2 racnode2-priv racnode2 Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. Now formatting voting device: /u02/oradata/racdb/CSSFile Now formatting voting device: /u02/oradata/racdb/CSSFile_mirror1 Now formatting voting device: /u02/oradata/racdb/CSSFile_mirror2 Format of 3 voting devices complete. Startup will be queued to init within 30 seconds. Adding daemons to inittab Expecting the CRS daemons to be up within 600 seconds. CSS is active on these nodes. racnode1 CSS is inactive on these nodes. racnode2 Local node checking complete. Run root.sh on remaining nodes to start CRS daemons. - Run root.sh from All Remaining Nodes.
Oracle 10.2.0.1 users should note that running root.sh on the last node will fail. Most notably is the silent mode VIPCA configuration failing because of BUG 4437727 in 10.2.0.1. Refer to my article Building an Inexpensive Oracle RAC 10g Release 2 on Linux - (CentOS 5.3 / iSCSI) to workaround these errors.
[root@racnode2 ~]# $ORA_CRS_HOME/root.sh Checking to see if Oracle CRS stack is already configured Setting the permissions on OCR backup directory Setting up NS directories Oracle Cluster Registry configuration upgraded successfully clscfg: EXISTING configuration version 3 detected. clscfg: version 3 is 10G Release 2. Successfully accumulated necessary OCR keys. Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897. node: node 1: racnode1 racnode1-priv racnode1 node 2: racnode2 racnode2-priv racnode2 clscfg: Arguments check out successfully. NO KEYS WERE WRITTEN. Supply -force parameter to override. -force is destructive and will destroy any previous cluster configuration. Oracle Cluster Registry for cluster has already been initialized Startup will be queued to init within 30 seconds. Adding daemons to inittab Expecting the CRS daemons to be up within 600 seconds. CSS is active on these nodes. racnode1 racnode2 CSS is active on all nodes. Waiting for the Oracle CRSD and EVMD to start Oracle CRS stack installed and running under init(1M) Running vipca(silent) for configuring nodeapps Creating VIP application resource on (2) nodes... Creating GSD application resource on (2) nodes... Creating ONS application resource on (2) nodes... Starting VIP application resource on (2) nodes... Starting GSD application resource on (2) nodes... Starting ONS application resource on (2) nodes... Done.
The Oracle Clusterware and Oracle RAC software in my configuration were patched with 10.2.0.4 and therefore did not receive any errors during the running of root.sh on the last node.- Configure Server-Side ONS using racgons.
CRS_home/bin/racgons add_config hostname1:port hostname2:port
[root@racnode1 ~]# $ORA_CRS_HOME/bin/racgons add_config racnode1:6200 racnode2:6200 [root@racnode1 ~]# $ORA_CRS_HOME/bin/onsctl ping Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = racnode1, port = 6200} Adding remote host racnode1:6200 onscfg[1] {node = racnode2, port = 6200} Adding remote host racnode2:6200 ons is running ...- Configure Network Interfaces for Clusterware.Log in as the owner of the Oracle Clusterware software which is typically the oracle user account and configure all network interfaces. The first step is to identify the current interfaces and IP addresses using oifcfg iflist. As discussed in the network settings section, eth0/192.168.1.0 is my public interface/network, eth1/192.168.2.0 is my iSCSI storage network and not used specifically for Oracle Clusterware, and eth2/192.168.3.0 is the cluster_interconnect interface/network.
[oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/oifcfg iflist eth0 192.168.1.0 <-- font="" interface="" public=""> eth1 192.168.2.0 <-- font="" not="" used=""> eth2 192.168.3.0 <-- cluster="" font="" interconnect=""> [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/oifcfg setif -global eth0/192.168.1.0:public [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/oifcfg setif -global eth2/192.168.3.0:cluster_interconnect [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/oifcfg getif eth0 192.168.1.0 global public eth2 192.168.3.0 global cluster_interconnect-->-->-->- Add TNS Listener using NETCA.As the Oracle Clusterware software owner (typically oracle), add a cluster TNS listener configuration to OCR using netca. This may give errors if the listener.ora contains the entries already. If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the $TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca. Add all the listeners that were added during the original Oracle Clusterware software installation.
[oracle@racnode1 ~]$ export DISPLAY=:0 [oracle@racnode1 ~]$ mv $TNS_ADMIN/listener.ora /tmp/listener.ora.original [oracle@racnode2 ~]$ mv $TNS_ADMIN/listener.ora /tmp/listener.ora.original [oracle@racnode1 ~]$ netca &- Add all Resources Back to OCR using srvctl.As a final step, log in as the Oracle Clusterware software owner (typically oracle) and add all resources back to the OCR using the srvctl command.
Please ensure that these commands are not run as the root user account.
Add ASM INSTANCE(S) to OCR:
srvctl add asm -n-i -o
Add DATABASE to OCR:
[oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add asm -i +ASM1 -n racnode1 -o /u01/app/oracle/product/10.2.0/db_1 [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add asm -i +ASM2 -n racnode2 -o /u01/app/oracle/product/10.2.0/db_1
srvctl add database -d-o
Add INSTANCE(S) to OCR:
[oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add database -d racdb -o /u01/app/oracle/product/10.2.0/db_1
srvctl add instance -d-i -n
Add SERVICE(S) to OCR:
[oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add instance -d racdb -i racdb1 -n racnode1 [oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add instance -d racdb -i racdb2 -n racnode2
srvctl add service -d-s -r -P where TAF_policy is set to NONE, BASIC, or PRECONNECT
[oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/srvctl add service -d racdb -s racdb_srvc -r racdb1,racdb2 -P BASIC
After completing the steps above, the OCR should have been successfully recreated. Bring up all of the resources that were added to the OCR and run cluvfy to verify the cluster configuration.
[oracle@racnode1 ~]$ $ORA_CRS_HOME/bin/crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.racdb.db application OFFLINE OFFLINE ora....b1.inst application OFFLINE OFFLINE ora....b2.inst application OFFLINE OFFLINE ora....srvc.cs application OFFLINE OFFLINE ora....db1.srv application OFFLINE OFFLINE ora....db2.srv application OFFLINE OFFLINE ora....SM1.asm application OFFLINE OFFLINE ora....E1.lsnr application ONLINE ONLINE racnode1 ora....de1.gsd application ONLINE ONLINE racnode1 ora....de1.ons application ONLINE ONLINE racnode1 ora....de1.vip application ONLINE ONLINE racnode1 ora....SM2.asm application OFFLINE OFFLINE ora....E2.lsnr application ONLINE ONLINE racnode2 ora....de2.gsd application ONLINE ONLINE racnode2 ora....de2.ons application ONLINE ONLINE racnode2 ora....de2.vip application ONLINE ONLINE racnode2 [oracle@racnode1 ~]$ srvctl start asm -n racnode1 [oracle@racnode1 ~]$ srvctl start asm -n racnode2 [oracle@racnode1 ~]$ srvctl start database -d racdb [oracle@racnode1 ~]$ srvctl start service -d racdb [oracle@racnode1 ~]$ cluvfy stage -post crsinst -n racnode1,racnode2 Performing post-checks for cluster services setup Checking node reachability... Node reachability check passed from node "racnode1". Checking user equivalence... User equivalence check passed for user "oracle". Checking Cluster manager integrity... Checking CSS daemon... Daemon status check passed for "CSS daemon". Cluster manager integrity check passed. Checking cluster integrity... Cluster integrity check passed Checking OCR integrity... Checking the absence of a non-clustered configuration... All nodes free of non-clustered, local-only configurations. Uniqueness check for OCR device passed. Checking the version of OCR... OCR of correct Version "2" exists. Checking data integrity of OCR... Data integrity check for OCR passed. OCR integrity check passed. Checking CRS integrity... Checking daemon liveness... Liveness check passed for "CRS daemon". Checking daemon liveness... Liveness check passed for "CSS daemon". Checking daemon liveness... Liveness check passed for "EVM daemon". Checking CRS health... CRS health check passed. CRS integrity check passed. Checking node application existence... Checking existence of VIP node application (required) Check passed. Checking existence of ONS node application (optional) Check passed. Checking existence of GSD node application (optional) Check passed. Post-check for cluster services setup was successful.
- For 11g
Execute as owner (generally oracle) of CRS_HOME command/install/onsconfig add_config hostname1:port hostname2:port $/u01/crs/install/onsconfig add_config halinux1:6251 halinux2:6251 - Execute as owner of CRS_HOME (generally oracle)
/bin/oifcfg setif -global. Please review Note 283684.1 for details. $/u01/crs/bin/oifcfg setif -global eth0/192.168.0.0:cluster_interconnect eth1/10.35.140.0:public - Add listener using netca. This may give errors if the listener.ora contains the entries already. If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the $TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca. Add all the listeners that were added earlier.
- Add ASM & database resource to the OCR using the appropriate srvctl add database command as the user who owns the ASM & database resource. Please ensure that this is not run as root user
- Add Instance, services using appropriate srvctl add commands. Please refer to the documentation for the exact commands.
- execute cluvfy stage -post crsinst -n node1,node2 ### Please ensure to replace node1,node2 with the node names of the cluster
====================================================================
FIX
If none of the steps documented above can be used to restore the file that was accidentally deleted or is corrupted, then the following steps can be used to re-create/reinstantiate these files. The following steps require complete downtime on all the nodes.
- Shutdown the Oracle Clusterware stack on all the nodes using command crsctl stop crs as root user.
- Backup the entire Oracle Clusterware home.
- Execute
/install/rootdelete.sh on all nodes - Execute
/install/rootdeinstall.sh on the node which is supposed to be the first node - The following commands should return nothing
- ps -e | grep -i ‘ocs[s]d’
- ps -e | grep -i ‘cr[s]d.bin’
- ps -e | grep -i ‘ev[m]d.bin’
- Execute
/root.sh on first node - After successful root.sh execution on first node Execute root.sh on the rest of the nodes of the cluster
- For 10gR2, use racgons; for 11g use onsconfig command. Using onsconfig stops and starts ONS so the changes take effect, while racgons doesn’t do that so the changes won’t take effect until ONS is restarted on all nodes. Examples for each are provided below.For 10g
Execute as owner (generally oracle) of CRS_HOME command
/bin/racgons add_config hostname1:port hostname2:port $/u01/crs/bin/racgons add_config halinux1:6251 halinux2:6251For 11g
Execute as owner (generally oracle) of CRS_HOME command
/install/onsconfig add_config hostname1:port hostname2:port $/u01/crs/install/onsconfig add_config halinux1:6251 halinux2:6251 - Execute as owner of CRS_HOME (generally oracle)
/bin/oifcfg setif -global. Please review Note 283684.1 for details. $/u01/crs/bin/oifcfg setif -global eth0/192.168.0.0:cluster_interconnect eth1/10.35.140.0:public - Add listener using netca. This may give errors if the listener.ora contains the entries already. If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the $TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca. Add all the listeners that were added earlier.
- Add ASM & database resource to the OCR using the appropriate srvctl add database command as the user who owns the ASM & database resource. Please ensure that this is not run as root user
- Add Instance, services using appropriate srvctl add commands. Please refer to the documentation for the exact commands.
- execute cluvfy stage -post crsinst -n node1,node2 ### Please ensure to replace node1,node2 with the node names of the cluster
==============================================================
(OR)
========================================================================================
Recreate OCR/Voting Disk Accidentally Deleted
The goal of this post is to help DBA’s who have accidentally deleted the OCR, voting disk or the files that are required for the operation of Oracle clusterware.
- OCR
- If the OCR has been deleted, then check if the OCR mirror is OK and vice versa. It may be prudent to use the OCR mirror to create the OCR.
- If the OCR mirror and OCR have been deleted, then it may be faster to restore the OCR using the OCR backups.
- Voting Disk
- If there are multiple voting disks and one was accidentally deleted, then check if there are any backups of this voting disk. If there are no backups then we can add one using :
crsctl add votedisk
- SCLS Directories
- These are internal only directories which are created by root.sh, if this directory is accidentally removed then they can only be created by the steps given below in post.
- Socket file in /temp/.oracle or /var/temp/.oracle
- If these files are accidentally deleted, then stop the Oracle Clusterware on that node and restart it again. This will recreate these socket files. If the socket files for cssd is deleted then the Oracle Clusterware stack may not come down in which case the node has to be bounced.
Solution:
If none of the steps given above can be used to restore the file that was accidentally deleted or is corrupted, then the following steps can be used to re-create these files. The following steps require complete downtime:
- Shutdown the Oracle Clusterware stack on all the nodes using command as root user. # crsctl stop crs
- Backup the entire Oracle CRS home.
- Execute script
/install/rootdelete.sh on all nodes. - Execute
/install/rootdeinstall.sh from first node only. - To verify deletion execute following commands and it should return nothing:
- ps -e | grep -i ‘ocssd’
- ps -e | grep -i ‘crsd.bin’
- ps -e | grep -i ‘evmd.bin’
- Execute
/root.sh on first node - After successful root.sh execution on first node. Execute root.sh on the rest of the nodes of the cluster.
- For 10gR2, use racgons; for 11g use onsconfig command. Using onsconfig stops and starts ONS so the changes take effect, while racgons doesn’t do that so the changes won’t take effect until ONS is restarted on all nodes. Examples for each are provided below:
For 10g
Execute as CRS owner (generally grid/oracle) of CRS_HOME command
$/bin/racgons add_config hostname1:port hostname2:port
Execute as CRS owner (generally grid/oracle) of CRS_HOME command
$
Example: $/u01/crs/install/onsconfig add_config halinux1:6251 halinux2:6251
For 11g
Execute as owner (generally oracle) of CRS_HOME command
$/install/onsconfig add_config hostname1:port hostname2:port
Execute as owner (generally oracle) of CRS_HOME command
$
Example: $/u01/crs/bin/oifcfg setif -global eth0/192.168.0.0:cluster_interconnect eth1/10.35.140.0:public
- Add listener using netca. This may give errors if the listener.ora contains the entries already. If this is the case, rename the listener.ora and then run netca. Add all the listeners that were added earlier.
- Add ASM & database resource to the OCR using the srvctl add database command as the user who is owner of ASM & database resource. Please ensure that this is not run as root user.
- Add Instance, services using srvctl add commands.
- execute cluvfy stage -post crsinst -n node1,node2
========================================================================================
Nice article, Good effort and I really appreciate it.
ReplyDelete