ohasd.bin Core dump, Unable To Start HAS After Kernel Panic

ohasd.bin Core dump, Unable To Start HAS After Kernel Panic (Doc ID 1233023.1)


Oracle Server - Enterprise Edition - Version: and later   [Release: 11.2 and later ]
Information in this document applies to any platform.


Oracle Restart (single instance) with Grid Infrastructure and ASM on Solaris sparc 5.10, the server crashed due to a kernel panic. After server restart, HAS stack are not starting, ohasd.bin keeps core dump. This causes ASM and the database can not be started.



2010-09-09 12:06:16
Changing directory to /u01/app/grid/product/11.2.0/grid/log//ohasd
OHASD starting
2010-09-09 12:06:16
OHASD reboot2010-09-09 12:06:20
OHASD handling signal 6
Dumping OHASD state
2010-09-09 12:06:20
Dumping OHASD stack trace
2010-09-09 12:06:20

----- Call Stack Trace ----- (simplified version)

sskgds_getcall: WARNING! *** STACK TRACE TRUNCATED ***
sskgds_getcall: WARNING! *** UNREADABLE FRAME FOUND ***
sclssutl_sigdump()+ 628

A core dump is generated is /var/core directory like:

ohasd.log shows:

2010-09-09 12:06:16.447: [ default][1] OHASD Daemon Starting. Command string :reboot
2010-09-09 12:06:16.535: [ default][1] Initializing OLR
2010-09-09 12:06:17.269: [ CRSPE][34] PE MASTER NAME:
2010-09-09 12:06:17.269: [ CRSPE][34] Starting to read configuration
2010-09-09 12:06:17.289: [ CRSPE][34] Reading (1) servers
2010-09-09 12:06:17.349: [ CRSPE][34] DM: set global config version to: 58
2010-09-09 12:06:17.350: [ CRSPE][34] DM: set pool freeze timeout to: 60000
2010-09-09 12:06:17.350: [ CRSPE][34] DM: Set event seq number to: 200000
2010-09-09 12:06:17.350: [ CRSPE][34] DM: Set threshold event seq number to: 280000
2010-09-09 12:06:17.350: [ CRSPE][34] Sent request to write event sequence number 300000 to repository
2010-09-09 12:06:17.671: [ CRSPE][34] Wrote new event sequence to repository
2010-09-09 12:06:17.823: [ CRSPE][34] Reading (12) types
2010-09-09 12:06:17.851: [ CRSPE][34] Reading (1) server pools
2010-09-09 12:06:17.918: [ CRSPE][34] Reading (17) resources
2010-09-09 12:06:20.039: [ CRSPE][34] Finished reading configuration. Parsing...
2010-09-09 12:06:20.039: [ CRSPE][34] Parsing resource types...
2010-09-09 12:06:20.192: [ default][34] Dump State Starting ...
2010-09-09 12:06:20.193: [ CRSPE][34] Dumping PE Data Model...:DM has [0 resources][0 types][0 servers][0 spools]
------------- RESOURCES:

------------- TYPES:

------------- SERVERS:

------------- SERVER POOLS:

2010-09-09 12:06:20.193: [ CRSPE][34] Dumping ICE contents...:ICE operation count: 0
2010-09-09 12:06:20.193: [ default][34] Dump State Done.


Node rebooted due to kernel panic


OLR is corrupted due to kernel panic.
Core dump happened at resource types parsing. If there is no issue with OLR, it should print message like:

2010-07-22 15:45:33.673: [ CRSPE][34] Parsing resource types...
2010-07-22 15:45:33.779: [ CRSPE][34] Resource Types parsed 


Restore the OLR from automatic backup as grid user:

1. Locate automatic OLR backup:
ls -l $GRID_HOME/cdata//

2. Rename the current OLR file
mv $GRID_HOME/cdata/localhost/.olr $GRID_HOME/cdata/localhost/.olr.old

3. Touch the new OLR file:
touch $GRID_HOME/cdata/localhost/.olr

4. Restore the OLR:
ocrconfig -local -restore $GRID_HOME/cdata//backup_20100722_153230.olr

5. Start HAS stack:
crsctl start has

6. Check resource status:
  crsctl stat res -t
check if ora.cssd and ora.diskmon is ONLINE, if not, start them manually:
  crsctl start res ora.cssd

7. Depend on the resources (ASM/Listener/Database) registered in OLR, either start them or register them to OLR via srvctl command.

