TNS Listener Crash Troubleshooting Guide.

TNS Listener Crash Troubleshooting Guide (Doc ID 793844.1)


APPLIES TO:

Oracle Net Services - Version 9.2.0.2 to 11.2.0.4 [Release 9.2 to 11.2]
Information in this document applies to any platform.
***Checked for relevance on 7-MAY-2014*** 

PURPOSE

This document is intended to supply some commonly used steps for diagnosing TNS listener crashes.

TROUBLESHOOTING STEPS

1. Confirm the time of problem.

By checking the listener log, we can find the occurrences for a possible listener crash by searching on when the TNS listener has been started. Grep for process ID, will list PID. Which can then be cross referenced for time.
$ grep pid listener.log 

Started with pid=27316  Started with pid=13285 
Started with pid=9520
Now searching on those PID values :
$ vi listener.log
/27316 

24-FEB-2009 01:07:30 * (CONNECT_DATA=(SID=ORCL)(CID=(PROGRAM=perl)(HOST=CLIENT)(USER=SCOTT ))) * (ADDRESS=(PROTOCOL=tcp)(HOST=SERVER)(PORT=59629)) * establish * ORCL * 0 
24-FEB-2009 01:07:31 * service_update * ORCL * 0 
24-FEB-2009 01:07:58 * service_update * ORCL * 0 

TNSLSNR for Solaris: Version 10.2.0.3.0 - Production on 24-FEB-2009 08:37:30 
Copyright (c) 1991, 2006, Oracle. All rights reserved. 

System parameter file is /oracle/ORCL/product/10.2/network/admin/listener.ora 
Log messages written to /oracle/ORCL/product/10.2/network/log/listener.log 
Trace information written to /oracle/ORCL/listener_trace/listener.trc 
Trace level is currently 16 

Started with pid=27316 

/13285 
26-FEB-2009 01:06:58 * (CONNECT_DATA=(SID=ORCL)(CID=(PROGRAM=perl)(HOST=CLIENT)(USER=SCOTT))) * (ADDRESS=(PROTOCOL=tcp)(HOST=SERVER)(PORT=34248)) * establish * ORCL * 0 
26-FEB-2009 01:07:15 * service_update * ORCL * 0 

TNSLSNR for Solaris: Version 10.2.0.3.0 - Production on 26-FEB-2009 07:10:01 
Copyright (c) 1991, 2006, Oracle. All rights reserved. 

System parameter file is /oracle/ORCL/product/10.2/network/admin/listener.ora 
Log messages written to /oracle/ORCL/product/10.2/network/log/listener.log 
Trace information written to /oracle/ORCL/listener_trace/listener.trc 
Trace level is currently 16 

Started with pid=13285

/9520

27-FEB-2009 01:07:50 * (CONNECT_DATA=(SID=ORCL)(CID=(PROGRAM=perl)(HOST=CLIENT)(USER=SCOTT))) * (ADDRESS=(PROTOCOL=tcp)(HOST=SERVER)(PORT=39582)) * establish * ORCL * 0 
27-FEB-2009 01:07:52 * service_update * ORCL * 0 

TNSLSNR for Solaris: Version 10.2.0.3.0 - Production on 27-FEB-2009 07:07:53 
Copyright (c) 1991, 2006, Oracle. All rights reserved. 

System parameter file is /oracle/ORCL/product/10.2/network/admin/listener.ora 
Log messages written to /oracle/ORCL/product/10.2/network/log/listener_dlngsd01.log 
Trace information written to /oracle/ORCL/listener_trace/listener.trc 
Trace level is currently 16 

Started with pid=9520
Review the listener.log and determine what the last action/command reported?  Check for commands being run or issued against the TNS listener. 
The above example helps establish a pattern that, before each start up, the last activity is around 01:07:xx. After this nothing is seen untill the TNS listener is re-started.

2.Tracing the TNS listener.

A tail of the trace file would show the location of the last recorded activity for the TNS listener. Matching the time stamp we can see the last entries before the crash. After this listener the start up should be seen, if TNS listener tracing has been left active.
[27-FEB-2009 01:07:52:406] nsglhe: Event on cxd 0x1002f0e40. 
[27-FEB-2009 01:07:52:406] nsglrohe: entry 
[27-FEB-2009 01:07:52:406] nsevmute: entry 
[27-FEB-2009 01:07:52:406] nsevmute: cid=5 
[27-FEB-2009 01:07:52:406] nsevmute: normal exit 
[27-FEB-2009 01:07:52:406] nsglrohe: exit 
[27-FEB-2009 01:07:52:406] nsglhe: exit 
[27-FEB-2009 01:07:52:406] nsevwait: entry 
[27-FEB-2009 01:07:52:406] nsevwait: 4 registered connection(s) 
[27-FEB-2009 01:07:52:406] nsevwait: 0 pre-posted event(s) 
[27-FEB-2009 01:07:52:406] nsevwait: waiting for transport event (1 thru 5)..
The function calls can be searched on to confirm the last actions of the TNS listener.

3.Truss / Strace of TNS listener process.

Start the Operating system trace before problem (1:07:xx). The output shows the last information the process receives, in this case a SIGTERM
9162: sigaction(SIGUSR1, 0xFFFFFFFF7FFFE9F0, 0xFFFFFFFF7FFFEAE8) = 0 
9162: sigaction(SIGUSR2, 0xFFFFFFFF7FFFEAA0, 0xFFFFFFFF7FFFEB98) = 0 
9162: _exit(0) 
25560/1: pollsys(0x1002BB6B0, 6, 0x00000000, 0x00000000) (sleeping...) 
25560/1: Received signal #15, SIGTERM, in pollsys() [default] 
25560/1: siginfo: SIGTERM pid=9255 uid=0 
25560/1: pollsys(0x1002BB6B0, 6, 0x00000000, 0x00000000) Err#4 EINTR 
This is a another process / program removing the TNS Listener process.

Look for a job or process running around the same time period.  Specifically one that matches the time pattern when the listener's last activity is logged.   That could be a scheduled backup or software that could be probing the listener port, Cron job, etc.

A Windows TNS listener crash could be diagnosed using ADplus.  That output can help to diagnose the fault.
Note 763301.1 How To Take Crash And Hang Dump Using ADPlus Utility 
Note 357756.1 How to obtain usable stack on Windows 2000 server - ADPlus
4. Ensure to review operation system and third party software error logs for issues, commands, failures, etc being run around the same time as the problem is seen. For example, if Veritas is used, check VCS agent logs for any indication the listener was killed.  Proof of resource exhaustion may be required by running O/S watcher and analyzing the output.

5. Check for core dump files that match the timestamp of the crash.  Customers who experience a core dump should extract a stack file prior to contacting support by following Note 1812.1 TECH: Getting a Stack Trace from a CORE file

6. Run a second TNS listener and confirm if both primary and secondary fail at the same time or test with TNS listener on another port. Does the issue reproduce using any port?

7.Check operating system logs for errors that match the time of the crash.

8. Check for software such as Antivirus, download accelators, network probing that could be affecting the listener by checking the port.  Remove and/or disable any software and confirm if the problem reproduces.

9. RAC, Often reported on RAC is TNS listener crash.  Review of the listener log shows infact the command "stop":
No longer listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.1.8.198)(PORT=1521)))
No longer listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.1.8.195)(PORT=1521)))
No longer listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)))
13-JUL-2010 08:02:21 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=NodeA)(USER=oracle))(COMMAND=stop)(ARGUMENTS=64)(SERVICE=LISTENER_NodeA)(VERSION=169870080)) * stop * 0
Checking the CRSD log file shows the VIP address going offline 
2010-07-13 08:01:00.588: [ CRSAPP][456] CheckResource error for NodeA.vip error code = 1
2010-07-13 08:01:00.675: [ CRSRES][456] In stateChanged, NodeA.vip target is ONLINE
2010-07-13 08:01:00.678: [ CRSRES][456] NodeA.vip on NodeA went OFFLINE unexpectedly
2010-07-13 08:01:00.679: [ CRSRES][456] StopResource: setting CLI values
2010-07-13 08:01:01.148: [ CRSRES][456] Attempting to stop `NodeA.vip` on member `NodeA`
2010-07-13 08:01:01.935: [ CRSRES][456] Stop of `NodeA.vip` on member `NodeA` succeeded.
2010-07-13 08:01:01.936: [ CRSRES][456] NodeA.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0
2010-07-13 08:01:01.959: [ CRSRES][456] NodeA.vip failed on NodeA relocating.
2010-07-13 08:01:04.529: [ CRSRES][456] StopResource: setting CLI values
VIP going offline, will stop the TNS listener, See the following:  Note: 985170.1 The listener stops or goes offline randomly due to VIP issue 
Another RAC example, you may see listener log reporting STOP command
No longer listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN2)))
No longer listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.25.1.61)(PORT=1521)))
Listener completed notification to CRS on stop
12-APR-2011 00:44:03 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=askasfsora01)(USER=grid))(COMMAND=stop)(ARGUMENTS=64)(SERVICE=LISTENER_SCAN2)(VERSION=186647040)) * stop * 0
Tue Apr 12 08:38:23 2011
And the CRSD log showing network resouce failure.
2011-04-12 00:43:08.954
[/oraclegrid/product/11.2.0/grid/bin/orarootagent.bin(635026)]CRS-5818:Aborted command 'check for resource: ora.drivers.acfs 1 1' for resource 'ora.drivers.acfs'. Details at (:CRSAGF00113:) {0:0:2} in /oraclegrid/product/11.2.0/grid/log/askasfsora01/agent/ohasd/orarootagent_root/orarootagent_root.log.
2011-04-12 00:43:09.046
[/oraclegrid/product/11.2.0/grid/bin/orarootagent.bin(635026)]CRS-5014:Agent "/oraclegrid/product/11.2.0/grid/bin/orarootagent.bin" timed out starting process "/oraclegrid/product/11.2.0/grid/bin/acfsload" for action "check": details at "(:CLSN00009:)" in "/oraclegrid/product/11.2.0/grid/log/askasfsora01/agent/ohasd/orarootagent_root/orarootagent_root.log"
2011-04-12 00:43:19.079
[/oraclegrid/product/11.2.0/grid/bin/orarootagent.bin(3358952)]CRS-5818:Aborted command 'check for resource: ora.net1.network askasfsora01 1' for resource 'ora.net1.network'. Details at (:CRSAGF00113:) {0:9:2} in /oraclegrid/product/11.2.0/grid/log/askasfsora01/agent/crsd/orarootagent_root/orarootagent_root.log.
2011-04-12 00:43:34.346
[/oraclegrid/product/11.2.0/grid/bin/scriptagent.bin(2527332)]CRS-5818:Aborted command 'check for resource: ora.oc4j 1 1' for resource 'ora.oc4j'. Details at (:CRSAGF00113:) {1:36840:2} in /oraclegrid/product/11.2.0/grid/log/askasfsora01/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2011-04-12 00:44:03.840
[/oraclegrid/product/11.2.0/grid/bin/oraagent.bin(893146)]CRS-5016:Process "/oraclegrid/product/11.2.0/grid/bin/lsnrctl" spawned by agent "/oraclegrid/product/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/oraclegrid/product/11.2.0/grid/log/askasfsora01/agent/crsd/oraagent_grid/oraagent_grid.log"
2011-04-12 00:44:03.906
[/oraclegrid/product/11.2.0/grid/bin/oraagent.bin(893146)]CRS-5016:Process "/oraclegrid/product/11.2.0/grid/bin/lsnrctl" spawned by agent "/oraclegrid/product/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/oraclegrid/product/11.2.0/grid/log/askasfsora01/agent/crsd/oraagent_grid/oraagent_grid.log"
2011-04-12 00:44:04.205
RAC needs to be engaged to assist in either case.
Known Problems
Note 549932.1 Listener Crash with Core dump in 10G (10.2.0.3) and 11G (11.1.0.6)
Note 388017.1 Windows TNS Listener Crash with Faulting Module ORANL10.DLL / ORANL9.DLL 
Note 298697.1 32/64 Bit Listeners Crash When Opposite Architecture Database Registers.
Note 472548.1 Listener Crash After Startup
Note 290018.1 Listener Crashes After First TCP Connection
Note 283906.1 Listener Process Keeps Dying when Veritas Cluster Software is Being Used
Note 290690.1RAC Listener Crashes with TNS-12518 and TNS-12564
Note 405904.1 Listener Crash on Windows 2003 and/or ORA-07445 Exception encountered CORE Dump [ACCESS_VIOLATION] [unable_to_trans_pc]:
Note 985170.1 The listener on the RAC server stops or goes offline randomly


Comments

Post a Comment

Oracle DBA Information