Top 5 Database and/or Instance Performance Issues in RAC Environment
Information in this document applies to any platform.
Solutions
Background
When a user session COMMITs or ROLLBACKs, the session's redo information needs to be flushed by LGWR to the redo logfile. The user session waits on 'log file sync' while waiting for LGWR to post it back to confirm that all redo changes are safely on disk.
Example:
WAIT #0: nam='log file sync' ela= 977744 buffer#=754 p2=0 p3=0 obj#=114927 tim=1212384670107387
Parameters:
P1 = buffer#
P2 = Not used
P3 = Not used
obj# = object_id
All changes up to this buffer# (in the redo log buffer) must be flushed to disk and the writes confirmed to ensure that the transaction is committed, and will remain committed upon an instance crash.
A typical life cycle of 'log file sync' wait
1. A user sessions issues a commit or rollback and starts waiting on log file sync.
2. LGWR gather redo information to be written to redo log files, issues IO and queues BOC to an LMS process and posts LMS process.
3. LGWR waits for redo buffers to be flushed to disk and SCN to be ACK'd
4. Completion of IO and receiving ACK from LMS signal write complete. LGWR then posts foreground process to continue.
5. Foreground process wakes up and log file sync wait ends.
Important log file sync related statistics and events
redo write time - Total elapsed time of the write from the redo log buffer to the current redo log file in 10s of milliseconds.
redo writes - Total number of writes by LGWR to the redo log files. "redo blocks written" divided by this statistic equals the number of blocks per write.
log file parallel write - Time it takes for the I/Os to complete. Even though redo records are written in parallel, the parallel write is not complete until the last I/O is on disk.
redo write broadcast ack time - Total amount of the latency associated with broadcast on commit beyond the latency of the log write (in microseconds).
user commits - Number of user commits. When a user commits a transaction, the redo generated that reflects the changes made to database blocks must be written to disk. Commits often represent the closest thing to a user transaction rate.
user rollbacks - Number of times users manually issue the ROLLBACK statement or an error occurs during a user's transactions.
The script provided in Document 1064487.1 - Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql) collects useful information for troubleshooting log file sync issues.
Possible Causes
Solutions
Background (A)
A session may wait for this event when it is trying to get a mutex pin in Share mode but another session is holding the mutex pin on the same cursor object in exclusive. Frequently, waits for 'Cursor: pin S wait on X' is a symptom and not the cause. There may be underlying tuning requirements or known issues.
Possible Causes (A)
Solutions (A)
Symptoms (B)
Background (B)
A session waits for "cursor: pin S" when it wants a specific mutex in S (share) mode on a specific cursor and there is no concurrent X holder but it could not acquire that mutex immediately. This may seem a little strange as one might question why there should be any form of wait to get a mutex which has no session holding it in an incompatible mode. The reason for the wait is that in order to acquire the mutex in S mode (or release it) the session has to increment (or decrement) the mutex reference count and this requires an exclusive atomic update to the mutex structure itself. If there are concurrent sessions trying to make such an update to the mutex then only one session can actually increment (or decrement) the reference count at a time. A wait on "cursor: pin S" thus occurs if a session cannot make that atomic change immediately due to other concurrent requests.
Mutexes are local to the current instance in RAC environments.
Parameters:
P1 = idn
P2 = value
P3 = where (where|sleeps in 10.2)
idn is the mutex identifier value which matches to the HASH_VALUE of the SQL statement that we are waiting to get the mutex on. The SQL can usually be found using the IDN value in a query of the form:
SELECT sql_id, sql_text, version_count
FROM V$SQLAREA where HASH_VALUE=&IDN;
If the SQL_TEXT shown for the cursor is of the form "table_x_x_x_x" then this is a special internal cursor - see Document 1298471.1 for information about mapping such cursors to objects.
P1RAW is the same value in hexadecimal and it can be used to search in tracefiles for SQL matching to that hash value.
Possible Causes (B)
Solutions (B)
Symptoms (C)
Background (C)
The library cache mutex is acquired for similar purposes that the library cache latches were acquired in prior versions of Oracle. In 10g, mutexes were introduced for certain operations in the library cache. Starting with 11g, the library cache latches were replaced by mutexes. This wait event is present whenever a library cache mutex is held in exclusive mode by a session and other sessions need to wait for it to be released.
Individual Waits:
Parameters:
P1 = "idn" = Unique Mutex Identifier
P2 = "value" = Session ID holding the mutex
P3 = "where" = location in code (internal identifier) where mutex is being waited for
Systemwide Waits:
At a systemwide level, there are two views which will help diagnose this wait:
GV$MUTEX_SLEEP (or V$MUTEX_SLEEPS for non-RAC)
and GV$MUTEX_SLEEP_HISTORY (or V$MUTEX_SLEEP_HISTORY for non-RAC)
These views track the instance wide usage of mutexes since instance startup. Since these views show values that are total values since startup, they are most meaningful when you obtain the difference in values during a short time interval when there was problem. The easiest way to see this information is through an AWR or statspack report in the "Mutex Sleep Summary" section.
Possible Causes (C)
Solutions (C)
Background (A)
When a transaction inserting a row in an index has to wait for the end of an index block split being done by another transaction the session would wait on event enq: TX - index contention.
Possible causes (A)
Solutions (A)
Symptom (B)
Background (B)
A session waits on enq: TX - allocate ITL entry when it wants to lock a row in the block but one or more other sessions have rows locked in the same block, and there is no free ITL slot in the block. Usually, Oracle Database dynamically adds another ITL slot. This may not be possible if there is insufficient free space in the block to add an ITL.
Possible Causes (B)
Solutions (B)
Symptom (C)
Background (C)
A session waits on enq: TX - row lock contention, when it waits for a row level lock that is held by another session. This happens when one user has updated or deleted a row, but has not commited or rolled back yet, and another session wants to update or delete the same row.
Solution (C)
Possible Causes (A)
Solutions (A)
Symptoms (B)
Possible Causes (B)
Possible Solutions (B)
BUG:9591812 - INCORRECT WAIT EVENTS IN 11.2 ("CURSOR: MUTEX S" INSTEAD OF "CURSOR: MUTEX X")
NOTE:164768.1 - Troubleshooting: High CPU Utilization
NOTE:1511700.1 - ORA-00060 Single-Resource Deadlock Occurs on Autonomous Transaction
NOTE:1298015.1 - WAITEVENT: "cursor: pin S wait on X" Reference Note
NOTE:1310764.1 - WAITEVENT: "cursor: pin S" Reference Note
NOTE:1349387.1 - Troubleshooting 'cursor: pin S wait on X' waits.
NOTE:1020008.6 - SCRIPT: FULLY DECODED LOCKING
NOTE:102925.1 - Tracing sessions: waiting on an enqueue
NOTE:1064487.1 - Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql)
NOTE:1356828.1 - FAQ: 'cursor: mutex ..' / 'cursor: pin ..' / 'library cache: mutex ..' Type Wait Events
NOTE:1357946.1 - Troubleshooting 'library cache: mutex X' Waits.
BUG:7199645 - QUERY_REWRITE_ENABLED TAKE LONG PARSE TIME AND CONSUME ALL SERVER MEMORY
BUG:7385253 - DBWR IS CONSUMING HIGH CPU
NOTE:361670.1 - Slow Performance with High CPU Usage on 64-bit Linux with Large SGA
NOTE:404991.1 - How to Tune Queries with High CPU Usage
NOTE:563566.1 - Troubleshooting gc block lost and Poor Network Performance in a RAC Environment
NOTE:34592.1 - WAITEVENT: "log file sync" Reference Note
NOTE:62143.1 - Troubleshooting: Tuning the Shared Pool and Tuning Library Cache Latch Contention
NOTE:727400.1 - WAITEVENT: "library cache: mutex X"
NOTE:756671.1 - Oracle Recommended Patches -- Oracle Database
NOTE:786507.1 - How to Determine the Blocking Session for Event: 'cursor: pin S wait on X'
NOTE:857576.1 - How to Reduce Waits on 'Log File Sync'
BUG:10411618 - ADD DIFFERENT WAIT SCHEMES FOR MUTEX WAITS
BUG:10636231 - HIGH VERSION COUNT FOR INSERT STATEMENTS WITH REASON INST_DRTLD_MISMATCH
BUG:4690571 - MEMORY LEAK USING BULK COLLECT WITHIN CURSOR OPENED USING NATIVE DYNAMIC SQL
BUG:6051972 - MEMORY LEAK IF CONNECTION TO DATABASE SERVER IS FREQUENTLY OPENED AND CLOSED
BUG:6356566 - STATSPACK.SNAP COSUMES ALMOST 100% CPU
BUG:6455161 - HIGHER "CONSISTENT GETS" AFTER TRUNCATE
BUG:7429070 - BIG PGA MEM ALLOC DURING PARSE TIME - KXS-HEAP-C
BUG:8031768 - ORA-04031 SHARED POOL "KKJ JOBQ WOR"
NOTE:179582.1 - How to Find TX Enqueue Contention in RAC or OPS
NOTE:873243.1 - Troubleshooting 'enq: TX - index contention' Waits in a RAC Environment.
BUG:10042937 - HIGH MEMORY GROUP IN GES_CACHE_RESS AND ORA-4031 ERRORS
BUG:10220046 - MEMORY LEAK OCCURS WITH NON-BLOCKING APP IN THE LATER VERSION 10.2.0.4
NOTE:62354.1 - Waits for 'Enq: Tx - Row Lock Contention' - Wait Scenario Examples
=================================================================
Note---> This informationmation taken from oracle metalink. all copy rights oracle only.
In this Document
Purpose |
Scope |
Details |
Issue #1: High counts of lost blocks(gc lost blocks, gc current/cr lost blocks) |
Issue #2: High log file sync waits |
Issue #3: High Waits on Mutexes |
Issue #4: High gc buffer busy, enq: TX -row lock contention, enq: TX - index contention, enq: TX - ITL allocate entry waits |
Issue #5: High CPU and memory consumption |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.1.0.2 to 12.1.0.1 [Release 10.1 to 12.1]Information in this document applies to any platform.
PURPOSE
The purpose of this document is to provide a summary of top database and/or instance performance issues and the possible solutions in a RAC environment.
Please note that Issue #3 (High Waits on Mutexes) and Issue #5 (High CPU and Memory Consumption) are generic database issues and not RAC-specific. However, they have been included in this document because they are one of the TOP issues, and can occur locally on an instance in a RAC environment.
Please note that Issue #3 (High Waits on Mutexes) and Issue #5 (High CPU and Memory Consumption) are generic database issues and not RAC-specific. However, they have been included in this document because they are one of the TOP issues, and can occur locally on an instance in a RAC environment.
SCOPE
DBAs
DETAILS
Issue #1: High counts of lost blocks(gc lost blocks, gc current/cr lost blocks)
Symptoms
I. AWR reports show high number of lost blocks.
II. netstat -s reports increasing number of packet reassambly failure, dropped packets.
II. netstat -s reports increasing number of packet reassambly failure, dropped packets.
Solutions
Use the following document to troubleshoot and resolve lost block issue. The document describes symptoms, possible causes and solutions.
Document 563566.1 - gc block lost diagnostics
Document 563566.1 - gc block lost diagnostics
Issue #2: High log file sync waits
Symptoms
I. AWR reports show that log file sync is one of the TOP 5 waits consistently.
II. Average log file sync is high ( > 20 ms).
III. Average log file parallel write is high ( > 10 ms).
III. Average redo write broadcast ack time or wait for scn ack is high ( > 10 ms).
IV. Average log file sync is low, but there are too many log file sync waits.
II. Average log file sync is high ( > 20 ms).
III. Average log file parallel write is high ( > 10 ms).
III. Average redo write broadcast ack time or wait for scn ack is high ( > 10 ms).
IV. Average log file sync is low, but there are too many log file sync waits.
Background
When a user session COMMITs or ROLLBACKs, the session's redo information needs to be flushed by LGWR to the redo logfile. The user session waits on 'log file sync' while waiting for LGWR to post it back to confirm that all redo changes are safely on disk.
Example:
WAIT #0: nam='log file sync' ela= 977744 buffer#=754 p2=0 p3=0 obj#=114927 tim=1212384670107387
Parameters:
P1 = buffer#
P2 = Not used
P3 = Not used
obj# = object_id
All changes up to this buffer# (in the redo log buffer) must be flushed to disk and the writes confirmed to ensure that the transaction is committed, and will remain committed upon an instance crash.
A typical life cycle of 'log file sync' wait
1. A user sessions issues a commit or rollback and starts waiting on log file sync.
2. LGWR gather redo information to be written to redo log files, issues IO and queues BOC to an LMS process and posts LMS process.
3. LGWR waits for redo buffers to be flushed to disk and SCN to be ACK'd
4. Completion of IO and receiving ACK from LMS signal write complete. LGWR then posts foreground process to continue.
5. Foreground process wakes up and log file sync wait ends.
Important log file sync related statistics and events
redo write time - Total elapsed time of the write from the redo log buffer to the current redo log file in 10s of milliseconds.
redo writes - Total number of writes by LGWR to the redo log files. "redo blocks written" divided by this statistic equals the number of blocks per write.
log file parallel write - Time it takes for the I/Os to complete. Even though redo records are written in parallel, the parallel write is not complete until the last I/O is on disk.
redo write broadcast ack time - Total amount of the latency associated with broadcast on commit beyond the latency of the log write (in microseconds).
user commits - Number of user commits. When a user commits a transaction, the redo generated that reflects the changes made to database blocks must be written to disk. Commits often represent the closest thing to a user transaction rate.
user rollbacks - Number of times users manually issue the ROLLBACK statement or an error occurs during a user's transactions.
The script provided in Document 1064487.1 - Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql) collects useful information for troubleshooting log file sync issues.
Possible Causes
I. Poor IO service time and throughput of the devices where redo logs reside.
II. Oracle bugs. Please review WAITEVENT: "log file sync" Reference (Document 34592.1) for known oracle bugs.
III. LMS not running in RT class.
IV. Scheduling delays with LGWR process.
V. Excessive number of commits.
VI. OS resource starvation.
II. Oracle bugs. Please review WAITEVENT: "log file sync" Reference (Document 34592.1) for known oracle bugs.
III. LMS not running in RT class.
IV. Scheduling delays with LGWR process.
V. Excessive number of commits.
VI. OS resource starvation.
Solutions
I. Move redo log files to disks with higher throughput and better response time if log file parallel write is consistently high ( > 10 ms). log file parallel write should ideally be within 1-2 ms range
II. Apply fixes for the known oracle bugs, which are applicable to your environment. The most effective way to get those fixes in is to apply the latest PSU patches. Document 756671.1 has more information on the latest PSUs.
III. Ensure that LMS processes are running in RT class. LMS processes run in RT class by default.
IV. Reduce number of commits by using batch commits instead of commiting after every DML operation.
V. See if any activity can safely be done with NOLOGGING / UNRECOVERABLE options.
VI. See if COMMIT NOWAIT option can be used. Please refer to Document 857576.1 for more information.
II. Apply fixes for the known oracle bugs, which are applicable to your environment. The most effective way to get those fixes in is to apply the latest PSU patches. Document 756671.1 has more information on the latest PSUs.
III. Ensure that LMS processes are running in RT class. LMS processes run in RT class by default.
IV. Reduce number of commits by using batch commits instead of commiting after every DML operation.
V. See if any activity can safely be done with NOLOGGING / UNRECOVERABLE options.
VI. See if COMMIT NOWAIT option can be used. Please refer to Document 857576.1 for more information.
@For Support Only: Renice LGWR to run at higher priority or run LGWR in RT class by adding LGWR to the parameter: _high_priority_processes='VKTM|LMS|LGWR". Consider doing this only if log file sync is high and scheduling delay of LGWR is found to be causing it. Be prepared to test it thoroughly.
Issue #3: High Waits on Mutexes
Mutexes are a lighter-weight and more granular concurrency mechanism than latches.The reason for obtaining a mutex is to ensure that certain operations are properly managed for concurrency. e.g., if one session is changing a data structure in memory, then other session must wait to acquire the mutex before it can make a similar change. The following are most common mutex related waits:
A. cursor: pin S wait on X
B. cursor: pin S
C. library cache: Mutex X
Symptoms (A)
A. cursor: pin S wait on X
B. cursor: pin S
C. library cache: Mutex X
Symptoms (A)
AWR reports show cursor: pin S wait on X as one of the top wait.
Background (A)
A session may wait for this event when it is trying to get a mutex pin in Share mode but another session is holding the mutex pin on the same cursor object in exclusive. Frequently, waits for 'Cursor: pin S wait on X' is a symptom and not the cause. There may be underlying tuning requirements or known issues.
Possible Causes (A)
Please review Troubleshooting 'cursor: pin S wait on X' waits (Document 1349387.1).
Solutions (A)
Please review Troubleshooting 'cursor: pin S wait on X' waits (Document 1349387.1).
Symptoms (B)
AWR reports show cursor: pin S as one of the top waits
Background (B)
A session waits for "cursor: pin S" when it wants a specific mutex in S (share) mode on a specific cursor and there is no concurrent X holder but it could not acquire that mutex immediately. This may seem a little strange as one might question why there should be any form of wait to get a mutex which has no session holding it in an incompatible mode. The reason for the wait is that in order to acquire the mutex in S mode (or release it) the session has to increment (or decrement) the mutex reference count and this requires an exclusive atomic update to the mutex structure itself. If there are concurrent sessions trying to make such an update to the mutex then only one session can actually increment (or decrement) the reference count at a time. A wait on "cursor: pin S" thus occurs if a session cannot make that atomic change immediately due to other concurrent requests.
Mutexes are local to the current instance in RAC environments.
Parameters:
P1 = idn
P2 = value
P3 = where (where|sleeps in 10.2)
idn is the mutex identifier value which matches to the HASH_VALUE of the SQL statement that we are waiting to get the mutex on. The SQL can usually be found using the IDN value in a query of the form:
SELECT sql_id, sql_text, version_count
FROM V$SQLAREA where HASH_VALUE=&IDN;
If the SQL_TEXT shown for the cursor is of the form "table_x_x_x_x" then this is a special internal cursor - see Document 1298471.1 for information about mapping such cursors to objects.
P1RAW is the same value in hexadecimal and it can be used to search in tracefiles for SQL matching to that hash value.
Possible Causes (B)
I. Heavy concurrency for a specific mutex, especially on systems with multiple CPUs.
II. Waits for very many different "idn" values when under load.
III. Oracle Bugs
Bug 10270888 - ORA-600[kgxEndExamine-Bad-State] / mutex waits after a self deadlock
Bug 9591812 - Wrong wait events in 11.2 ("cursor: mutex S" instead of "cursor: mutex X")
Bug 9499302 - Improve concurrent mutex request handling
Bug 7441165 - Prevent preemption while holding a mutex (fix only works on Solaris)
Bug 8575528 - Missing entries in V$MUTEX_SLEEP.location
II. Waits for very many different "idn" values when under load.
III. Oracle Bugs
Bug 10270888 - ORA-600[kgxEndExamine-Bad-State] / mutex waits after a self deadlock
Bug 9591812 - Wrong wait events in 11.2 ("cursor: mutex S" instead of "cursor: mutex X")
Bug 9499302 - Improve concurrent mutex request handling
Bug 7441165 - Prevent preemption while holding a mutex (fix only works on Solaris)
Bug 8575528 - Missing entries in V$MUTEX_SLEEP.location
Solutions (B)
I. Apply fix for Bug 10411618.
II. For any identified "hot" SQLs, one can reduce the concurrency on specific cursors by replacing the one SQL with some variants which are executed by different sessions. Please review WAITEVENT: cursor: pin S Reference (Document 1310764.1) for further details.
III. Apply fixes for other known oracle bugs. The most effective way to get the fixes in is to apply the latest PSU patches. Document 756671.1has more information on recommended patches.
II. For any identified "hot" SQLs, one can reduce the concurrency on specific cursors by replacing the one SQL with some variants which are executed by different sessions. Please review WAITEVENT: cursor: pin S Reference (Document 1310764.1) for further details.
III. Apply fixes for other known oracle bugs. The most effective way to get the fixes in is to apply the latest PSU patches. Document 756671.1has more information on recommended patches.
Symptoms (C)
AWR reports show library cache: Mutex X as one of the TOP waits.
Background (C)
The library cache mutex is acquired for similar purposes that the library cache latches were acquired in prior versions of Oracle. In 10g, mutexes were introduced for certain operations in the library cache. Starting with 11g, the library cache latches were replaced by mutexes. This wait event is present whenever a library cache mutex is held in exclusive mode by a session and other sessions need to wait for it to be released.
Individual Waits:
Parameters:
P1 = "idn" = Unique Mutex Identifier
P2 = "value" = Session ID holding the mutex
P3 = "where" = location in code (internal identifier) where mutex is being waited for
Systemwide Waits:
At a systemwide level, there are two views which will help diagnose this wait:
GV$MUTEX_SLEEP (or V$MUTEX_SLEEPS for non-RAC)
and GV$MUTEX_SLEEP_HISTORY (or V$MUTEX_SLEEP_HISTORY for non-RAC)
These views track the instance wide usage of mutexes since instance startup. Since these views show values that are total values since startup, they are most meaningful when you obtain the difference in values during a short time interval when there was problem. The easiest way to see this information is through an AWR or statspack report in the "Mutex Sleep Summary" section.
Possible Causes (C)
I. Frequent Hard Parses.
II. High Version Counts.
III. Invalidations and reloads.
IV. Oracle Bugs. Please review WAITEVENT: "library cache: mutex X" (Document 727400.1) for the
list of known oracle bugs.
II. High Version Counts.
III. Invalidations and reloads.
IV. Oracle Bugs. Please review WAITEVENT: "library cache: mutex X" (Document 727400.1) for the
list of known oracle bugs.
Solutions (C)
I. Reduce hard parsing, invalidations and reloads. Please review Troubleshooting: Tuning the Shared Pool and Tuning Library Cache Latch Contention (Document 62143.1) for more information.
II. Review Troubleshooting: High Version Count Issues (Document 296377.1) to resolve high version count issue.
III. Apply fixes for the known oracle bugs. The most effective way to get the fixes in is to apply the latest PSU patches. Document 756671.1has more information on recommended patches.
II. Review Troubleshooting: High Version Count Issues (Document 296377.1) to resolve high version count issue.
III. Apply fixes for the known oracle bugs. The most effective way to get the fixes in is to apply the latest PSU patches. Document 756671.1has more information on recommended patches.
Issue #4: High gc buffer busy, enq: TX -row lock contention, enq: TX - index contention, enq: TX - ITL allocate entry waits
A. enq: TX - Index Contention
B. enq: TX - ITL allocate entry waits
C. enq: TX - row lock contention
Symptoms (A)
B. enq: TX - ITL allocate entry waits
C. enq: TX - row lock contention
Symptoms (A)
I. AWR reports show high wait on enq: TX - index contention and enq: TX - row lock contention in mode 4.
II. AWR reports show high values for branch node splits, leaf node splits and leaf node 90-10 splits
III. AWR reports (Segments by Row Lock Waits) show high row lock waits for a particular segment
IV. AWR reports show other waits such as gc buffer busy waits on index branch or Leaf blocks, gc current block busy on Remote Undo Headers and gc current split.
Example:
Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
enq: TX - index contention 29,870 1,238 41 9.52 Concurrency
Instance Activity Stats:
Statistic Total per Second per Trans
branch node splits 945 0.26 0.00
leaf node 90-10 splits 1,670 0.46 0.00
leaf node splits 35,603 9.85 0.05
Segments by Row Lock Waits:
Owner Tablespace Object Name Obj.Type Row Lock Waits % of Capture
ACSSPROD ACSS_IDX03 ACSS_ORDER_HEADER_PK INDEX 3,425 43.62
ACSSPROD ACSS_IDX03 ACSS_ORDER_HEADER_ST INDEX 883 11.25
ACSSPROD ACSS_IDX03 ACSS_ORDER_HEADER_DT INDEX 682 8.69
II. AWR reports show high values for branch node splits, leaf node splits and leaf node 90-10 splits
III. AWR reports (Segments by Row Lock Waits) show high row lock waits for a particular segment
IV. AWR reports show other waits such as gc buffer busy waits on index branch or Leaf blocks, gc current block busy on Remote Undo Headers and gc current split.
Example:
Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
enq: TX - index contention 29,870 1,238 41 9.52 Concurrency
Instance Activity Stats:
Statistic Total per Second per Trans
branch node splits 945 0.26 0.00
leaf node 90-10 splits 1,670 0.46 0.00
leaf node splits 35,603 9.85 0.05
Segments by Row Lock Waits:
Owner Tablespace Object Name Obj.Type Row Lock Waits % of Capture
ACSSPROD ACSS_IDX03 ACSS_ORDER_HEADER_PK INDEX 3,425 43.62
ACSSPROD ACSS_IDX03 ACSS_ORDER_HEADER_ST INDEX 883 11.25
ACSSPROD ACSS_IDX03 ACSS_ORDER_HEADER_DT INDEX 682 8.69
Background (A)
When a transaction inserting a row in an index has to wait for the end of an index block split being done by another transaction the session would wait on event enq: TX - index contention.
Possible causes (A)
I. High number of concurrent inserts leading to excessive index block splits.
II. Right-growing index with key generated from a sequence.
II. Right-growing index with key generated from a sequence.
Solutions (A)
Solution is to reorganize the index in a way to avoid the contention or hot spots. The options are
I. Recreate the index as reverse key index
II. Hash partition the index
III. If index key is generated from a sequence, increase cache size of the sequence and make the sequence 'no order' if application supports it.
I. Recreate the index as reverse key index
II. Hash partition the index
III. If index key is generated from a sequence, increase cache size of the sequence and make the sequence 'no order' if application supports it.
Symptom (B)
AWR reports show high wait on enq: TX - allocate ITL entry and enq: TX - row lock contention in mode 4.
Background (B)
A session waits on enq: TX - allocate ITL entry when it wants to lock a row in the block but one or more other sessions have rows locked in the same block, and there is no free ITL slot in the block. Usually, Oracle Database dynamically adds another ITL slot. This may not be possible if there is insufficient free space in the block to add an ITL.
Possible Causes (B)
Low initrans setting relative to the number of concurrent transactions.
Solutions (B)
Find the segments those have high ITL waits from AWR reports or using the following SQL:
SELECT OWNER, OBJECT_NAME, OBJECT_TYPE
FROM V$SEGMENT_STATISTICS
WHERE STATISTIC_NAME = 'ITL waits' AND VALUE > 0
ORDER BY VALUE;
Increase initrans value of the segments, which are seeing high ITL waits.
SELECT OWNER, OBJECT_NAME, OBJECT_TYPE
FROM V$SEGMENT_STATISTICS
WHERE STATISTIC_NAME = 'ITL waits' AND VALUE > 0
ORDER BY VALUE;
Increase initrans value of the segments, which are seeing high ITL waits.
Symptom (C)
AWR reports show high wait on enq: TX - row lock contention in Exclusive mode (6).
Background (C)
A session waits on enq: TX - row lock contention, when it waits for a row level lock that is held by another session. This happens when one user has updated or deleted a row, but has not commited or rolled back yet, and another session wants to update or delete the same row.
Solution (C)
This is an application issue and application developer needs to be engaged to look into the SQL statements involved. The following documents might be helpful in drilling down further:
Document 102925.1 - Tracing sessions: waiting on an enqueue
Document 179582.1 - How to Find TX Enqueue Contention in RAC or OPS
Document 1020008.6 - SCRIPT: FULLY DECODED LOCKING
Document 62354.1 - TX Transaction locks - Example wait scenarios
Document 224305.1 -Autonomous Transaction can cause locking
Document 102925.1 - Tracing sessions: waiting on an enqueue
Document 179582.1 - How to Find TX Enqueue Contention in RAC or OPS
Document 1020008.6 - SCRIPT: FULLY DECODED LOCKING
Document 62354.1 - TX Transaction locks - Example wait scenarios
Document 224305.1 -Autonomous Transaction can cause locking
Issue #5: High CPU and memory consumption
A. High CPU Utilization
B. High Memory Utilization
Symptoms (A)
B. High Memory Utilization
Symptoms (A)
I. OS tools such as TOP, prstat, vmstat shows user CPU usage is very high, and top cpu consuming
processes are either oracle shadow or background processes.
II. AWR reports show top waits are one or more of the following:
latch free
cursor pin S wait on X or cursor pin S wait or library cache mutex X
latch: cache buffers chains
resmgr: cpu quantum
enq: RO - fast object reuse
DFS lock handle
III. AWR reports (SQLs ordered by buffer gets) show SQLs with very high buffer gets per execution
and cpu time.
IV. Process stack of the high cpu consuming process shows that the process is spinning.
processes are either oracle shadow or background processes.
II. AWR reports show top waits are one or more of the following:
latch free
cursor pin S wait on X or cursor pin S wait or library cache mutex X
latch: cache buffers chains
resmgr: cpu quantum
enq: RO - fast object reuse
DFS lock handle
III. AWR reports (SQLs ordered by buffer gets) show SQLs with very high buffer gets per execution
and cpu time.
IV. Process stack of the high cpu consuming process shows that the process is spinning.
Possible Causes (A)
I. Oracle bugs:
Bug 12431716 - Mutex waits may cause higher CPU usage in 11.2.0.2.2 PSU / GI PSU
Bug 8199533 - NUMA enabled by default can cause high CPU
Bug 9226905 - Excessive CPU usage / OERI:kkfdPaPrm from Parallel Query / High Version
count on PX_MISMATCH
Bug 7385253 - Slow Truncate / DBWR uses high CPU / CKPT blocks on RO enqueue
Bug 10326338 - High "resmgr:cpu quantum" but CPU has idle time
Bug 6455161 - Higher CPU / Higher "cache buffer chains" latch gets / Higher "consistent gets"
after truncate/Rebuild
II. Very large SGA on Linux x86-64 platform without the implementation of Hugepages.
III. Expensive SQL queries with sub-optimal execution plans.
IV. Runaway processes.
Bug 12431716 - Mutex waits may cause higher CPU usage in 11.2.0.2.2 PSU / GI PSU
Bug 8199533 - NUMA enabled by default can cause high CPU
Bug 9226905 - Excessive CPU usage / OERI:kkfdPaPrm from Parallel Query / High Version
count on PX_MISMATCH
Bug 7385253 - Slow Truncate / DBWR uses high CPU / CKPT blocks on RO enqueue
Bug 10326338 - High "resmgr:cpu quantum" but CPU has idle time
Bug 6455161 - Higher CPU / Higher "cache buffer chains" latch gets / Higher "consistent gets"
after truncate/Rebuild
II. Very large SGA on Linux x86-64 platform without the implementation of Hugepages.
III. Expensive SQL queries with sub-optimal execution plans.
IV. Runaway processes.
Solutions (A)
I. Apply fix for the bug(s) that you are encountering. Most effective way to get all those fixes in is to apply the latest PSU patches.Document 756671.1 has more information on the latest PSUs.
II. Implement hugepages. Please refer to Document 361670.1 - Slow Performance with High CPU Usage on 64-bit Linux with Large SGA for further explanation.
III. Tune the SQLs, which are incurring excessive buffer gets and cpu time. Please refer to Document 404991.1 - How to Tune Queries with High CPU Usage for more info.
II. Implement hugepages. Please refer to Document 361670.1 - Slow Performance with High CPU Usage on 64-bit Linux with Large SGA for further explanation.
III. Tune the SQLs, which are incurring excessive buffer gets and cpu time. Please refer to Document 404991.1 - How to Tune Queries with High CPU Usage for more info.
Symptoms (B)
I. OS tools such as ps, pmap show process is growing in memory. pmap shows the growth is in heap and/or stack area of the process. For example,
#pmap -x 26677
Address Kbytes RSS Anon Locked Mode Mapped File
00010000 496 480 - - r-x-- bash
0009A000 80 80 24 - rwx-- bash
000AE000 160 160 40 - rwx-- [ heap ]
FF100000 688 688 - - r-x-- libc.so.1
II. Session uga and/or pga memory of a oracle shadow process is growing.
select se.sid,n.name,max(se.value) maxmem
from v$sesstat se,
v$statname n
where n.statistic# = se.statistic#
and n.name in ('session pga memory','session pga memory max',
'session uga memory','session uga memory max')
group by n.name,se.sid
order by 3;
III. Number of open cursors for a session/process is growing monotonically.
#pmap -x 26677
Address Kbytes RSS Anon Locked Mode Mapped File
00010000 496 480 - - r-x-- bash
0009A000 80 80 24 - rwx-- bash
000AE000 160 160 40 - rwx-- [ heap ]
FF100000 688 688 - - r-x-- libc.so.1
II. Session uga and/or pga memory of a oracle shadow process is growing.
select se.sid,n.name,max(se.value) maxmem
from v$sesstat se,
v$statname n
where n.statistic# = se.statistic#
and n.name in ('session pga memory','session pga memory max',
'session uga memory','session uga memory max')
group by n.name,se.sid
order by 3;
III. Number of open cursors for a session/process is growing monotonically.
Possible Causes (B)
I. Oracle bugs:
Bug 9919654 - High resource / memory use optimizing SQL with UNION/set functions with
many branches
Bug 10042937 HIGH MEMORY GROUP IN GES_CACHE_RESS AND ORA-4031 ERRORS
Bug 7429070 BIG PGA MEM ALLOC DURING PARSE TIME - KXS-HEAP-C
Bug 8031768 - ORA-04031 SHARED POOL "KKJ JOBQ WOR"
Bug 10220046 - OCI memory leak using non-blocking connection for fetches
Bug 6356566 - Memory leak / high CPU selecting from V$SQL_PLAN
Bug 7199645 - Long parse time / high memory use from query rewrite
Bug 10636231 - High version count for INSERT .. RETURNING statements with
reason INST_DRTLD_MISMATCH
Bug 9412660 - PLSQL cursor leak / ORA-600[kglLockOwnersListDelete]
Bug 6051972 - Memory leak if connection to database is frequently opened and closed
Bug 4690571 - Memory leak using BULK COLLECT within cursor opened using native
dynamic sql
II. Cursor leaks caused by application not closing cursors explicitly.
III. SQLs with abnormally large hash join and/or sort operation.
Bug 9919654 - High resource / memory use optimizing SQL with UNION/set functions with
many branches
Bug 10042937 HIGH MEMORY GROUP IN GES_CACHE_RESS AND ORA-4031 ERRORS
Bug 7429070 BIG PGA MEM ALLOC DURING PARSE TIME - KXS-HEAP-C
Bug 8031768 - ORA-04031 SHARED POOL "KKJ JOBQ WOR"
Bug 10220046 - OCI memory leak using non-blocking connection for fetches
Bug 6356566 - Memory leak / high CPU selecting from V$SQL_PLAN
Bug 7199645 - Long parse time / high memory use from query rewrite
Bug 10636231 - High version count for INSERT .. RETURNING statements with
reason INST_DRTLD_MISMATCH
Bug 9412660 - PLSQL cursor leak / ORA-600[kglLockOwnersListDelete]
Bug 6051972 - Memory leak if connection to database is frequently opened and closed
Bug 4690571 - Memory leak using BULK COLLECT within cursor opened using native
dynamic sql
II. Cursor leaks caused by application not closing cursors explicitly.
III. SQLs with abnormally large hash join and/or sort operation.
Possible Solutions (B)
I. Apply fix for the applicable bug(s). The most effective way to get those fixes in is to apply the latest PSU patches. Document 756671.1 has more information on the latest PSUs.
II. Ensure cursors are explicitly closed by application.
III. Avoid very large hash join and/or sort operation.
II. Ensure cursors are explicitly closed by application.
III. Avoid very large hash join and/or sort operation.
REFERENCES
NOTE:1298471.1 - H$PSEUDO_CURSORBUG:9591812 - INCORRECT WAIT EVENTS IN 11.2 ("CURSOR: MUTEX S" INSTEAD OF "CURSOR: MUTEX X")
NOTE:164768.1 - Troubleshooting: High CPU Utilization
NOTE:1511700.1 - ORA-00060 Single-Resource Deadlock Occurs on Autonomous Transaction
NOTE:1298015.1 - WAITEVENT: "cursor: pin S wait on X" Reference Note
NOTE:1310764.1 - WAITEVENT: "cursor: pin S" Reference Note
NOTE:1349387.1 - Troubleshooting 'cursor: pin S wait on X' waits.
NOTE:1020008.6 - SCRIPT: FULLY DECODED LOCKING
NOTE:102925.1 - Tracing sessions: waiting on an enqueue
NOTE:1064487.1 - Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql)
NOTE:1356828.1 - FAQ: 'cursor: mutex ..' / 'cursor: pin ..' / 'library cache: mutex ..' Type Wait Events
NOTE:1357946.1 - Troubleshooting 'library cache: mutex X' Waits.
BUG:7199645 - QUERY_REWRITE_ENABLED TAKE LONG PARSE TIME AND CONSUME ALL SERVER MEMORY
BUG:7385253 - DBWR IS CONSUMING HIGH CPU
NOTE:361670.1 - Slow Performance with High CPU Usage on 64-bit Linux with Large SGA
NOTE:404991.1 - How to Tune Queries with High CPU Usage
NOTE:563566.1 - Troubleshooting gc block lost and Poor Network Performance in a RAC Environment
NOTE:34592.1 - WAITEVENT: "log file sync" Reference Note
NOTE:62143.1 - Troubleshooting: Tuning the Shared Pool and Tuning Library Cache Latch Contention
NOTE:727400.1 - WAITEVENT: "library cache: mutex X"
NOTE:756671.1 - Oracle Recommended Patches -- Oracle Database
NOTE:786507.1 - How to Determine the Blocking Session for Event: 'cursor: pin S wait on X'
NOTE:857576.1 - How to Reduce Waits on 'Log File Sync'
BUG:10411618 - ADD DIFFERENT WAIT SCHEMES FOR MUTEX WAITS
BUG:10636231 - HIGH VERSION COUNT FOR INSERT STATEMENTS WITH REASON INST_DRTLD_MISMATCH
BUG:4690571 - MEMORY LEAK USING BULK COLLECT WITHIN CURSOR OPENED USING NATIVE DYNAMIC SQL
BUG:6051972 - MEMORY LEAK IF CONNECTION TO DATABASE SERVER IS FREQUENTLY OPENED AND CLOSED
BUG:6356566 - STATSPACK.SNAP COSUMES ALMOST 100% CPU
BUG:6455161 - HIGHER "CONSISTENT GETS" AFTER TRUNCATE
BUG:7429070 - BIG PGA MEM ALLOC DURING PARSE TIME - KXS-HEAP-C
BUG:8031768 - ORA-04031 SHARED POOL "KKJ JOBQ WOR"
NOTE:179582.1 - How to Find TX Enqueue Contention in RAC or OPS
NOTE:873243.1 - Troubleshooting 'enq: TX - index contention' Waits in a RAC Environment.
BUG:10042937 - HIGH MEMORY GROUP IN GES_CACHE_RESS AND ORA-4031 ERRORS
BUG:10220046 - MEMORY LEAK OCCURS WITH NON-BLOCKING APP IN THE LATER VERSION 10.2.0.4
NOTE:62354.1 - Waits for 'Enq: Tx - Row Lock Contention' - Wait Scenario Examples
=================================================================
Note---> This informationmation taken from oracle metalink. all copy rights oracle only.
I dont think its a good idea to publish copyrighted content which is available only for paid customers on a public website. If Oracle legal come across this, it can cause you trouble.
ReplyDelete