Troubleshooting 'latch: cache buffers chains' Wait Contention

Troubleshooting 'latch: cache buffers chains' Wait Contention



PURPOSE

This article describes how to troubleshoot issues where there are significant waits for  'latch: cache buffers chains'.

TROUBLESHOOTING STEPS

"latch: cache buffers chains" contention is typically encountered because SQL statements read more buffers than they need to and multiple sessions are waiting to read the same block.

If you have high contention, you need to look at the statements that perform the most buffer gets and then look at their access paths to determine whether these are performing as efficiently as you would like.

Typical solutions are:-
  • Look for SQL that accesses the blocks in question and determine if the repeated reads are necessary. This may be within a single session or across multiple sessions.
  • Check for suboptimal SQL (this is the most common cause of the events) - look at the execution plan for the SQL being run and try to reduce the gets per executions which will minimize the number of blocks being accessed and therefore reduce the chances of multiple sessions contending for the same block.
  • If you can identify a poor SQL and have identified a better plan, you can direct the optimizer to use this plan using the following article:
    Document 1400903.1 Directing Plans with Baselines/Profiles Using coe_load_sql_baseline.sql / coe_load_sql_profile.sql (shipped with SQLT)
     
Further information can be found in:
Document 1476044.1 Resolving Issues Where Waits for 'latch: cache buffers chains' Seen Due to Poorly Tuned SQL

Document 390374.1 Oracle Performance Diagnostic Guide (OPDG)
Document 163424.1 How To Identify a Hot Block Within The Database Buffer Cache.
Document 62172.1 Understanding and Tuning Buffer Cache and DBWR

Worked example:

Problem: Database is slow and 'latch: cache buffers chains' is high in the waits in AWR.
Start with Top 5 Waits:

Top 5 Timed Events                                      Avg    %Total
~~~~~~~~~~~~~~~~~~                                      wait   Call
Event                          Waits        Time (s)    (ms)   Time   Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
latch: cache buffers chains          74,642      35,421    475    6.1 Concurrenc
CPU time                                         11,422           2.0
log file sync                        34,890       1,748     50    0.3 Commit
latch free                            2,279         774    340    0.1 Other
db file parallel write               18,818         768     41    0.1 System I/O
-------------------------------------------------------------

High cache buffers chains latch indicates that there is likely to be something reading a lot of buffers. Typically the SQL with the most gets is likely to be that which is contending:

SQL ordered by Gets         DB/Inst:  Snaps: 1-2
-> Resources reported for PL/SQL code includes the resources used by all SQL
statements called by the code.
-> Total Buffer Gets:   265,126,882
-> Captured SQL account for   99.8% of Total


                            Gets                CPU      Elapsed
Buffer Gets    Executions   per Exec     %Total Time (s) Time (s)  SQL Id
-------------- ------------ ------------ ------ -------- --------- -------------
   256,763,367       19,052     13,477.0   96.8 ######## ######### a9nchgksux6x2
Module: JDBC Thin Client
SELECT * FROM SALES ....

     1,974,516      987,056          2.0    0.7    80.31    110.94 ct6xwvwg3w0bv
SELECT COUNT(*) FROM ORDERS ....

The Query with SQL_ID a9nchgksux6x2 is reading 100x more buffers than the 2nd most 'hungry' statement and CPU and Elapsed are off the 'scale' of the report.  This is a prime candidate for the cause of the CBC latch issues.

You can also link this information to the Top  Segments by Logical Reads:
Segments by Logical Reads           
-> Total Logical Reads:     265,126,882
-> Captured Segments account for   98.5% of Total

           Tablespace                      Subobject  Obj.       Logical
Owner         Name    Object Name            Name     Type         Reads  %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
DMSUSER    USERS      SALES                           TABLE  212,206,208   80.04
DMSUSER    USERS      SALES_PK                        INDEX   44,369,264   16.74
DMSUSER    USERS      SYS_C0012345                    INDEX    1,982,592     .75
DMSUSER    USERS      ORDERS_PK                       INDEX      842,304     .32
DMSUSER    USERS      INVOICES                        TABLE      147,488     .06
          -------------------------------------------------------------

The top object read is SALES and the top SQL is a select from SALES which appears to correlate towards this being a potential problem select.

This SQL should be investigated to see if the Gets per Exec or the Executions figure per hour has changed in any way (comparison to previous reports would show this) and if so the reasons for that change investigated and resolved.

In this case the statement is reading > 10,000 buffers per execution and executing > 15,000 times
so both of these may need to be adjusted to get better performance.

Note: This is a simple example where there is a high likelihood that the 'biggest' query is the culprit but it is not always the 'Top' SQL that causes the problem. For example, contention may occur on a statement with a smaller total if it is only executed a small number of times so that  it may not appear as the top sql. It may still make millions of buffer gets, but will appear lower in the list because other sqls are performing many times, just not contending.

So, if the first SQL is not the culprit then look at the others.

REFERENCES

NOTE:62172.1 - Understanding and Tuning Buffer Cache and DBWR
NOTE:163424.1 - How To Identify a Hot Block Within The Database Buffer Cache.


===========================================================

How to Identify Hot Blocks Within the Database Buffer Cache that may be Associated with 'latch: cache buffers chains' Wait Contention



GOAL

Identify blocks which cause latch contention on the 'cache buffers chains' latch. 
NOTE: To solve a hot block, the application may need to be reviewed because the contention may be caused by inefficient SQL repeatedly accessing the same buffers unnecessarily.
For more details see:

Document 1342917.1 Troubleshooting 'latch: cache buffers chains' Wait Contention

SOLUTION

Possible hot blocks in the buffer cache normally can be identified by a high or rapid increasing wait count on the CACHE BUFFERS CHAINS latch. This latch is acquired when searching for data blocks cached in the buffer cache. The Buffer cache is implemented as a chain of blocks linked together under a hash value. When a buffer is requested the hash value of the chain that this buffer will be in is calculated and that chain is scanned. Each chain is protected by a latch to prevent it from being changed while it is scanned. Contention in this latch can be caused by very heavy access to a single block. This can require the application to be reviewed.

By examining the waits for this latch, information about the segment and the specific block can be obtained using the following queries.

Identify high contention latch addresses

First determine which latch addresses (ADDR) are interesting by examining the number of  sleeps for this latch. The higher the sleep count, the more processes are having to wait for that latch, so the higher the sleeps, the higher the contention. If we are investigating contention, the higher the sleeps the more interesting the latch addresses (ADDR) is. The following select returns latch addresses (ADDR) ordered by sleeps:
SQL> 
SELECT child# "cCHILD" ,
  addr "sADDR" ,
  gets "sGETS" ,
  misses "sMISSES" ,
  sleeps "sSLEEPS"
FROM v$latch_children
WHERE name = 'cache buffers chains'
ORDER BY 5,  1,  2,  3;

Run the query above a few times to establish the addresses (ADDR) that consistently has the highest sleeps.

Identify the blocks protected under that address

Once the addresses (ADDR) with the highest sleep count have been determined, these latch addresses can be used to get more detail about the blocks currently in the buffer cache protected by this latch. The following query identifies the blocks protected by a particular ADDR and should be run just after determining the ADDR with the highest sleep count:
SQL>  column segment_name format a35
SELECT  /*+ RULE */
  e.owner  || '.'  || e.segment_name segment_name,
  e.extent_id extent#,
  x.dbablk - e.block_id + 1 block#,
  x.tch,
  l.child#
FROM sys.v$latch_children l,
  sys.x$bh x,
  sys.dba_extents e
WHERE x.hladdr = '&ADDR'
AND e.file_id  = x.file#
AND x.hladdr   = l.addr
AND x.dbablk BETWEEN e.block_id AND e.block_id + e.blocks -1
ORDER BY x.tch DESC ;
Example output :
SEGMENT_NAME                     EXTENT#      BLOCK#       TCH    CHILD# 
-------------------------------- ------------ ------------ ------ ---------- 
SCOTT.EMP_PK                                5          474     17      7,668 
SCOTT.EMP                                   1          449      2      7,668

The TCH column identifies the number of times each block has been hit by a SQL statement, so you can use this information to identify a hot block. The higher the value of the TCH column, the more frequent the block is accessed by SQL statements.

The following query joins with DBA_OBJECTS to find the objects waiting, the misses, sleeps, etc:
SQLgt; 
WITH bh_lc AS
  (SELECT
    /*+ ORDERED */
    lc.addr,
    lc.child#,
    lc.gets,
    lc.misses,
    lc.immediate_gets,
    lc.immediate_misses,
    lc.spin_gets,
    lc.sleeps,
    bh.hladdr,
    bh.tch tch,
    bh.file#,
    bh.dbablk,
    bh.class,
    bh.state,
    bh.obj
  FROM x$kslld ld,
    v$session_wait sw,
    v$latch_children lc,
    x$bh bh
  WHERE lc.addr  =sw.p1raw
  AND sw.p2      = ld.indx
  AND ld.kslldnam='cache buffers chains'
  AND lower(sw.event) LIKE '%latch%'
  AND sw.state ='WAITING'
  AND bh.hladdr=lc.addr
  )
SELECT bh_lc.hladdr,
  bh_lc.tch,
  o.owner,
  o.object_name,
  o.object_type,
  bh_lc.child#,
  bh_lc.gets,
  bh_lc.misses,
  bh_lc.immediate_gets,
  bh_lc.immediate_misses,
  spin_gets,
  sleeps
FROM bh_lc,
  dba_objects o
WHERE bh_lc.obj = o.object_id(+)
UNION
SELECT bh_lc.hladdr,
  bh_lc.tch,
  o.owner,
  o.object_name,
  o.object_type,
  bh_lc.child#,
  bh_lc.gets,
  bh_lc.misses,
  bh_lc.immediate_gets,
  bh_lc.immediate_misses,
  spin_gets,
  sleeps
FROM bh_lc,
  dba_objects o
WHERE bh_lc.obj = o.data_object_id(+)
ORDER BY 1,2 DESC;

The output is something like following:
HLADDR           TCH   OWNER    OBJECT_NAME               OBJECT_TYP CHILD#  GETS   MISSES IMMEDIATE_GETS   IMMEDIATE_MISSES SPIN_GETS SLEEPS
---------------- ----- -------- ------------------------- ---------- ------- ------ ------ ---------------- ---------------- --------- ------
0000000621ED8B00   127 SABXFER  PCBDEMPRELEVEMENT         TABLE        80750 703326   1649             8007                0      1642      7
0000000621ED8B00   127                                                 80750 703326   1649             8007                0      1642      7
0000000621ED8B00    74 SABXFER  V_HIERARCHIEENTREPRISE_B  TABLE        80750 703326   1649             8007                0      1642      7
0000000621ED8B00     1 SABXFER  SACNTPR_PK                INDEX        80750 703326   1649             8007                0      1642      7
0000000621ED8B00     1 SABXFER  SAENCR2_1                 TABLE        80750 703326   1649             8007                0      1642      7
0000000621ED8B00     1                                                 80750 703326   1649             8007                0      1642      7
0000000621ED8B00     0                                                 80750 703326   1649             8007                0      1642      7

Reducing contention

In order to reduce contention for this object the following mechanisms can be put in place:
  1. Examine the application to see if the execution of certain DML and SELECT statements can be reorganized to eliminate contention on the object. This may be because individual SQL statements are reading more buffers than they need to or SQL affecting specific buffers does so at the same time. For more details see:
    Document 1342917.1 Troubleshooting 'latch: cache buffers chains' Wait Contention
  2. Decrease the buffer cache. If less buffers are stored then there is less change of them contending. This may only help in a small amount of cases and may just move the problem elsewhere.
  3. DBWR throughput may have a factor in this as well because this determines how quickly buffers are flushed to disk. If dirty buffers are present then processes requesting those buffers may need to find earlier version meaning that they hold latches longer increasing the chances of contention.  If using multiple DBWR's then increase the number of DBWR's. See:
    Document 62172.1 - Understanding and Tuning Buffer Cache and DBWR
  4. Increase the PCTFREE for the table storage parameters via ALTER TABLE or rebuild. This will result in less rows per block such that there is less chance of the same buffers being requested by processes.
  5. Consider implementing reverse key indexes (if range scans aren't commonly used against the segment)
=======================================================================

Resolving Issues Where Waits for 'latch: cache buffers chains' Seen Due to Poorly Tuned SQL


PURPOSE

Disclaimer: This note is written in the context that the Performance Guided Resolution tool was used and recommended this article. It may not make as much sense if read standalone or by other audiences.

TROUBLESHOOTING STEPS


Brief Definition:

Block headers are placed on the buffer cache in linked lists (cache buffer chains) which are accessed through a hash table. One or more hash chains are protected by one child of this latch. Processes need to get the child latch to scan for a buffer. This prevents the linked list from changing while scanning.
'latch: cache buffers chains' contention is typically encountered because SQL statements read more buffers than they need to and multiple sessions are waiting to read the same block.

Problem Confirmation:

AWR/statspack report top timed event shows significant percentage of database time spent waiting for 'latch: cache buffers chains'

Reducing Waits:

Contention on the cache buffers chains latch may occur when block headers are accessed very frequently (via logical reads). These are often referred to as Hot blocks. Look for SQLs that access the blocks in question and determine if the repeated reads are necessary. This may be within a single session or across multiple sessions. In order to identify the contending SQLs:
Check ASH data to find the SQLs most often waited for this latch.
Check SQL ordered by logical reads to find out the top SQLs performing extensive logical reads.
Check Segments by logical reads to find the segments upon which most of the logical reads occurred.
  • Inefficient execution plans may perform many logical reads while they visit many blocks. If this query is executed by many sessions concurrently (or other similar queries against the same blocks), then there will be contention on these blocks. Check for suboptimal SQL (this is the most common cause of the events) - look at the execution plan for the SQL being run and try to reduce the gets per executions which will minimize the number of blocks being accessed and therefore reduce the chances of multiple sessions contending for the same block. If you can identify a poor SQL and have identified a better plan, you can direct the optimizer to use this plan using the following article:
    Document 1400903.1 Directing Plans with Baselines/Profiles Using coe_load_sql_baseline.sql / coe_load_sql_profile.sql (shipped with SQLT)
     
  • When execution plans are optimal, then further checks are needed from your application side to validate whether the concurrency on blocks imposed via repeated execution of sqls are justified.
  • If you have assigned a very large buffer cache, then reducing the size may help, in very rare case.
  • Index leaf blocks may see contention due to key values that are increasing steadily (using a sequence) and concentrated in a leaf block on the "right-hand side" of the index. Look at using reverse key indexes if range scans aren't commonly used against the segment (remember that this change means that range scan won’t be used for this index).
  • Use hash partitioning to spread values across blocks.
  • Rebuilding table with more PCTFREE may distribute rows in multiple blocks. This will reduce the number of rows per block and hopefully, spread out contention for the blocks (at the expense of wasting space). This may cause some queries to run slower because they will need to access more blocks to obtain the same number of rows. 

Measuring Success:

Once you have applied the changes to resolve the issues you have found, compare the latest AWR to the AWR that led you here via Guided Resolution (that AWR becomes your baseline). Look at the percentage decrease total wait time for this event. If there are still issues, re- evaluate those and address them according to the specific symptom.









Note---> This informationmation taken from oracle metalink. all copy rights oracle only.

Comments

Post a Comment

Oracle DBA Information