CLUSTERING FACTOR DEMYSTIFIED PART – II

CLUSTERING FACTOR DEMYSTIFIED PART – II


In my last post, CLUSTERING FACTOR DEMYSTIFIED PART – I‘,  I had explained what is clustering factor, how and why does it affect the performance and had  demonstrated it with the help of an example. I had mentioned various methods to  resolve the problem of a high clustering factor.
Let’s explore the simplest one first  i.e. CTAS with order by:
Overview:
- Create another ‘intermediate’  table from ‘unorganized’ table using CTAS order by
- Create ‘reorganized’ table from ‘ intermediate’  table using CTAS
- Create index on id column of ‘reorganized’ table
- Trace  the same query and check that performance has improved
Implementation :
- Create another ‘intermediate’  table from ‘unorganized’ table using CTAS order by
SQL>create table intermediate as   select * from unorganized    order by id;
- Create ‘reorganized’ table from ‘ intermediate’  table using CTAS
SQL>create table reorganized as select * from intermediate;
drop table intermediate purge;
- Find out no. of blocks across which records of a key value are spread in the ‘reorganized’ table.
- Note that in ‘reorganized’  table,  records  are now clustered   i.e. rows for a key value are placed together in blocks rather than scattered across various blocks
SQL> select reorg.id,  reorg.cnt reorganized_block
         from ( select id,               count(distinct(dbms_rowid.ROWID_BLOCK_NUMBER(rowid))) cnt
               from reorganized 
                group by id) reorg
        order by id;
ID        REORGANIZED_BLOCKS
     ———- ——————–
         1                    5
         2                    6
         3                    6
         4                    6
         ………
        98                    5
        99                    5
       100                   6
-- Create index on id column of ‘reorganized’  table and gather statistics for the table
SQL>  create index reorganized_idx on reorganized(id);

exec dbms_stats.gather_table_stats(USER, 'reorganized', estimate_percent => 100, method_opt=> 'for all indexed columns size 254');
– Check the index statistics
– Note that the index on table ‘reorganized‘ has a clustering factor (488) which is equal to the no. of rows in the table i.e.  to fetch all the records for various key values using index,  blocks need not be switched unless all the records in the earlier block have been fetched
 SQL>set line 500<             col table_name for a15 >
           col index_name for a15 
          select blevel,  leaf_blocks, table_name, index_name, clustering_factor
         from user_indexes 
          where table_name like '%REORGANIZED%' 
         order by 1;
   BLEVEL    LEAF_BLOCKS       TABLE_NAME      INDEX_NAME      CLUSTERING_FACTOR
    ———- ———————– ——————- ——————      ————————–
         1           7                       REORGANIZED     REORGANIZED_IDX               488
- Trace  the same query on ‘reorganized’  table  and check that performance has improved
SQL> alter session set tracefile_identifier = 'cluster_factor'; 
           alter session set sql_trace=true; 
           select /*+ index(reorganized reorganized_idx) */ count(txt) from reorganized where id=id; 
           alter session set sql_trace=false;
– Find out the name of trace file generated
SQL> col trace_file for a100 >
           select  value trace_file from v$diag_info 
         where upper(name) like '%TRACE FILE%';
TRACE_FILE
—————————————————————————————————-
/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_29116_cluster_factor.trc
– Run tkprof utility on the trace file generated 
$cd /u01/app/oracle/diag/rdbms/orcl/orcl/trace 
    tkprof orcl_ora_29116_cluster_factor.trc cluster_factor.out 
   vi cluster_factor.out
******************************************************************************
SQL ID: bmsnm1zh6audy
Plan Hash: 3677546845
select /*+ index(reorganized reorganized_idx) */ count(txt)
from
 reorganized where id=id
Rows     Row Source Operation
——-  —————————————————
      1  SORT AGGREGATE (cr=496 pr=328 pw=0 time=0 us)
   3400   TABLE ACCESS BY INDEX ROWID REORGANIZED (cr=496pr=328 pw=0 time=42046 us cost=496 size=3073600 card=3400)
   3400    INDEX FULL SCAN REORGANIZED_IDX (cr=8 pr=0 pw=0 time=9945 us cost=8 size=0 card=3400)(object id 75125)
******************************************************************************
Note that :
Total no. of I/Os performed against the index on ‘reorganized’ table = 8 (cr=8 in the INDEX FULL SCAN ORGANIZED_IDX row source)
Total I/O’s performed by the query = 496 (cr = 496 in the TABLE ACCESS BY INDEX ROWID ORGANIZED)
Hence , No. of I/O’s made against the table = 496 – 8 = 488 which is equal to the clustering factor of the index.
We can verify the improvement in performance  by using autotrace on a query against the table:
SQL>set autotrace traceonly explain 
       select /*+ index(reorganized reorganized_idx) */ count(txt)  
         from      reorganized where id=id; 
          set autotrace off
————————————————————————————————
| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
————————————————————————————————
|   0 | SELECT STATEMENT             |                 |     1 |   904 |   496   (0)| 00:00:06 |
|   1 |  SORT AGGREGATE              |                 |     1 |   904 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| REORGANIZED     |  3400 |  3001K|   496   (0)| 00:00:06 |
|*  3 |    INDEX FULL SCAN           | REORGANIZED_IDX |  3400 |       |     8   (0)| 00:00:01 |
————————————————————————————————
Note that there is a cost of 8 for using the index for the REORGANIZED table and index – about 8 I/O’s against the index i.e. the query will hit one root block (1)and the leaf blocks (7) .
Then the query will be doing 488 more I/Os against the table(= number of blocks in table), because the rows needed are all next to each other on a few database blocks, for a total cost of 496.
Hence.  it can be seen that rows in  a table can be physically resequenced by using CTAS with order by. Here I would like to point out that in this case, the table becomes unavailable while it is being recreated. Hence,  if availability of the table can’t be compromised, this method is not preferable.


Comments

  1. This has been copied verbatim from my blog at http://oracleinaction.com/cluster-factor-ii/. This is plagiarism.

    ReplyDelete

Post a Comment

Oracle DBA Information