CLUSTERING FACTOR DEMYSTIFIED PART – II
In my last post, CLUSTERING FACTOR DEMYSTIFIED PART – I‘, I had explained what is clustering factor, how and why does it affect the performance and had demonstrated it with the help of an example. I had mentioned various methods to resolve the problem of a high clustering factor.
Let’s explore the simplest one first i.e. CTAS with order by:
Overview:
- Create another ‘intermediate’ table from ‘unorganized’ table using CTAS order by
- Create ‘reorganized’ table from ‘ intermediate’ table using CTAS
- Create index on id column of ‘reorganized’ table
- Trace the same query and check that performance has improved
Implementation :
- Create another ‘intermediate’ table from ‘unorganized’ table using CTAS order by
SQL>create table intermediate as select * from unorganized order by id;
- Create ‘reorganized’ table from ‘ intermediate’ table using CTAS
SQL>create table reorganized as select * from intermediate;
drop table intermediate purge;
- Find out no. of blocks across which records of a key value are spread in the ‘reorganized’ table.
- Note that in ‘reorganized’ table, records are now clustered i.e. rows for a key value are placed together in blocks rather than scattered across various blocks
SQL> select reorg.id, reorg.cnt reorganized_block from ( select id, count(distinct(dbms_rowid.ROWID_BLOCK_NUMBER(rowid))) cnt from reorganized group by id) reorg order by id;
ID REORGANIZED_BLOCKS
———- ——————–
1 5
2 6
3 6
4 6
………
98 5
99 5
100 6
-- Create index on id column of ‘reorganized’ table and gather statistics for the table
SQL> create index reorganized_idx on reorganized(id); exec dbms_stats.gather_table_stats(USER, 'reorganized', estimate_percent => 100, method_opt=> 'for all indexed columns size 254');
– Check the index statistics
– Note that the index on table ‘reorganized‘ has a clustering factor (488) which is equal to the no. of rows in the table i.e. to fetch all the records for various key values using index, blocks need not be switched unless all the records in the earlier block have been fetched
SQL>set line 500< col table_name for a15 > col index_name for a15 select blevel, leaf_blocks, table_name, index_name, clustering_factor from user_indexes where table_name like '%REORGANIZED%' order by 1;
BLEVEL LEAF_BLOCKS TABLE_NAME INDEX_NAME CLUSTERING_FACTOR
———- ———————– ——————- —————— ————————–
1 7 REORGANIZED REORGANIZED_IDX 488
- Trace the same query on ‘reorganized’ table and check that performance has improved
SQL> alter session set tracefile_identifier = 'cluster_factor'; alter session set sql_trace=true; select /*+ index(reorganized reorganized_idx) */ count(txt) from reorganized where id=id; alter session set sql_trace=false;
– Find out the name of trace file generated
SQL> col trace_file for a100 > select value trace_file from v$diag_info where upper(name) like '%TRACE FILE%';
TRACE_FILE
—————————————————————————————————-
/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_29116_cluster_factor.trc
– Run tkprof utility on the trace file generated
$cd /u01/app/oracle/diag/rdbms/orcl/orcl/trace tkprof orcl_ora_29116_cluster_factor.trc cluster_factor.out vi cluster_factor.out
******************************************************************************
SQL ID: bmsnm1zh6audy
Plan Hash: 3677546845
select /*+ index(reorganized reorganized_idx) */ count(txt)
from
reorganized where id=id
Rows Row Source Operation
——- —————————————————
1 SORT AGGREGATE (cr=496 pr=328 pw=0 time=0 us)
3400 TABLE ACCESS BY INDEX ROWID REORGANIZED (cr=496pr=328 pw=0 time=42046 us cost=496 size=3073600 card=3400)
3400 INDEX FULL SCAN REORGANIZED_IDX (cr=8 pr=0 pw=0 time=9945 us cost=8 size=0 card=3400)(object id 75125)
******************************************************************************
Note that :
Total no. of I/Os performed against the index on ‘reorganized’ table = 8 (cr=8 in the INDEX FULL SCAN ORGANIZED_IDX row source)
Total I/O’s performed by the query = 496 (cr = 496 in the TABLE ACCESS BY INDEX ROWID ORGANIZED)
Hence , No. of I/O’s made against the table = 496 – 8 = 488 which is equal to the clustering factor of the index.
We can verify the improvement in performance by using autotrace on a query against the table:
SQL>set autotrace traceonly explain select /*+ index(reorganized reorganized_idx) */ count(txt) from reorganized where id=id; set autotrace off
————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 904 | 496 (0)| 00:00:06 |
| 1 | SORT AGGREGATE | | 1 | 904 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| REORGANIZED | 3400 | 3001K| 496 (0)| 00:00:06 |
|* 3 | INDEX FULL SCAN | REORGANIZED_IDX | 3400 | | 8 (0)| 00:00:01 |
————————————————————————————————
Note that there is a cost of 8 for using the index for the REORGANIZED table and index – about 8 I/O’s against the index i.e. the query will hit one root block (1)and the leaf blocks (7) .
Then the query will be doing 488 more I/Os against the table(= number of blocks in table), because the rows needed are all next to each other on a few database blocks, for a total cost of 496.
Hence. it can be seen that rows in a table can be physically resequenced by using CTAS with order by. Here I would like to point out that in this case, the table becomes unavailable while it is being recreated. Hence, if availability of the table can’t be compromised, this method is not preferable.
This has been copied verbatim from my blog at http://oracleinaction.com/cluster-factor-ii/. This is plagiarism.
ReplyDelete