How to Move from ANALYZE to DBMS_STATS on Non-Partitioned Tables - Some Examples

How to Move from ANALYZE to DBMS_STATS on Non-Partitioned Tables - Some Examples


This article provides statistic gathering examples using DBMS_STATS and the legacy ANALYZE commands. Its purpose is to assist users of ANALYZE to move to DBMS_STATS for Non-partitioned tables.

Note: This note is a part of a series explaining how to move to DBMS_STATS from legacy statistic gathering routines such as ANALYZE and DBMS_UTILITY.
See the following article to view other notes in the series:
Note:237293.1 How to Move from ANALYZE to DBMS_STATS - Introduction
Scope and Application 

This article is aimed at anyone wishing to gather statistics for use by the Cost Based Optimizer (CBO). 

Gathering statistics on Non-Partitioned Tables.
Statistics for Non-Partitioned tables are stored in the following dictionary Views:
  • USER_TABLES
  • USER_TAB_COL_STATISTICS
  • USER_HISTOGRAMS
    This dictionary view is always populated at the same time that USER_TAB_COL_STATISTICS.
    To know how many buckets are collected, look at Num_Buckets column in USER_TAB_COL_STATISTICS.
  • USER_INDEXES if any
The Examples below provide typical scenarios using both ANALYZE and DBMS_STATS 

1. Gather Statistics on T1Tab, T1Tab columns (no histograms) and T1Tab indexes, computing the statistics. 

1
From  ANALYZE TABLE T1Tab COMPUTE STATISTICS;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => TRUE, method_opt => 'FOR ALL COLUMNS SIZE 1');

2. Gather Statistics on T1Tab, T1Tab columns (no histograms) and T1Tab indexes using a 15% sample. 

2
From  ANALYZE TABLE T1Tab ESTIMATE STATISTICS SAMPLE 15 PERCENT;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => TRUE, ESTIMATE_PERCENT => 15);

3. Gather Computed Statistics on T1Tab for the table but not for columns or indexes 

3
From  ANALYZE TABLE T1Tab COMPUTE STATISTICS FOR TABLE;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => FALSE, METHOD_OPT => 'FOR COLUMNS');

4. Gather Computed Statistics on T1Tab for the table but not for columns. Gathers indexes statistics for index IDX_T1Tab 

4
From  ANALYZE TABLE T1Tab COMPUTE STATISTICS FOR TABLE;
    + ANALYZE INDEX IDX_T1Tab COMPUTE STATISTICS;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => TRUE, METHOD_OPT => 'FOR COLUMNS');

5. Gather Statistics on T1Tab, T1Tab columns (high and low column values) and T1Tab indexes, computing the statistics. Later, gather column statistics (histograms - Default is SIZE 75) for all indexed columns. 

5
From  ANALYZE TABLE T1Tab COMPUTE STATISTICS;
      and then later to refresh only Indexed columns
    + ANALYZE TABLE T1Tab COMPUTE STATISTICS FOR ALL INDEXED COLUMNS;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => TRUE);
      and then later to refresh only Table and Indexed columns
    + exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => FALSE, METHOD_OPT => 'FOR ALL INDEXED COLUMNS');

6. Gather Statistics on T1Tab and column statistics for all indexed columns (but only recording high and low column values). Do not gather any index statistics. 

6
From  ANALYZE TABLE T1Tab COMPUTE STATISTICS FOR TABLE FOR ALL INDEXED COLUMNS size 1;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => FALSE, METHOD_OPT => 'FOR ALL INDEXED COLUMNS size 1');

7. Gather column statistics for all indexed columns (but only recording high and low column values). Do not gather any index statistics (Gathering of Table Statistics dependant on method used). 

7
From  ANALYZE TABLE T1Tab COMPUTE STATISTICS FOR ALL INDEXED COLUMNS size 1;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => FALSE, METHOD_OPT => 'FOR ALL INDEXED COLUMNS size 1');
    + exec DBMS_STATS.DELETE_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE_COLUMNS => FALSE);
Note that with Non Partitioned Tables, the GRANULARITY parameter values of 'ALL'|'GLOBAL'|'DEFAULT' are equivalent since there are no other levels to gather global statistics for.
For more information on Global Statistics see: Note:236935.1
Detailed explanation for the examples 

1 - Case 1

Gather Statistics on T1Tab, T1Tab columns (high and low column values) and T1Tab indexes, computing the statistics. 

From  ANALYZE TABLE T1Tab COMPUTE STATISTICS;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => TRUE, method_opt => 'FOR ALL COLUMNS SIZE 1');


  • Populate USER_TABLES, USER_TAB_COL_STATISTICS, USER_INDEXES
  • The GLOBAL_STATS column is set to NO with the ANALYZE command and YES with the DBMS_STATS package. See (*)for details
  • ANALYZE sets the value of the SAMPLE_SIZE Column in USER_TAB_COL_STATISTICS to: ( number of Rows )
  • DBMS_STATS sets the value of the SAMPLE_SIZE Column in USER_TAB_COL_STATISTICS to: ( Number of Rows - Number of Nulls )


2 - Case 2

Gather Statistics on T1Tab, T1Tab columns and T1Tab indexes using a 15% sample. 

From  ANALYZE TABLE T1Tab ESTIMATE STATISTICS SAMPLE 15 PERCENT;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => TRUE, ESTIMATE_PERCENT => 15);


  • Populate USER_TABLES, USER_TAB_COL_STATISTICS, USER_INDEXES
  • The 4 Dictionary views show some differences in GLOBAL_STATS, SAMPLE_SIZE, NUM_ROWS and COLUMNS
    • Global_Stats is set to NO with ANALYZE command and YES with DBMS_STATS package. See (*)
      • ANALYZE sets the value of the SAMPLE_SIZE Column to 15 percent
      • DBMS_STATS sets the value of the SAMPLE_SIZE Column ro around 15 percent
      • The sampling model of DBMS_STATS uses some thresholds and can increase this Sample_Size to the real number of rows.
      • SAMPLE_SIZE Column can have different values in USER_TAB_COL_STATISTICS for the columns of the same table.
    • Regarding NUM_ROWS, the DBMS_STATS sampling algorithm has been refined in comparison to ANALYZE to make the sampling for estimated statistics truly random. The design goal is to make the sampling non-repeatable when statistics are estimated and also to minimise the chance of sampling a cluster of empty or full blocks and thus adversely affect the statistics. Since these statistics are an estimate, the difference in the reported value of NUM_ROWS and the actual number of rows is insignificant, and will not affect optimizer decisions. 


3 - Case 3

Gather Computed Statistics on T1Tab for the table but not for columns or indexes 

From  ANALYZE TABLE T1Tab COMPUTE STATISTICS FOR TABLE;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => FALSE, METHOD_OPT => 'FOR COLUMNS');


  • See Note:67257.1 ANALYZE - Difference between FOR TABLE and standard
  • Populates only USER_TABLES
  • No Statistics value in USER_TAB_COL_STATISTICS or USER_INDEXES for table T1Tab
  • Using ANALYZE:
    • GLOBAL_STATS Column is set to NO.
    • Column value for AVG_ROW_LEN (in USER_TABLES) is gathered.

  • Using DBMS_STATS:
    • GLOBAL_STATS Column is set to YES. See (*)
    • Column value for AVG_ROW_LEN (in USER_TABLES) is :
      • set to the default value of 100 if there is not already a value there
      • Left as the old value if one already exists

      See (**)


4 - Case 4

Gather Computed Statistics on T1Tab for the table but not for columns. Gathers indexes statistics for index IDX_T1Tab 

From  ANALYZE TABLE T1Tab COMPUTE STATISTICS FOR TABLE;
    + ANALYZE INDEX IDX_T1Tab COMPUTE STATISTICS;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => TRUE, METHOD_OPT => 'FOR COLUMNS');


  • Populates USER_INDEXES
  • ANALYZE sets the GLOBAL_STATS Column to NO
  • DBMS_STATS sets the GLOBAL_STATS Column to YES. See (*)
  • Column value for AVG_ROW_LEN (in USER_TABLES) have a different value depending on whether ANALYZE or DBMS_STATS is used (See Case 3)


5 - Case 5

Gather Statistics on T1Tab, T1Tab columns (no histograms) and T1Tab indexes, computing the statistics. Later, gather column statistics (histograms) for all indexed columns.

From  ANALYZE TABLE T1Tab COMPUTE STATISTICS;
      and then later to refresh only Indexed columns
    + ANALYZE TABLE T1Tab COMPUTE STATISTICS FOR ALL INDEXED COLUMNS size 1;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => TRUE);
      and then later to refresh only Table and Indexed columns
    + exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => FALSE, METHOD_OPT => 'FOR ALL INDEXED COLUMNS size 1');
  • For the first Statement See Case 1
  • 2nd statement is needed to refresh only Columns statistics on T1Tab indexed columns
    • ANALYZE only collects statistics on Indexed Columns
      • GLOBAL_STATS is set to NO
      • USER_TABLES is not updated for table T1Tab
      • USER_TAB_COL_STATISTICS is updated for each indexed column
    • DBMS_STATS collects Indexed Column statistics and Table statistics are refreshed
      • GLOBAL_STATS is set to YES. See (*)
      • USER_TABLES is updated apart from AVG_ROW_LEN. See (**)
      • USER_TAB_COL_STATISTICS is updated for each indexed column


6 - Case 6

Gather Statistics on T1Tab and column statistics for all indexed columns (but only recording high and low column values).
Do not gather any index statistics.
 

From  ANALYZE TABLE T1Tab COMPUTE STATISTICS FOR TABLE FOR ALL INDEXED COLUMNS size 1;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => FALSE, METHOD_OPT => 'FOR ALL INDEXED COLUMNS size 1');


  • Gather Statistics on T1Tab, and T1Tab Indexed Columns
  • Only high and low values are recorded for the column statistics on the indexed columns. This is the same as gathering the default column statistics for those columns.
  • Do not Gather Statistics on T1Tab's unIndexed Columns and T1Tab's indexes
  • Populate USER_TABLES
  • Populate USER_TAB_COL_STATISTICS for the Indexed Columns.
  • USER_INDEXES remains unchanged for indexes on table T1Tab
  • using ANALYZE:
    • GLOBAL_STATS Column is set to NO.
    • AVG_ROW_LEN (in USER_TABLES) is gathered.
  • using DBMS_STATS:
    • GLOBAL_STATS Column is set to YES. See (*)
    • AVG_ROW_LEN (in USER_TABLES) is set to the default value of 100. See (**)


7 - Case 7

Gather column statistics for all indexed columns (but only recording high and low column values).
Do not gather any index statistics (Gathering of Table Statistics dependant on method used).
 

From  ANALYZE TABLE T1Tab COMPUTE STATISTICS FOR ALL INDEXED COLUMNS size 1;
to    exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE => FALSE, METHOD_OPT => 'FOR ALL INDEXED COLUMNS size 1');
     + exec DBMS_STATS.DELETE_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE_COLUMNS => FALSE);


  • ANALYZE statement gathers statistics on indexed columns, without having any table statistics
    • There are no statistics in USER_TABLES for this table T1Tab
    • T1Tab not Indexed columns have no statistics in USER_TAB_COL_STATISTICS
  • DBMS_STATS collects T1Tab indexed Columns statistics AND Table statistics.
    • With DBMS_STATS it is impossible to gather column statistics without the table ones.
    • So to have the same behaviour, you have to delete the Table statistics without cascading on columns


(*)
In some Oracle versions, the GLOBAL_STATS column can have NO value when statistics have been gathered using the DBMS_STATS package generating "ANALYZE" commands for sampling activities instead of SQL statements.
For example, the GLOBAL_STATS Column is set to NO in USER_INDEXES when the database version is lower than 9.2.
(**)
The AVG_ROW_LEN Table Statistic (in USER_TABLES) is now an aggregation of AVG_COL_LEN Column Statistics (in USER_TAB_COL_STATISTIC).
All columns statistics have to be gathered [method_opt => FOR ALL COLUMNS ...] to generate the correct value in USER_TABLES, (otherwise the previous value is left or the default value of 100 is used if there is not already a value) - See Cases 3 and 6.

References



Note:237293.1 How to Move from ANALYZE to DBMS_STATS - Introduction 
Note:236935.1 Global statistics - DBMS_STATS versus ANALYZE 

Note:223065.1 Using DBMS_STATS.GATHER_TABLE_STATS With ESTIMATE_PERCENT Parameter Samples All Rows 
Note:67257.1 ANALYZE - Difference between FOR TABLE and standard 

Note:237901.1 Gathering Schema or Database Statistics Automatically - Examples
bug:2919423 DBMS_STATS TAKES AT LEAST TWICE AS LONG IN 9.2 AS IN 8.1.7.4.0
Upgrade to 9.2.0.5 to improve performance using DBMS_STATS package

============================================================

Comments