RAC SYSTEM TEST PLAN OUTLINE or Oracle Support Master Note for Real Application Clusters (RAC), Oracle Clusterware and Oracle Grid Infrastructure (Doc ID 1096952.1)

Oracle Support Master Note for Real Application Clusters (RAC), Oracle Clusterware and Oracle Grid Infrastructure (Doc ID 1096952.1)



QUICKLINKS:

Are you implementing Oracle RAC or Oracle Grid Infrastructure for the first time?   Are you an experienced RAC or Oracle Clusterware user, but want feedback on the latest knowledge and Best Practices?   This blog is intended to give you insights into the quickest way to find information about RAC, Oracle Clusterware and Oracle Grid Infrastructure.  The contents of this blog replicate the data found in the My Oracle Support Master Note#1096952.1 for Oracle RAC, Oracle Clusterware and Oracle Grid Infrastructure.

This Master Note is intended to provide an index and references to the most frequently used My Oracle Support Notes with respect to Oracle Real Application Clusters, Oracle Clusterware and Oracle Grid Infrastructure implementations. This Master Note is subdivided into categories to allow for easy access and reference to notes that are applicable to your area of interest, within the RAC, Clusterware and Grid Infrastructure spaces. This includes the following categories:
  • RAC Assurance RAC Starter Kits - Installation and Best Practices Information
  • Additional Notes on RAC Installation and Frequently asked Questions:
  • RAC and Clusterware Monitoring, Troubleshooting & Data Collection:
    • Tools
    • Troubleshooting and Data Collection
    • My Oracle Support references
  • RAC and Clusterware Testing and Performance Tuning
  • RAC and Clusterware Maintenance and Patching References
    • General Maintenance References
    • Patching Best Practices and References
  • Upgrade and Platform Migration for RAC
  • RAC and Oracle Applications
  • RAC ASM Documentation
  • Maximum Availability Architecture References 

Scope and Application

Oracle Real Application Clusters (RAC) is a cluster database with a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide a highly scalable and available database solution for all your business applications. Oracle RAC provides the foundation for enterprise grid computing.

Oracle Clusterware is a portable cluster software that allows clustering of single servers so that they cooperate as a single system. Oracle Clusterware provides the required infrastructure for Oracle Real Application Clusters while also enabling high availability protection of any Oracle or third party application.

Oracle Automatic Storage Management (ASM) provides a virtualization layer between the database and storage. It treats multiple disks as a single disk group, and lets you dynamically add or remove disks while keeping databases online. ASM greatly simplifies the storage complexities often associated with an Oracle database while increasing availability and performance of the database.

With Oracle grid infrastructure 11g release 2 (11.2), Oracle Automatic Storage Management (Oracle ASM) and Oracle Clusterware are installed into a single home directory, which is referred to as the Grid Infrastructure home. The installation of the combined products is called Oracle Grid Infrastructure. However, Oracle Clusterware and Oracle Automatic Storage Management remain separate products.

This note applies to the following versions of these products:

Oracle Clusterware - Version 10.1.0.2 to 11.1.0.7
Oracle Grid Infrastructure - Version 11.2.0.x
Oracle Real Application Clusters - Version 10.1.0.2 to 11.2.0.x

Master Note for Real Application Clusters (RAC) Oracle Clusterware and Oracle Grid Infrastructure

RAC Assurance RAC Starter Kits - Installation and Best Practices

The goal of the Oracle Real Application Clusters (RAC) Starter Kits are to provide the latest information on generic and platform specific best practices for implementing an Oracle cluster using Oracle Clusterware or Oracle Grid Infrastructure (11gR2) and Oracle Real Application Clusters. These documents are compiled and maintained based on Oracle Support's ongoing experience with its global RAC customer base. The process used to install, configure, and create an Oracle Cluster and an Oracle Real Application Clusters (RAC) database shares much in common across all supported operating system platforms. Despite these commonalities, many aspects of the deployment process are O/S specific. As such, there is a Generic Starter Kit, as well as a separate Starter Kit for each supported platform. These Starter Kits provide the latest information in terms of installation help, as well as best practices for ongoing/existing implementations. 
  •  RAC Assurance Support Team: RAC and Oracle Clusterware Starter Kit and Best Practices (Generic)
  •  RAC Assurance Support Team RAC Starter Kit (Linux) 
  •  RAC Assurance Support Team RAC Starter Kit (Solaris) 
  •  RAC Assurance Support Team RAC Starter Kit (Windows) 
  •  RAC Assurance Support Team RAC Starter Kit (AIX) 
  •  RAC Assurance Support Team RAC Starter Kit (HP-UX) 
The Generic Starter Kit has attached documents with sample test plan outlines for system testing and basic load-testing to validate the infrastructure is setup correctly, as well as a sample project plan outline for your RAC implementation. In addition, best practices for Storage and networking and other areas are included. The Platform-specific Starter Kit notes complement this with Step-by-Step Installation Guides, for each particular platform, as well as best practices for the specific platform in question.

Additional Notes on RAC and Clusterware Installation and Frequently asked Questions


While the Starter Kits referenced above will contain much of what you need to get started, additional notes and resources that are important to take heed of, particularly for a new implementation, can be found here in the following FAQ's and other references: 
  •  RAC: Frequently Asked Questions
  •  CLUSTER VERIFICATION UTILITY FAQ 
  •  Using Oracle Clusterware with Vendor Clusterware FAQ 
  •  Oracle's Policy for Supporting RAC 10g with Symantec SFRAC 
  •  OCFS2 - Frequently Asked Questions 
  •  Cluster Interconnect in Oracle 10g and 11g
Key notes related to OS Installation requirements for a RAC or Oracle Clusterware environment
  •  Mount Options for Oracle files when used with NAS devices
  •  How to install Oracle Clusterware with shared storage on block devices 
  •  Configuring device-mapper for CRS/ASM 
  •  Oracle Database on AIX,HP-UX,Linux,MacOSX,Solaris,Tru64 Operating Systems Installation and Configuration Requirements Quick Reference (8.0.5 to 11.1) 
Key notes introducing differences in Oracle 11gR2
  •  11gR2 Clusterware and Grid Home - What You Need to Know
  •  Pre 11.2 Database Issues in 11gR2 Grid Infrastructure Environment 
  •  How to Troubleshoot Grid Infrastructure Startup Issues
  •  11gR2 Grid Infrastructure Single Client Access Name (SCAN)Explained

RAC and Clusterware Monitoring, Troubleshooting and Data Collection Notes


To effectively manage your RAC cluster, it is important to know how to monitor, troubleshoot and collect data, both for your own diagnosis, and also to provide supporting information when logging a service request. The following Notes will give you information on how to do both:


Tools for Monitoring RAC and Clusterware Environments 
  •  OS Watcher User Guide
  •  Procwatcher: Script to Monitor and Examine Oracle and CRS 
  •  Introducing Cluster Health Monitor 
  •  How to install Oracle Cluster Health Monitor (former IPD/OS) on Windows 
  •  Remote Diagnostics Agent (RDA) 
  •  11g Understanding Automatic Diagnostic Repository 
  •  Using Diagwait as a Diagnostic. 

References needed for Troubleshooting and Data Collection

  •  Data Gathering for Troubleshooting RAC Issues
  •  Data Gathering for Troubleshooting CRS issues 
  •  CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide 
  •  Crsd Ocssd Evmd Log Rotation Policy 
  •  gc lost blocks diagnostics 
  •  Troubleshoot ORA-29740 errors in a RAC Environment. 
  •  11g How to Unpack a Package in to ADR 
  •  11g Quick Steps to Package and Send Critical Error Diagnostic Information to Support 
  •  What Is The Difference Between RDA and Configuration Manager 
  •  External Support FTP site: Information Sheet 

RAC and Clusterware Testing and Performance Tuning

Much of RAC performance tuning is no different from "single instance" tuning. However, there are some additional areas to check in a RAC environment when tuning or troubleshooting performance related issues. The following notes will make you aware of some of the areas where performance tuning and diagnosis has a specific RAC flavor: 
  •  Oracle Performance Diagnostic Guide (OPDG)
  •  Tuning Inter-Instance Performance in RAC and OPS 
  •  Transactional Sequences in Applications in a RAC environment 
  •  RAC Hangs due to small cache size on SYS.AUDSES$ 
  •  gc lost blocks diagnostics 
  •  HugePages on Linux: What It Is... and What It Is Not... 
  • > Checklist for Performance Problems with Parallel Execution 
  •  Turning NUMA on can cause database hangs 

RAC and Clusterware Certification, Maintenance, and Patching References

Maintenance of a RAC environment is similar in many respects to maintaining a single-instance environment. However, with the Clusterware and Grid Infrastructure stacks, there are some additional tasks to be aware of regarding ongoing maintenance, as well as maintenance that may be brought on by changes to your environment. Below are some references speaking to the areas of general maintenance that may be needed or affect RAC environments in a particular way, as well as references to patching, which is a key part of an ongoing maintenance strategy.
Certification References for RAC and Clusterware Environments
  •  How To Check The Certification Matrix for Real Application Clusters
  •  Database 11g Release 2 Certification Highlights
  •  Oracle Clusterware - ASM - Database Version Compatibility

General Maintenance References for RAC and Clusterware Environments 
  •  How to ADD/REMOVE/REPLACE/MOVE Oracle Cluster Registry (OCR) and Voting Disk
  •  Modifying the VIP or VIP Hostname of a 10g or 11g Oracle Clusterware Node. 
  •  How to Change Interconnect/Public Interface IP or Subnet in Oracle Clusterware 
  •  Potential disk corruption when adding a node to a cluster
  •  VIP Failover Take Long Time After Network Cable Pulled 
  •  Implementing LIBUMEM for CRS on Solaris 64 with 3rd Party Clusterware 
  •  How to Password Protect the Listener 

Patching Best Practices and References for RAC and Oracle Clusterware 
As part of an overall maintenance strategy, it is critical that customers have a formal strategy to stay in front of known issues and bugs. To make it easier for customers to obtain and deploy fixes for known critical issues please refer to the following list of references. 
  •  Database, FMW, Em Grid Control and OCS Software Error Correction Support Policy.
  •  Oracle 10g Release 2 (10.2) Support Status and Alerts 
  •  Oracle 11g Release 1 (11.1) Support Status and Alerts 
  •  Release Schedule of Current Database Releases. 
  •  Oracle Recommended Patches -- Oracle Database. 
  •  Introduction to Patch Set Updates (PSUs) 
  •  Oracle Announces First Patch Set Update For Oracle Database Release 10.2 
  •  10.2.0.X CRS Bundle Patch Information. 
  •  11.1.0.X CRS Bundle Patch Information. 
  •  11.2.0.X Grid Infrastructure Bundle Patch Information. 
  •  Applying one-off Oracle Clusterware patches in a mixed version home environment 
  • Note 844983.1 Apply CRS Bundle Patch Using opatch auto Option. 
  •  Online Patches (Hot Patching). 
  •  Critical Patch Update - Introduction to Database n-Apply CPUs. 
  •  11.2.0.X Grid Infrastructure PSU Known Issues

Upgrade and Platform Migration for RAC and Oracle Clusterware


When migrating a RAC cluster to a new platform or upgrading a RAC cluster from an older version of Oracle to the current version of Oracle, the following references will be helpful: 
  •  Oracle 10gR2 Upgrade Companion.
  •  Oracle 11gR1 Upgrade Companion. 
  •  Oracle 11gR2 Upgrade Companion. 
  •  Complete Checklist for Manual Upgrades to 11gR1. 
  •  How to Convert 10g Single-Instance database to 10g RAC using Manual Conversion procedure. 
  •  Cluster Ready Services (CRS) rolling upgrade 

References for running RAC with Applications

  •  Oracle Clusterware and Application Failover Management

RAC and Oracle E-Business

  •  Configuring Oracle Applications Release 11i with Oracle10g Release 2 Real Application Clusters and Automatic Storage Management.
  •  Using Load-Balancers with Oracle E-Business Suite Release 12 
  •  Implementing Load Balancing On Oracle E-Business Suite - Documentation For Specific Load Balancer Hardware 
  •  Advanced Configurations and Topologies for Enterprise Deployments of E-Business Suite 11i 
  •  MAA Roadmap for the E-Business Suite 
  •  Using Oracle 10gR2 RAC and Automatic Storage Management with Oracle E-Business Suite Release 12 

RAC ASM Documentation

Automatic Storage Management (ASM) is an evolution in file system and volume management functionality for Oracle database files. ASM further enhances automation and simplicity in storage management that is critical to the success of the Oracle grid architecture. ASM also improves file system scalability and performance, manageability and database availability for RAC environments. Use of ASM with RAC is an Oracle Best Practice. 
  •  Master Note for Automatic Storage Management (ASM)
  •  Master Note for Oracle Database Machine and Exadata Storage Server

MAA/Standby Documentation

Data Guard provides the management, monitoring, and automation software infrastructure to create and maintain one or more standby databases to protect Oracle data from failures, disasters, errors, and data corruptions. As users commit transactions at a primary database, Oracle generates redo records and writes them to a local online log file. 
  •  Master Note for Data Guard

Using My Oracle Support Effectively

  •  New Customers Start Here
  •  My Oracle Support - The Next Generation Support Platform
  •  My Oracle Support Configuration Management FAQ
  •  My Oracle Support Health Checks Catalog
  •  Working Effectively With Global Customer Support
  •  Escalating Service Requests with Oracle Support Services

 Keywords and search terms: RAC, ASM, CRS, Cluster, Clusterware, DB Machine, Exadata, GI, Grid, Infrastructure, OCR, Oracle, RAC, Votedsk, votedisk, 'Voting Disk', ohasd, ocssd, crsd, evmd, srvctl, crsctl, 'Grid Infrastructure', 'Real Application Clusters', HA, Scalability, ACFS, ADVM

*** - For reference to the content in this blog, and for the latest and most up to date information, refer toNote:1096952.1  Master Note for Real Application Clusters (RAC) Oracle Clusterware and Oracle Grid Infrastructure

=========================================================================================

To BottomTo Bottom

In this Document
Purpose
Scope
Details
RAC Platform Specific Starter Kits and Best Practices
RAC Platform Generic Load Testing and System Test Plan Outline
ORAchk (Formerly RACcheck) - Oracle Configuration Audit Tool
Top 11 Things to do NOW to Stabilize your RAC Cluster Environment
Design Considerations
General Design Considerations
Networking Considerations
Storage Considerations (Including ASM)
Clusterware and Grid Infrastructure Configuration Considerations
Virtualization Considerations
Installation Considerations
Patching Considerations
Upgrade Considerations
Clusterware, ASM and Grid Infrastructure Upgrade Considerations
RDBMS Upgrade Considerations
Database Configuration Considerations for RAC
Database Initialization Parameter Considerations
Performance Tuning Considerations
Database Network Considerations
General Database Considerations
Diagnostics and Troubleshooting
General Considerations
Clusterware and Grid Infrastructure Diagnostics
RAC Database Diagnostics
Patching Diagnostics (OPatch)
References




APPLIES TO:

Oracle Database - Enterprise Edition - Version 10.2.0.1 to 12.1.0.2 [Release 10.2 to 12.1]
Information in this document applies to any platform.

PURPOSE

The goal of the Oracle Real Application Clusters (RAC) series of Best Practice and Starter Kit notes is to provide customers with quick knowledge transfer of generic and platform specific best practices for implementing, upgrading and maintaining an Oracle RAC system. This document is compiled and maintained based on Oracle's experience with its global RAC customer base.

This Starter Kit is not meant to replace or supplant the Oracle Documentation set, but rather, it is meant as a supplement to the same. It is imperative that the Oracle Documentation be read, understood, and referenced to provide answers to any questions that may not be clearly addressed by this Starter Kit.

All recommendations should be carefully reviewed by your own operations group and should only be implemented if the potential gain as measured against the associated risk warrants implementation. Risk assessments can only be made with a detailed knowledge of the system, application, and business environment.

As every customer environment is unique, the success of any Oracle Database implementation, including implementations of Oracle RAC, is predicated on a successful test environment. It is thus imperative that any recommendations from this Starter Kit are thoroughly tested and validated using a testing environment that is a replica of the target production environment before being implemented in the production environment to ensure that there is no negative impact associated with the recommendations that are made.

SCOPE

This article applies to all new and existing RAC implementations as well as RAC upgrades.

DETAILS

RAC Platform Specific Starter Kits and Best Practices

While this note focuses on platform independent (generic) RAC Best Practices, the following notes contain detailed platform specific best practices including Step-By-Step installation cookbooks.
Document 811306.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Linux)
Document 811280.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Solaris)
Document 811271.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Windows)
Document 811293.1 RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)
Document 811303.1 RAC and Oracle Clusterware Best Practices and Starter Kit (HP-UX)

RAC Platform Generic Load Testing and System Test Plan Outline

A critical task of any successful implementation, particularly mission critical Maximum Availability environments, is testing.  For a RAC environment, testing should include both load generation and fault injection testing.  Load testing will allow for measurement of how the system reacts under heavy load while fault injection testing will help ensure that the system reacts as designed when those inevitable hardware and/or software failures occur.   The following documents will provide you with guidance in performing this crucial testing.
Click here for a White Paper on available RAC System Load Testing Tools
Click here for a platform generic RAC System Test Plan Outline for 10gR2 and 11gR1

Click here for a platform generic RAC System Test Plan Outline for 11gR2


These documents are to be used to validate your system setup and configuration, and also as a means to practice responses and establish procedures in case of certain types of failures.

ORAchk (Formerly RACcheck) - Oracle Configuration Audit Tool

ORAchk is a Configuration Audit tool designed to audit various important configuration settings within Real Application Clusters (RAC), Oracle Clusterware (CRS), Automatic Storage Management (ASM) and Grid Infrastructure environments. This utility is to be used to validate the Best Practices and Success Factors defined in the series of Oracle Real Application Clusters (RAC) Best Practice and Starter Kit notes which are maintained by the RAC Assurance development and support teams. At present ORAchk supports Linux (x86 and x86_64), Solaris SPARC and AIX (with the bash shell) platforms. Those customers running RAC on ORAchk supported platforms are strongly encouraged to utilize this tool identify potential configuration issues that could impact the stability of the cluster.
Document 1268927.1 ORAchk - Oracle Configuration Audit Tool

Note:  Oracle is constantly generating and maintaining Best Practices and Success Factors from the global customer base.  As a result the ORAchk utility is frequently updated with this information.  That said, it is recommended that you ensure you are using the version of ORAchk prior to execution.

Top 11 Things to do NOW to Stabilize your RAC Cluster Environment

As a proactive measure to prevent cluster instability due to commonly know issues, the Oracle RAC Proactive Support team has compiled a list of the top 11 issues that can impact the stability of a RAC cluster.  Though all of these recommendations are contained within the series of of Best Practice and Starter Kit notes, we do strongly recommend the following note be reviewed as we do feel these are key success factors.
Document 1344678.1 Top 11 Things to do NOW to Stabilize your RAC Cluster Environment 

Design Considerations

The following Design Considerations are to provide guidance and best practice information around the infrastructure (Platform Independent) to support an Oracle RAC implementation.  This information not only pertains to new installations and upgrade but will also provide useful information for those supporting existing RAC implementations.

General Design Considerations

  • To simplify the stack and simplify vendor interactions, Oracle recommends avoiding 3rd party clusterware, unless absolutely necessary.
  • Automatic Storage Management (ASM) is recommended for database storage. Additional information regarding ASM can be found in Document 1187723.1.
  • Check the support matrix to ensure supportability of product, version and platform combinations or for understanding any specific steps which need to be completed which are extra in the case of some such combinations. Document 337737.1
  • Check with the Disk Vendor that the Number of Nodes, OS version, RAC version, CRS version, Network fabric, and Patches are certified, as some Storage/San vendors may require special certification for a certain number of nodes.
  • Plan and document capacity requirements. Work with server vendor to produce detailed capacity plan and system configuration, but consider: Use normal capacity planning process to estimate number of CPUs required to run workload. Both SMP and RAC clusters have synchronization costs as the number of CPUs increase. SMPs normally scale well for small number of CPUs, RAC clusters normally scale better than SMPs for large number of CPUs.
  • Eliminate any single points of failure in the architecture. Examples include (but are not limited to): Cluster interconnect redundancy (NIC bonding etc), multiple access paths to storage, using 2 or more HBA's or initiators and multipathing software, and Disk mirroring/RAID.  Additional details are found in the subsequent sections.
  • Having a system test plan to help plan for and practice unplanned outages is crucial. This note has an attached sample System Test Plan Outline, to guide your system testing to help prepare for potential unplanned failures.
  • It is strongly advised that a production RAC instance does not share a node with a DEV, TEST, QA or TRAINING instance. These extra instances can often introduce unexpected performance changes into a production environment.  Along the same lines, it is highly recommended that testing environments mirror production environments as closely as possible.

Networking Considerations

  • For the private network 10 Gigabit Ethernet is highly recommended, the minimum requirement is 1 Gigabit Ethernet.
  • Underscores are not be used in a host or domain name according to RFC952 - DoD Internet host table specification. The same applies for Net, Host, Gateway, or Domain name.
  • The default gateway must be on the same subnet as the VIPs (including SCAN VIPs) to prevent VIP start/stop/failover issues.  With 11gR2 this is detected and reported by the OUI, if the check is ignored this will result in the failure to start the VIPs resulting in failure of the installation itself.
  • It is recommended that the SCAN name (11gR2 and above) resolve via DNS to a minimum of 3 IP addresses round-robin regardless of the size of the cluster.  For additional information see the Understanding SCAN VIP white paper.
  • To avoid name resolution issues, ensure that the HOSTS files and DNS are furnished with both VIP and Public host names. SCAN must NOT be in the HOSTS file due to the fact that the HOSTS file is only able to represent a 1:1 host to IP mapping.
  • The network interfaces must have the same name on all nodes (e.g eth1 -> eth1 in support of the VIP and eth2 -> eth2 in support of the private interconnect).
  • Network Interface Card (NIC) names must not contain " . "
  • Jumbo Frames for the private interconnect is a recommended best practice for enhanced performance of cache fusion operations.  Reference: Document 341788.1
  • Use non-routable network addresses for private interconnect; Class A: 10.0.0.0 to 10.255.255.255, Class B: 172.16.0.0 to 172.31.255.255, Class C: 192.168.0.0 to 192.168.255.255.  Refer to RFC1918  and Document 338924.1 for additional information.
  • Make sure network interfaces are configured correctly in terms of speed, duplex, etc. Various tools exist to monitor and test network: ethtool, iperf, netperf, spray and tcp. SeeDocument 563566.1.
  • To avoid the public network or the private interconnect network from being a single point of failure, Oracle highly recommends configuring a redundant set of public network interface cards (NIC's) and private interconnect NIC's on each cluster node.. Document 787420.1.  Starting with 11.2.0.2 Oracle Grid Infrastructure can provide redundancy and load balancing for the private interconnect (NOT the public network), this is the preferred method of NIC redundancy for full 11.2.0.2 stacks (11.2.0.2 Database must be used).  More information can be found in Document 1210883.1.

    NOTE: If using the 11.2.0.2 Redundant Interconnect/HAIP feature - At present it is REQUIRED that all interconnect interfaces be placed on separate subnets.  If the interfaces are all on the same subnet and the cable is pulled from the first NIC in the routing table a rebootless-restart or node reboot will occur.  See Document 1481481.1for a technical description of this requirement.
     
  • For more predictable hardware discovery, place hba and nic cards in the same corresponding slot on each server in the Grid.
  • The use of a switch (or redundant switches) is required for the private network (crossover cables are NOT supported).
  • Dedicated redundant switches are highly recommended for the private interconnect due to the fact that deploying the private interconnect on a switch (even when using a VLAN) may expose the interconnect links to congestion and instability in the larger IP network topology. If deploying the interconnect on a VLAN, there should be a 1:1 mapping of VLAN to non-routable subnet and the VLAN should not span multiple VLANs (tagged) or multiple switches. Deployment concerns in this environment include Spanning Tree loops when the larger IP network topology changes, Asymmetric routing that may cause packet flooding, and lack of fine grained monitoring of the VLAN/port.  Reference Bug 9761210.
  • If deploying the cluster interconnect on a VLAN, review the considerations in the Oracle RAC and Clusterware Interconnect Virtual Local Area Networks (VLANs) white paper.
  • Consider using Infiniband on the interconnect for workloads that have high volume requirements.   Infiniband can also improve performance by lowering latency.  When Infiniband is in place the RDS protocol can be used to further reduce latency.  See Document 751343.1 for additional details.
  • Starting with 12.1.0.1 IPv6 is supported for the Public Network, IPv4 must be used for the Private Network.  Please see the Oracle Database IPv6 State of Direction white paperfor details.
  • For version Grid Infrastructure 11.2.0.2 multicast traffic must be allowed on the private network for the 230.0.1.0 subnet.  Patch: 9974223 (Included in GI PSU 11.2.0.2.1 and above) for Oracle Grid Infrastructure 11.2.0.2 enables multicasting on the 224.0.0.251 multicast address on the private network.  Multicast must be allowed on the private network for one of these 2 addresses (assuming the patch has been applied).  Additional information as well as a program to test multicast functionality is provided inDocument 1212703.1.

Storage Considerations (Including ASM)

  • Implement multiple access paths to storage array using two or more HBAs or initiators with multi-pathing software over these HBAs. Where possible, use the pseudo devices (multi-path I/O) as the diskstring for ASM. Examples are: EMC PowerPath, Veritas DMP, Sun Traffic Manager, Hitachi HDLM, IBM SDDPC, Linux 2.6 Device Mapper. This is useful for I/O loadbalancing and failover. Reference: Document 294869.1 and  Multipathing Best Practices white paper
  • Ensure Correct Mount Options for NFS Disks when RAC is used with NFS.The documented mount options are detailed in Document 359515.1 for each platform.
  • ASM is the current and future direction for Oracle Database storage.  That said, it is a highly recommended best practice that ASM be used (opposed to a clustered file system) within a RAC environment.  ASM is required for data file storage when using Oracle RAC Standard Edition.
  • Ensure that ASM Disk Discovery times are optimal by customizing the ASM Diskstring, See Document 1608549.1 Improving ASM Disk Discovery (ASM_DISKSTRING) Time Best Practices.
  • It is recommended to maintain no more than 2 ASM disk groups, one for database area and one for flash recovery area, on separate physical disks. RAID storage array LUNs can be used as ASM disks to minimize the number of LUNs presented to the OS.
  • A minimum of 4 LUNs that are identical in size and performance per ASM diskgroup (each LUN in a separate RAID group) should be used to ensure optimal performance.
  • Create external redundancy disk groups when using high-end storage arrays. High-end storage arrays generally provide hardware RAID protection. Use Oracle ASM mirroring redundancy when not using hardware RAID, or when you need host-based volume management functionality, such as mirroring across storage systems. You can use Oracle ASM mirroring in configurations when mirroring between geographically-separated sites (extended clusters).
  • For 11g, Automatic Memory Management (AMM) is enabled by default on an ASM instance, even when the MEMORY_TARGET parameter is not explicitly set. The default value used for MEMORY_TARGET is generally not sufficient and should be increased to 1536MB.  If you are experiencing ORA-4031 errors within your ASM Instance(s), adjustments to the AMM settings may be necessary.  Review Document 1370925.1 (ASM Instances Are Reporting ORA-04031 Errors) for guidance on proper settings to avoid shared pool exhaustion.
  • For 10g, increase ASM instance SGA parameter size allocations from their default values:
    SHARED_POOL_SIZE=128M - In addition the SHARED_POOL_SIZE should be increased 500KB for every disk group after the first five disk groups.
    LARGE_POOL=12M
    DB_CACHE_SIZE=64M
    PROCESSES=150
  • In 11g environments ASM processes parameter may need modified when multiple databases are sharing the same ASM instance. While the rule of thumb is 150 processes, the formula that can be used to determine the appropriate value is:
    ASM processes = 50 x ( + <1 10="" first="" for="">) + 10 x ()
  • Choose a hardware RAID stripe size that is a power of 2 and less than or equal to the size of the Oracle ASM allocation unit.
  • ORA-15196 (ASM block corruption) can occur, if LUNs larger than 2TB are presented to an ASM diskgroup. As a result of the fix, ORA-15099 will be raised if a disk larger than 2TB is specified. This is irrespective of the presence of asmlib. Workaround: Do not add more than 2 TB size disk to a diskgroup. Reference: Document 6453944.8
  • On some platforms repeat warnings about AIO limits may be seen in the alert log:
    "WARNING:Oracle process running out of OS kernel I/O resources." Apply Patch 6687381, available on many platforms. This issue affects 10.2.0.3, 10.2.0.4, and 11.1.0.6. It is fixed in 11.1.0.7. Document 6687381.8
  • The occurrence of Bug 5100163 (possible metadata corruption) has being identified during an ASM upgrade from release 10.2 to release 11.1 or 11.2, this bug could only occur having ASM diskgroups with an AU > 1 MB (before the ASM upgrade is performed). This bug is not encountered with new diskgroups created directly on release 11.1 or 11.2.
    • In order to prevent any occurrence of Bug 5100163, a public alert has been generated and it is visible through My Oracle Support. Reference: Document 1145365.1Alert: Querying v$asm_file Gives ORA-15196 After ASM Was Upgraded From 10gR2 To 11gR2.  In short, you would want to run an "alter diskgroup check all repair" to validate and repair any upgraded diskgroups.

Clusterware and Grid Infrastructure Configuration Considerations

  • For versions prior to 11gR2 it is recommended that Voting Disks be stored on RAW or Block Devices (depending on the OS and Oracle version) and Oracle supplied redundancy be used regardless of the underlying storage configuration.   Two OCRs are recommended.  Voting Disks should be be maintained in odd numbers making the minimum number 3.  Odd numbers of Voting Disks are recommended because losing 1/2 or more of all of your voting disks will cause nodes to get evicted from the cluster, or nodes to evict themselves out of the cluster.
  • With Grid Infrastructure 11gR2, RAW (and block) devices have been deprecated making ASM the recommended method of storing the OCR and Voting Disks.  When storing the OCR and Voting Disk within ASM in 11gR2 and higher it is recommended to maintain a separate diskgroup for OCR and Voting Disk (not in the same diskgroup which stores database related files).  If you are utilizing external redundancy (see ASM Considerations for details on diskgroup redundancy) for your disk groups this means you will have 1 Voting Disk and 1 OCR. It is recommended to use Normal or High redundancy if external redundancy is not in use. 
  • With 11.2.0.2 Oracle Grid Infrastructure can provide redundancy and load balancing for the private interconnect (NOT the public network), this is the preferred method of NIC redundancy for full 11.2.0.2 stacks (11.2.0.2 Database must be used). More information can be found in Document 1210883.1.
  • For versions 10gR2 and 11gR1, it is a best practice on all platforms to set the CSS diagwait parameter to 13 in order to provide time for dumping diagnostics in case of node evictions. Setting the diagwait above 13 is NOT recommended without explicit instruction from Support. This setting is no longer required in Oracle Clusterware 11g Release 2. Reference Document 559365.1 for more details on diagwait.
  • DO NOT set the ORA_CRS_HOME environment variable (on all platforms). Setting this variable will problems for various Oracle components, and it is never necessary for CRS programs because they all have wrapper scripts.

Virtualization Considerations

  • Oracle Clusterware, Grid Infrastructure and RAC are supported on specific Virtualization technologies (e.g. Oracle VM) with specific platform, version and patch requirements.  When deploying RAC in a Virtualized environment it is essential that the support requirements documented in the RAC Virtualization Matrix are met to ensure a successful and supported deployment.

Installation Considerations

  • Execute root.sh/rootupgrade.sh as the "real" root user, not sudo. When switching to the root user to execute rootupgrade.sh, "su -" or "su - root" provides the full root environment, while sudo, pbrun, "su root" or "su" or similar facilities don't. It is recommended to execute root.sh/rootupgrade.sh with full root environment.  See Document 1210883.1Document 1235944.1Document 1259874.1 and  Document 1315203.1 for additional details.
  • It is recommended that local file systems on local disks are used for installation of Oracle RAC software to allow for rolling patches, avoid a single point of failure as well as other factors.  See the Oracle Homes in an Oracle Real Application Clusters Environment white paper for additional information.  Do note that 11gR2 Grid Infrastructure is not supported on clustered file systems, see Section 2.5.4 of Oracle Grid Infrastructure Installation Guide 11g Release 2 (11.2) for Linux.
  • Check Cluster Prerequisites Using cluvfy (Cluster Verification Utility). Use cluvfy at all stages prior to and during installation of Oracle software. When installing a pre-11gR2 release it is crucial to download the latest version of cluvfy OTNDocument 339939.1 and Document 316817.1 contain more relevant information on this topic.
  • When performing pre-11gR2 installations is recommended to patch the Clusterware Home to the desired level before performing any RDBMS or ASM home install.  For example, install Clusterware 10.2.0.1 and patch to 10.2.0.4 before installing 10.2.0.1 RDBMS.
  • In pre-11gR2 environments, install ASM in a separate ORACLE_HOME from the database for maintenance and availability reasons (eg., to independently patch and upgrade).  For ease of upgrades to 11gR2, the ASM software owner should be kept the same as the Clusterware software owner.
  • Starting with 11gR2, all Patchsets are fully installable releases.  For example, to install 11.2.0.2 (11gR2 Patchset 1) you will install directly from the 11.2.0.2 opposed to installing 11.2.0.1 and patching to 11.2.0.2.  With 11gR2 Grid Infrastructure, all Patchsets are out-of-place upgrades.  With 11gR2 RDBMS you can perform either an out-of-place or in-place upgrade, with out-of-place being the recommended method.  More information can be found in Document 1189783.1.
  • If you are installing Oracle Clusterware as a user that is a member of multiple operating system groups, the installer installs files on all nodes of the cluster with group ownership set to that of the user's current active or primary group.  Therefore:  ensure that the first group listed in the file /etc/ group is the current active group OR invoke the Oracle Clusterware installation using the following additional command line option, to force the installer to use the proper group when setting group ownership on all files:  runInstaller s_usergroup=current_active_group (Bug 4433140)

Patching Considerations

This section is targeted towards developing a proactive patching strategy for new and existing implementations. For new implementations, it is strongly recommended that the latest available Patchset and applicable Patch Set Update (PSU) for your platform be applied at the outset of your testing. In cases where that latest version of the RDBMS cannot be used because of lags in internal or 3rd party application certification or due to other limitations, it is still supported to have the CRS Home and ASM (or Grid Infrastructure) Homes running at a later patch level than the RDBMS Home. As a best practice (with some exceptions, see the Note in the references section below), Oracle Support recommends that the following be true:
  • The Clusterware (or Grid Infrastructure) MUST be at a patch level or version that is greater than or equal to the patch level of version of the RDBMS Home (to the 4th dot in a given release). For pre-11.2 the Clusterware must be a patch level or version that is greater than or equal to the patch level or version of the ASM and RDBMS home (to the 4th dot in a given release).
  • Before patching the database, ASM or Clusterware homes using opatch check the available space on the filesystem and use Document 550522.1 in order to estimate how much space will be needed and how to handle the situation if the filesystem should fill up during the patching process.
  • Document 557934.1 provides a basic overview of patching Oracle Clusterware in a pre-11gR2 environment and clarifies how the Oracle Clusterware components are updated through patching.
  • If patching Grid Infrastructure from 11.2.0.1 to 11.2.0.2, it is essential that Document 1312225.1 - "Things to Consider Before Upgrading to Grid Infrastructure 11.2.0.2" be reviewed.  This document states all of the prerequisites and procedures that MUST be followed to ensure a successful upgrade to 11.2.0.2.
  • Develop a proactive patching strategy, to stay ahead of the latest known issues. Keep current with the latest Patch Set Updates (as documented in Document 850471.1) and be aware of the most current recommended patches (as documented in Document 756671.1). Plan for periodic (for example: quarterly) maintenance windows to keep current with the latest recommended PSUs and patches.
  • With 11.2.0.3 GI was enhanced to utilize broadcast or multicast (on 230.0.1.0 or 224.0.0.251 addresses) to bootstrap. However the 11.2.0.3.5 GI PSU introduces a new issue with effectively disables the broadcast functionality (Bug 16547309).  Do note that most networks do support multicast on the 224.0.0.251 multicast address without any special configuration, therefore the odds of this being an issue for 11.2.0.3.5 - 11.2.0.3.7 and 12.1.0.1 are greatly reduced.  See Document 1212703.1 and Document 1564555.1 for additional information, available patches and corrective action.

Upgrade Considerations

This section is actually broken into 2 sub-sections.  The first section is covers the Clusterware, ASM and Grid Infrastructure upgrades and the second section covers the RDBMS upgrades.

Clusterware, ASM and Grid Infrastructure Upgrade Considerations

  • For upgrades to 11.2.0.3 and above, utilize the ORAchk Upgrade Readiness Assessment to assist in pre-upgrade requirements planning and post-upgrade validation.  See the Upgrade Readiness Tab of Document 1268927.2 ORAchk for additional details.
  • Execute root.sh/rootupgrade.sh as the "real" root user, not sudo. When switching to the root user to execute rootupgrade.sh, "su -" or "su - root" provides the full root environment, while sudo, pbrun, "su root" or "su" or similar facilities don't. It is recommended to execute root.sh/rootupgrade.sh with full root environment.  See Document 1210883.1Document 1235944.1Document 1259874.1 and  Document 1315203.1 for additional detail.
  • Oracle Clusterware and Oracle ASM upgrades to Grid Infrastructure are always out-of-place upgrades. With 11g release 2 (11.2), you cannot perform an in-place upgrade of Oracle Clusterware and Oracle ASM to existing homes.
  • If the existing Oracle Clusterware home is a shared home, note that you can use a non-shared home for the Oracle Grid Infrastructure for a Cluster.
  • Prior to beginning an upgrade of or to Grid Infrastructure, it is essential that the following is reviewed (depending on target version):
    Document 1312225.1 - Things to Consider Before Upgrading to Grid Infrastructure 11.2.0.2
    Document 1363369.1 - Things to Consider Before Upgrading to 11.2.0.3/11.2.0.4 Grid Infrastructure/ASM
  A few key points to take away from the above documents are as follows (details are found in the note):
  • Validate the health of the existing Clusterware and ASM (or Grid Infrastructure) configuration
  • Ensure all prerequisite patches are applied to the existing Clusterware/ASM/Grid Infrastructure Homes for example: 
    • When upgrading from GI 11.2.0.1 to 11.2.0.2, Patch 9706490 must be applied to the GI 11.2.0.1 home prior to attempting the upgrade.
  • For upgrades to 11.2.0.2, validate multicast functionality on the private interconnect network
  • Patch 11.2.0.2 upgrades to the latest GI PSU prior to executing rootupgrade.sh (or root.sh), instructions for doing so are found in Document 1312225.1.
  • To upgrade 10gR2 Clusterware to 11g, you must start with minimum version 10.2.0.3 as stated in following Oracle Upgrade Guide 11gR1http://download.oracle.com/docs/cd/B28359_01/server.111/b28300/upgrade.htm#BABCEGBC, similar is also stated in the 11gR2 GI platform specific documentation.  This document states the following:

    Note: A new prerequisite check has been added to ensure that Oracle Clusterware release 10.2.0.x is at release 10.2.0.3 (or higher), before you attempt to upgrade it to Oracle Clusterware 11g release 1 (11.1). If this check fails, then you are instructed to apply Oracle Clusterware patch set release 10.2.0.3.0 or later to your existing release before it can be upgraded. All other upgrade paths and fresh install cycles are unaffected by this prerequisite check.
  • Use rolling upgrades where appropriate for Oracle Clusterware (CRS) Document 338706.1.  For detailed upgrade assistance, refer to the appropriate Upgrade Companion for your release:  Document 466181.1 10g Upgrade Companion and Document 601807.1 Oracle 11gR1 Upgrade Companion.  With 11gR2, the upgrade of the Clusterware itself will be rolling (old stack MUST be up on all nodes), ASM upgrades will be rolling for ASM 11.1 and above.  Pre-11.1 versions of ASM are NOT rolling.
  • If there are plans to run a pre-11gR2 databases within an 11gR2 Grid Infrastructure environment, review Document 948456.1 : Pre 11.2 Database Issues in 11gR2 Grid Infrastructure Environment.
  • The 11.2.0.2 HAIP feature will NOT provide NIC redundancy or load balancing for pre-11.2.0.2 databases, if there are plans to run a pre-11.2.0.2 database on 11.2.0.2 Grid Infrastructure you must use a 3rd party NIC redundancy solution as you would have done in pre-11.2.0.2 releases.

RDBMS Upgrade Considerations

  • Upgrading from Oracle Database 10g to 11g: What to expect from the Optimizer:
    http://www.oracle.com/technetwork/database/bi-datawarehousing/twp-sql-plan-management-11gr2-133099.pdf
  • For those upgrading a database and require minimal downtime consider using a transient logical standby, refer to:  Document 949322.1 : Oracle11g Data Guard: Database Rolling Upgrade Shell Script.

Database Configuration Considerations for RAC

Database Initialization Parameter Considerations

  • Observe best practices with regard to large SGAs for RAC databases using very large SGA (e.g. 100GB) per instance. Document 1619155.1 : Best Practices and Recommendations for RAC databases using very large SGA (e.g. 100 GB)
  • Set PRE_PAGE_SGA=false. If set to true, it can significantly increase the time required to establish database connections. In cases where clients might complain that connections to the database are very slow then consider setting this parameter to false, doing so avoids mapping the whole SGA and process startup and thus saves connection time.
  • Be sure to monitor the number of active servers and calculate the average value to be applied for PARALLEL_MIN_SERVERS. This can be done by:
Select * from v$pq_syssstat;
Then: Get/save the value for row "Servers Highwater"
  • Tune PARALLEL_MAX_SERVERS to your hardware. Start with (2 * ( 2 threads ) *(CPU_COUNT)) = 4 x CPU count and repeat test for higher values with test data.
  • Consider setting FAST_START_PARALLEL_ROLLBACK. This parameter determines how many processes are used for transaction recovery, which is done after redo application. Optimizing transaction recovery is important to ensure an efficient workload after an unplanned failure. As long as the system is not CPU bound, setting this to a value of HIGH is a best practice. This causes Oracle to use four times the CPU count (4 X cpu_count) parallel processes for transaction recovery. The default for this parameter is LOW, or two times the CPU count (2 X cpu_count).
  • Set FAST_START_MTTR_TARGET to a non-zero value in seconds. Crash recovery will complete within this desired time frame.
  • In 10g and 11g databases, init parameter ACTIVE_INSTANCE_COUNT should no longer be set. This is because the RACG layer doesn't take this parameter into account. As an alternative, you should create a service with one preferred instance.
  • For versions prior to 11gR2, increase PARALLEL_EXECUTION_MESSAGE_SIZE from default (normally 2048) to 8192. This can be set higher for data warehousing based systems where there is a lot of data transferred through PQ.  In version 11gR2, the default for PARALLEL_EXECUTION_MESSAGE_SIZE is 16K, which should prove sufficient in most cases.
  • Set OPTIMIZER_DYNAMIC_SAMPLING = 1 or simply analyze your objects because 10g Dynamic sampling can generate extra CR buffers during execution of SQL statements.
  • For all O/S’s Oracle releases 10.2.0.3 through 11.0.6.0, LMON process can become hung and instance Membership Recovery (IMR) cannot evict hung instances. This is fixed in 11.1.0.7. This can be corrected in 10gR2 via bug 5259835. This enhancement is included in 10.2.0.4.1 and later PSU and as a one-off patch for 10.2.0.3 on some platforms and in 10.2.0.5. For 11.1.0.6.0, the following settings would need to be configured:
    _lm_rcvr_hang_check_frequency = 20
    _lm_rcvr_hang_allow_time = 70
    _lm_rcvr_hang_kill = true

Performance Tuning Considerations

  • In any database system, RAC or single instance, the most significant performance gains are usually obtained from traditional application tuning techniques. The benefits of those techniques are even more remarkable in a RAC database.
  • Remove unselective indexes. In RAC environments, unselective index blocks may be subject to inter-instance contention, increasing the frequency of cache transfers for indexes belonging to INSERT intensive tables.
  • To avoid the performance impact of "checkpoint not complete" conditions and frequent log switches, it is recommended that a minimum of 3 redo log groups per thread are created and the size of the redo logs allows for log switches to occur every 15 - 30 minutes.  See Document 147468.1 for details>.
  • Use Automatic Segment Space Management (ASSM). ASSM tablespaces automate freelist management and remove the requirement/ability to specify PCTUSED, FREELISTS and FREELIST GROUPS storage parameters for individual tables and indexes created in these tablespaces.  See Document 180608.1 for additional details.
  • Increasing sequence caches for insert intensive applications improves instance affinity to index keys deriving their values from sequences.  Increase the Cache for Application Sequences and some System sequences for better performance. Use a large cache value of maybe 10,000 or more. Additionally use of the NOORDER attribute is most effective, but it does not guarantee sequence numbers are generated in order of request (this is actually the default).
    Note: this recommendation is not suitable for OLAP S$ objects. If those are reorganized, the OLAP AW (which contains all multidimensional Data) becomes corrupted as they strictly depend on the order the values are in.
  • The default setting for the SYS.AUDSES$ sequence is 20, this is too low for a RAC system where logins can occur concurrently from multiple nodes.  Refer to Document 395314.1.   This issue is fixed in 10.2.0.3 and above.
  • Increase retention period for AWR data from 7 days to at least one business cycle. Use the awrinfo.sql script to budget for the amount of information required to be stored in the AWR and hence sizing the same.
  • The recommendation for sizing the buffer cache in a RAC environment is similar to a single node where the same tuning procedures would apply. Although actual memory requirements will depend on application workload, the general recommendation is approximately a 10% increase in buffer cache to accommodate scale up requirements (example: 1 node – 1000 users, 2 nodes - 2000 users).
  • Tune DataGuard to avoid cluster related waits. Improperly tuned DataGuard settings can cause high LOG FILE SYNC WAIT and GLOBAL CACHE LOG FLUSH TIME. Reference:http://www.oracle.com/technetwork/database/features/availability/maa-wp-10gr2-dataguardnetworkbestpr-134557.pdf and http://www.oracle.com/technetwork/database/features/availability/maa-wp-10gr2-recoverybestpractices-131010.pdf

Database Network Considerations

  • Configure Oracle Net Services load balancing properly to distribute connections. Load balancing should be used in combination with Workload Management to provide the highest availability. The CLB_GOAL attribute of workload services should be configured appropriately depending upon application requirements. Different workloads might require different load balancing goals. Use separate services for each workload with different CLB_GOAL.
  • For pre-11.2 listeners, ensure that the IPC entry exists and is listed first in the listener.ora address list for ALL RAC listeners. Failure to do so can adversely impact the amount of time it takes the VIP to fail over if the public network interface should fail. Refer to Document 403743.1 for details.
  • For pre-11.2 implementation, it is recommended that the listener is run from the ASM home (assuming ASM is run from a dedicated ORACLE_HOME). This allows the registration of multiple databases on a node with the listener without being tied to a specific database home.
  • For 11gR2 the listener will be running from the Grid Infrastructure home by default. The best practice is to NOT modify this configuration.
  • For 11gR2 environments the TNS_ADMIN environment variable should NOT be set for the Grid Infrastructure environment for it can hinder the ability for the oraagent to maintain the listener configuration files. TNS_ADMIN can also impact the ability for the Clusterware to start, stop and monitor the listeners.
  • It is highly recommended to utilize SCAN for your 11.2 databases (for 11.2 client connectivity) as it simplifies client connectivity and eliminates the need to modify database connect strings when the cluster grows and/or shrinks. SCAN can also be integrated with pre-11.2 clients following Document 1058646.1. More information on SCAN can be found in Document 887522.1.
  • Listeners should implement remote listener cross-registration to support workload redirection and load balancing to allow for distribution of service connection requests across a RAC cluster.  With this configuration each listener is aware of which instance handling a particular service and how much load is currently on a given database instance. This enables clients load balance connection requests among the listeners and for workloads to be managed at the service level.  See Document 728787.1 - How to Set Up Cross-Registration in RAC.
  • For pre-11gR2 the LOCAL_LISTENER parameter should point to the VIP of the listener local to the system in which a particular database instance is running on.  For 11gR2 the LOCAL_LISTENER parameter should be set to its default which allows for the oraagent to properly set the parameter in accordance with the listener configuration stored in the OCR.
  • Generally speaking only 1 listener per RAC node (SCAN excluded) is necessary to handle connections for all databases running on a given cluster.  Should there be some need or requirement for multiple listeners on each cluster node it is important that these additional listeners are registered with the clusterware to ensure HA and proper management of the additional listeners.
  • It is recommended that a database service be created to handle application connections leaving the default database service for administration purposes. This will provide administrators with the flexibility to implement features such as server side TAF, FAN, FCF and Runtime Connection Load Balancing in addition to the ability for administrators to easily control which instances applications are able to connect to. For more information on services see Chapter 5 of Oracle Real Application Clusters Administration and Deployment Guide 11g Release 2 (11.2)
  • Take advantage of connection oriented availability features such as Transparent Application Failover (TAF), Fast Application Notification (FAN), Fast Connection Failover (FCF), Runtime Connection Load Balancing (RCLB).  For more details see the Automatic Workload Management with Oracle Real Application Clusters white paper.
  • Use Class of Secure Transport (COST) to Restrict Instance Registration thus preventing unauthorized registrations with database listeners.  This will ensure you are in compliance with the issues published in Oracle Security Alert CVE-2012-1675.  Instructions for implementing COST Listener Registration can be found in Document 1340831.1.

General Database Considerations

  • In 10gR2 and above the LMS process is intended to run in the real time scheduling class. In some instances we have seen this prevented due to incorrect ownership or permissions for the oradism executable which is stored in the $ORACLE_HOME/bin directory. See Document 602419.1 for more details on this.
  • The number of database instances running on a given cluster node should be limited such that the total number of real-time LMS processes on a node does not exceed the recommended number of #CPUs - 1. All of these real-time processes create the potential to overload a given cluster node resulting in poor performance and instance/node eviction.  In addition to this the modification of the number of LMS processes (gcs_server_processes parameter) for a given database should NOT be modified away from the default without guidance from Oracle Support. See Document 558185.1 for details.
  • For versions prior to 11gR2 (where NUMA is disabled by default), ensure the NUMA (Non Uniform Memory Architecture) feature is turned OFF unless explicitly required and tested, as there have been issues reported with NUMA enabled.  Refer to Document 759565.1 for more details.
  • Use SRVCTL from the RDBMS Home in which the database runs from to register and manage database resources.  Resource registration must be done by the OS user who owns the RDBMS software.

Diagnostics and Troubleshooting

General Considerations

  • Utilize TFA to simplify diagnostic data collections.  Trace File Analyzer Collector (TFA) is a diagnostic collection utility to simplify diagnostic data collection on Oracle Clusterware/Grid Infrastructure, RAC and Single Instance Database systems.  TFA is similar to the diagcollection utility packaged with Oracle Clusterware in the fact that it collects and packages diagnostic data however TFA is MUCH more powerful than diagcollection with its ability to centralize and automate the collection of diagnostic information.  TFA ships with 11.2.0.4 and 12.1.0.2 Grid Infrastructure and is also available standalone via Document 1513912.1.  TFA provides the following key benefits:
    • Increase Productivity with Reduction in time and effort in diagnostic collection
      • Diagnostic collection is performed with a single command executed from a single node
      • Diagnostics collected include log and trace files for Grid Infrastructure/Oracle Clusterware, ASM, RAC and the OS
      • Output is consolidated to a single node for ease of upload
    • Reduce diagnostic data uploads to support by a factor of 10x or morein most cases
      • Diagnostic files are isolated and trimmed to only include data for incident time
    • Reduce SR Resolution times
      • Proper diagnostics collection for incidents ensure complete diagnostic data collections thus minimizing the need for support to request additional data
  • Review Document 166650.1 Working Effectively with Global Customer Support.  This will provide understanding support processes, procedures and resources available to you as a Oracle Support customer.
  • Install and run OSWATCHER (OSW) proactively for OS resource utilization diagnosability. OSW is a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid diagnosing performance issues that is designed to run continuously and to write the metrics to ASCII files which are saved to an archive directory. The amount of archived data saved and frequency of collection are based on user parameters set when starting OSW. It is highly recommended that OSW be installed and run continuously on ALL cluster nodes, at all times. Document 301137.1. Be sure to use separate directories per node for storing OSW output. When using OSWatcher in a RAC environment, each node must write its output files to a separate archive directory. Combining the output files under one archive (on shared storage) is not supported and causes the OSWg tool to crash. Shared storage is fine, but each node needs a separate archive directory.
  • On the Windows, Solaris and Linux platforms, Cluster Health Monitor (CHM) can be used to track OS resource consumption and collect and analyze data cluster-wide. CHM is installed automatically with Grid Infrastructure 11.2.0.3 and above for all platforms except HPUX.  Independent downloads of CHM are available for Linux (if not running 11.2.0.2 GI).  If you are running 11.2.0.2 GI on Solaris or Linux or 11.2.0.3 for all other platforms you must NOT install the independent download of CHM.  For more information on CHM see Document 1328466.1.
  • Install/Utilize the Tools in the Support Tools bundle (TFA, ORAchk, OSWatcher, Procwatcher, SQLT, ORATOP, RDA) to decrease the number of Reactive issues and improve efficently when dealing with Reactive types of issues.  The RAC and DB Support Tools Bundle is available within Document 1594347.1.
  • Use the cluster deinstall tool to remove CRS install - if needed. The clusterdeconfig tool removes and deconfigures all of the software and shared files that are associated with an Oracle Clusterware or Oracle RAC Database installation. The clusterdeconfig tool removes the software and shared files from all of the nodes in a cluster. Reference:http://www.oracle.com/technology/products/database/clustering/index.html

Clusterware and Grid Infrastructure Diagnostics

  • When opening an SR with Oracle Support related to issues with the Clusterware/Grid Infrastructure it is recommended that the output of the diagcollection utility be uploaded as outlined in Document 330358.1.  If this is an eviction related issue, OSWatcher or CHM data should also be uploaded (see above for OSWatcher and/or CHM information).  Providing this data at the time of SR creation can decrease the turnaround time of SRs.  Additional information on diagnostics gathering for Grid Infrastructure and Clusterware issues can be found at:
    Data Gathering for Troubleshooting CRS Issues Document 289690.1
  • The following notes are often of value when troubleshooting RAC related database issues:
    RAC: Frequently Asked Questions Document 220970.1
    11gR2 Clusterware and Grid Home - What You Need to Know Document 1053147.1
    Troubleshooting 11.2 Clusterware Node Evictions (Reboots) Document 1050693.1
    Troubleshooting 10g and 11.1 Clusterware Reboots Document 265769.1
    Data Gathering for Troubleshooting CRS Issues Document 289690.1

    Note: 
    Additional information can be found in the Master Note for Real Application Clusters (RAC) Oracle Clusterware and Oracle Grid Infrastructure Document 1096952.1.

RAC Database Diagnostics

  • When opening an SR with Oracle Support related to issues with a RAC database be sure to review Document 289690.1 to ensure the proper diagnostic information is gathered and provided at the time of SR creation.  Providing this information up front can decrease the turnaround time of SRs.
  • The following notes are often of value when troubleshooting RAC related database issues:
    RAC: Frequently Asked Questions Document 220970.1
    GC Lost Blocks Diagnostics Document 563566.1
    Troubleshoot ORA-29740 errors in a RAC Environment Document 219361.1
    11g How to Unpack a Package in to ADR Document 745960.1
    11g Quick Steps to Package and Send Critical Error Diagnostic Information to Support Document 443529.1
    Data Gathering for Troubleshooting RAC Issues Document 289690.1

    Note:  Additional information can be found in the Master Note for Real Application Clusters (RAC) Oracle Clusterware and Oracle Grid Infrastructure Document 1096952.1.

Patching Diagnostics (OPatch)

  • The following notes are often of value when troubleshooting OPatch related issues:
    Top OPATCH/PATCH Questions/Issues in Oracle Clusterware (Grid Infrastructure or CRS) and RAC Environment Document 1339140.1

    Note: Additional information on OPatch can be found in the Master Note for OPatch Document 293369.1

Database - RAC/Scalability Community
To discuss this topic further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Database - RAC/Scalability Community

REFERENCES

NOTE:1344678.1 - Top 11 Things to do NOW to Stabilize your RAC Cluster Environment
NOTE:745960.1 - 11g How to Unpack a Package in to ADR
NOTE:359515.1 - Mount Options for Oracle files when used with NFS on NAS devices
NOTE:395314.1 - RAC Hangs due to small cache size on SYS.AUDSES$
NOTE:787420.1 - Cluster Interconnect in Oracle 10g and 11gR1 RAC
NOTE:949322.1 - Oracle11g Data Guard: Database Rolling Upgrade Shell Script
NOTE:294869.1 - Oracle ASM and Multi-Pathing Technologies
NOTE:6453944.8 - Bug 6453944 - ORA-15196 with ASM disks larger than 2TB
NOTE:338706.1 - Oracle Clusterware (CRS or GI) Rolling Upgrades
BUG:5259835 - INSTANCE EVICTION NOT HAPPENED BECAUSE LMON IS NOT RESPONSIVE
BUG:4433140 - INSTALLING CRS 10.2.0.1 050609: WRONG GROUP ID

NOTE:289690.1 - Data Collection for Troubleshooting Oracle Clusterware (CRS or GI) And Real Application Cluster (RAC) Issues
NOTE:301137.1 - OSWatcher (Includes: [Video])
NOTE:6687381.8 - Bug 6687381 - "WARNING: Oracle process running out of OS kernel I/O resources" messages
NOTE:751343.1 - Oracle Clusterware and RAC Support for RDS Over Infiniband
NOTE:563566.1 - Troubleshooting gc block lost and Poor Network Performance in a RAC Environment
NOTE:811293.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)
NOTE:1096952.1 - Master Note for Real Application Clusters (RAC) Oracle Clusterware and Oracle Grid Infrastructure
NOTE:1054902.1 - How to Validate Network and Name Resolution Setup for the Clusterware and RAC
NOTE:1210883.1 - Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip
NOTE:811303.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (HP-UX)
NOTE:1312225.1 - Things to Consider Before Upgrading to 11.2.0.2 Grid Infrastructure/ASM
NOTE:1050693.1 - Troubleshooting 11.2 Clusterware Node Evictions (Reboots)
NOTE:337737.1 - Oracle Clusterware (CRS/GI) - ASM - Database Version Compatibility
NOTE:602419.1 - LMS not running in RT (real time) mode in 10.2.0.3 RAC database
NOTE:1339140.1 - FAQ: OPatch/Patch Questions/Issues for Oracle Clusterware (Grid Infrastructure or CRS) and RAC Environments
NOTE:1189783.1 - Important Changes to Oracle Database Patch Sets Starting With 11.2.0.2
NOTE:147468.1 - Checkpoint Tuning and Troubleshooting Guide
NOTE:1328466.1 - Cluster Health Monitor (CHM) FAQ
NOTE:316817.1 - Cluster Verification Utility (CLUVFY) FAQ
NOTE:728787.1 - How to Set Up Cross-Registration in RAC
NOTE:887522.1 - Grid Infrastructure Single Client Access Name (SCAN) Explained
NOTE:557934.1 - Oracle Clusterware: Patch installation
NOTE:756671.1 - Oracle Recommended Patches -- Oracle Database
NOTE:759565.1 - Oracle NUMA Usage Recommendation
NOTE:601807.1 - Oracle 11gR1 Upgrade Companion
NOTE:219361.1 - Troubleshooting ORA-29740 in a RAC Environment
NOTE:403743.1 - Pre-11.2: VIP Failover Take Long Time After Network Cable Pulled
NOTE:220970.1 - RAC: Frequently Asked Questions
NOTE:265769.1 - Troubleshooting 10g and 11.1 Clusterware Reboots
NOTE:550522.1 - How To Avoid Disk Full Issues Because OPatch Backups Take Big Amount Of Disk Space.
NOTE:948456.1 - Pre 11.2 Database Issues in 11gR2 Grid Infrastructure Environment
NOTE:166650.1 - Working Effectively With Oracle Support - Best Practices
BUG:9761210 - NETWORK CHECK FOR IPMP BASED ON TAGGED VLAN FAILS FROM TIME TO TIME
NOTE:850471.1 - Oracle Announces First Patch Set Update For Oracle Database Release 10.2
NOTE:338924.1 - CLUVFY Fails With Error: Could not find a suitable set of interfaces for VIPs or Private Interconnect
NOTE:1145365.1 - Alert: Querying V$ASM_FILE Gives ORA-15196 After ASM Was Upgraded From 10gR2 To 11gR2 with an AU size > 1M
NOTE:785351.1 - Oracle 11gR2 Upgrade Companion
NOTE:811271.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (Windows)
NOTE:811306.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (Linux)
NOTE:466181.1 - Oracle 10g Upgrade Companion
NOTE:443529.1 - Database 11g: Quick Steps to Package and Send Critical Error Diagnostic Information to Support [Video]
NOTE:341788.1 - Recommendation for the Real Application Cluster Interconnect and Jumbo Frames
NOTE:558185.1 - LMS and Real Time Priority in Oracle RAC 10g and 11g
NOTE:339939.1 - Running Cluster Verification Utility to Diagnose Install Problems
NOTE:1363369.1 - Things to Consider Before Upgrading to 11.2.0.3/11.2.0.4 Grid Infrastructure/ASM
NOTE:559365.1 - Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions
NOTE:330358.1 - Oracle Clusterware 10gR2/ 11gR1/ 11gR2/ 12cR1 Diagnostic Collection Guide
NOTE:1268927.1 - ORAchk - Health Checks for the Oracle Stack
NOTE:265633.1 - ASM Technical Best Practices For 10g and 11gR1 Release
NOTE:1212703.1 - Grid Infrastructure Startup During Patching, Install or Upgrade May Fail Due to Multicasting Requirement
NOTE:1053147.1 - 11gR2 Clusterware and Grid Home - What You Need to Know
NOTE:1058646.1 - How to integrate a 10g/11gR1 RAC database with 11gR2 clusterware (SCAN)
NOTE:811280.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (Solaris)

=======================================================================



Comments