Category Archives: RAC

RAC Networking Considerations

Networking Considerations

For the private network 10 Gigabit Ethernet is highly recommended, the minimum requirement is 1 Gigabit Ethernet.

Underscores are not be used in a host or domain name according to RFC952 – DoD Internet host table specification. The same applies for Net, Host, Gateway, or Domain name.

The VIPs and SCAN VIPs must be on the same subnet as the public interface. For additional information see the Understanding SCAN VIP white paper.

The default gateway must be on the same subnet as the VIPs (including SCAN VIPs) to prevent VIP start/stop/failover issues. With 11gR2 this is detected and reported by the OUI, if the check is ignored this will result in the failure to start the VIPs resulting in failure of the installation itself.

It is recommended that the SCAN name (11gR2 and above) resolve via DNS to a minimum of 3 IP addresses round-robin regardless of the size of the cluster. For additional information see the Understanding SCAN VIP white paper.

To avoid name resolution issues, ensure that the HOSTS files and DNS are furnished with both VIP and Public host names. SCAN must NOT be in the HOSTS file due to the fact that the HOSTS file
is only able to represent a 1:1 host to IP mapping.

The network interfaces must have the same name on all nodes (e.g eth1 -> eth1 in support of the VIP and eth2 -> eth2 in support of the private interconnect).
Network Interface Card (NIC) names must not contain ” . ”

Jumbo Frames for the private interconnect is a recommended best practice for enhanced performance of cache fusion operations. Reference: Document 341788.1

Use non-routable network addresses for private interconnect; Class A: 10.0.0.0 to 10.255.255.255, Class B: 172.16.0.0 to 172.31.255.255, Class C: 192.168.0.0 to 192.168.255.255. Refer to RFC1918 and Document 338924.1 for additional information.
Make sure network interfaces are configured correctly in terms of speed, duplex, etc. Various tools exist to monitor and test network: ethtool, iperf, netperf, spray and tcp. See Document 563566.1.

To avoid the public network or the private interconnect network from being a single point of failure, Oracle highly recommends configuring a redundant set of public network interface cards (NIC’s) and private interconnect NIC’s on each cluster node.. Document 787420.1. Starting with 11.2.0.2 Oracle Grid Infrastructure can provide redundancy and load balancing for the private interconnect (NOT the public network), this is the preferred method of NIC redundancy for full 11.2.0.2 stacks (11.2.0.2 Database must be used). More information can be found in Document 1210883.1.

NOTE: If using the 11.2.0.2 Redundant Interconnect/HAIP feature – At present it is REQUIRED that all interconnect interfaces be placed on separate subnets. If the interfaces are all on the same subnet and the cable is pulled from the first NIC in the routing table a rebootless-restart or node reboot will occur. See Document 1481481.1 for a technical description of this requirement.

For more predictable hardware discovery, place hba and nic cards in the same corresponding slot on each server in the Grid.

The use of a switch (or redundant switches) is required for the private network (crossover cables are NOT supported).

Dedicated redundant switches are highly recommended for the private interconnect due to the fact that deploying the private interconnect on a switch (even when using a VLAN) may expose the interconnect links to congestion and instability in the larger IP network topology. If deploying the interconnect on a VLAN, there should be a 1:1 mapping of VLAN to non-routable subnet and the VLAN should not span multiple VLANs (tagged) or multiple switches. Deployment concerns in this environment include Spanning Tree loops when the larger IP network topology changes, Asymmetric routing that may cause packet flooding, and lack of fine grained monitoring of the VLAN/port. Reference Bug 9761210.

If deploying the cluster interconnect on a VLAN, review the considerations in the Oracle RAC and Clusterware Interconnect Virtual Local Area Networks (VLANs) white paper.

Consider using Infiniband on the interconnect for workloads that have high volume requirements. Infiniband can also improve performance by lowering latency. When Infiniband is in place the RDS protocol can be used to further reduce latency. See Document 751343.1 for additional details.

In 12.1.0.1 IPv6 is supported for the Public Network, IPv4 must be used for the Private Network. Starting with 12.2.0.1 IPv6 is fully supported for both the public and private interfaces. Please see the Oracle Database IPv6 State of Direction white paper for details.
For version Grid Infrastructure 11.2.0.2 multicast traffic must be allowed on the private network for the 230.0.1.0 subnet. Patch: 9974223 (Included in GI PSU 11.2.0.2.1 and above) for Oracle Grid Infrastructure 11.2.0.2 enables multicasting on the 224.0.0.251 multicast address on the private network. Multicast must be allowed on the private network for one of these 2 addresses (assuming the patch has been applied). Additional information as well as a program to test multicast functionality is provided in Document 1212703.1.

Top 5 Issues That Cause Node Reboots or Evictions or Unexpected Recycle of CRS

(Doc ID 1367153.1)

Issue #1: The Node rebooted, but the log files do not show any error or cause.

Issue #2: The node rebooted because it was evicted due to missing network heartbeats.

Issue #3: The node rebooted after a problem with storage.

The ocssd.log file shows that the node rebooted because it cannot access a majority of voting disks.

Solution: Fix the problem with the voting disk. Make sure that voting disks are available and accessible by user oracle or grid or any user who owns CRS or GI HOME.  If the voting disk is not in ASM, use “dd if= of=/dev/null bs=1024 count=10240” to test the accessibility.

Issue #4: Node is rebooted after asm or database instance hang or eviction.

The ocssd.log of surviving node shows a member kill request escalated to node kill request.

Cause: Starting 11.1, inability to evict a database or asm instance at the database level means that CRS gets involved and tries to kill the problem instance. This is a member kill request. If CRS cannot kill the problem instance, then CRS reboots the node because the member kill request is escalated to a node kill request.

Solution: Find out the reason that asm or database instance could not get evicted at the database level (lmon, lmd, and lms initiated eviction). One common cause is that the instance was hanging and is not responding to the remote instance’s request to die. Another cause is that one or more instance processes cannot be killed. One such example is that the process is in uninterruptible IO sleep.

Issue #5: The CRS recycled automatically, but node did not reboot

Cause: Starting 11.2.0.2, if CRS needs to reboot the node due to any reason listed here, CRs first tries to recycle itself before rebooting the node. Only when it cannot recycle itself successfully, CRS will reboot the node to recycle itself forcibly.
Solution: check which of the reasons for node reboot listed here is applicable and follow the solution listed for that.

RAC Log File Locations

RAC Log File Locations

If you are using Oracle RAC (doesn’t matter how many nodes you have) You need to know where log files are located.

The Cluster Ready Services Daemon (crsd) Log Files

Log files for the CRSD process (crsd) can be found in the following directories:

                 CRS home/log/hostname/crsd

Oracle Cluster Registry (OCR) Log Files

The Oracle Cluster Registry (OCR) records log information in the following location:

                CRS Home/log/hostname/client

Cluster Synchronization Services (CSS) Log Files

You can find CSS information that the OCSSD generates in log files in the following locations:

                CRS Home/log/hostname/cssd

Event Manager (EVM) Log Files

Event Manager (EVM) information generated by evmd is recorded in log files in the following locations:

                CRS Home/log/hostname/evmd

RACG Log Files

The Oracle RAC high availability trace files are located in the following two locations:

CRS home/log/hostname/racg

$ORACLE_HOME/log/hostname/racg

Core files are in the sub-directories of the log directories. Each RACG executable has a sub-directory assigned exclusively for that executable. The name of the RACG executable sub-directory is the same as the name of the executable.

You can follow below table which define locations of logs files:

Oracle Clusterware log files

Cluster Ready Services Daemon (crsd) Log Files:
$CRS_HOME/log/hostname/crsd

Cluster Synchronization Services (CSS):
$CRS_HOME/log/hostname/cssd

Event Manager (EVM) information generated by evmd:
$CRS_HOME/log/hostname/evmd

Oracle RAC RACG:
$CRS_HOME/log/hostname/racg
$ORACLE_HOME/log/hostname/racg

Oracle RAC 11g Release 2 log files

Clusterware alert log:
$GRID_HOME/log/<host>/alert<host>.log

Disk Monitor daemon:
$GRID_HOME/log/<host>/diskmon

OCRDUMP, OCRCHECK, OCRCONFIG, CRSCTL:
$GRID_HOME/log/<host>/client

Cluster Time Synchronization Service:
$GRID_HOME/log/<host>/ctssd

Grid Interprocess Communication daemon:
$GRID_HOME/log/<host>/gipcd

Oracle High Availability Services daemon:
$GRID_HOME/log/<host>/ohasd

Cluster Ready Services daemon:
$GRID_HOME/log/<host>/crsd

Grid Plug and Play daemon:
$GRID_HOME/log/<host>/gpnpd:

Mulitcast Domain Name Service daemon:
$GRID_HOME/log/<host>/mdnsd

Event Manager daemon:
$GRID_HOME/log/<host>/evmd

RAC RACG (only used if pre-11.1 database is installed):
$GRID_HOME/log/<host>/racg

Cluster Synchronization Service daemon:
$GRID_HOME/log/<host>/cssd

Server Manager:
$GRID_HOME/log/<host>/srvm

HA Service Daemon Agent:
$GRID_HOME/log/<host>/agent/ohasd/oraagent_oracle11

HA Service Daemon CSS Agent:
$GRID_HOME/log/<host>/agent/ohasd/oracssdagent_root

HA Service Daemon ocssd Monitor Agent:
$GRID_HOME/log/<host>/agent/ohasd/oracssdmonitor_root

HA Service Daemon Oracle Root Agent:
$GRID_HOME/log/<host>/agent/ohasd/orarootagent_root

CRS Daemon Oracle Agent:
$GRID_HOME/log/<host>/agent/crsd/oraagent_oracle11

CRS Daemon Oracle Root Agent:
$GRID_HOME/log/<host> agent/crsd/orarootagent_root

Grid Naming Service daemon:
$GRID_HOME/log/<host>/gnsd

Directories for RAC

Filesystems
/oracle – binaries for the database software 50GB – 100GB (enough for the current binaries as well as upgrade or download if necessary)
/oracle_crs – binaries for the grid infrastructure 50GB – 100GB (enough for the current binaries as well as upgrade or download if necessary)
/ora01 – at least 100GB for each server in the cluster, mounted to all servers

These are GoldenGate named databases
/ggate
/ggate2
/ggtrail
/ggtrail02

These are ACFS based mounts
/oracle_homes
/acfs_test_mount
/acfs_test2_mount

/oracle_crs/crs/diag/asm/+asm/+ASM*/trace

/oracle_crs/crs/diag/tnslsnr/*/listener/trace