Category Archives: RAC

Top 5 Issues That Cause Node Reboots or Evictions or Unexpected Recycle of CRS

(Doc ID 1367153.1)

Issue #1: The Node rebooted, but the log files do not show any error or cause.

Issue #2: The node rebooted because it was evicted due to missing network heartbeats.

Issue #3: The node rebooted after a problem with storage.

The ocssd.log file shows that the node rebooted because it cannot access a majority of voting disks.

Solution: Fix the problem with the voting disk. Make sure that voting disks are available and accessible by user oracle or grid or any user who owns CRS or GI HOME.  If the voting disk is not in ASM, use “dd if= of=/dev/null bs=1024 count=10240” to test the accessibility.

Issue #4: Node is rebooted after asm or database instance hang or eviction.

The ocssd.log of surviving node shows a member kill request escalated to node kill request.

Cause: Starting 11.1, inability to evict a database or asm instance at the database level means that CRS gets involved and tries to kill the problem instance. This is a member kill request. If CRS cannot kill the problem instance, then CRS reboots the node because the member kill request is escalated to a node kill request.

Solution: Find out the reason that asm or database instance could not get evicted at the database level (lmon, lmd, and lms initiated eviction). One common cause is that the instance was hanging and is not responding to the remote instance’s request to die. Another cause is that one or more instance processes cannot be killed. One such example is that the process is in uninterruptible IO sleep.

Issue #5: The CRS recycled automatically, but node did not reboot

Cause: Starting 11.2.0.2, if CRS needs to reboot the node due to any reason listed here, CRs first tries to recycle itself before rebooting the node. Only when it cannot recycle itself successfully, CRS will reboot the node to recycle itself forcibly.
Solution: check which of the reasons for node reboot listed here is applicable and follow the solution listed for that.

RAC Log File Locations

RAC Log File Locations

If you are using Oracle RAC (doesn’t matter how many nodes you have) You need to know where log files are located.

The Cluster Ready Services Daemon (crsd) Log Files

Log files for the CRSD process (crsd) can be found in the following directories:

                 CRS home/log/hostname/crsd

Oracle Cluster Registry (OCR) Log Files

The Oracle Cluster Registry (OCR) records log information in the following location:

                CRS Home/log/hostname/client

Cluster Synchronization Services (CSS) Log Files

You can find CSS information that the OCSSD generates in log files in the following locations:

                CRS Home/log/hostname/cssd

Event Manager (EVM) Log Files

Event Manager (EVM) information generated by evmd is recorded in log files in the following locations:

                CRS Home/log/hostname/evmd

RACG Log Files

The Oracle RAC high availability trace files are located in the following two locations:

CRS home/log/hostname/racg

$ORACLE_HOME/log/hostname/racg

Core files are in the sub-directories of the log directories. Each RACG executable has a sub-directory assigned exclusively for that executable. The name of the RACG executable sub-directory is the same as the name of the executable.

You can follow below table which define locations of logs files:

Oracle Clusterware log files

Cluster Ready Services Daemon (crsd) Log Files:
$CRS_HOME/log/hostname/crsd

Cluster Synchronization Services (CSS):
$CRS_HOME/log/hostname/cssd

Event Manager (EVM) information generated by evmd:
$CRS_HOME/log/hostname/evmd

Oracle RAC RACG:
$CRS_HOME/log/hostname/racg
$ORACLE_HOME/log/hostname/racg

Oracle RAC 11g Release 2 log files

Clusterware alert log:
$GRID_HOME/log/<host>/alert<host>.log

Disk Monitor daemon:
$GRID_HOME/log/<host>/diskmon

OCRDUMP, OCRCHECK, OCRCONFIG, CRSCTL:
$GRID_HOME/log/<host>/client

Cluster Time Synchronization Service:
$GRID_HOME/log/<host>/ctssd

Grid Interprocess Communication daemon:
$GRID_HOME/log/<host>/gipcd

Oracle High Availability Services daemon:
$GRID_HOME/log/<host>/ohasd

Cluster Ready Services daemon:
$GRID_HOME/log/<host>/crsd

Grid Plug and Play daemon:
$GRID_HOME/log/<host>/gpnpd:

Mulitcast Domain Name Service daemon:
$GRID_HOME/log/<host>/mdnsd

Event Manager daemon:
$GRID_HOME/log/<host>/evmd

RAC RACG (only used if pre-11.1 database is installed):
$GRID_HOME/log/<host>/racg

Cluster Synchronization Service daemon:
$GRID_HOME/log/<host>/cssd

Server Manager:
$GRID_HOME/log/<host>/srvm

HA Service Daemon Agent:
$GRID_HOME/log/<host>/agent/ohasd/oraagent_oracle11

HA Service Daemon CSS Agent:
$GRID_HOME/log/<host>/agent/ohasd/oracssdagent_root

HA Service Daemon ocssd Monitor Agent:
$GRID_HOME/log/<host>/agent/ohasd/oracssdmonitor_root

HA Service Daemon Oracle Root Agent:
$GRID_HOME/log/<host>/agent/ohasd/orarootagent_root

CRS Daemon Oracle Agent:
$GRID_HOME/log/<host>/agent/crsd/oraagent_oracle11

CRS Daemon Oracle Root Agent:
$GRID_HOME/log/<host> agent/crsd/orarootagent_root

Grid Naming Service daemon:
$GRID_HOME/log/<host>/gnsd

Directories for RAC

Filesystems
/oracle – binaries for the database software 50GB – 100GB (enough for the current binaries as well as upgrade or download if necessary)
/oracle_crs – binaries for the grid infrastructure 50GB – 100GB (enough for the current binaries as well as upgrade or download if necessary)
/ora01 – at least 100GB for each server in the cluster, mounted to all servers

These are GoldenGate named databases
/ggate
/ggate2
/ggtrail
/ggtrail02

These are ACFS based mounts
/oracle_homes
/acfs_test_mount
/acfs_test2_mount

/oracle_crs/crs/diag/asm/+asm/+ASM*/trace

/oracle_crs/crs/diag/tnslsnr/*/listener/trace