Top 5 Issues That Cause Node Reboots or Evictions or Unexpected Recycle of CRS

(Doc ID 1367153.1)

Issue #1: The Node rebooted, but the log files do not show any error or cause.

Issue #2: The node rebooted because it was evicted due to missing network heartbeats.

Issue #3: The node rebooted after a problem with storage.

The ocssd.log file shows that the node rebooted because it cannot access a majority of voting disks.

Solution: Fix the problem with the voting disk. Make sure that voting disks are available and accessible by user oracle or grid or any user who owns CRS or GI HOME. If the voting disk is not in ASM, use "dd if= of=/dev/null bs=1024 count=10240" to test the accessibility.

Issue #4: Node is rebooted after asm or database instance hang or eviction.

The ocssd.log of surviving node shows a member kill request escalated to node kill request.

Cause: Starting 11.1, inability to evict a database or asm instance at the database level means that CRS gets involved and tries to kill the problem instance. This is a member kill request. If CRS cannot kill the problem instance, then CRS reboots the node because the member kill request is escalated to a node kill request.

Solution: Find out the reason that asm or database instance could not get evicted at the database level (lmon, lmd, and lms initiated eviction). One common cause is that the instance was hanging and is not responding to the remote instance's request to die. Another cause is that one or more instance processes cannot be killed. One such example is that the process is in uninterruptible IO sleep.

Issue #5: The CRS recycled automatically, but node did not reboot

Cause: Starting 11.2.0.2, if CRS needs to reboot the node due to any reason listed here, CRs first tries to recycle itself before rebooting the node. Only when it cannot recycle itself successfully, CRS will reboot the node to recycle itself forcibly.

Solution: check which of the reasons for node reboot listed here is applicable and follow the solution listed for that.

ITRemote

Issue #1: The Node rebooted, but the log files do not show any error or cause.

Issue #2: The node rebooted because it was evicted due to missing network heartbeats.

Issue #3: The node rebooted after a problem with storage.

Issue #4: Node is rebooted after asm or database instance hang or eviction.

Issue #5: The CRS recycled automatically, but node did not reboot

Ready for Action?