All posts by mrculp

.kshrc and .bashrc for persistent settings

In order to get set -o vi
This should be set at the .kshrc or .bashrc level

# .kshrc

# Source global definitions
if [ -f /etc/kshrc ]; then
. /etc/kshrc
fi

# use emacs editing mode by default
# set -o emacs
set -o vi

# User specific aliases and functions

For the bash folks here is the .bashrc file

</pre>
# .bashrc

# User specific aliases and functions

alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi

set -o vi
<pre>

ASM Setup

BEFORE running the installer, you need to use oracleasm utilities to create the disks.

You may need to run this as root but I’m not 100% sure.

Probably, root will need to create the directory /dev/oracleasm and give ownership to grid in group oraasm or something like that.  It is VERY important that the disks are owned by the Grid user and group you designate to run ASM.  You CAN NOT do this as a a side-step when you have OUI running.  You must close it, create the disks, and then restart OUI.

When you get to the ASM setup, make sure the disk search string is a pattern that matches where the disks are.  ASM will only be able to use disks whose device nodes are owned by the correct user.  ASM disks MUST be raw (character, not block devices) and the file permissions must be 0600 (meaning read/write by owner and NOBODY else can read or write, with no set or sticky bits).

RAC Networking Considerations

Networking Considerations

For the private network 10 Gigabit Ethernet is highly recommended, the minimum requirement is 1 Gigabit Ethernet.

Underscores are not be used in a host or domain name according to RFC952 – DoD Internet host table specification. The same applies for Net, Host, Gateway, or Domain name.

The VIPs and SCAN VIPs must be on the same subnet as the public interface. For additional information see the Understanding SCAN VIP white paper.

The default gateway must be on the same subnet as the VIPs (including SCAN VIPs) to prevent VIP start/stop/failover issues. With 11gR2 this is detected and reported by the OUI, if the check is ignored this will result in the failure to start the VIPs resulting in failure of the installation itself.

It is recommended that the SCAN name (11gR2 and above) resolve via DNS to a minimum of 3 IP addresses round-robin regardless of the size of the cluster. For additional information see the Understanding SCAN VIP white paper.

To avoid name resolution issues, ensure that the HOSTS files and DNS are furnished with both VIP and Public host names. SCAN must NOT be in the HOSTS file due to the fact that the HOSTS file
is only able to represent a 1:1 host to IP mapping.

The network interfaces must have the same name on all nodes (e.g eth1 -> eth1 in support of the VIP and eth2 -> eth2 in support of the private interconnect).
Network Interface Card (NIC) names must not contain ” . ”

Jumbo Frames for the private interconnect is a recommended best practice for enhanced performance of cache fusion operations. Reference: Document 341788.1

Use non-routable network addresses for private interconnect; Class A: 10.0.0.0 to 10.255.255.255, Class B: 172.16.0.0 to 172.31.255.255, Class C: 192.168.0.0 to 192.168.255.255. Refer to RFC1918 and Document 338924.1 for additional information.
Make sure network interfaces are configured correctly in terms of speed, duplex, etc. Various tools exist to monitor and test network: ethtool, iperf, netperf, spray and tcp. See Document 563566.1.

To avoid the public network or the private interconnect network from being a single point of failure, Oracle highly recommends configuring a redundant set of public network interface cards (NIC’s) and private interconnect NIC’s on each cluster node.. Document 787420.1. Starting with 11.2.0.2 Oracle Grid Infrastructure can provide redundancy and load balancing for the private interconnect (NOT the public network), this is the preferred method of NIC redundancy for full 11.2.0.2 stacks (11.2.0.2 Database must be used). More information can be found in Document 1210883.1.

NOTE: If using the 11.2.0.2 Redundant Interconnect/HAIP feature – At present it is REQUIRED that all interconnect interfaces be placed on separate subnets. If the interfaces are all on the same subnet and the cable is pulled from the first NIC in the routing table a rebootless-restart or node reboot will occur. See Document 1481481.1 for a technical description of this requirement.

For more predictable hardware discovery, place hba and nic cards in the same corresponding slot on each server in the Grid.

The use of a switch (or redundant switches) is required for the private network (crossover cables are NOT supported).

Dedicated redundant switches are highly recommended for the private interconnect due to the fact that deploying the private interconnect on a switch (even when using a VLAN) may expose the interconnect links to congestion and instability in the larger IP network topology. If deploying the interconnect on a VLAN, there should be a 1:1 mapping of VLAN to non-routable subnet and the VLAN should not span multiple VLANs (tagged) or multiple switches. Deployment concerns in this environment include Spanning Tree loops when the larger IP network topology changes, Asymmetric routing that may cause packet flooding, and lack of fine grained monitoring of the VLAN/port. Reference Bug 9761210.

If deploying the cluster interconnect on a VLAN, review the considerations in the Oracle RAC and Clusterware Interconnect Virtual Local Area Networks (VLANs) white paper.

Consider using Infiniband on the interconnect for workloads that have high volume requirements. Infiniband can also improve performance by lowering latency. When Infiniband is in place the RDS protocol can be used to further reduce latency. See Document 751343.1 for additional details.

In 12.1.0.1 IPv6 is supported for the Public Network, IPv4 must be used for the Private Network. Starting with 12.2.0.1 IPv6 is fully supported for both the public and private interfaces. Please see the Oracle Database IPv6 State of Direction white paper for details.
For version Grid Infrastructure 11.2.0.2 multicast traffic must be allowed on the private network for the 230.0.1.0 subnet. Patch: 9974223 (Included in GI PSU 11.2.0.2.1 and above) for Oracle Grid Infrastructure 11.2.0.2 enables multicasting on the 224.0.0.251 multicast address on the private network. Multicast must be allowed on the private network for one of these 2 addresses (assuming the patch has been applied). Additional information as well as a program to test multicast functionality is provided in Document 1212703.1.

Oracle RAC Network Requirements

Executive Overview

This paper addresses Oracle’s interconnect requirement as ‘private’ and ‘separate’ and evaluates this requirement in terms of latency and bandwidth in an shared Ethernet network environment. A shared Ethernet network, in an Oracle Clusterware interconnect context, is a network where a switch, network interface or network segment is configured to handle network traffic that is unrelated to the interconnect traffic. This unrelated traffic may be private interconnect traffic from consolidated databases
or consolidated virtual environments, public traffic from adjacent Local Area Network (LAN) segments,  storage traffic, backup and replication traffic or any other network traffic unrelated to the cluster interconnect. A shared Ethernet switch network is usually partitioned for broadcast isolation using Virtual Local Area Networks (VLANs). Partitioning may be configured at the port level on the switch (most common configuration) for tagged or untagged VLANs depending on the topology. Partitioning may also
occur on the host network adapter using tagged VLANs. A shared Ethernet network implies shared switch and NIC resources and, as such, are potentially subject to increased contention, performance  degradation and diminished availability.

This paper draws focus to shared Ethernet switch, shared NIC VLAN configuration and deployment practices intended to optimize for Oracle Clusterware interconnect performance and availability. Oracle Real Application Clusters (RAC) interconnect latency and bandwidth baselines are described in generic terms as guidelines and are not intended to apply to Oracle Engineered Systems. These baselines should be considered when deploying the Oracle Clusterware interconnect in an Shared Ethernet network
topology. The target audience is anyone with an interest in Oracle Clusterware interconnect network deployment requirements, specifically Architects, DBAs, System Administrators and Network Engineers. This is paper is not intended to provide specific VLAN configuration guidance but will assist in network architectural design based on the requirements of the Oracle Clusterware interconnect, variable workload and the supporting network components.

 

Oracle has recommendations for the networking components of RAC.

Note:
UDP is the default interface protocol for Oracle RAC and Oracle Clusterware.
You must use a switch for the interconnect.
Oracle recommends that you use a dedicated switch.
Oracle does not support token-rings or crossover cables for the interconnect.

Network Requirements
Each node must have at least two network adapters: one for the public network interface and the other for the private network interface or interconnect.

In addition, the interface names associated with the network adapters for each network must be the same on all nodes.

For the public network, each network adapter must support TCP/IP.

For the private network, the interconnect must support UDP (TCP for Windows) using high-speed network adapters and switches that support TCP/IP.

Gigabit Ethernet (minimum) or an equivalent is recommended.
Note: For a more complete list of supported protocols, see MetaLink Note: 278132.1.

Before starting the installation, each node requires an IP address and an associated host name registered in the DNS or the /etc/hosts file for each public network interface.

One unused virtual IP address and an associated VIP name registered in the DNS or the /etc/hosts file that you configure for the primary public network interface are needed for each node.

The virtual IP address must be in the same subnet as the associated public interface.

After installation, you can configure clients to use the VIP name or IP address. If a node fails, its virtual IP address fails over to another node. For the private IP address and optional host name for each private interface, Oracle recommends that you use private network IP addresses for these interfaces, for example, 10.*.*.* or 192.168.*.*. You can use the /etc/hosts file on each node to associate private host names with private IP addresses.

Interconnect network is a critical component in a RAC environment. It’s very important to minimize latency, increase its throughput and avoid lost of UDP packets.

There’s more than a heartbeat

Sometimes, when we work with sysadmin or network admins, you find they are expecting a “classic” interconnect network for deploying a RAC. In this situation, this network is just used to perform configuration and keepalive information between nodes. With Oracle RAC, we must see the interconnect just like an I/O system, because it is the component Cache Fusion technology uses to send data blocks from one cluster node to any other without spending time updating it in disk for the other instance to get a current value. So, we can find high network traffic with 8KiBs blocks being sent (if we use default block size for our database).

Oracle specifications

Some years ago, Oracle said it was necessary to use a physical dedicated network different from the public network as interconnect. Lot of customers did not use specific hardware for this implementation as it was integrated in the consolidated LAN infrastructure, using specific VLANs for these networks. In 2012, Oracle published a White Paper called Oracle Real Application Clusters (RAC) and Oracle Clusterware Interconnect Virtual Local Area Networks (VLANs) Deployment Considerations. This document talks about the concepts private and separate.

As a summary, we should take in mind:

We can use tagged or untagged VLANs and not necessary specific hardware.

RAC servers should be connected as OSI layer 2 adjacency, within the same broadcast domain and just one hope communication.
Disabling or restricting STP (Spanning Tree Protocol) is very important for avoiding traffic suspension that could result in a split brain.
Enable prunning or private VLANs, so multicast and broadcast traffic will never be propagated beyond the access layer.

Additionaly, from 11.2.0.2 onward, we need to enable multicast traffic for network 230.0.0.1, or, from 11.2.0.2.1, for network 224.0.0.251. We can find further information in the MOS Doc Grid Infrastructure Startup During Patching, Install or Upgrade May Fail Due to Multicasting Requirement (Doc ID 1212703.1). We can find in that Note a tool for checking multicast traffic in a network.

The interconnect in an Oracle RAC environment is the backbone of your cluster. A highly performing, reliable interconnect is a crucial ingredient in making Cache Fusion perform well. Remember that the assumption in most cases is that a read from another node’s memory via the interconnect is much faster than a read from disk—except perhaps for Solid State Disks (SSDs). The interconnect is used to transfer data and messages among the instances. An Oracle RAC cluster requires a high-bandwidth solution with low latency wherever possible. If you find that the performance of the interconnect is subpar, your Oracle RAC cluster performance will also most likely be subpar. In the case of subpar performance in an Oracle RAC environment, the interconnect configuration, including both hardware and software, should be one of the first areas you investigate.

UDP buffers

You want the fastest possible network to be used for the interconnect. To maximize your speed and efficiency on the interconnect, you should ensure that the User Datagram Protocol (UDP) buffers are set to the correct values. On Linux, you can check this via the following command:

sysctl net.core.rmem_max net.core.wmem_max net.core.rmem_default net
   .core.wmem_default
   net.core.rmem_max = 4194304
   net.core.wmem_max = 1048576
   net.core.rmem_default = 262144
   net.core.wmem_default = 262144

Alternatively, you can read the associated values directly from the respective files in the directory /proc/sys/net/core. These values can be increased via the following SYSCTL commands:
sysctl -w net.core.rmem_max=4194304
   sysctl -w net.core.wmem_max=1048576
   sysctl -w net.core.rmem_default=262144
   sysctl -w net.core.wmem_default=262144


 

The numbers in this example are the recommended values for Oracle RAC on Linux and are more than sufficient for the majority of configurations. Nevertheless, let’s talk about some background of the UDP buffers. The values determined by rmem_max and wmem_max are on a “per-socket” basis. So if you set rmem_max to 4MB, and you have 400 processes running, each with a socket open for communications in the interconnect, then each of these 400 processes could potentially use 4MB, meaning that the total memory usage could be 1.6GB just for this UDP buffer space. However, this is only “potential” usage. So if rmem_default is set to 1MB and rmem_max is set to 4MB, you know for sure that at least 400MB will be allocated (1MB per socket). Anything more than that will be allocated only as needed, up to the max value. So the total memory usage depends on the rmem_default, rmem_max, the number of open sockets, and the variable piece of how much buffer space each process is actually using. This is an unknown—but it could depend on the network latency or other characteristics of how well the network is performing and how much network load there is altogether. To get the total number of Oracle-related open UDP sockets, you can execute this command:

 

netstat -anp -udp | grep ora | wc -l

 

NOTE

Our assumption here is that the UDP is being used for the interconnect. Although that will be true in the vast majority of cases, there are some exceptions. For example, on Windows, TCP is used for Cache Fusion traffic. When InfiniBand is in use (more details on InfiniBand are provided later in the section “Interconnect Hardware”), the Reliable Datagram Sockets (RDS) protocol may be used to enhance the speed of Cache Fusion traffic. However, any other proprietary interconnect protocols are strongly discouraged, so starting with Oracle Database 11g, your primary choices are UDP, TCP (Windows), or RDS (with InfiniBand).

Jumbo frames

Another option to increase the performance of your interconnect is the use of jumbo frames. When you use Ethernet, a variable frame size of 46–1500 bytes is the transfer unit used between all Ethernet participants. The upper bound is 1500 MTU (Maximum Transmission Unit). Jumbo frames allows the Ethernet frame to exceed the MTU of 1500 bytes up to a maximum of 9000 bytes (on most platforms—though platforms will vary). In Oracle RAC, the setting of DB_BLOCK_SIZE multiplied by the MULTI_BLOCK_READ_COUNT determines the maximum size of a message for the global cache, and the PARALLEL_EXECUTION_MESSAGE_SIZE determines the maximum size of a message used in Parallel Query. These message sizes can range from 2K to 64K or more, and hence will get fragmented more so with a lower/ default MTU. Increasing the frame size (by enabling jumbo frames) can improve the performance of the interconnect by reducing the fragmentation when shipping large amounts of data across that wire. A note of caution is in order, however: Not all hardware supports jumbo frames. Therefore, due to differences in specific server and network hardware requirements, jumbo frames must be thoroughly tested before implementation in a production environment.

Interconnect hardware

In addition to the tuning options, you have the opportunity to implement faster hardware such as InfiniBand or 10 Gigabit Ethernet (10 GigE). InfiniBand is available and supported with two options. Reliable Datagram Sockets (RDS) protocol is the preferred option, because it offers up to 30 times the bandwidth advantage and 30 times the latency reduction over Gigabit Ethernet. IP over InfiniBand (IPoIB) is the other option, which does not do as well as RDS, since it uses the standard UDP or TCP, but it does still provide much better bandwidth and much lower latency than Gigabit Ethernet.

Another option to increase the throughput of your interconnect is the implementation of 10 GigE technology, which represents the next level of Ethernet. Although it is becoming increasingly common, note that 10 GigE does require specific certification on a platform-by-platform basis, and as of the writing of this book, it was not yet certified on all platforms. Check with Oracle Support to resolve any certification questions that you may have on your platform.