Category Archives: Uncategorized

Hugepages, Memory and Oracle

Configuring Hugepages For Oracle on Linux

Huge Pages Article

NOTE: I have recently discovered that Oracle, hugepages, and NUMA are incompatible, at least on Linux. NUMA must be disabled to use hugepages with Oracle.

RAM is managed in 4k pages in 64-bit Linux. When memory sizes were limited, and systems with more than 16G RAM were rare, this was not as much of an issue. However, as systems get more memory, and the increasing demand on performance for memory increased and become less manageable. Hugepages can make managing the large amounts of memory available in modern servers much less CPU intensive. In particular, with the number of memory pages reduced by typically three orders of magnitude, the chance that a particular page pointer will be available in the processor cache goes up dramatically.

First some caveats on using hugepages: Hugepages are not swappable, thus Oracle SGA memory must either be all hugepages are no hugepages. If you allocate hugepages for Oracle, and don’t allocate enough for the entire SGA, Oracle will not use any hugepage memory. If there is not enough non-hugepage memory, your database will not start. Finally, enabling hugepages will require a server restart, so if you do not have the ability to restart your server, do not attempt to enable hugepages.

Oracle Metalink note 1134002.1 says explicitly that AMM (MEMORY_TARGET/MEMORY_MAX_TARGET) is incompatible with hugepages. However, I have found at least one blog that says that AMM is compatible with hugepages when using the USE_LARGE_PAGES parameter in 11g (where AMM is available). Until further confirmation is found, I do not recommend trying to combine hugepages with MEMORY_TARGET/MEMORY_MAX_TARGET.

There are both Oracle database settings and Linux OS settings that must be adjusted in order to enable hugepages. The Linux and oracle settings of concern are below:

Linux OS settings:

/etc/sysctl.conf:

vm.nr_hugepages
kernel.shmmax
kernel.shmall

/etc/security/limits.conf:

oracle soft memlock
oracle hard memlock

Oracle Database spfile/init.ora:

SGA_TARGET = Size of the SGA for use currently
SGA_MAX_SIZE = Size the SGA *could* be increased to without restarting the server
MEMORY_TARGET = These parameters should not be used with hugepages
MEMORY_MAX_TARGET = These parameters should not be used with hugepages

USE_LARGE_PAGES

First, calculate the Linux OS settings. Kernel.shmmax should be set to the size of the largest SGA_TARGET on the server plus 1G, to account for other processes. For a single instance with 180G RAM, that would be 181G.

Kernel.shmall should be set to the sum of the SGA_TARGET values divided by the pagesize. Use ‘getconf pagesize’ command to get the page size. Units are bytes. The standard pagesize on Linux x86_64 is 4096, or 4k.

Oracle soft memlock and oracle hard memlock should be set to slightly less than the total memory on the server, I chose 230G. Units are kbytes, so the number is 230000000. This is the total amount of memory Oracle is allowed to lock.

Now for the hugepage setting itself: vm.nr_hugepages is the total number of hugepages to be allocated on the system. The number of hugepages required can be determined by finding the maximum amount of SGA memory expected to be used by the system (the SGA_MAX_SIZE value normally, or the sum of them on a server with multiple instances) and dividing it by the size of the hugepages, 2048k, or 2M on Linux. To account for Oracle process overhead, add five more hugepages . So, if we want to allow 180G of hugepages, we would use this equation: (180*1024*1024/2048)+5. This gives us 92165 hugepages for 180G. Note: I took a shortcut in this calculation, by using memory in MEG rather than the full page size. To calculate the number in the way I initial described, the equation would be: (180*1024*1024*1024)/(2048*1024).

In order to allow the Oracle database to use up to 180G for the SGA_TARGET/SGA_MAX_SIZE, below are the settings we would use for the OS:


/etc/security/limits.conf

oracle soft memlock 230000000
 oracle hard memlock 230000000

/etc/sysctl.conf

vm.nr_hugepages = 92165

kernel.shmmax = 193273528320+1g = 194347270144

kernel.shmall = 47448064

In the Oracle database there is a new setting in 11gR2. This is USE_LARGE_PAGES, with possible values of ‘true’, ‘only’, and ‘false’. True is the default and current behavior, ‘False’ means never use hugepages, use only small pages. ‘Only’ forces the database to use hugepages. If insufficient pages are available the instance will not start. Regardless of this setting, it must use either all hugepages or all smallpages. According to some blogs, using this setting is what allows the MEMORY_MAX_TARGET and MEMORY_TARGET to be used with hugepages. As I noted above, I have not verified this with a Metalink note as yet.

Next, set SGA_TARGET and SGA_MAX_SIZE to the desired size. I generally recommend setting both to the same size. Oracle recommends explicitly setting the MEMORY_TARGET and MEMORY_MAX_TARGET to 0 when enabling hugepages. So these are the values in the spfile that we change:

USE_LARGE_PAGES=only

SGA_TARGET=180G
SGA_MAX_SIZE=180G
MEMORY_MAX_TARGET=0
MEMORY_TARGET=0

In order to verify that hugepages are being used, run this command:

cat /proc/meminfo | grep Huge

It will show HugePages_Total, HugePages_Free, and HugePages_Rsvd. The HugePages_Rsvd value is the number of hugepages that are in use.

Note that this example uses Linux hugepage size of 2M (2048k).
On Itanium systems the hugepage size is 256M.

These instructions show you how to successfully implement huge pages in Linux. Note that everything would be the same for Oracle 10gR2, with the exception that the USE_LARGE_PAGES parameter is unavailable.

1.Reasons for Using Hugepages 1.Use hugepages if OLTP or ERP. Full stop.
2.Use hugepages if DW/BI with large numbers of dedicated connections or a large SGA. Full stop.
3.Use hugepages if you don’t like the amount of memory page tables are costing you (/proc/meminfo). Full stop.

2.SGA Memory Management Models 1.AMM does not support hugepages. Full stop.
2.ASMM supports hugepages.

3.Instance Type 1.ASM uses AMM by default. ASM instances do not need hugepages. Full stop.
2.All non-ASM instances should be considered candidate for hugepages. See 1.1->1.3 above.

4.Configuration 1.Limits (multiple layers) 1./etc/security/limits.conf establishes limits for hugepages for processes. Note, setting these values does not pre-allocate any resources.
2.Ulimit also establishes hugepages limits for processes.

5.Allocation 1./etc/sysctl.conf vm.nr_hugepages allocates memory to the hugepages pool.

6.Sizing 1.Read MOS 401749.1 for information on tools available to aid in the configuration of vm/nr_hugepages

To make the point of how urgently Oracle DBAs need to qualify their situation against list items 1.1 through 1.3 above, please consider the following quote from an internal email I received. The email is real and the screen output came from a real customer system. Yes, 120+ gigabytes of memory wasted in page tables. Fact is often stranger than fiction!

And here is an example of kernel pagetables usage, with a 24GB SGA, and 6000+ connections .. with no hugepages in use .

# grep PageT /proc/meminfo

PageTables: 123,731,372 kB

In part I of my recent blog series on Linux hugepages and modern Oracle releases I closed the post by saying that future installments would materialize if I found any pitfalls. I don’t like to blog about bugs, but in cases where there is little material on the matter provided elsewhere I think it adds value. First, however, I’d like to offer links to parts I and II in the series:
• Configuring Linux Hugepages for Oracle Database Is Just Too Difficult! Isn’t It? Part – I.
• Configuring Linux Hugepages for Oracle Database Is Just Too Difficult! Isn’t It? Part – II.

The pitfall I’d like to bring to readers’ attention is a situation that can arise in the case where the Oracle Database 11g Release 2 11.2.0.2 parameter USE_LARGE_PAGES is set to “only” thus forcing the instance to either successfully allocate all shared memory from the hugepages pool or fail to boot. As I pointed out in parts I and II this is a great feature. However, after an instance is booted it stands to reason that other processes (e.g., Oracle instances) may in fact use hugepages thus drawing down the amount of free hugepages. In fact, it stands to reason that other uses of hugepages could totally deplete the hugepages pool.

So what happens to a running instance that successfully allocated its shared memory from the hugepages pool and hugepages are later externally drawn down? The answer is nothing. An instance can plod along just fine after instance startup even if hugepages continue to get drawn down to the point of total depletion. But is that the end of the story?

What Goes Up, Must (be able to) Come Down
OK, so for anyone that finds themselves in a situation where an instance is up and happy but HugePages_Free is zero the following is what to expect:

12345678910111213141516171819202122

$ sqlplus ‘/ as sysdba’ SQL*Plus: Release 11.2.0.2.0 Production on Wed Sep 29 17:32:32 2010 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to an idle instance. SQL>SQL> HOST grep -i huge /proc/meminfoHugePages_Total: 4663HugePages_Free: 0HugePages_Rsvd: 10Hugepagesize: 2048 kB SQL> shutdown immediateORA-01034: ORACLE not availableORA-27102: out of memoryLinux-x86_64 Error: 12: Cannot allocate memoryAdditional information: 1Additional information: 6422533SQL>

Pay particular attention to the fact that sqlplus is telling us that it is attached to an idle instance! I assure you, this is erroneous. The instance is indeed up.

Yes, this is bug 10159556 (I filed it for what it is worth). The solution is to have ample hugepages as opposed to precisely enough. Note, in another shell a privileged user can dynamically allocate more hugepages (even a single hugepage) and the instance will be then able to be shutdown cleanly. As an aside, an instance in this situation can be shutdown with abort. I don’t aim to insinuate that this is some sort of zombie instance that will not go away.

JDESC.sql

/*
Created: Version 1: 01/30/2012 Jed S. Walker
Updated: Version 2: 03/20/2006 Jed S. Walker

Notes :
does not work on Oracle < 9i due to MView check
To do:
does not handle "" table names
Usage:
@jdesc user.table
@jdesc table
*/

set serveroutput on size 1000000

set verify off
set feedback off

set pages 50 linesize 120
set long 10000

col owner format a20
col object_name format a35

declare
n_exist        number;
v_param1       varchar2(200);
v_seploc       number;
v_schema_name  user_tables.table_name%type;
v_table_name   user_tables.table_name%type;
v_longmsg      varchar2(4000);
v_type_atts    varchar2(500);
cursor curs_syns (owner_val varchar2, tabname_val varchar2)
is select owner, synonym_name
     from all_synonyms
    where owner = owner_val
      and table_name = tabname_val
    order by owner, synonym_name;
cursor curs_tcols (owner_val varchar2, tabname_val varchar2)
is select column_name, data_type,
data_length, data_precision, data_scale,
decode(nullable,'Y','NULL','NOT NULL') nullable,
data_default
from all_tab_columns
where owner = owner_val
and table_name = tabname_val
order by column_id;
v_lenprec  varchar2(20);
cursor curs_cons (owner_val varchar2, tabname_val varchar2)
is select constraint_name,
decode(constraint_type, 'P', 'Primary', 'U', 'Unique',
'C', 'Check', 'R', 'References', constraint_type)
constraint_type,
search_condition, r_constraint_name,
decode(delete_rule, 'NO ACTION', '', delete_rule) delete_rule,
decode(constraint_type, 'P', 1, 'U', 2, 'R', 3, 'C', 4, 5) sortorder
from all_constraints
where owner = owner_val
and table_name = tabname_val
order by sortorder, constraint_name;
v_fk_table_name   user_tables.table_name%type;
v_temp            varchar2(2000);
cursor curs_ccollist (owner_val varchar2, consname_val varchar2)
is select column_name
from all_cons_columns
where owner = owner_val
and constraint_name = consname_val
order by position;
v_criteria   varchar2(2000);
cursor curs_inds (owner_val varchar2, tabname_val varchar2)
is select index_name, index_type, uniqueness
from all_indexes
where owner = owner_val
and table_name = tabname_val
order by uniqueness desc, index_name;
cursor curs_icollist (owner_val varchar2, indname_val varchar2)
is select column_name
from all_ind_columns
where index_owner = owner_val
and index_name = indname_val
order by column_position;
v_columns   varchar2(2000);
cursor curs_trigs (owner_val varchar2, tabname_val varchar2)
is select trigger_name, trigger_type, triggering_event
from all_triggers
where owner = owner_val
and table_name = tabname_val
order by trigger_type, triggering_event, trigger_name;
begin

-- get parameters and determine if this is local schema or other schema
v_param1:='&amp;1';
v_seploc:=instr(v_param1,'.');
if v_seploc = 0 then
v_schema_name:=upper(SYS_CONTEXT('USERENV','CURRENT_SCHEMA'));
v_table_name:=upper(v_param1);
else
v_schema_name:=upper(substr(v_param1,1,v_seploc-1));
v_table_name:=upper(substr(v_param1,v_seploc+1,length(v_param1)-v_seploc));
end if;

-- check for table existence or view
select count(1) into n_exist
from all_tables
where owner = v_schema_name
and table_name = v_table_name;
if n_exist = 0 then
select count(1) into n_exist
from all_views
where owner = v_schema_name
and view_name = v_table_name;
if n_exist = 0 then
dbms_output.put_line('Table ' || v_schema_name || '.' || v_table_name || ' does not exist');
else
dbms_output.put_line(chr(10) || v_schema_name || '.' || v_table_name || ' is a view:' || chr(10));
select text into v_longmsg from all_views where owner = v_schema_name and view_name = v_table_name;
dbms_output.put_line(v_longmsg);
end if;
else

-- show table name and type attributes
dbms_output.put_line(chr(10) || v_schema_name || '.' || v_table_name || chr(10));
-- show table type attributes
select decode(temporary,'Y','YES','N','NO') into v_temp
from all_tables
where owner = v_schema_name
and table_name = v_table_name;
dbms_output.put_line('Temporary Table =&gt; ' || v_temp);
select decode(iot_type,'IOT','YES','NO') into v_temp
from all_tables
where owner = v_schema_name
and table_name = v_table_name;
dbms_output.put_line('Index Organized Table (IOT) =&gt; ' || v_temp);
select partitioned into v_temp
from all_tables
where owner = v_schema_name
and table_name = v_table_name;
dbms_output.put_line('Partitioned =&gt; ' || v_temp);
select decode(cluster_name,null,'NO','YES') into v_temp
from all_tables
where owner = v_schema_name
and table_name = v_table_name;
dbms_output.put_line('Clustered Table =&gt; ' || v_temp);
select decode(compression,'ENABLED','YES','NO') into v_temp
from all_tables
where owner = v_schema_name
and table_name = v_table_name;
dbms_output.put_line('Compressed Table =&gt; ' || v_temp);
select decode(count(1),0,'NO','YES') into v_temp
from all_base_table_mviews
where owner = v_schema_name
and master = v_table_name;
dbms_output.put_line('Materialized Views =&gt; ' || v_temp);
select decode(count(1),0,'NO','YES') into v_temp
from all_policies
where object_owner = v_schema_name
and object_name = v_table_name;
dbms_output.put_line('Fine Grained Access Control =&gt; ' || v_temp);

-- show synonyms
dbms_output.put_line(chr(10));
n_exist:=0;
for rec_syn in curs_syns (v_schema_name,v_table_name) loop
dbms_output.put_line('Synonym: ' || rec_syn.owner || ' : ' || rec_syn.synonym_name);
n_exist:=n_exist+1;
end loop;
if n_exist = 0 then
dbms_output.put_line('There are no synonyms for this table.');
end if;

-- list columns, data type, null?, default
dbms_output.put_line(chr(10) || rpad('COLUMN',32) || chr(9) ||
rpad('TYPE',14) || chr(9) ||
rpad('NULLS',10) || chr(9) ||
'DEFAULT');
dbms_output.put_line(rpad('------',32) || chr(9) ||
rpad('----',14) || chr(9) ||
rpad('-----',10) || chr(9) ||
'-------');
for rec_col in curs_tcols (v_schema_name,v_table_name) loop
if rec_col.data_type in ('CHAR','VARCHAR2','NCHAR','NVARCHAR2') then
v_lenprec:='(' || rec_col.data_length || ')';
elsif rec_col.data_type in ('NUMBER') then
v_lenprec:='(' || rec_col.data_precision || ',' || rec_col.data_scale || ')';
else
v_lenprec:='';
end if;
dbms_output.put_line(rpad(rec_col.column_name,32) || chr(9) ||
rpad(rec_col.data_type || v_lenprec,14) || chr(9) ||
rpad(rec_col.nullable,10) || chr(9) ||
rec_col.data_default);
end loop;

-- constraints
dbms_output.put_line(chr(10) || rpad('CONSTRAINT',32) || chr(9) ||
rpad('TYPE',14) || chr(9) ||
'CRITERIA');
dbms_output.put_line(rpad('----------',32) || chr(9) ||
rpad('----',14) || chr(9) ||
'--------');
n_exist:=0;
for rec_con in curs_cons (v_schema_name,v_table_name) loop
if rec_con.constraint_type in ('Primary','Unique') then
v_criteria:='(';
for rec_collist in curs_ccollist(v_schema_name,rec_con.constraint_name) loop
if v_criteria != '(' then
v_criteria:=v_criteria||',';
end if;
v_criteria:=v_criteria || rec_collist.column_name;
end loop;
v_criteria:=v_criteria || ')';
elsif rec_con.constraint_type = 'References' then
--v_criteria:=rec_con.r_constraint_name;
select table_name into v_fk_table_name
from all_constraints
where owner = v_schema_name
and constraint_name = rec_con.r_constraint_name;
v_temp:='(';
for rec_collist in curs_ccollist(v_schema_name,rec_con.r_constraint_name) loop
if v_temp != '(' then
v_temp:=v_temp||',';
end if;
v_temp:=v_temp || rec_collist.column_name;
end loop;
v_temp:=v_temp || ')';
v_criteria:=v_fk_table_name || v_temp || ' ' || rec_con.delete_rule;
elsif rec_con.constraint_type = 'Check' then
v_criteria:=rec_con.search_condition;
if length(v_criteria) &gt; (250-32-14) then
v_criteria:=substr(v_criteria,1,(250-32-14)) || '...';
end if;
else
v_criteria:='Unknown';
end if;
n_exist:=n_exist+1;
dbms_output.put_line(rpad(rec_con.constraint_name,32) || chr(9) ||
rpad(rec_con.constraint_type,14) || chr(9) ||
v_criteria);
end loop;
if n_exist = 0 then
dbms_output.put_line('&lt;none&gt;');
end if;

-- indexes
dbms_output.put_line(chr(10) || rpad('INDEX',32) || chr(9) ||
rpad('TYPE',20) || chr(9) ||
'COLUMNS');
dbms_output.put_line(rpad('-----',32) || chr(9) ||
rpad('----',20) || chr(9) ||
'-------');
n_exist:=0;
for rec_ind in curs_inds (v_schema_name,v_table_name) loop
v_columns:='(';
for rec_collist in curs_icollist (v_schema_name,rec_ind.index_name) loop
if v_columns != '(' then
v_columns:=v_columns||',';
end if;
v_columns:=v_columns || rec_collist.column_name;
end loop;
v_columns:=v_columns || ')';
dbms_output.put_line(rpad(rec_ind.index_name,32) || chr(9) ||
rpad(rec_ind.uniqueness || ':' || rec_ind.index_type,20) || chr(9) ||
v_columns);
n_exist:=n_exist+1;
end loop;
if n_exist = 0 then
dbms_output.put_line('&lt;none&gt;');
end if;

-- triggers
dbms_output.put_line(chr(10) || rpad('TRIGGER',32) || chr(9) ||
rpad('TYPE',20) || chr(9) ||
rpad('EVENT',20) );
dbms_output.put_line(rpad('-----',32) || chr(9) ||
rpad('----',20) || chr(9) ||
rpad('------',20) );
n_exist:=0;
for rec_trig in curs_trigs (v_schema_name,v_table_name) loop
dbms_output.put_line(rpad(rec_trig.trigger_name,32) || chr(9) ||
rpad(rec_trig.trigger_type,20) || chr(9) ||
rpad(rec_trig.triggering_event,20) );
n_exist:=n_exist+1;
end loop;
if n_exist = 0 then
dbms_output.put_line('&lt;none&gt;');
end if;

-- some space before prompt
dbms_output.put_line(chr(10));

end if; -- check for table
end;
/

set feedback on

Creating a TAR of existing Oracle home

Clone DB OracleBase article

This article is a test

When doing any type of upgrade/patch I’m often asked about backing up the existing binaries and how to save the customer from minimal downtime. One of the ways this can be achieved is by simply making another oracle home tar copied from the existing home. This now gives several options that allow flexibity in many ways.

# tar -cvf /tmp/u01.tar /u01
First parameter is the location of the tar file, second paramater is the directory where the oracle home is located

# gzip /tmp/u01.tar

You can then gzip it to make it smaller/easier to move it

Once moved to the destination server

# gunzip /tmp/u01.tar.gz
# cd /
# tar -xvf /tmp/u01.tar

Root Configuration Scripts

Run the root configutation scripts, generated as part of the original installation, on the destination server as the “root” user.

/u01/app/oraInventory/orainstRoot.sh
/u01/app/oracle/product/11.2.0/db_1/root.sh

Modify Config Files

If you have not prepared the “.bash_profile” as part of the destination server setup, copy the file from the source to the destination server.

# scp /home/oracle/.bash_profile oracle@192.168.2.136:/home/oracle/.bash_profile

Edit the “.bash_profile” file on the destination server, giving it the correct value for the ORACLE_HOSTNAME environment variable.

Amend any hostname or IP references in the “listener.ora” and “tnsnames.ora” files in the “$ORACLE_HOME/network/admin” directory.

Edit the “/etc/oratab” making sure all instances are referenced correctly. In my case I need to add the following entry.

DB11G:/u01/app/oracle/product/11.2.0/db_1:Y