Category Archives: Hugepages

Memory and 11gr2 and Linux

For large SGA sizes, HugePages can give substantial benefits in virtual memory management. Without HugePages, the memory of the SGA is divided into 4K pages, which have to be managed by the Linux kernel.

Using HugePages, the page size can be increased to any value between 2MB and 256MB, thereby reducing the total number of pages to be managed by the kernel and therefore reducing the amount of memory required to hold the page table in memory.

In addition to these changes, the memory associated with HugePages does not get swapped out, which forces the SGA to stay resident. The savings in memory and resources of page management make HugePages a great addition for Oracle 11g systems running on x86-64 architectures.

Note. Automatic Memory Management (AMM) is not compatible with HugePages, so apart from ASM instances and small unimportant databases, you will probably have no need for AMM on a real system. Instead, Automatic Shared Memory Management and Automatic PGA Management should be used as they are compatible with HugePages.

This command can show the current HugePage usage. The default HugePage size is 2MB on Oracle Linux 5.x and as you can see from the output below, by default no HugePages are defined.

grep HugePages /proc/meminfo


Oracle 11.2.0.3 and hugepages

Failure to properly configure amount of hugepages on your Linux box and your instance may silently revert to 4k memory pages resulting in excessive paging/swapping.

Since 11.2.0.2 there is a parameter use_large_pages that controls hugepages allocation behavior. This parameter is not hidden, however there is nothing in public documentation but some notes on metalink (USE_LARGE_PAGES To Enable HugePages In 11.2 [ID 1392497.1]) and blog posts from Oracle guys.

This parameter can have three distinct values and defaults to use_large_pages=true.

In 11.2.0.2 default value preserves old behavior – if there are less hugepages then total allocated SGA – Oracle will write a warning message to alert.log and will go on with normal pages.

With  use_large_pages=only Oracle will check during the startup if there’s enough preallocated large pages and if there isn’t – will not proceed starting up  with a message like

Specified value of sga_max_size is too small, bumping to 94220845056 ****************** Large Pages Information ***************** Parameter use_large_pages = ONLY Large Pages unused system wide = 43000 (84 GB) (alloc incr 256 MB)

Large Pages configured system wide = 43000 (84 GB)

Large Page size = 2048 KB ERROR:  Failed to allocate shared global region with large pages, unix errno = 12. Aborting Instance startup.

ORA-27137: unable to allocate Large Pages to create a shared memory segment ACTION:  Total Shared Global Region size is 88 GB. Increase the number of  unused large pages to atleast 44932 (88 GB) to allocate 100% Shared Global  Region with Large Pages. ***********************************************************

With use_large_pages=false Oracle obviously will not use any of the large pages even if there are enough of them.

In 11.2.0.3 the default behavior has changed – now with use_large_pages=true and less then SGA hugepages Oracle will allocate part of the SGA with them and the resting part with normal 4k pages. In alert.log it will look like

Specified value of sga_max_size is too small, bumping to 94220845056 ****************** Large Pages Information ***************** Total Shared Global Region in Large Pages = 84 GB (95%) Large Pages used by this instance: 42881 (84 GB) Large Pages unused system wide = 119 (238 MB) (alloc incr 256 MB) Large Pages configured system wide = 43000 (84 GB) Large Page size = 2048 KB

RECOMMENDATION:  Total Shared Global Region size is 88 GB. For optimal performance,  prior to the next instance restart increase the number  of unused Large Pages by atleast 1929 2048 KB Large Pages (3858 MB)  system wide to get 100% of the Shared  Global Region allocated with Large pages ***********************************************************

So there is one trap less, but I hope Oracle will be certified for RHEL6 very soon, so we’ll have a transparent hugepages feature that will make this nice new parameter obsolete ).

Hugepages, Memory and Oracle

Configuring Hugepages For Oracle on Linux

Huge Pages Article

NOTE: I have recently discovered that Oracle, hugepages, and NUMA are incompatible, at least on Linux. NUMA must be disabled to use hugepages with Oracle.

RAM is managed in 4k pages in 64-bit Linux. When memory sizes were limited, and systems with more than 16G RAM were rare, this was not as much of an issue. However, as systems get more memory, and the increasing demand on performance for memory increased and become less manageable. Hugepages can make managing the large amounts of memory available in modern servers much less CPU intensive. In particular, with the number of memory pages reduced by typically three orders of magnitude, the chance that a particular page pointer will be available in the processor cache goes up dramatically.

First some caveats on using hugepages: Hugepages are not swappable, thus Oracle SGA memory must either be all hugepages are no hugepages. If you allocate hugepages for Oracle, and don’t allocate enough for the entire SGA, Oracle will not use any hugepage memory. If there is not enough non-hugepage memory, your database will not start. Finally, enabling hugepages will require a server restart, so if you do not have the ability to restart your server, do not attempt to enable hugepages.

Oracle Metalink note 1134002.1 says explicitly that AMM (MEMORY_TARGET/MEMORY_MAX_TARGET) is incompatible with hugepages. However, I have found at least one blog that says that AMM is compatible with hugepages when using the USE_LARGE_PAGES parameter in 11g (where AMM is available). Until further confirmation is found, I do not recommend trying to combine hugepages with MEMORY_TARGET/MEMORY_MAX_TARGET.

There are both Oracle database settings and Linux OS settings that must be adjusted in order to enable hugepages. The Linux and oracle settings of concern are below:

Linux OS settings:

/etc/sysctl.conf:

vm.nr_hugepages
kernel.shmmax
kernel.shmall

/etc/security/limits.conf:

oracle soft memlock
oracle hard memlock

Oracle Database spfile/init.ora:

SGA_TARGET = Size of the SGA for use currently
SGA_MAX_SIZE = Size the SGA *could* be increased to without restarting the server
MEMORY_TARGET = These parameters should not be used with hugepages
MEMORY_MAX_TARGET = These parameters should not be used with hugepages

USE_LARGE_PAGES

First, calculate the Linux OS settings. Kernel.shmmax should be set to the size of the largest SGA_TARGET on the server plus 1G, to account for other processes. For a single instance with 180G RAM, that would be 181G.

Kernel.shmall should be set to the sum of the SGA_TARGET values divided by the pagesize. Use ‘getconf pagesize’ command to get the page size. Units are bytes. The standard pagesize on Linux x86_64 is 4096, or 4k.

Oracle soft memlock and oracle hard memlock should be set to slightly less than the total memory on the server, I chose 230G. Units are kbytes, so the number is 230000000. This is the total amount of memory Oracle is allowed to lock.

Now for the hugepage setting itself: vm.nr_hugepages is the total number of hugepages to be allocated on the system. The number of hugepages required can be determined by finding the maximum amount of SGA memory expected to be used by the system (the SGA_MAX_SIZE value normally, or the sum of them on a server with multiple instances) and dividing it by the size of the hugepages, 2048k, or 2M on Linux. To account for Oracle process overhead, add five more hugepages . So, if we want to allow 180G of hugepages, we would use this equation: (180*1024*1024/2048)+5. This gives us 92165 hugepages for 180G. Note: I took a shortcut in this calculation, by using memory in MEG rather than the full page size. To calculate the number in the way I initial described, the equation would be: (180*1024*1024*1024)/(2048*1024).

In order to allow the Oracle database to use up to 180G for the SGA_TARGET/SGA_MAX_SIZE, below are the settings we would use for the OS:


/etc/security/limits.conf

oracle soft memlock 230000000
 oracle hard memlock 230000000

/etc/sysctl.conf

vm.nr_hugepages = 92165

kernel.shmmax = 193273528320+1g = 194347270144

kernel.shmall = 47448064

In the Oracle database there is a new setting in 11gR2. This is USE_LARGE_PAGES, with possible values of ‘true’, ‘only’, and ‘false’. True is the default and current behavior, ‘False’ means never use hugepages, use only small pages. ‘Only’ forces the database to use hugepages. If insufficient pages are available the instance will not start. Regardless of this setting, it must use either all hugepages or all smallpages. According to some blogs, using this setting is what allows the MEMORY_MAX_TARGET and MEMORY_TARGET to be used with hugepages. As I noted above, I have not verified this with a Metalink note as yet.

Next, set SGA_TARGET and SGA_MAX_SIZE to the desired size. I generally recommend setting both to the same size. Oracle recommends explicitly setting the MEMORY_TARGET and MEMORY_MAX_TARGET to 0 when enabling hugepages. So these are the values in the spfile that we change:

USE_LARGE_PAGES=only

SGA_TARGET=180G
SGA_MAX_SIZE=180G
MEMORY_MAX_TARGET=0
MEMORY_TARGET=0

In order to verify that hugepages are being used, run this command:

cat /proc/meminfo | grep Huge

It will show HugePages_Total, HugePages_Free, and HugePages_Rsvd. The HugePages_Rsvd value is the number of hugepages that are in use.

Note that this example uses Linux hugepage size of 2M (2048k).
On Itanium systems the hugepage size is 256M.

These instructions show you how to successfully implement huge pages in Linux. Note that everything would be the same for Oracle 10gR2, with the exception that the USE_LARGE_PAGES parameter is unavailable.

1.Reasons for Using Hugepages 1.Use hugepages if OLTP or ERP. Full stop.
2.Use hugepages if DW/BI with large numbers of dedicated connections or a large SGA. Full stop.
3.Use hugepages if you don’t like the amount of memory page tables are costing you (/proc/meminfo). Full stop.

2.SGA Memory Management Models 1.AMM does not support hugepages. Full stop.
2.ASMM supports hugepages.

3.Instance Type 1.ASM uses AMM by default. ASM instances do not need hugepages. Full stop.
2.All non-ASM instances should be considered candidate for hugepages. See 1.1->1.3 above.

4.Configuration 1.Limits (multiple layers) 1./etc/security/limits.conf establishes limits for hugepages for processes. Note, setting these values does not pre-allocate any resources.
2.Ulimit also establishes hugepages limits for processes.

5.Allocation 1./etc/sysctl.conf vm.nr_hugepages allocates memory to the hugepages pool.

6.Sizing 1.Read MOS 401749.1 for information on tools available to aid in the configuration of vm/nr_hugepages

To make the point of how urgently Oracle DBAs need to qualify their situation against list items 1.1 through 1.3 above, please consider the following quote from an internal email I received. The email is real and the screen output came from a real customer system. Yes, 120+ gigabytes of memory wasted in page tables. Fact is often stranger than fiction!

And here is an example of kernel pagetables usage, with a 24GB SGA, and 6000+ connections .. with no hugepages in use .

# grep PageT /proc/meminfo

PageTables: 123,731,372 kB

In part I of my recent blog series on Linux hugepages and modern Oracle releases I closed the post by saying that future installments would materialize if I found any pitfalls. I don’t like to blog about bugs, but in cases where there is little material on the matter provided elsewhere I think it adds value. First, however, I’d like to offer links to parts I and II in the series:
• Configuring Linux Hugepages for Oracle Database Is Just Too Difficult! Isn’t It? Part – I.
• Configuring Linux Hugepages for Oracle Database Is Just Too Difficult! Isn’t It? Part – II.

The pitfall I’d like to bring to readers’ attention is a situation that can arise in the case where the Oracle Database 11g Release 2 11.2.0.2 parameter USE_LARGE_PAGES is set to “only” thus forcing the instance to either successfully allocate all shared memory from the hugepages pool or fail to boot. As I pointed out in parts I and II this is a great feature. However, after an instance is booted it stands to reason that other processes (e.g., Oracle instances) may in fact use hugepages thus drawing down the amount of free hugepages. In fact, it stands to reason that other uses of hugepages could totally deplete the hugepages pool.

So what happens to a running instance that successfully allocated its shared memory from the hugepages pool and hugepages are later externally drawn down? The answer is nothing. An instance can plod along just fine after instance startup even if hugepages continue to get drawn down to the point of total depletion. But is that the end of the story?

What Goes Up, Must (be able to) Come Down
OK, so for anyone that finds themselves in a situation where an instance is up and happy but HugePages_Free is zero the following is what to expect:

12345678910111213141516171819202122

$ sqlplus ‘/ as sysdba’ SQL*Plus: Release 11.2.0.2.0 Production on Wed Sep 29 17:32:32 2010 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to an idle instance. SQL>SQL> HOST grep -i huge /proc/meminfoHugePages_Total: 4663HugePages_Free: 0HugePages_Rsvd: 10Hugepagesize: 2048 kB SQL> shutdown immediateORA-01034: ORACLE not availableORA-27102: out of memoryLinux-x86_64 Error: 12: Cannot allocate memoryAdditional information: 1Additional information: 6422533SQL>

Pay particular attention to the fact that sqlplus is telling us that it is attached to an idle instance! I assure you, this is erroneous. The instance is indeed up.

Yes, this is bug 10159556 (I filed it for what it is worth). The solution is to have ample hugepages as opposed to precisely enough. Note, in another shell a privileged user can dynamically allocate more hugepages (even a single hugepage) and the instance will be then able to be shutdown cleanly. As an aside, an instance in this situation can be shutdown with abort. I don’t aim to insinuate that this is some sort of zombie instance that will not go away.