Configuring Hugepages For Oracle on Linux
Huge Pages Article
NOTE: I have recently discovered that Oracle, hugepages, and NUMA are incompatible, at least on Linux. NUMA must be disabled to use hugepages with Oracle.
RAM is managed in 4k pages in 64-bit Linux. When memory sizes were limited, and systems with more than 16G RAM were rare, this was not as much of an issue. However, as systems get more memory, and the increasing demand on performance for memory increased and become less manageable. Hugepages can make managing the large amounts of memory available in modern servers much less CPU intensive. In particular, with the number of memory pages reduced by typically three orders of magnitude, the chance that a particular page pointer will be available in the processor cache goes up dramatically.
First some caveats on using hugepages: Hugepages are not swappable, thus Oracle SGA memory must either be all hugepages are no hugepages. If you allocate hugepages for Oracle, and don’t allocate enough for the entire SGA, Oracle will not use any hugepage memory. If there is not enough non-hugepage memory, your database will not start. Finally, enabling hugepages will require a server restart, so if you do not have the ability to restart your server, do not attempt to enable hugepages.
Oracle Metalink note 1134002.1 says explicitly that AMM (MEMORY_TARGET/MEMORY_MAX_TARGET) is incompatible with hugepages. However, I have found at least one blog that says that AMM is compatible with hugepages when using the USE_LARGE_PAGES parameter in 11g (where AMM is available). Until further confirmation is found, I do not recommend trying to combine hugepages with MEMORY_TARGET/MEMORY_MAX_TARGET.
There are both Oracle database settings and Linux OS settings that must be adjusted in order to enable hugepages. The Linux and oracle settings of concern are below:
Linux OS settings:
oracle soft memlock
oracle hard memlock
Oracle Database spfile/init.ora:
SGA_TARGET = Size of the SGA for use currently
SGA_MAX_SIZE = Size the SGA *could* be increased to without restarting the server
MEMORY_TARGET = These parameters should not be used with hugepages
MEMORY_MAX_TARGET = These parameters should not be used with hugepages
First, calculate the Linux OS settings. Kernel.shmmax should be set to the size of the largest SGA_TARGET on the server plus 1G, to account for other processes. For a single instance with 180G RAM, that would be 181G.
Kernel.shmall should be set to the sum of the SGA_TARGET values divided by the pagesize. Use ‘getconf pagesize’ command to get the page size. Units are bytes. The standard pagesize on Linux x86_64 is 4096, or 4k.
Oracle soft memlock and oracle hard memlock should be set to slightly less than the total memory on the server, I chose 230G. Units are kbytes, so the number is 230000000. This is the total amount of memory Oracle is allowed to lock.
Now for the hugepage setting itself: vm.nr_hugepages is the total number of hugepages to be allocated on the system. The number of hugepages required can be determined by finding the maximum amount of SGA memory expected to be used by the system (the SGA_MAX_SIZE value normally, or the sum of them on a server with multiple instances) and dividing it by the size of the hugepages, 2048k, or 2M on Linux. To account for Oracle process overhead, add five more hugepages . So, if we want to allow 180G of hugepages, we would use this equation: (180*1024*1024/2048)+5. This gives us 92165 hugepages for 180G. Note: I took a shortcut in this calculation, by using memory in MEG rather than the full page size. To calculate the number in the way I initial described, the equation would be: (180*1024*1024*1024)/(2048*1024).
In order to allow the Oracle database to use up to 180G for the SGA_TARGET/SGA_MAX_SIZE, below are the settings we would use for the OS:
oracle soft memlock 230000000
oracle hard memlock 230000000
vm.nr_hugepages = 92165
kernel.shmmax = 193273528320+1g = 194347270144
kernel.shmall = 47448064
In the Oracle database there is a new setting in 11gR2. This is USE_LARGE_PAGES, with possible values of ‘true’, ‘only’, and ‘false’. True is the default and current behavior, ‘False’ means never use hugepages, use only small pages. ‘Only’ forces the database to use hugepages. If insufficient pages are available the instance will not start. Regardless of this setting, it must use either all hugepages or all smallpages. According to some blogs, using this setting is what allows the MEMORY_MAX_TARGET and MEMORY_TARGET to be used with hugepages. As I noted above, I have not verified this with a Metalink note as yet.
Next, set SGA_TARGET and SGA_MAX_SIZE to the desired size. I generally recommend setting both to the same size. Oracle recommends explicitly setting the MEMORY_TARGET and MEMORY_MAX_TARGET to 0 when enabling hugepages. So these are the values in the spfile that we change:
In order to verify that hugepages are being used, run this command:
cat /proc/meminfo | grep Huge
It will show HugePages_Total, HugePages_Free, and HugePages_Rsvd. The HugePages_Rsvd value is the number of hugepages that are in use.
Note that this example uses Linux hugepage size of 2M (2048k).
On Itanium systems the hugepage size is 256M.
These instructions show you how to successfully implement huge pages in Linux. Note that everything would be the same for Oracle 10gR2, with the exception that the USE_LARGE_PAGES parameter is unavailable.
1.Reasons for Using Hugepages 1.Use hugepages if OLTP or ERP. Full stop.
2.Use hugepages if DW/BI with large numbers of dedicated connections or a large SGA. Full stop.
3.Use hugepages if you don’t like the amount of memory page tables are costing you (/proc/meminfo). Full stop.
2.SGA Memory Management Models 1.AMM does not support hugepages. Full stop.
2.ASMM supports hugepages.
3.Instance Type 1.ASM uses AMM by default. ASM instances do not need hugepages. Full stop.
2.All non-ASM instances should be considered candidate for hugepages. See 1.1->1.3 above.
4.Configuration 1.Limits (multiple layers) 1./etc/security/limits.conf establishes limits for hugepages for processes. Note, setting these values does not pre-allocate any resources.
2.Ulimit also establishes hugepages limits for processes.
5.Allocation 1./etc/sysctl.conf vm.nr_hugepages allocates memory to the hugepages pool.
6.Sizing 1.Read MOS 401749.1 for information on tools available to aid in the configuration of vm/nr_hugepages
To make the point of how urgently Oracle DBAs need to qualify their situation against list items 1.1 through 1.3 above, please consider the following quote from an internal email I received. The email is real and the screen output came from a real customer system. Yes, 120+ gigabytes of memory wasted in page tables. Fact is often stranger than fiction!
And here is an example of kernel pagetables usage, with a 24GB SGA, and 6000+ connections .. with no hugepages in use .
# grep PageT /proc/meminfo
PageTables: 123,731,372 kB
In part I of my recent blog series on Linux hugepages and modern Oracle releases I closed the post by saying that future installments would materialize if I found any pitfalls. I don’t like to blog about bugs, but in cases where there is little material on the matter provided elsewhere I think it adds value. First, however, I’d like to offer links to parts I and II in the series:
• Configuring Linux Hugepages for Oracle Database Is Just Too Difficult! Isn’t It? Part – I.
• Configuring Linux Hugepages for Oracle Database Is Just Too Difficult! Isn’t It? Part – II.
The pitfall I’d like to bring to readers’ attention is a situation that can arise in the case where the Oracle Database 11g Release 2 220.127.116.11 parameter USE_LARGE_PAGES is set to “only” thus forcing the instance to either successfully allocate all shared memory from the hugepages pool or fail to boot. As I pointed out in parts I and II this is a great feature. However, after an instance is booted it stands to reason that other processes (e.g., Oracle instances) may in fact use hugepages thus drawing down the amount of free hugepages. In fact, it stands to reason that other uses of hugepages could totally deplete the hugepages pool.
So what happens to a running instance that successfully allocated its shared memory from the hugepages pool and hugepages are later externally drawn down? The answer is nothing. An instance can plod along just fine after instance startup even if hugepages continue to get drawn down to the point of total depletion. But is that the end of the story?
What Goes Up, Must (be able to) Come Down
OK, so for anyone that finds themselves in a situation where an instance is up and happy but HugePages_Free is zero the following is what to expect:
$ sqlplus ‘/ as sysdba’ SQL*Plus: Release 18.104.22.168.0 Production on Wed Sep 29 17:32:32 2010 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to an idle instance. SQL>SQL> HOST grep -i huge /proc/meminfoHugePages_Total: 4663HugePages_Free: 0HugePages_Rsvd: 10Hugepagesize: 2048 kB SQL> shutdown immediateORA-01034: ORACLE not availableORA-27102: out of memoryLinux-x86_64 Error: 12: Cannot allocate memoryAdditional information: 1Additional information: 6422533SQL>
Pay particular attention to the fact that sqlplus is telling us that it is attached to an idle instance! I assure you, this is erroneous. The instance is indeed up.
Yes, this is bug 10159556 (I filed it for what it is worth). The solution is to have ample hugepages as opposed to precisely enough. Note, in another shell a privileged user can dynamically allocate more hugepages (even a single hugepage) and the instance will be then able to be shutdown cleanly. As an aside, an instance in this situation can be shutdown with abort. I don’t aim to insinuate that this is some sort of zombie instance that will not go away.