ORA-27012 - but not the usual cause - see if you can guess what the issue was before you get to the end.......



We recently attempted to patch an 11.2.0.3 database with a 10G SGA to 11.2.0.4. The 11.2.0.4 software was copied from another server using the clone.pl script i've mentioned in a previous blog. This all installed and linked fine.

However when we attempted to start the instance at 11.2.0.4 we just got the error

ORA-27012: out of memory

Strange - as this had just started fine with 11.2.0.3. Also strange is the absence of any other message after ORA-27012 (it's normally followed by a unix error message of some description).

There appeared to be loads of free memory. We knew the kernel settings were fine as the database had started OK a few minutes before on 11.2.0.3.

Strange - anyway we reduced the SGA down just to get the patch done - which went fine and decided to come back to the memory issue later.

After the patch (which ran with no issues), we still couldn't get the database to start with a 10G SGA. In the short term we released it back to the developers while we investigated some more.

We decided to create a dummy instance on 11.2.0.4 to see what would happen.

I created an init file with the following content:

*.db_name='RICH'
*.sga_target=4G
*.diagnostic_dest=/tmp


So minimal - just to see what would happen (diag dest just set to make it easy to find the trace files)

export ORACLE_SID=RICH
sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Fri Jan 10 15:51:43 2014

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount
ORA-27102: out of memory
SQL> exit
Disconnected


So this new instance won't start either..... hmmm

lets go back to basics and check the kernel settings


# cat /proc/sys/kernel/shmmax
4294967295
#  cat /proc/sys/kernel/shmmni
4096
# cat /proc/sys/kernel/shmall
24903680
# getconf PAGE_SIZE
4096


So this tells us that the max size of any one chunk of shared memory is ~4GB (shmmax)
The max number of shared memory chunks is 4096 (shmmni)
The total size of all shared memory on the system is ~100GB (shmall*page_size)

This is actually nicely summarised by

#ipcs -lm
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 4194303
max total shared memory (kbytes) = 99614720
min seg size (bytes) = 1


There are other databases on the server so lets check if they are grabbing too much of anything

ipcs -a

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 1713668097 oracle     640        4096       0
0x00000000 1713700866 oracle     640        4096       0
0x79566774 1713733635 oracle     640        4096       0
0x00000000 1695481860 oracle     640        4096       0
0x00000000 1695514629 oracle     640        4096       0
0x32c7bb04 1695547398 oracle     640        4096       0
0x00000000 1717075975 oracle     640        4096       0
0x00000000 1054670856 oracle     640        67108864   133
0x00000000 1054703625 oracle     640        4261412864 133
0x00000000 1054736394 oracle     640        4261412864 133
0x00000000 1054769163 oracle     640        2147483648 133
0x79505a24 1054801932 oracle     640        2097152    133
0x00000000 1717108749 oracle     640        4096       0


so only 13 segments, but what is odd here is that the size of a lot of them and the nattch value (number of processes attached to the memory - basically a count from v$process in the database) is 0.

This led me on a wild goose chase for a while about changes to do with memory_target and huge pages and /dev/shm  - suffice to say that if memory_target is active then you can't look into ipcs to find shared memory info - you can find id values but these relate to multiple 'files' under /dev/shm of either 4MB of 16MB depending on how big the memory_target is.

As a simple test i made memory_target for 'RICH' 2GB, and SGA_TARGET 1GB to see what would happen . The database now started fine - still nothing but an id in ipcs -m but if i look under /dev/shm i now see a load of files

ls -l /dev/shm/*RICH*
-rw-r----- 1 oracle oinstall 16777216 2014-01-10 16:03 /dev/shm/ora_RICH_1717075975_0
-rw-r----- 1 oracle oinstall 16777216 2014-01-10 16:03 /dev/shm/ora_RICH_1717108749_0
-rw-r----- 1 oracle oinstall 16777216 2014-01-10 16:03 /dev/shm/ora_RICH_1717108749_1
-rw-r----- 1 oracle oinstall 16777216 2014-01-10 16:03 /dev/shm/ora_RICH_1717108749_10

etc

There are 123 of these files each of 16MB

123*16 is ~2GB so the maths works out - the number after the SID in the file name seems to correspond to the shmid in the ipcs output.

I digress.....

Anyway regardless of this setting the problem still occured even in my simple test where memory_target was not set.

So what on earth is going on here...

Lets check free memory on the system

# cat /proc/meminfo
MemTotal:       99193552 kB
MemFree:        45503724 kB
Buffers:         5527680 kB
Cached:         33473352 kB
SwapCached:        44216 kB
Active:         41455976 kB
Inactive:        6489096 kB
Active(anon):   11506404 kB
Inactive(anon):  1256284 kB
Active(file):   29949572 kB
Inactive(file):  5232812 kB
Unevictable:        5496 kB
Mlocked:            5496 kB
SwapTotal:      99614704 kB
SwapFree:       99264084 kB
Dirty:               980 kB
Writeback:             0 kB
AnonPages:       8912060 kB
Mapped:          2146520 kB
Shmem:           3827500 kB
Slab:            4015360 kB
SReclaimable:    3756244 kB
SUnreclaim:       259116 kB
KernelStack:       22048 kB
PageTables:       395628 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    149211480 kB
Committed_AS:   27106932 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      336440 kB
VmallocChunk:   34308450724 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        6384 kB
DirectMap2M:     2080768 kB
DirectMap1G:    98566144 kB



# free
             total       used       free     shared    buffers     cached
Mem:      99193552   53746748   45446804          0    5527684   37230268
-/+ buffers/cache:   10988796   88204756
Swap:     99614704     350620   99264084


These both seem to concur that there is ~45GB of memory free.

Let's check alert log maybe.


Fri Jan 10 15:51:50 2014
Starting ORACLE instance (normal)


Even less than i expected....

So to summarise

1. We can't start database due to 'out of memory'
2. Nothing in alert log
3. kernel is big enough
4. plenty of free memroy

hmmm....

Lets strace sqlplus to see if we get any more info about where the out of memory is being thrown.


# strace -f -o output.txt sqlplus / as sysdba
[ Process PID=3928 runs in 32 bit mode. ]

SQL*Plus: Release 11.2.0.4.0 Production on Fri Jan 10 16:25:55 2014

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount
[ Process PID=4176 runs in 64 bit mode. ]
[ Process PID=3929 runs in 32 bit mode. ]
[ Process PID=4176 runs in 64 bit mode. ]
[ Process PID=3929 runs in 32 bit mode. ]
[ Process PID=4176 runs in 64 bit mode. ]
[ Process PID=3929 runs in 32 bit mode. ]
ORA-27102: out of memory
SQL>


WHooa - hold on - what was that message before we even ran the startup command - that shouldn't be there, and in fact there is no 64 bit mentioned in the header at all - and now we get multiple 32/64 bit messages.

I think we have our cause.......

# file $ORACLE_HOME/bin/oracle
/oracle/11.2.0.4.DB/bin/oracle: setuid setgid ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.4, dynamically linked (uses shared libs), not stripped


We've cloned a 32bit home rather than a 64 bit one and patch 11.2.0.3 64 bit to 11.2.0.4 32bit.

A 32bit system cannot address more than about 3 to 4GB so thats the issue!

So now all we need to do is clone the 64 bit 11.2.0.4 home and do a quick 32 to 64 bit switch (which i think from memory is just utlrp - i'll check metalink and find out)

This was quite a weird issue to resolve but it proved useful anyway as we realise we really now need to look into memory_target, hugepages and ASMM/AMM to get the best out of the system. And we need to remember that ipcs isn;t going to be showing us everything any more.

This problem could not happen in 12c....

why you ask?

12c is 64 bit only so i guess in the future this won't cause any problems. Well at least until 128bit arrives anyway.......


1 comments:

  1. damn,i hit the same issue

    big thx for explanation

    ReplyDelete