ASM audit file overload



I posted a few weeks ago about adding a cloud control metric extension to monitor inodes. inodes have come up again today as the filesystem filled up (i know, i know the metric extension should have been set up and checking before that happened).

Anyway the problem was quickly fixed but the issue that caused the problem needs addressing so that's what I'll talk about here.

So the problem we had was this

df -i .
Filesystem                    Inodes   IUsed IFree IUse% Mounted on
/dev/mapper/rootvg-lv_oracle 2752512 2752512     0  100% /oracle


A quick check reveals 99% of the files are in this directory

/oracle/product/12.1.0/grid/rdbms/audit 

This directory contains over 2 million tiny files all with content similar to this

Audit file /oracle/product/12.1.0/grid/rdbms/audit/+ASM_ora_4912_20141014124951016241143795.aud
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Automatic Storage Management option
ORACLE_HOME = /oracle/product/12.1.0/grid
System name:    Linux
Node name:      server
Release:        3.0.93-0.5-default
Version:        #1 SMP Tue Aug 27 08:17:02 UTC 2013 (925d406)
Machine:        x86_64
Instance name: +ASM
Redo thread mounted by this instance: 0 <none>
Oracle process number: 7
Unix process pid: 4912, image: oracle@server (TNS V1-V3)

Tue Oct 14 12:49:51 2014 +01:00
LENGTH : '144'
ACTION :[7] 'CONNECT'
DATABASE USER:[1] '/'
PRIVILEGE :[6] 'SYSASM'
CLIENT USER:[6] 'oracle'
CLIENT TERMINAL:[0] ''
STATUS:[1] '0'
DBID:[0] ''


This will be very familiar to most DBA's - it's logging any time privileged access is used - in this case SYSASM.

Looking more closely there is one of these files generated every second! Now initially i thought is this cloud control going crazy? Then i discounted that - it's not going to be trying every 1 second to connect. The only real candidate then was the oracle restart software which is monitoring the ASM instance - there must be something configured there that is going crazy.

A quick hunt around reveals that the problem is the check interval being applied to the ASM resource - it's every 1 second - which is a crazy amount of checking - this is shown below

 crsctl stat res ora.asm -p |grep ^CHECK_INTERVAL
CHECK_INTERVAL=1


So lets change that to something more sensible

crsctl modify resource ora.asm -attr "CHECK_INTERVAL=60"


And check that is set OK

 crsctl stat res ora.asm -p |grep ^CHECK_INTERVAL
CHECK_INTERVAL=60

That change is picked up straight away so it's going to log at lot less now - but we need to clean up.

Annoyingly there seems to be nothing built in to deal with this (well other than logging to syslog and dealing with it that way - in our setup though with outsourced providers this is too painful to contemplate). So we are reduced to old school cron jobs

So i run this as a one off

 /usr/bin/find /oracle/product/12.1.0/grid/rdbms/audit -type f -mtime +14 -exec rm {} \;

and schedule the same thing in crontab every day at 12:00

# Below line is to housekeep ASM audit files generated by CRS checking processes
00 12 * * * /usr/bin/find /oracle/product/12.1.0/grid/rdbms/audit -type f -mtime +14 -exec rm {} \;



So problem fixed - now just need to roll it out on all the single instance asm restart servers......





7 comments:

  1. It seems the syntax might have changed with 12.1.0.2.

    crsctl modify resource ora.asm -attr "CHECK_INTERVAL=15"
    CRS-4995: The command 'Modify resource' is invalid in crsctl. Use srvctl for this command.

    crsctl modify resource ora.asm -init -attr "CHECK_INTERVAL=15"

    ReplyDelete
    Replies
    1. # crsctl modify resource ora.asm -attr "CHECK_INTERVAL=60"
      CRS-4995: The command 'Modify resource' is invalid in crsctl. Use srvctl for this command.

      # crsctl modify resource ora.asm -attr "CHECK_INTERVAL=60" -unsupported
      # crsctl stat res ora.asm -p |grep ^CHECK_INTERVAL
      CHECK_INTERVAL=60


      Delete
  2. I have changed the CHECK_INTERVAL to 60 but still seem to get an audit log a second. Do I need to restart ?

    ReplyDelete
    Replies
    1. Been a while since i did it - but i don't think a restart was required.

      If it's still not working then i guess try a restart and see if that resolves it.

      Cheers,
      Rich

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. /u01/app/12.1.0.2/grid/bin/crsctl modify resource ora.orcl.db -attr AUTO_START=always


    get this error
    CRS-4995: The command 'Modify resource' is invalid in crsctl. Use srvctl for this command

    Note : I'm running 11gr2 RAC db on 12.1 Grid Infrastructure
    I want to start database automatically

    ReplyDelete