We've recently been hit a couple of times with systems that haven't been able to write a file to a filesystem - normally of course this is just due to a lack of space - but in certain cases a filesystem can run out of inodes (essentially a pointer to a file for the os). This tends to happen when many small (and i'm talking tens of thousands + files here) files are all in the same filesystem - audit/trace files can often be the cause of this - particularly if you have jobs running as sysdba that logs on multiple times to the database creatng a small audit file each time. These can of course be cleaned up automatically (for example) but there are still cases where this can become an issue.
Inodes per filesystem is something defined when a filesystem is created - and can be altered at this point. If you are storing large database files in a filesystem then the inodes can be reduced to give you more disk space. Anyway i digress this wasn't meant to be a general chat about inodes - it's to illustrate how to monitor them with cloud control.
Now out of the box there are some inode related metrics but nothing that actually seems to check the usage of them - filesystem space is of course checked- but nothing for inodes.
To check the current usage on a system you can simply to df -i rather than df -k
oracle@server:/oracle/home/oracle> df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/rootvg-root
655360 6850 648510 2% /
udev 6699830 4745 6695085 1% /dev
/dev/cciss/c0d0p1 32128 42 32086 1% /boot
/dev/mapper/rootvg-home
131072 4320 126752 4% /home
/dev/mapper/rootvg-opt
327680 96826 230854 30% /opt
So you can see of 327,000 inodes in /opt 96,000 (30% are used) - if we created 230,000 more file we'd use them all up.
So what we want to do is somehow to be able to check the inode usage using a metric extension - easy right? Well yes it is once you get the config of the metric extension right - which is easy when you know how but took me a little bit of messing around to get right - hopefully my example will make things clear.
First up before we create the metric we need to create a linux command line that will just give us the very basic information we need - filesystem name and inode usage. Now the default df -i output is not friendly at all (especially if you have long LV names) - i had to get a little creative in order to get the output i wanted - the command i eventually came up with is this
/bin/df -Pil |grep -iv Mounted |/bin/awk '{OFS = "|"}{gsub("%","")}{print $6,$5}'
Nice huh... (well i thought it was quite neat) - here is a quick summary how it ended up as this
1) Full paths to commands must be used
2) df -Pil , P gives us posix format to remove the multiple lines, i for inodes of course and l to only include 'local' filesystems
3) remove the header row with the grep -v
4) Now the awk script (don't you just love awk.....). OFS defines the fields separator (| or pipe in our case|), gsub does a global replace of % with null and the print just gives us column 6 (filesystem name) and column 5 (inode usage)
So the output of that looks like this
oracle@server:/oracle/home/oracle> /bin/df -Pil |grep -iv Mounted |/bin/awk '{OFS = "|"}{gsub("%","")}{print $6,$5}'
/|2
/dev|1
/boot|1
/home|4
/opt|30
So now we have a command that gives us the output we want - now we just have to wrap a metric extension round it.
Here is how to do that:
1) First up we go to Enterprise->Monitoring->metric extensions and fill in some basic details
2) Now this is the tricky screen that took me ages to get right..... so pay attention 007. The key bit here is to make sure the command is just set to ksh (or whatever your favourite shell is). The script argument is specified using the %scriptsDir% syntax to refer to the file we create further down the same screen - cloud control will automatically deploy this file with the metric. Content of this file is shown in screenshot 3
3) So here we fill in the command we created earlier and give the file a name - which much be the same name obviously we refer to in the script part
4) Now we just define to output columns to match what we expect from the command output
5) output column 1
6) output column 2
7) Now select credentials - we just use the default here which is the ones the agent is using anyway for all the other checks
8) Now we test it by picking a host and clicking "run test" - no prizes for guessing that one
9) Now we get the summary screen
10) And there we go created !
There are a couple of steps to then make this 'usable' which are mentioned in an earlier example of metric extension i blogged about here (see about half way down after the extension is created). It can then be deployed to whatever hosts you see fit - I've again done this for my estate using a monitoring template associated with an admin group - so anything i add automatically gets the same monitoring as everything else in that group.
The example above can be used for any host command you care to come up with - it's suprisingly easy once you figure out the 'pay attention' screen.
I've even uploaded the export of the metric extension should you wish to use it me download (as always with anything you download from random people on the internet - check it out before you use it......). This can be imported in from the main screen. (it was written in 12.1.0.4 but i guess should import in older versions......)
Cloud control again shows how incredibly powerful it is!
Comments
Post a Comment