An unexpected sequel (the template of doom) ...



In an unexpected follow up to the hit post AUTOMATICALLY ASSIGNING MONITORING TEMPLATES TO DATABASES IN CLOUD CONTROL I've something of a warning about properly testing templates and what they are actually going to do.....

We've recently had a couple of cases where tablespaces ran out of space (DBA 101 i know) - i put this down to an agent glitch but when it happened a third time i started to dig deeper and found a terrible mistake in the whole template thing I'd done......

Now it seems I'd made the assumption (after doing much reading on the subject too which makes this all the more worse) that when a template is applied via the dynamic admin groups that anything you don;t specify is 'left alone' - well it turns out this isn't the case.

If you apply a template directly (ignoring any special clever groups) then you are asked the question below


So the default is - only apply the changes - anything i don't specify just leave that alone OK....

Now as this seems to be the default when you apply templates in this way i kind of assumed that it was the same for administration groups and unless i missed a screen (or missed the option on the screen) there was no choice in the matter when this was set up (my memory could be playing tricks here of course).

But it would seem that they don;t behave as i had expected - what actually happens is that when the template is applied all of the metrics continue to be collected (unless explicitly disabled in the template) - however - and this is a big HOWEVER - any thresholds associated with any of the database metrics are just nulled out! So any alerting based on anything other than you explicitly set in the template is removed!

This can be seen when we go and look at the metrics screen for a database without the auto assigned template and one with it




As you can see all the thresholds are missing - however if you just casually browse the all metrics screen and pull back values all the information is there - just completely ignored by the alerting as no thresholds are set!

So anyway after this shock revelation all i had to do was copy the metrics i was actually interested in alerting on from a unchanged database and copy these into the template and then reapply the template to all the databases.

And that fixed it - and resulted in about 60 emails with tablespace running out of space......

Now I'm not sure if there should be an option to choose how the templates are applied or at least some warning about this - it just seems a little dangerous to me.....

0 comments:

Post a Comment