Stopping alerts from cloud control whilst doing maintenance



Previously when doing any kind of operating system patch we never used to bother dealing with any of the alerts that were generated by cloud control - we'd just delete them after the event as we knew what the cause was and it wasnt really affecting anyone else.

However we've recently hooked our 'request tracker' tool into grid control so when events are generated in cloud control they send an email - this email results in the creation of a ticket in RT and we use this as our incident/request tracking system.

If you've not seen  RT you should definitely check it out - it allows you to create tickets from a 'mailbox'. So many companies work this way with email request being sent into a 'group' mailbox to get processed - this tool auto generates tickets and wil track all responses to the emails - it's really very impressive. The website desnt look much but its a fantastic tool - anyway enough about that (I'm not on commission by the way.....)

So now we had a reason to stop wanting the alerts to get generated - it would mean we'd get lots of tickets.

So how do we stop the shutting down of 10 databases, a listener and an agent from sending all these alerts?

Easy - use blackouts.

This has been around for a long time I've just never really paid much attention to it before - it's very simple to use and does exactly what you want. It can be done from the command line on the agent, from the gui or from emcli.

For my simple case of wanting to block everything from a certain host i just used emcli, if you wanted to black out loads of hosts/databases across multiple systems (for whatever reason) then the gui is probably a beter choice.

Anyway - on to what actually had to be done.

To set the blackout:

# emctl start blackout SLESPATCH -nodeLevel
Oracle Enterprise Manager Cloud Control 12c Release 3
Copyright (c) 1996, 2013 Oracle Corporation.  All rights reserved.
Blackout SLESPATCH added successfully
EMD reload completed successfully


This creates a blackout called 'SLESPATCH' (the name is just a label). The nodeLevel switch means just black out everything on that node.

To then see the status of the blackout

# emctl status blackout
Oracle Enterprise Manager Cloud Control 12c Release 3
Copyright (c) 1996, 2013 Oracle Corporation.  All rights reserved.

Blackoutname = SLESPATCH
Targets = (server:host,)
Time = ({2014-01-29|17:25:25,|} )
Expired = False


Here you can see a truncated list of what is blacked out, the blackout name, when it started etc

Once the maintenance work is complete

#  emctl stop blackout SLESPATCH
Oracle Enterprise Manager Cloud Control 12c Release 3
Copyright (c) 1996, 2013 Oracle Corporation.  All rights reserved.
Blackout SLESPATCH stopped successfully
EMD reload completed successfully


It's a simple as that - no alerts are generated during the blackout time.

7 comments:

  1. Useful one.. thanks

    ReplyDelete
  2. As usual Rich - you manage to put together these incredibly informative EM-related blog posts almost instantaneously. Great Job!!

    Just wondering if there's any way to clear alerts running SQL queries or any jobs against repository in 12c? Haven't tried with emcli yet but i prefer to clear them off at one go than logging into each server and manually clearing them using emcli :) Incidents link from console didn't help me much and Increasing parameter value job_queue_processes had given relief to few but mine was set to 1000 but still it fails. Kind of a mess.

    Step 4: Summary of operation
    0 out of 378 updates were successful.
    364 out of 378 updates had errors.
    14 out of 378 updates were skipped.

    Thanks,
    -Revanth.

    ReplyDelete
  3. Hi Revanth,
    Not 100% sure what you are asking. I think you want to remove multiple incidents from cloud control in one go?

    I've never done it but it looks like the emcli has an option to do this - tun rhis and you'll see some usage details

    emcli help delete_incident_record

    seems its limited to 20 at a time but you could just create something to spool a whole load of emcli commands?

    The step 4 you show was that trying to do multiple incident clearing from the GUI - which then seems to have failed?

    Cheers,
    Rich

    ReplyDelete
  4. Hi Rich - Am so sorry for not being so clear in my last post.

    Just trying to figure out if there's any way to clear outstanding warning and critical alerts using cloud control repository.

    In good old 11g times, i used to log in as sysman and query the mgmt_current_severity and then use em_severity.delete_current_severity to remove the desired alert. But looks to me that delete_current_severity is not working in 12c. I know we can clear them off using EMCLI command but am specially interested in doing this job from the cloud repository. If there's any solution in 12c, please let me know. Old habits really do die hard :)

    The attempted operation of Step 4 is from Incident Manager page from 12c, which is to manually clear bulk of alerts at one go but unfortunately couldn't get them off from incident's charts :(

    ReplyDelete
  5. Hi again,
    Well that proc still exists in 12c (i never used it in earlier versions personally) - does it throw an error when you try and use it ?

    It looks like the only documented way is to use the gui or use emcli - both will i'm sure just be calling plsql procs in the background through. The clear stateless command from emcli is probably just calling EM_SEVERITY.CLEAR_STATELESS_ALERTS - have you tried using that instead?

    I'd be wary of just trying it though without some kind of OK from support (or at least a good backup... :-))

    Cheers,
    Rich

    ReplyDelete
  6. emctl command not found error is coming. Give me the solution how can I resolve it.

    ReplyDelete
  7. HI,
    You can't have your environment set up correctly - ORACLE_HOME etc must point to the cloud control installation.

    Cheers,
    Rich

    ReplyDelete