Clearing alarm 'x' when alarm 'y' is received

View Only

Back to discussions

Expand all | Collapse all

Jump to Best Answer

1. Clearing alarm 'x' when alarm 'y' is received

0 Recommend
Stiofan Conlon
Posted Jun 28, 2019 08:25 AM

Reply Reply Privately
Hi all,

Another noob question inbound!

I am monitoring a Linux service with logmon (checking the exit code of a command).

If the service fails the alarm triggers, I would like to be able to clear this alert if the service recovers.

Could anyone point me in the right direction ?

Thanks for your time and help!
2. RE: Clearing alarm 'x' when alarm 'y' is received
Best Answer

0 Recommend
David Michel
Posted Jun 28, 2019 09:14 AM

Reply Reply Privately
For something like that the processes probe may be a better fit.
https://docops.ca.com/ca-unified-infrastructure-management-probes/ga/en/alphabetical-probe-articles/processes-process-monitoring

------------------------------
Support Engineer
Broadcom
------------------------------

Original Message
3. RE: Clearing alarm 'x' when alarm 'y' is received

0 Recommend
David Michel
Posted Jun 28, 2019 09:14 AM

Reply Reply Privately
Oh and this page shows the compatibility between probes and OS versions.
https://docops.ca.com/ca-unified-infrastructure-management/9-0-2/en/files/490068425/537402493/6/1561451411753/Platform_Support_Availability_current.pdf

------------------------------
Support Engineer
Broadcom
------------------------------

Original Message
4. RE: Clearing alarm 'x' when alarm 'y' is received

0 Recommend
Stiofan Conlon
Posted Jun 28, 2019 09:26 AM

Reply Reply Privately
Hi David!

Thanks for your quick response,

So I have the process probe running also, but I was running logon to verify the service is up, but also working as expected.

is it possible to use auto-operator to close out the alert generated by logmon? say if it returns a '0' on the next run, or if the server is rebooted?

Thanks again!

Original Message
5. RE: Clearing alarm 'x' when alarm 'y' is received

0 Recommend
Garin Walsh
Posted Jun 28, 2019 11:58 AM

Reply Reply Privately
You need to script this (suggest using Lua) in an AO.

You need to find the alarm you want to close with something like:

alarm1=alarm.list("where","robot = '" .. robot .. "' and supp_key = '" .. supp .. "'")

You'd adjust the where criteria to match what you need - here I already knew the supp_key for the alarm I was looking for.

alarm1 then has a list of the matching alarms. Identify the one you need to close and then

action.close (a.nimid)

will close it.

Original Message
6. RE: Clearing alarm 'x' when alarm 'y' is received

0 Recommend
Christian McHugh
Posted Jun 28, 2019 12:11 PM

Reply Reply Privately
So we can have the logmon probe run a verification command every 5 minutes, and check the exit code. If the exit code is not 0 it generates an alert, but when the service is later recovered and the verification command succeeds with exit code of 0, it requires custom scripting to close out the alert?
Since this is how nagios and sensu operate, I'd expect this to be a fairly normal feature. Is it just not supported by the logmon probe when running commands, or is there a better probe to use for this?

Original Message
7. RE: Clearing alarm 'x' when alarm 'y' is received

0 Recommend
Garin Walsh
Posted Jul 01, 2019 11:01 AM

Reply Reply Privately
My apologies - I understood the question to be that you wanted to detect the problem with logmon but clear the issue with the processes probe. For something like that you need scripting because you are crossing the boundary between probes. If you are doing all the testing with logmon then, generally speaking, all you need is to have a logmon watcher that returns clear that has the same suppression id as the watcher that created the alarm. For the return code checking I believe that the clear happens automatically when you have a zero return code.

From my experience though, I'd suggest that you avoid the return code checking and instead have the script return some filterable text value - (OK or FAIL for instance) as, if nothing else, it makes debugging the whole thing easier.

Original Message
8. RE: Clearing alarm 'x' when alarm 'y' is received

0 Recommend
Sam Green
Posted Jul 09, 2019 11:02 AM

Reply Reply Privately
If I'm correct in thinking, you need a heartbeat monitor. We monitor text is written to a log file and if so - all is good. If no text is written - alert.

Here's a profile which will alert you to no activity within a log file, and clear itself when activity is detected:

   <MS27 - DEV2>
      active = yes
      interval = 10 min
      scanfile = /home/rs.log
      fileencoding =
      scanmode = updates
      alarm = yes
      qos = no
      message = no
      subject =
      user =
      reccur_directory = no
      reccur_directory_level = 10
      resetFile = no
      initialfileptr = 2
      resumefileptr = 4
      command_timeout_active = no
      command_timeout =
      command_severity = 2
      command_timeout_alarm = 0
      alarmFOpenFail = no
      clearFOpenFailRestart = no
      monitor_exit_code = No
      max_alarm_sev = 5
      max_alarms =
      max_alarm_msg =
      password =
      <watchers>
         <Heartbeat>
            active = yes
            match = *
            level = minor
            subsystemid = 1.1
            message = Heartbeat outage detected
            i18n_token =
            restrict =
            expect = yes
            abort = no
            sendclear = no
            count = no
            separator =
            suppid = DEV2
            source =
            target =
            qos =
            runcommandonmatch = no
            alarm_on_first_match = yes
            commandexecutable =
            commandarguments =
            pattern_threshold_severity = information
            pattern_threshold_message =
            timeout = 1
            pattern_threshold =
            expect_message = Heartbeat detected - DEV2
            expect_level = information
            regexfromexternalfile = no
            patternfilepath =
            token =
            variable_threshold =
            variable_threshold_message =
            variable_threshold_severity = information
            variable_threshold_supp =
         </Heartbeat>
         <Heartbeat Clear>
            active = yes
            match = *
            level = clear
            subsystemid = 1.1
            message = Heartbeat detected
            i18n_token =
            restrict =
            expect = no
            abort = no
            sendclear = no
            count = no
            separator =
            suppid = DEV2
            source =
            target =
            qos =
            runcommandonmatch = no
            alarm_on_first_match = yes
            commandexecutable =
            commandarguments =
            pattern_threshold_severity = information
            pattern_threshold_message =
            timeout = 1
            pattern_threshold =
            expect_message =
            expect_level =
            regexfromexternalfile = no
            patternfilepath =
            token =
            variable_threshold =
            variable_threshold_message =
            variable_threshold_severity = information
            variable_threshold_supp =
         </Heartbeat Clear>
      </watchers>
   </MS27 - DEV2>

------------------------------
CA - UIM administrator
------------------------------

Original Message

DX Unified Infrastructure Management

Clearing alarm 'x' when alarm 'y' is received

Stiofan ConlonJun 28, 2019 08:25 AM

David MichelJun 28, 2019 09:14 AMBest Answer

David MichelJun 28, 2019 09:14 AM

Stiofan ConlonJun 28, 2019 09:26 AM

Garin WalshJun 28, 2019 11:58 AM

Christian McHughJun 28, 2019 12:11 PM

Garin WalshJul 01, 2019 11:01 AM

Sam GreenJul 09, 2019 11:02 AM

1. Clearing alarm 'x' when alarm 'y' is received

2. RE: Clearing alarm 'x' when alarm 'y' is received Best Answer

3. RE: Clearing alarm 'x' when alarm 'y' is received

4. RE: Clearing alarm 'x' when alarm 'y' is received

5. RE: Clearing alarm 'x' when alarm 'y' is received

6. RE: Clearing alarm 'x' when alarm 'y' is received

7. RE: Clearing alarm 'x' when alarm 'y' is received

8. RE: Clearing alarm 'x' when alarm 'y' is received

2. RE: Clearing alarm 'x' when alarm 'y' is received
Best Answer