DX Unified Infrastructure Management

  • 1.  Multi level logmon alerts (Clear , Warn Critical)

    Posted Oct 15, 2010 09:54 PM

    Greetings,

    Basically x is less than 10 clear

    if x is between 10 and 20 warn

    if x is greater than 20 critical

     

    The way I approached it was to set everything up with a variable: Note using the NMS way to describe now

    Var  source >10

    Var source  < 10

    Var source <= 20

     

    Result for 25 first check

    Critical sends email/alarm

     

    Second check 26

    Sends warning for 26

    Sends Alarm for 26

     

    Problem is now we have sent two pages for the same alarm; our paging system sends for majors and critical only all others are emails, but because the second check cleared the critical with the warning and then upgraded the warning to a critical  see the problem.

     

    I was wondering how the between issue is solved?

     

    Thoughts?



  • 2.  Re: Multi level logmon alerts (Clear , Warn Critical)

    Posted Oct 16, 2010 02:07 AM

    Why did the second check send a warning if x was over 20?

     

    Could you paste or attach the relevant part of logmon.cfg?

     

    -Keith



  • 3.  Re: Multi level logmon alerts (Clear , Warn Critical)

    Posted Oct 18, 2010 03:05 PM

    The idea is we have a warning and a critical so we really are looking for a a way to have  a middle value with a between statement.

     

    Here is a snippet. per our security regs, I had to replace with *** to meet requirements

    ----

    <XXXstatus>
          active = yes
          interval = 10 min
          scanfile = ./***/XXXX.bash
          scanmode = command
          alarm = yes
          qos = yes
          message = no
          max_alarms =
          max_alarm_msg =
          password =
          subject =
          user =
          <watchers>
             <***_RCV-Clear>
                active = yes
                match = /PARAMVALUE;***_RCV;/
                level = clear
                subsystemid =
                message = $XXXtext
                restrict =
                expect = no
                abort = no
                sendclear = no
                count = no
                separator = ;
                suppid =
                source =
                target =
                qos =
                runcommandonmatch = no
                commandexecutable =
                commandarguments =
                <variables>
                   <***_RCV>
                      definition = 3-3
                      operator = gt
                      threshold = 15
                      qosactive = yes
                      qosname = <Default>
                      qostarget =
                   </***_RCV>
                   <XXXtext>
                      definition = 4-
                      operator = gt
                      threshold =
                      qosactive =
                      qosname =
                      qostarget =
                   </XXXttext>
                </variables>
             </***_RCV-Clear>
             <***_RCV-Warn>
                active = yes
                match = /PARAMVALUE;***_RCV;/
                level = warning
                subsystemid =
                message = $XXXttext
                restrict =
                expect = no
                abort = no
                sendclear = no
                count = no
                separator = ;
                suppid =
                source =
                target =
                qos =
                runcommandonmatch = no
                commandexecutable =
                commandarguments =
                <variables>
                   <***_RCV>
                      definition = 3-3
                      operator = le
                      threshold = 15
                      qosactive =
                      qosname =
                      qostarget =
                   </***_RCV>
                   <XXXttext>
                      definition = 4-
                      operator = gt
                      threshold =
                      qosactive =
                      qosname =
                      qostarget =
                   </XXXttext>
                </variables>
             </***_RCV-Warn>
             <***_RCV-Crit>
                active = yes
                match = /PARAMVALUE;***_RCV;/
                level = critical
                subsystemid =
                message = No msgs from the ASP for $***_RCV minutes
                restrict =
                expect = no
                abort = no
                sendclear = no
                count = no
                separator = ;
                suppid =
                source =
                target =
                qos =
                runcommandonmatch = no
                commandexecutable =
                commandarguments =
                <variables>
                   <***_RCV>
                      definition = 3-3
                      operator = le
                      threshold = 30
                      qosactive =
                      qosname =
                      qostarget =
                   </***_RCV>
                   <XXXttext>
                      definition = 4-
                      operator = gt
                      threshold =
                      qosactive =
                      qosname =
                      qostarget =
                   </XXXttext>
                </variables>
             </***_RCV-Crit>



  • 4.  Re: Multi level logmon alerts (Clear , Warn Critical)
    Best Answer

    Posted Oct 18, 2010 04:41 PM

    If you could use a regex to restrict each watcher to match only the messages that are under or over the threshold values, you can enable the "abort on match" option to allow only one watcher to match each line. That will prevent more than one alarm message from being sent for each log line. You cannot enable that option with your watchers as is because the regex is the same for all watchers. But more complex regexes would probably do the trick.

     

    Matching ranges of numbers with regexes is a little complicated, but it should be feasible. For example, I think this would work if you wanted to go low to high:

     

    - CLEAR: /PARAMVALUE;(\d|1[0-5]);/

    - WARNING: /PARAMVALUE;([1-2]\d|30);/

    - CRITICAL: /PARAMVALUE;(\d+)/

     

    Or if you wanted to go high to low:

     

    - CRITICAL: /PARAMVALUE;(3[1-9]|[4-9]\d|\d\d\d+)/

    - WARNING: /PARAMVALUE;(1[6-9];[23]\d);/

    - CLEAR: /PARAMVALUE;(\d+);/

     

    Hopefullly those are right or at least close... :smileyhappy:

     

    If you were to go this route, you can use the match expression option when defining the variable that includes the number. You could continue to leave the variable threshold configured, although that should not be necessary if the regexes are right.

     

    -Keith



  • 5.  Re: Multi level logmon alerts (Clear , Warn Critical)

    Posted Oct 18, 2010 09:27 PM

    Excellent! Thank you



  • 6.  Re: Multi level logmon alerts (Clear , Warn Critical)

    Posted Apr 23, 2012 06:23 PM

    Hello,

     

    I'm having an issue with getting the "abort on match" to work as expected.  First of all, a few questions regarding the "abort on match":

     

    1. Is it considered a "match" if only the match expression is true?  Or do variables also go into account here?  I know that for an alarm to be triggered, not only does the match expression have to be true but also one of the defined variables cannot be as expected.

     

    2. Does using a url as the mode change the behavior of the "abort on match"?

     

    Basically, I have a profile using url mode.  I have 3 watcher rules defined in this profile.  The first looks for a specific xml tag and result.  If found, we know that everything is "UP" and sends a clear alarm.  The second is looking for the same xml tag with a different result and then is grabbing the rest of the text as a regex variable.  Within that, I set 3 variables that are looking for specific services.  I am "expecting" to see each one.  If I don't see one of them, that will cause the watcher to send a critical alarm saying it is "DOWN".  I have set the Advanced setting of "abort on match" for this particular watcher rule.  That is because the final watcher rule is actually looking for the same xml tag and result as the "down" but is not looking for any variables.  So, as long as it sees the xml tag, it sends a warning alarm saying it is "DEGRADED".  The issue is, I cannot get the logmon to abort on the second watcher rule.  It says in the log that it should be aborting, but it continues evaluating the next watcher rule.  Here is the logmon.cfg for clarification.  Also, an excerpt from the log is below.

     

       <e2e>
          active = yes
          interval = 1 min
          scanfile = http://***/endtoendtest.txt
          scanmode = url
          alarm = yes
          qos = no
          message = no
          max_alarms =
          max_alarm_msg =
          password =
          timeout = 15
          retries = 2
          url_alarm_sev = 5
          urluser = ***

          urlpass = ***

          challengeresponse = yes
          proxyhost =
          proxyport =
          proxyuser =
          SslSettings = 0
          subject =
          user =
          <watchers>
             <UP>
                active = yes
                match = /<GotAllExpectedResponses>yes<\/GotAllExpectedResponses>/
                level = clear
                subsystemid =
                message = e2e is UP - TEST
                i18n_token =
                restrict =
                expect = no
                abort = no
                sendclear = no
                count = no
                separator =
                suppid = logmon/e2e
                source = e2e
                target =
                qos =
                runcommandonmatch = no
                commandexecutable =
                commandarguments =
                expect_message =
                expect_level =
             </UP>
             <DOWN>
                active = yes
                match = /<GotAllExpectedResponses>no<\/GotAllExpectedResponses>(.*)/
                level = minor
                subsystemid =
                message = e2e is DOWN - TEST
                i18n_token =
                restrict =
                expect = no
                abort = yes
                sendclear = no
                count = no
                separator =
                suppid = logmon/e2e
                source = e2e
                target =
                qos =
                runcommandonmatch = no
                commandexecutable =
                commandarguments =
                expect_message =
                expect_level =
                <variables>
                   <Parser>
                      definition = $1
                      operator = re
                      threshold = /<Service>Parser<\/Service>/
                      qosactive =
                      qosname =
                      qostarget =
                   </Parser>
                   <EServices>
                      definition = $1
                      operator = re
                      threshold = /<Service>EServices<\/Service>/
                      qosactive =
                      qosname =
                      qostarget =
                   </EServices>
                   <Inserter>
                      definition = $1
                      operator = re
                      threshold = /<Service>Inserter<\/Service>/
                      qosactive =
                      qosname =
                      qostarget =
                   </Inserter>
                </variables>
             </DOWN>
             <DEGRADED>
                active = yes
                match = /<GotAllExpectedResponses>no<\/GotAllExpectedResponses>/
                level = warning
                subsystemid =
                message = e2e is DEGRADED - TEST
                i18n_token =
                restrict =
                expect = no
                abort = no
                sendclear = no
                count = no
                separator =
                suppid = logmon/e2e
                source = e2e
                target =
                qos =
                runcommandonmatch = no
                commandexecutable =
                commandarguments =
                expect_message =
                expect_level =
             </DEGRADED>
          </watchers>
       </e2e>

     

    Logfile:

     

    Apr 23 10:42:51:647 [2404] logmon: [e2e] FORMAT START [default] - matches
    Apr 23 10:42:51:647 [2404] logmon: [e2e] FORMAT LINES [default] - matches
    Apr 23 10:42:51:647 [2404] logmon: [e2e] NO MATCH [UP] offset now -1
    Apr 23 10:42:51:647 [2404] logmon: [e2e] MATCH [DOWN] on line 0
    Apr 23 10:42:51:647 [2404] logmon: [e2e] e2e.DOWN: Alarm Message, severity=3, sid=1.1, msg='e2e is DOWN - TEST'
    Apr 23 10:42:51:647 [2404] logmon: abort [DOWN] on match!!
    Apr 23 10:42:51:647 [2404] logmon: [e2e] MATCH [DEGRADED] on line 0
    Apr 23 10:42:51:647 [2404] logmon: [e2e] sldmbe2e.DEGRADED: Alarm Message, severity=2, sid=1.1, msg='e2e is DEGRADED - TEST'
    Apr 23 10:42:51:647 [2404] logmon: [e2e] NO MATCH [DEGRADED] offset now -1
    Apr 23 10:42:51:647 [2404] logmon: [e2e] used 40 ms scanning 1792 bytes

     

    Any help is appreciated.  Thanks!

     

    Karen



  • 7.  Re: Multi level logmon alerts (Clear , Warn Critical)

    Posted Apr 24, 2012 06:16 PM

    Karen,

     

    I tested the abort on match just using a simple logfile and only two watchers rules (probe version 3.03 and 3.12) - and the abort on match worked as expected. This is a troubleshooting step you may have also performed. Looking at what you provided I believe that you have a bug. Perhaps either with URL type or more likely something to do with variables. 

     

    Good Luck



  • 8.  Re: Multi level logmon alerts (Clear , Warn Critical)

    Posted Apr 24, 2012 06:47 PM

    Thanks for the response.  I agree...I'm starting to think that maybe it is due to being in URL mode or perhaps it is the variables throwing it off.  I'm going to attempt using the "max alarms" on the profile and set it to "1".  Since really all I want is to receive one alarm from the profile each time it runs.  I haven't had an opportunity to test this yet (dependent on someone else for testing) but as soon as I do, I'll post the results.

     

    Perhaps I should open up a Nimsoft case regarding this potential bug. 

     

    Thanks again!



  • 9.  Re: Multi level logmon alerts (Clear , Warn Critical)

    Posted Apr 24, 2012 07:09 PM
    Seems to me you could just copy/paste your post - that should give support enough to go on