DX Unified Infrastructure Management

 View Only
  • 1.  Ever wanted to detect and act on an eventstorm from logmon, ntevl or syslog ?

    Posted Dec 11, 2009 04:36 PM

    I spoke to our support guys the other day.  They brought up an issue that many customers want to resolve.  In short and simple language:

          ".... send alarm or do an action if N number of alerts happened in M number of minutes."

    The way I solved this is by a single AO profile and a single script.  The AO profile will scan all open alarms comming from the 'ntevl' probe (this can be extended to 'ntevl|logmon' etc.) with suppcount > 3 every 5 minutes.

    The NAS 3.31 (comming soon) allows AO profiles to send arguments to the script.  But the solution can well be put into work in the current NAS.

    The script will escalate the matching alarm to a MAJOR severity, and generate a secondary alarm (as an example).

    Enjoy,
    Carstein

    ----8<------8<------8<------8<------8<------8<------8<------8<------8<------8<--
    --
    -- Function to scan the transaction-log for the number of suppressions in a moving time window
    --
    -- Examples:
    --  local nid = "UN29351917-74961"
    --  printf("num. of suppressed transactions last (default: 15 'minutes'): %d", numSuppAlarmsLast(nid))
    --  printf("num. of suppressed transactions last 5  (default: 'minutes'): %d", numSuppAlarmsLast(nid,5) )
    -- 
    --  printf("num. of suppressed transactions last 15 min   : %d", numSuppAlarmsLast(nid,15,"minute") )
    --  printf("num. of suppressed transactions last hour     : %d", numSuppAlarmsLast(nid,1, "hour") )
    --  printf("num. of suppressed transactions last day      : %d", numSuppAlarmsLast(nid,1, "day") )
     
    function numSuppAlarmsLast(nimid,num,unit)
      
       if nimid==nil then error ("numSuppAlarmsLast: no nimid!") end
       if num==nil then num=15 end
       if unit==nil then unit="minutes" end

       if unit~="minute" and unit~="hour" and unit~="day" and unit~="minutes" and unit~="hours" and unit~="days" then
          error ("numSuppAlarmsLast: unit is one of minute(s), hour(s) or day(s)!")
       end
       local sql = "SELECT COUNT(type) as nsupp FROM NAS_TRANSACTION_LOG WHERE nimid='"..nimid.."' AND type = 2 AND time >= datetime('now','localtime','-"..num.." "..unit.."')"
       local al  = alarm.query (sql)
       return al.nsupp
    end

    if SCRIPT_ARGUMENT == nil then
       SCRIPT_ARGUMENT = "15 minutes"
    end

    -- NAS 3.31 supports AO arguments, expect argument on the form: num unit
    -- e.g 15 minutes
    args = split(SCRIPT_ARGUMENT)

    -- Get the current alarm-record
    a = alarm.get()
    if a == nil then error ("Missing current alarm-record!") end

    n = numSuppAlarmsLast(a.nimid,args,args)

    printf("%s has %d suppressed transactions the last %s %s", a.nimid, n, args, args)
    if n>5 then
       action.escalate (NIML_MAJOR,a.nimid)
       nimbus.alarm    (NIML_MAJOR,"Check the logs at '"..a.hostname.."'",a.nimid)
    end






  • 2.  RE: Ever wanted to detect and act on an eventstorm from logmon, ntevl or syslog ?

    Broadcom Employee
    Posted Aug 11, 2019 09:28 PM
    See the Time Over Threshold Event Rule:
    https://docops.ca.com/ca-unified-infrastructure-management-probes/ga/en/how-to-articles/the-time-over-threshold-event-rule


  • 3.  RE: Ever wanted to detect and act on an eventstorm from logmon, ntevl or syslog ?

    Broadcom Employee
    Posted Aug 12, 2019 07:43 AM
    the one problem with this is performance.
    As NAS acts as a single thread when a script runs all other alarm processing is halted.
    If you have Thousands of alarms to scan through and correlate and you are doing this every minute or so this could potentially really slow down your alarm processing.
    If used this should be test in a lab to make sure it does not have a negative impact.

    ------------------------------
    Gene Howard
    Principal Support Engineer
    Broadcom
    ------------------------------



  • 4.  RE: Ever wanted to detect and act on an eventstorm from logmon, ntevl or syslog ?

    Broadcom Employee
    Posted Aug 12, 2019 09:06 PM
    Hi ,

    You can also enable nas storm protection functionality 

    https://ca-broadcom.wolkenservicedesk.com/external/article?articleId=34276