I spoke to our support guys the other day. They brought up an issue that many customers want to resolve. In short and simple language:
".... send alarm or do an action if N number of alerts happened in M number of minutes."
The way I solved this is by a single AO profile and a single script. The AO profile will scan all open alarms comming from the 'ntevl' probe (this can be extended to 'ntevl|logmon' etc.) with suppcount > 3 every 5 minutes.
The NAS 3.31 (comming soon) allows AO profiles to send arguments to the script. But the solution can well be put into work in the current NAS.
The script will escalate the matching alarm to a MAJOR severity, and generate a secondary alarm (as an example).
Enjoy,
Carstein
----8<------8<------8<------8<------8<------8<------8<------8<------8<------8<--
--
-- Function to scan the transaction-log for the number of suppressions in a moving time window
--
-- Examples:
-- local nid = "UN29351917-74961"
-- printf("num. of suppressed transactions last (default: 15 'minutes'): %d", numSuppAlarmsLast(nid))
-- printf("num. of suppressed transactions last 5 (default: 'minutes'): %d", numSuppAlarmsLast(nid,5) )
--
-- printf("num. of suppressed transactions last 15 min : %d", numSuppAlarmsLast(nid,15,"minute") )
-- printf("num. of suppressed transactions last hour : %d", numSuppAlarmsLast(nid,1, "hour") )
-- printf("num. of suppressed transactions last day : %d", numSuppAlarmsLast(nid,1, "day") )
function numSuppAlarmsLast(nimid,num,unit)
if nimid==nil then error ("numSuppAlarmsLast: no nimid!") end
if num==nil then num=15 end
if unit==nil then unit="minutes" end
if unit~="minute" and unit~="hour" and unit~="day" and unit~="minutes" and unit~="hours" and unit~="days" then
error ("numSuppAlarmsLast: unit is one of minute(s), hour(s) or day(s)!")
end
local sql = "SELECT COUNT(type) as nsupp FROM NAS_TRANSACTION_LOG WHERE nimid='"..nimid.."' AND type = 2 AND time >= datetime('now','localtime','-"..num.." "..unit.."')"
local al = alarm.query (sql)
return al.nsupp
end
if SCRIPT_ARGUMENT == nil then
SCRIPT_ARGUMENT = "15 minutes"
end
-- NAS 3.31 supports AO arguments, expect argument on the form: num unit
-- e.g 15 minutes
args = split(SCRIPT_ARGUMENT)
-- Get the current alarm-record
a = alarm.get()
if a == nil then error ("Missing current alarm-record!") end
n = numSuppAlarmsLast(a.nimid,args,args)
printf("%s has %d suppressed transactions the last %s %s", a.nimid, n, args, args)
if n>5 then
action.escalate (NIML_MAJOR,a.nimid)
nimbus.alarm (NIML_MAJOR,"Check the logs at '"..a.hostname.."'",a.nimid)
end