Some devices become very insistent when there is an environmental condition present - such as high temperature - and send traps every 10 seconds. We recently had this problem and it was flooding our Spectrum system. Even worse, it occurred during the db backup interval and Archive Manager was shutdown and stuff really piled up.
What I would like would be a way in EventDisp to allow 'n' alarms to be generated, then have an alarm generated that says 'suppressing xyz alarm' and no more alarms generated until no events have been seen for 's' seconds. The latter is almost exactly a EventRateWindowRule, but I'm not sure how to tie the rules together.
Example for 'n'=2 and 2 minute window ('s'=2)
10:00:00 event -> alarm FAN BAD
10:00:10 event -> alarm FAN BAD
10:00:20 event -> alarm SUPPRESS FAN BAD ERRORS FOR 2 MINUTES
<no more events>
10:03:00 alarm > (Clear) FAN BAD ERRORS HAVE STOPPED
Do you get the threshold breached and reset traps or it is just threshold exceeded traps for temperature? If it is just threshold breach , Spectrum will not generate multiple alarms and the events are updated unless "we are generating unique alarm for this events".
For event counter rule to work, you need to have both the breached and reset events happen continously until they breach some number and then generate new threshold breached alarm.
ex: alarm fan bad
alarm fan good
both occur for n number of times then generate the new threshold event with an alarm.
Thanks for your reply.
Unfortunately, what we were seeing was continuous fan alarms, once every 10 seconds. There was a temperature problem which wasn't resolved for multiple hours and we had multiple devices sending these alerts every 10 seconds with no resets.
SEC (Simple Event Correlator) has an easy way to handle this through the use of a context variable, so one could generate an alarm if a context (too_many_fan_bad) wasn't set. Then another command that would look for so many FAN BAD events occurring within a set window of time; if found it would set the context with a particular time to live. When the problem stopped, the context would be deleted and a new message could be sent
Thanks for using the CA Community! It looks like your question has been answered so we are closing this thread out. If you have additional questions, please feel free to contact us again.