I'm a newbiw too nimsoft and have been tasked with a deployment in my company.
We have a mix of AS400 and Linux hosts. All works well except that we occasionally get flooded with alerts since the probes do not have flap detection.
AS 400 - the systat probe is conofigured to alert when processing > 80%. Checks are performed every 60s. The sysstat probe has no option for samples so I cannot configure it to alert after say 5 valid samples.
Linux - the processes probe is configured to monitor certain java processes. There is an option to alert when the process is down. Checks are performed every 60s. We have a nightly cron that restarts the java process which takes a over a minute to start up. We then get alerted that the process is down and then clears in the next minute. There is no option for valid samples on the process being down.
Unless I am missing something, the only solution to this is using the auto operator with a rule to mark these messages as invisible, then a trigger to launch a script to count the number of alerts and then to send out a custom alert?? Is there another way of doing this?
How you handle those depends largely on how you use Nimsoft. In our case we mostly handle alarms in our ticket system, so we create tickets for alarms that have been active for 5 minutes. In this case it doesn't matter much if we get alarms with short lifespan.
If you need to see it from the console immediately when it occurs (in most cases), you'd have to go for NAS rules. You probably need not go to scripting as AO profiles have count detection as well.
With the processes probe, if you have regular maintenance restarts, you could build a exclude schedule in the probe itself and use that for the process profile that restarts.
I agree with jonhcw, it really depends on how you implemented alarm escalation at your company.
For example, we have a 24x7 NOC that monitors IM/UMP directly. They're only able to see Major and Critical visible alarms. So for situations that you're discribing, we have a NAS rule that turns them invisible until the count reaches X then another rule that changes the alarm back to visible so the Operators can then see and respond to it. I tried to attach a couple screenshots that show a basic example.