One of our users is asking for an interesting set of logic for processing BGP alarms. We already have thousands of standing alarms in our environment. We are trying to cut that down help people focus on the real problems. One of our big pain points with alarms is around BGP backwards Transition traps. What my user would like for us to do is:
Generate one alarm when a router reports a BGP backward transition.
Increase the Occurrence count as more are generated.
Keep track of the peer router identified in the BGP backward transition alarm
As traps are received that indicate that the peers return to established, remove those from the list and decrease the occurrence count.
When the occurrence count transitions from 1 to 0 clear the alarm
Maintaining a list of peer routers is not that difficult. I was looking to use Event Procedures to maintain an additional alarm attribute where we would keep the current peer router list.
However, the Occurrence count is where I'm stuck. I know that when a second alarm of the same type comes in that is not unique, it causes the count to increase. However, I don't know how to decrease the occurrence count. Is that even possible?
I'm assuming you already tried event discriminators for creating unique alarms. This was you only have one alarm up/alarm clear event. I'm not familiar with all the var binds you get in the BGP alarms, but you might be able to do it this way.
I don't think occurrence can decrease.
You are right, occurence count only increases when same event type occurs with same set of varbind discriminator values(if existing) until it is cleared. this is the only purpose of it (it captures total number of times it occured).
the requirement that you are talking is something different, in which case if we start decreasing occurence count, we would loose the actual total count.
Thanks for confirming. I thought that we could only increase the occurrence count with each instance of the trap hitting us.
The problem with making each unique based on the trap variable is that we get a large number of these messages and we already have an extremely large number of alarms in our environment. So we are looking at ways to make smarter alarms, not more alarms.
I have a way to deal with this. Instead of modifying the occurrence count, we will maintain a list that contains each instance and the device owner knows that if the alarm clears, the list clears.