DX NetOps

Is it time for a policy-based handling of events in Spectrum?

  • 1.  Is it time for a policy-based handling of events in Spectrum?

    Posted 08-18-2015 07:48 AM

    Hi Community,

     

    after working with SPECTRUM for so many years and doing all kind of magic stuff with event rules and procedures I'm kinda stuck with one issue.

    I can't really find an easy solution to tell Spectrum to handle a specific event in different ways depending on our policy on how to treat the situation.

    Sounds a litte bit confusing? Let me give an example:

     

    The small business customer

    We have a Cisco router located at a small company office which provides internet-access for our customer. If the device reports a fan failure (via SNMP trap) our internal policy tells us to alarm this as a Major alarm which will be taken care of during regular business hours.

     

    The large enterprise customer

    Another customer, same kind of device (or ModelType) but this time it is a very important enterprise customer. In this case, if the device reports a fan failure (again via SNMP trap) we want to alarm this as Critical so NOC employees will send out field-service staff right away to fix the issue.

     

    So basically SPECTRUM should handle the event based on a policy. Sure... I could use some internal attribute on the model and an event condition to tell SPECTRUM: "IF event 0x0 occurs and attribute 0x1 == 1000 then create alarm 0x2 with severity critical ELSE IF event 0x0 occurs and attribute 0x1 == 2000 then create alarm 0x2 with severity major" but imagine you need to do this for a large number of events like how to treat BGP session alarms or power supplies or port down events. You would end up in a mess.

     

    "Policy-based event-handling" could be a possible solution. Kinda the way CA PC uses monitoring/threshold profiles. You would bundle event rules into an event-policy which you assign to a model. A model can be assigned to 0 or * event-policies which have a priority in case they include event rules which might cause conflicts. The policies get assigned via the use of Global Collections. Think of the current existing policies but for events instead of attributes.

     

    Has anyone out there seen this issue? Anyone found a solution which still works if you have 50 landscapes 100k devices with million of events and you need 10 different ways to handle a fan failure?

    I kinda hope I can start a discussion about this.. and maybe we come up with an "idea" which I can sum up for CA and which gets supported by a lot of people from the community.

     

    Thanks and looking forward to get some feedback.

     

    Regards,

    Jan