DX NetOps

 View Only
  • 1.  Alert rule for Critical Devices

    Posted Mar 19, 2023 01:37 PM

    Hi Community,

    We have some critical devices in our environment so whatever the alarms comes to those device should be trigger as Critical alarms. Is it possible in Spectrum?



    ------------------------------
    Regards,

    Eshwar
    ------------------------------


  • 2.  RE: Alert rule for Critical Devices

    Posted Mar 20, 2023 06:35 AM

    You need to be more specific on the request. This is by default available with SPECTRUM: all sort of alarms are created, based on various conditions of devices/interfaces/applications, conditions that are either trigger by SNMP Traps or internal checks and mechanisms. 

    Normally, alarms are threated generally the same for all devices. So if an alarm is by default Major, it will be Major, no matter what device is asserted on. This can be customized to some degree, but it will not work for all alarm types. 

    Each device has an attributed called Criticality. This attribute defines the way the system computes impact analysis. 

    Here some info on that. That would only apply to a contact status lost of the device, so it should not apply to all alarms. 

    You can try to create some Services and define custom Service Policies that would trigger Critical Service Alarms, whenever some type of alarms occur on that specific device. 



    ------------------------------
    Cătălin Fărcășanu
    Senior Consultant
    SolvIT Networks
    ------------------------------



  • 3.  RE: Alert rule for Critical Devices

    Posted Mar 20, 2023 11:32 AM

    Hi Catalin,

    I understood that Spectrum can trigger Major , Critical and Minor alarms but my concern here is that we have bunch of Critical, High and Medium devices on basis that we need alarms irrespective of device. for example if the device is Critical we need critical alarms, if the device is High we need Major alarm, if the device is Medium we need minor alarm. 

    As you mentioned  Criticality attribute so it does only applicable for contact status lost of the device not for other alarms?



    ------------------------------
    Regards,

    Eshwar
    ------------------------------



  • 4.  RE: Alert rule for Critical Devices

    Posted Mar 22, 2023 10:40 AM

    Hi Eshwar,

    It's possible but I think the deciding point will be how big your managed network is and how many alarms you get. Maybe even isolate alarms that your operations team would have to address immediately. For example, if you are managing routers and you get BGP alarms. By default a BGP alarm is Major severity. If you want to change that to Critical because the router is a critical router for your company, then you would have to create a rule for the events that create that probably cause and make a new event with the same PCause but Critical severity. As you can imagine this can turn into an administrative nightmare but not impossible. 

    Another approach can be through SANM if you have it. It will require scripting and maintaining a "lookup" list of inventory that will need special treatment. Make SANM run that script that will validate the device and modify the alarm via CLI to Critical, Major, whatever. Depending on how many alarms you get, it can take up resources and put strain on your server. But you can throttle that with Global Collections and selective alarms again.

    Another approach can be via Policy Manager. There is an AlarmMgmt Model type and modify the Condition based on members of a Global Collection. Now this one I am not sure it will work and I don't have a lab at this time to try. But go for it, there might be more options in Policy Manager. 

    Of the three options I feel the SANM approach might be better and maybe a little easier.  Hope this helps and give it a try. Let me know what you do since I might need to do that one day. 




  • 5.  RE: Alert rule for Critical Devices

    Posted Mar 22, 2023 04:22 PM

    At this moment I would say that Service Manager is the only way to go, with custom Service Profile. That might work. You cannot have one alarm as Critical for one device and Major for another device. 

    As Cesar specifies, event configuration can be a way of doing it. I did something similar for a set of devices, but only for one alarm, not for all alarms. For all alarms it is not manageable. Cesar says it 's a nightmare, I tell you for sure, it cannot be done. You cannot maintain it during upgrades either. Alarms are being added with new version of software, so you'd have to review all type of alarms. This is not feasible, so I would not recommend it. 

    Depending on the number of alarms, modifying alarms via SANM+AlarmNotifier instance, could be a way. You cannot change the criticality level of one alarm from Minor to Major using CLI. You cand modify attributes of alarms, but the criticality level of the alarm is defined in the Event Configuration. You also have to consider that this is single threaded and if the number of alarms exceeds a certain value, you might have delays induces by the processing of alarms. Generally, it would be better to process this type of actions using EventProcedures, based on memory loaded attributes, but this requires event configuration, thus managing parallel set of events/alarms for all alarms that are created. It's a no brainer. 

    I would try to understand better what the customer wants and try to get him in the right direction. It's not always good to try to answer yes to any of the customer's requests. You might take a step back and try a different approach that would provide similar results. Why would someone want to have all alarms for a certain device as critical only? This defeats the purpose of having different values for criticality. All alarms are the same, there's not one more important than other, so NOC cannot really tell what they should address first. That's the reason for I recommended using Service Manager. You can define various level o criticality for all Services and decide base on service alarm impact what the alarms that have to be addressed first. 

    They don't seem to understand very well the concept of Root Cause Analysis and Impact Analysis. Spectrum should be used to reduce the number of alarms and not as notification server. Basically this is what setting only one alarm criticality per customer is doing: sending one type of notification per customer. You're better of with an email sent that contains title CRITICAL for all alarms for a certain customer device, no matter what is displayed in the OneClick interface.

    How do you manage 5000+ active alarms? 



    ------------------------------
    Cătălin Fărcășanu
    Senior Consultant
    SolvIT Networks
    ------------------------------



  • 6.  RE: Alert rule for Critical Devices

    Posted Mar 23, 2023 09:33 AM

    As an alternative to changing alarm criticality, which as others have pointed out would be a lot of effort to setup, maintain etc, you could consider using the Criticality attribute and filter using that in the Alarm View. That way you could set up a filter that will only show alarms for 'Critical Devices'. 

    You would need to set your critical devices to have a specific value for Criticality, say 20.

    The Critical Alarm filter will still show all alarms as Minor, Major, Critical but will only show alarms for devices with a Criticality of 20. As for any filter this could be setup to filter out specific alarms not of interest.

    One other suggestion is that if you have a Service Desk tool integrated, such as ServiceNow, you could do your device/site priority setting in there.

    Hope that helps.




  • 7.  RE: Alert rule for Critical Devices

    Posted Mar 27, 2023 06:57 AM

    Thank you for your response.

    That's great news, we have some options to tweak the alarms. Let me try it and will come back.



    ------------------------------
    Regards,

    Eshwar
    ------------------------------