DX NetOps

 View Only
Expand all | Collapse all

Bad Link x Device not responding to polls - Correlation

  • 1.  Bad Link x Device not responding to polls - Correlation

    Posted Feb 04, 2020 07:57 AM
    Edited by Marcelo Zacchi Feb 04, 2020 07:58 AM
    Fellow Spectrum admins,

    In the situation below, shouldn't I expect Spectrum to have suppressed the "Bad Link" alarm?

    Best regards,


    ------------------------------
    Marcelo Zacchi
    CA Spectrum consultant
    Nets Denmark
    ------------------------------


  • 2.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 04, 2020 09:24 AM
    GeneratePortStatusAlarms 0x12a54 --> Generates a "Bad Link Detected" alarm if the port is polled.

    Check if the "PollPortStatus" attribute is enabled at port level as well as at device level and also check 
    GeneratePortStatusAlarms attribute at port level. 

    ------------------------------
    Thank you.
    Rajashekar
    ------------------------------



  • 3.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 04, 2020 09:45 AM
    Hi Rajeshekar,

    I have a policy to set this attributed to yes if the port is connected to another model in Spectrum. I do want to monitor that port, but if the device adjacent to it is down, I would like the "Bad Link" alarm to be suppressed by the "Device not responding" one. Makes sense?

    Regards,

    ------------------------------
    Marcelo Zacchi
    CA Spectrum consultant
    Nets Denmark
    ------------------------------



  • 4.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 04, 2020 10:24 AM
    I think is the other way around: the suppressed alarm for device down is under the bad link alarm.

    ------------------------------
    Senior Consultant
    SolvIT Networks
    ------------------------------



  • 5.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 04, 2020 11:10 AM
    Edited by Rajashekar Allala Feb 04, 2020 12:21 PM
    From the screenshot, I see, the Bad Link Detected alarm event code is a custom event based on the event 0x00010d11. Seems, fault isolation has broken. 

    https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/it-operations-management/spectrum/10-4/managing-network/event-configuration/event-and-alarm-customization.html

    ------------------------------
    Thank you.
    Rajashekar
    ------------------------------



  • 6.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 04, 2020 12:51 PM
    Edited by Rajashekar Allala Feb 04, 2020 12:52 PM
    Please verify below as well. These settings control the port fault correlation.

    Go to VNM model --> Fault isolation  --> Port Fault Correlation --> All connected ports

    Under Live Pipes --> Suppress Linked Port Alarms --> True

    Documentation for more information.

    https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/it-operations-management/spectrum/10-4/managing-network/modeling-and-managing-your-it-infrastructure/fault-management/port-fault-correlation.html

    ------------------------------
    Thank you.
    Rajashekar
    ------------------------------



  • 7.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 05, 2020 09:39 AM
    Hi Rajashekar,

    Thanks for providing this info. The event was modified to raise another one containing the interface name in the alarm title. I will have a look if I can alter it in some other way.
    This is the only event from the list in that URL which I have modified, thankfully.
    The VNM settings are consistent to what I have today.

    Best regards,

    ------------------------------
    Marcelo Zacchi
    CA Spectrum consultant
    Nets Denmark
    ------------------------------



  • 8.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 05, 2020 02:51 AM
    Hi Marcelo
    Live links purpose is to alert you to the fact that redundant links are down when a device is still responding to polls. Without this feature you could be blissfully ignorant to a meshed link or a load balanced link being down.
    There is no purpose in setting live links on a simple spoke connection. By definition’ if you have live links enabled on the connection, if polling to the spoke device fails you will get a port alarm as that is the closest fault to a device in an up state.
    I would remove all live links except those used to monitor redundant/load balanced/trunk links.
    Regards
    Stephen.


    Sent from my iPhone




  • 9.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 05, 2020 09:42 AM
    Hi Stephen,

    Agreed, that makes perfect sense, but since, as per my current policy, all live-links are monitored, I expected the link alarm to suppress the device down alarm or the other way around.

    Best regards,

    ------------------------------
    Marcelo Zacchi
    CA Spectrum consultant
    Nets Denmark
    ------------------------------



  • 10.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 05, 2020 10:01 AM
    Hi Marcelo
    That's sort of my point, using a policy to set all links to live links isn't something I would personally recommend. I can't think of an automation/policy rule to stop/remove the live links, or change the alerting behaviour, for just the spoke connections.
    Regards
    Stephen.

    Sent from my iPhone 





  • 11.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 06, 2020 06:13 AM
    Hi Stephen,

    I can't as well, but will certainly spend some time considering that.
    One question though, let's say I have a device that is connected to another 5 devices, and all the links are live-links.
    If it goes down, I should expect - using the current logic - to have 6 alarms (1 device down + 5 bad links), right?
    At the same time, if I put that single device in maintenance before-hand, I will not get any alarms. Now, it seems to me that Spectrum is using one suppressing logic for the device not responding to polls and another one for bad link alarms when one of the peers is in maintenance.
    It could be that, as @Rajashekar Allala suggested, this is related with the fact that I have modified the 0x10d11 alarm to add the interface name to it's title.
    I will try to simulate this condition in lab and see what happens.

    Best regards,​

    ------------------------------
    Marcelo Zacchi
    CA Spectrum consultant
    Nets Denmark
    ------------------------------



  • 12.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 06, 2020 07:20 AM
    Hi Marcelo
    Interesting. Yes, if I remember correctly what you say is correct ie if all the live links to a device go down then the device itself is the route cause alarm. This must be included in the root cause algorithm. Of you think about it this is similar to how a fanout behaves.
    Had a thought or two about how you might be able to work around this.
    How about setting an unused attribute to a value to signify that the device is an edge device with a single connection? These could be collect in a GC and the GC used in your live links policy. What do you think?
    Regards
    Stephen.




    Sent from my iPhone




  • 13.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 06, 2020 07:22 AM
    Hi Marcelo
    Sorry, to be clear - use the attribute to exclude edge devices from the GC.
    Regards
    Stephen

    Sent from my iPhone




  • 14.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 06, 2020 10:08 AM
    Hi Stephen,

    Yes, I understood you and it is a good idea. I found out that the first number in the "NeighborList" attribute brings the number o neighbours the device has:

    0 neighbor: blank
    1 neighbor: 1.0.0.0.134.55.0.50
    1 neighbor: 1.0.0.0.82.219.0.20
    1 neighbor: 1.0.0.0.167.0.0.20
    2 neighbors: 2.0.0.0.225.25.0.80.164.26.0.80
    2 neighbors: 2.0.0.0.170.21.0.70.172.21.0.70

    I won't have to use a custom attribute for that. Next steps would be to create a GC with the devices with only 1 neighbor and another one with the interfaces of these devices. Then exclude these interfaces from the live-link GC. Does that make sense to you?

    I'm done for the day, but will work on that tomorrow and keep this thread updated.

    Best regards,

    ------------------------------
    Marcelo Zacchi
    CA Spectrum consultant
    Nets Denmark
    ------------------------------



  • 15.  RE: Bad Link x Device not responding to polls - Correlation
    Best Answer

    Posted Feb 06, 2020 11:21 AM
    Hi Marcelo
    That sounds promising. Only complication I can think of is that will still leave the uplink port on the upstream device, say the core/distribution switch being set to live links. This would still lead to the port alarm still being raised on the ‘near end’ port and therefore suppressing the ‘far end’ device alarm. Agh! lol.
    Regards
    Stephen



    Sent from my iPhone




  • 16.  RE: Bad Link x Device not responding to polls - Correlation

    Posted Feb 07, 2020 07:22 AM
    Hi Stephen,

    Good point. Back to the drawing board :)

    Best regards,

    ------------------------------
    Marcelo Zacchi
    CA Spectrum consultant
    Nets Denmark
    ------------------------------