Brocade Fibre Channel Networking Community

Expand all | Collapse all

Link Reset on Port S0,P0(0) vc_no=0 crd(s)lost=80 auto trigger.

  • 1.  Link Reset on Port S0,P0(0) vc_no=0 crd(s)lost=80 auto trigger.

    Posted 03-11-2020 06:55 AM
    We have a small environment with two hosts running in an active/passive setup connected via two Brocade Model 6505 switches to a 3PAR 8200.

    Last night we got the Subject error message on both Switches and the host lost connectivity to its disks.

    Can anyone explain to me what could have caused this?

    What can we do to prevent this from happening again?


  • 2.  RE: Link Reset on Port S0,P0(0) vc_no=0 crd(s)lost=80 auto trigger.

    Posted 03-11-2020 01:20 PM
    Stefan,
    it seems like there were some lost credits detected on port 0.    ​This might be caused by physical layer issues.   You should check the error counter on that particular port and root cause the lost credits.   You can setup automatic port fencing in cases like that  (so the hosts multipathing driver will have it easy to move the load to the healthy link) using MAPS if you have a Fabric Vision license installed.


    ------------------------------
    Senior Systems Engineer
    Broadcom - Brocade Storage Networking
    ------------------------------



  • 3.  RE: Link Reset on Port S0,P0(0) vc_no=0 crd(s)lost=80 auto trigger.

    Posted 03-12-2020 03:20 AM
    Hi Thomas,

    When you say physical layer I assume you refer to SFP and cables. I do not observe major error counts on the switches especially not at P0.

    Why would it also happen on both switches at exactly the same time?

    It's almost like both switches reset as the RH Linux host lost connectivity to the disk.

    Stefan


  • 4.  RE: Link Reset on Port S0,P0(0) vc_no=0 crd(s)lost=80 auto trigger.

    Posted 03-12-2020 05:05 AM
    @StefanDB007  If it's not related to cabling/SFPs, then troubleshooting why the link resets occurred requires more data. ​  Better open a support ticket to get the supportsaves analyzed.

    ------------------------------
    Senior Systems Engineer
    Broadcom - Brocade Storage Networking
    ------------------------------



  • 5.  RE: Link Reset on Port S0,P0(0) vc_no=0 crd(s)lost=80 auto trigger.

    Posted 03-12-2020 06:42 AM

    I do not observe major error counts on the switches especially not at P0.

    The conditional word 'major' in your sentence suggests that there are still some errors. It doesn't necessarily take a lot to result in LR's.

    Why would it also happen on both switches at exactly the same time?

    Because what you're seeing on the switches is a symptom of another problem, not the cause. For both to register issues at exactly the same time would point to an issue with the end device.

    Last night we got the Subject error message on both Switches and the host lost connectivity to its disks.

    I think you have the order of events kind of the wrong way round. What you've experienced is a timeout condition between your switches and whatever resides on port 0. The switches have then issued a link reset in an attempt to re-establish connectivity.

    At this point, I would be looking for why the device on port 0 had stopped communicating and caused the switch ports to link reset, not the other way round.




  • 6.  RE: Link Reset on Port S0,P0(0) vc_no=0 crd(s)lost=80 auto trigger.

    Posted 03-12-2020 07:18 AM
    Hi Calvin,

    We look after a rather big environment with a SAN consisting of Two Fabrics, each with 7 Directors and abour 3000 Nodes. We often (due to bad cabling) see devices with ITW counts in excess of 1000000.

    So to put things in context, the above I would describe as major.

    The issue is seen on a little "island" environment with two 6505 switches, a 3PAR and two hosts

    Unfortunately I reset the stats and I have no recollection of prior error count. Also that count was for a 6 month period.

    Never the less... p0 on both 6505s is connected to the 3PAR. I will monitor events and investigate what happened on the 3PAR.

    Last question and excuse me for asking this but can we be sure that S0,P0 is indeed port 0 and not for instance the asic?

    Thanks for your input.
    Stefan


  • 7.  RE: Link Reset on Port S0,P0(0) vc_no=0 crd(s)lost=80 auto trigger.

    Posted 03-12-2020 10:04 AM

    We look after a rather big environment with a SAN consisting of Two Fabrics, each with 7 Directors and abour 3000 Nodes. We often (due to bad cabling) see devices with ITW counts in excess of 1000000.

    So to put things in context, the above I would describe as major.

    Yeah, I'd call that major too :) The point I was trying to make is that you don't necessarily need 'major' to experience an outage. Just because you're not seeing 1,000,000 errors in this instance doesn't discount the possibility that you still have/had a problem.

    Unfortunately I reset the stats and I have no recollection of prior error count. Also that count was for a 6 month period.

    If you've got access to a BNA instance, you should be able to graph historical errors on the associated ports.

    Never the less... p0 on both 6505s is connected to the 3PAR. I will monitor events and investigate what happened on the 3PAR.

    If you haven't done so already, I would raise a case with 3PAR. The switches didn't receive R_RDY's on both connected ports for an extended period of time that exceeded the E_D_TOV. Getting low level analysis from the array side of things as to why that might have happened is key.

    Last question and excuse me for asking this but can we be sure that S0,P0 is indeed port 0 and not for instance the asic?

    S0,P0 is short for 'Slot 0, Port Index 0'. As these are 6505's, slot 0 is effectively the switch and the port index is the same as the port number (ie, 0). For a port on a director, you might see something like S12,P122.