Brocade Fibre Channel Networking Community

Expand all | Collapse all

creditrecovmode settings and blade faulting

  • 1.  creditrecovmode settings and blade faulting

    Posted 08-03-2015 02:31 AM



    I'm looking for some general guidelines and recommendations about the "creditrecovmode" settings for Brocade directors (credit recovery of backend ports).


    According to the documentation, these are the action mechanisms, depending on whether you use option "--cfg onLrOnly" or option "--cfg onLrThresh":


    • onLrOnly:
      When it detects credit loss, it performs a link reset.
      If the link reset fails to recover the port, the port reinitializes.
      If the port fails to reinitialize, the port is faulted
      If a port is faulted and there are no more online backend ports in the trunk, the core blade is faulted.   
      (Note that the port blade will always be faulted.)
    • onLrThresh:
      Recovery is attempted through repeated link resets and a count of the link resets is kept.
      If the threshold of more than the configured threshold value (using the -lrthreshold option) per hour is reached, the blade is faulted (RAS Cx-1018)
      (Note that regardless of whether the link reset occurs on the port blade or on the core blade, the port blade is always faulted.)


    What exactly is meant with "the port blade is faulted" or "the core blade is faulted" ?


    Does this mean that:

    • The blade will be marked as faulty, but will continue operations

    OR does this mean that:

    • FOS will disable the port blade or core blade completely (so all ports on the blade are not usable anymore) ?


    I am a bit prudent about implmenting this setting, I would't want an entire blade to be disabled by FOS without manual action from a SAN admin, even if credit loss has been detected.


    Anyone with experience in this matter?


  • 2.  Re: creditrecovmode settings and blade faulting

    Posted 08-04-2015 09:24 AM

    "the blade is faulted" really means that the blade will stop operating. yes, that sounds a bit scary. but i've never seen this happened. and now i double checked my companys knowledge base to see that there are only a very few (7 or 8) cases that mention the respective error codes (think of it - that's worldwide numbers). moreover, i didn't dig into these cases, so it is fairly possible that not all of them are talking about the blade was really faulted, but some of them may just explain what happens when etc... etc...
    my own reception of this setting - i always implement it with "onlronly" according to the brocade (and our tech support) best practices:

    from what i understand, a lost credit should mean that the "rdy" primitive was lost. to me - if you are loosing such a small things over the internal ports, there's a high risk that you are already transmitting garbage due to some low level issues - either an asic faiulure or a bad copper contact. it is really better to turn off the faulty thing. in the first case it needs to be replaced, in the latter case it has to be reseated.

    anyway, your san design should be "dual redundant" and then your host multipathing software should find other paths to transfer the data.