I'm looking for some general guidelines and recommendations about the "creditrecovmode" settings for Brocade directors (credit recovery of backend ports).
According to the documentation, these are the action mechanisms, depending on whether you use option "--cfg onLrOnly" or option "--cfg onLrThresh":
What exactly is meant with "the port blade is faulted" or "the core blade is faulted" ?
Does this mean that:
OR does this mean that:
I am a bit prudent about implmenting this setting, I would't want an entire blade to be disabled by FOS without manual action from a SAN admin, even if credit loss has been detected.
Anyone with experience in this matter?
"the blade is faulted" really means that the blade will stop operating. yes, that sounds a bit scary. but i've never seen this happened. and now i double checked my companys knowledge base to see that there are only a very few (7 or 8) cases that mention the respective error codes (think of it - that's worldwide numbers). moreover, i didn't dig into these cases, so it is fairly possible that not all of them are talking about the blade was really faulted, but some of them may just explain what happens when etc... etc...my own reception of this setting - i always implement it with "onlronly" according to the brocade (and our tech support) best practices:
from what i understand, a lost credit should mean that the "rdy" primitive was lost. to me - if you are loosing such a small things over the internal ports, there's a high risk that you are already transmitting garbage due to some low level issues - either an asic faiulure or a bad copper contact. it is really better to turn off the faulty thing. in the first case it needs to be replaced, in the latter case it has to be reseated.
anyway, your san design should be "dual redundant" and then your host multipathing software should find other paths to transfer the data.