I have two ISLs on my switch in a trunk. The telco provider sometimes fails the circuit over on an ISL changing the distance from 45km to 58km or back again.
When they do this the whole trunk hangs and we lose IO down the ISLs. I know a trunk is sensitive to distance and must be very close in distance.
I would like to know how I can sto the trunk from crashing when the circuits failover. At the moment we have had to disable trunking to mitigate this, but it is not ideal.
It sounds like you got a SPOF (telco) in the design over which you do not have any control.
If the circuit failover does not happen on both ISL at once you are momentarily trying to trunk on two ISL's which are distance wise incompatible.
If the circuit failover happens simitaneous on both ISL this will hurt the fabric as it segments then goes through a principal selection then rejoins and another election takes place.
If you have no control when a circuit failover happens and which ISL's are impacted, I would plan an alternative in which the fabric will not segment, whether that ill be dark fiber, FCIP, second telco provider and disable trunking (as you already did).
I have no control over the circuit fail over. We have asked the telco to set the one fabric to the 45km circuit and the other to 58km.
Is there no way to get the tunking to automatically dissolve when the difference in distance changes?
Is there no way to get the tunking to automatically dissolve when the difference in distance changes? Cheers,Rick.
Maybe I understand you wrong but the trunk does dissolve if the distance changes (for sure if a short offline/online action is taking place).
If trunking is required in your enironment you could expand to 4 ISL per fabric.
2 ISL will run the 45KM link, the other two the 58KM one.
This will result in;
-more bandwidth per link
-more resilient to link failover (may also depend on your telco's setup)
-a better fan in/out ratio
Might be (to) expensive to arrange, that said not being able to work or worse costs money as well.
usually, when a telco system does protection/failover switching , it is below 50ms -> you only see a short loss of sync and you loose a few bytes, R_RDY, etc.
The trunk deskews (and all other) settings will not be updated. buffer credits might be lost.
You have to tell the telco guys to configure (if possible) their gear to pass you a 'loss of light' towards your switchports.
Then, if you do not have the loss_TOV setting enabled, the ports will bounce and you have a stable config again.
Of course you should set the Buffercredits for the longer link.
Be aware, If these are the only links between your switches, this might lead to a fabric separation...
Hope this helps
The major issue is that irrespective of LOS_TOV setting there is most likley always a discrepancy in distance and the worst part of it it is that it is out of a customers control. This is simply unacceptible.
Even though when the los_tov setting is disabled the link will still bounce causing frame drop. Since the distance changes, the trunk will disolve in two separate ISL's only when it has determined a loss of light.
The best way to go forward is to remove the trunk option from these two links. This way there will be no discrepancy in deskew values simply because it is irrelevant and the los_tov setting can be enabled. So even when there is a sync timeout of nearly 100ms the link will still remain up and the only difference will then be a change in distance. If you take into account the average framesize and use the longest distance you can use the required buffers to still maintain optimum performance irrespective of which distance the telco provides.
Only when the sync loss is then beyond 100ms the link will drop causing a fabric re-routing update to be propagated through the fabric.
So one lesson to learn from this is to NEVER EVER use trunks on links you do not control unless there are hardcoded rules and regulations in the contract.