Hi,
We have recently had an issue with our SAN which has highlighted another issue with vCenter.
The background to this is that some sort of event occurs once every two weeks on our SAN which causes dead paths to appear in vCenter. Exactly what this event is is not known at this time and is probably not relevant to this forum, however in brief, it appears that the LUNs are not presented for a split second once every 2 weeks, and that this causes the dead paths in vCenter.
What this has high lighted however is that the dead paths in Vcenter remain dead - i.e. they do not reconnect at any point. Rescanning the HBAs does not reconnect them either.
To give a little more detail, we have 2 vmHBAs which connect to 2 fibre channel switches (vmHBA1 to switch 1 and vmHBA2 to switch2).
The two switches then each have a connection to two storage processors (switch 1 to SP1, switch 1 to SP2, switch 2 to SP1 and switch 2 to SP2).
This means that each host has 4 paths to each LUN.
Once every 2 weeks after the SAN event, 8 paths show as dead in Vcenter. If I right click on one of the vmHBAs and click 'Rescan' the dead paths are removed from the list of paths but are not reconnected (ie there are 8 fewer paths in the list).
So far the only method I have found to reconnect the paths is to reboot each host one at a time. Obviously the main thing here is that whatever is happening on the SAN every two weeks should be addressed however this has highlighted that when a dead path appears, it does not reconnect.
Is anyone aware of either a means to reconnect dead paths without having to reboot every hosts, or preferably a setting in vCenter that will reconnect the dead paths automatically? My concern is that should a path go dead for whatever reason in the middle of the night say, I would rather that it reconnect automatically than remain in a 'dead' state in which our infrastructure overall is left with less redundancy until someone A) notices the dead path and B) reboots every host.
If anyone else has any experience of this, any information would be much appreciated.
vCenter Server version 4.0.0 Build 208111
ESX 4.0
Update on 28/11/2011 - One thing that I don't think I was clear about on my original post, when we get these dead paths, all of the hosts can still access all LUNs, because we have redundancy due to multipathing. The concern is simply that when a path goes dead we have then lost redundancy and are operating in an at-risk state.