We are in the midst of attempting to replace aging switches that connect our SAN NFS environment to our ESXI 8 host systems. We have made several attempts to no avail with the most recent attempt getting very close, but we ultimately had to back it out.
Here is a hi level of our process;
- Shutdown VM's
- Disable HA in VCenter
- Edit Port Switch Networking to add the VMNIC's and remove the old VMNIC's
- Remove old SAN DAC cables to old switch, connect NEW DAC switches to SAN units, note: during this period, NFS volumes are present in the ESXI hosts, but no volume data or size info appear.
- Test basic connectivity to and from hosts & SAN and all test pass as expected.
- Observe NFS volumes and they initially appear to be present and intact.
At this point, we are under the assumption that we are able to start bringing VM's back online. When we attempt to bring a single VM online, it hangs and is not able to power on. An additional observation of NFS Volumes results in some being available, but some are showing "0" size and we are unable to perform a Browse files from the individual host ESXI management interface. We end up having to back the cutover out, and move back to the original config. Systems restore fully within 30 seconds of reconnecting.
We don't place the hosts in Maintenance mode and/or shut them down at all during this process. We also do not place the NFS VOLS in offline mode. Wondering if this is a step that we need to consider as it may force the connections to come up clean. We are also fighting a config on the NEW switches as they are a pair of Dell S5248 units configured with VLT between them. We are considering ripping the VLT out and treating them as two independent switches.
Thoughts or ideas of this process?