VMware vSphere

 View Only
  • 1.  Best Way to Handle a Network Interruption

    Posted Jun 29, 2023 04:19 PM

    Our network team is cutting over the data center to a new network. Core distribution and top of rack switches are being replaced. What is the best way to prepare our ESXi clusters for this outage? We have several requirements: 1. We do not want to power down the VMs. There are over 800 VMs that would take hours to shutdown and start up after the cutover completes. 2. We don't want the cluster to lose access to vSAN.

    I know it's not possible to put a host into maintenance mode while maintaining power to the VMs. Is it possible to put a DS into maintenance mode? Can the network on each host be paused or disconnected while keeping VMs powered up?

    I figured to ask these questions before testing some of the scenarios in a lab. Please let me know if you've had experience with a task similar to this. Thank you in advance. 



  • 2.  RE: Best Way to Handle a Network Interruption

    Posted Jun 29, 2023 09:00 PM

    Is it possible to put a DS into maintenance mode? This will just evacuate the DS.

    Can the network on each host be paused or disconnected while keeping VMs powered up? can disconnect , this will cause issue and outages

    I have been in this situation and because of this kind of activity finally end up with fixing lot of read-only VMs, recovering etc..

    This need to evaluate carefully

    what type of outage the network team expect ? do they bring a new switch and cut-over to that one

    What storage you have ? is it nfs ? VSAN is sensitive to network .. so get the complete picture and figure out what is happening during maintenance activity and what is the impact



  • 3.  RE: Best Way to Handle a Network Interruption

    Posted Jun 29, 2023 09:05 PM

    Hi,

    TLDR: A live vSAN environment with a total connectivity loss for an extended period of time will almost certainly result in problems. Putting Datastores in Maintenance mode will take it completely out of service and stop all I/O on it.

    vSAN handles connectivity loss according to this documentation: Network Connectivity Is Lost in the vSAN Cluster (vmware.com)

    You will have all Hosts isolated which will most likely result in I/O errors for the VMs. This will most likely lead to reboots of the VMs and could cause some data inconsistency on the OS level.

    The best way to resolve a situation like yours would be to replace the network infrastructure step-by-step in parts, so keeping parts of the connectivity alive. If you have over 800VMs which are of such importance, why is there no network redundancy present in your deployment?

     



  • 4.  RE: Best Way to Handle a Network Interruption

    Posted Jun 29, 2023 09:31 PM

    We do have network redundancy across the entire infrastructure. The way it was explained to me was that both fabrics of the network provide LACP bonds between them and that Cisco will not permit cutting over one side then another. LACP has to be configured on the same model of switches. They are bringing in all new equipment and some parts of the network is being virtualized. Therefore the network team is cutting over all at once after the hardware is in place.

    It looks like the safest bet would be to shut down the VMs, and power off the hosts. When everything comes back up the new switches will learn the MAC of each physical and virtual device which would make things a lot cleaner.