VMware vSphere

 View Only
  • 1.  ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing

    Posted Jul 23, 2024 09:43 AM

    Greetings All,

    I'm looking for some inspiration from the collective minds here. I have two datacenters, one running VMs on 6.7.0, 17700523 (the source DC) and the other running 8.0.2, 22380479 (the destination DC). Both have a HDS G370 storage systems behind them. Both DCs are run from the same vCenter 8.0.3, 24022515.

    Both systems share the same VLAN for vMotion on a dedicated vmkernel adapter. This VLAN is trunked between the DCs. vmkping from hosts in one DC get through to the other with no issues and visa-versa. Ping times are around 1.5ms. NC connect tests on port 8000 also work both ways with no issues over the vMotion kernel adapter. MTU is 1500 across the board so no jumbo packets.

    Problem is occurring when I try to vMotion a VM (compute and storage) from one DC to the other. The migration kicks off, stalls at 22% and then fails after about 5 minutes. The error seen in the VMs vmware.log file shown below.

    Watching the destination file system during the move I see files being copies over; hlog, vswp file, various logs and then they are all cleared out when the process fails.

    Any thoughts on what might be causing the issues here?

    Thanks,

    Gary

    2024-07-23T06:00:42.270Z| worker-2195212| I125: SVMotion: Enter Phase 8
    2024-07-23T06:00:42.271Z| worker-2195212| I125: Disk/File copy started for /vmfs/volumes/5747daae-8c118858-5659-0000878429f0/GM-Test04/GM-Test04.vmdk.
    2024-07-23T06:01:02.282Z| vmx| W115: SVMotion: scsi0:0: Disk transfer rate slow: 0 kB/s over the last 10.01 seconds, copied total 512 MB at 26201 kB/s.
    2024-07-23T06:01:02.354Z| vmx| W115: Mirror: scsi0:0: Failed to copy disk: Timeout
    2024-07-23T06:01:02.354Z| worker-2195212| W115: SVMotionMirroredModeThreadDiskCopy: Found internal error when woken up on diskCopySemaphore. Aborting storage vmotion.
    2024-07-23T06:01:02.354Z| worker-2195212| W115: SVMotionCopyThread: disk copy failed. Canceling Storage vMotion.
    2024-07-23T06:01:02.354Z| worker-2195212| I125: SVMotionCopyThread: Waiting for SVMotion Bitmap thread to complete before issuing a stun during migration failure cleanup.
    2024-07-23T06:01:02.355Z| worker-2195212| I125: SVMotion: FailureCleanup thread completes.
    2024-07-23T06:01:02.355Z| vmx| I125: SVMotion: Worker thread performing SVMotionCopyThreadDone exited.



  • 2.  RE: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing

    Broadcom Employee
    Posted Jul 23, 2024 07:07 PM

    Hello Gary,

    Always 22%?
    I would check both systems' vmkernel.log for disk timeouts.

    There could be an unrecoverable block on either the read side or the write side.
    Check for NIC timeouts as well.




  • 3.  RE: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing

    Posted Jul 23, 2024 08:52 PM

    Morning,

    It goes straight to 22% - see detailed timings:

    Migrate start @ 23:17:46

    22% immediate

    24% @ 23:26:36

    29% @ 23:26:45

    32% @ 23:26:51

    Failed @ 23:28:13

    Log files from the source and destination ESXi hosts and the VM attached.




  • 4.  RE: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing

    Broadcom Employee
    Posted Jul 24, 2024 01:27 PM

    Hello Gary,

    You can try this :

    https://knowledge.broadcom.com/external/article/344928/shared-nothing-vmotion-fails-for-large-v.html

    But I think you may have a bad disk in /vmfs/volumes/668f26b6-ef44f8d1-51b0-0025b5a2000c
    Try checking the disk health:
    https://knowledge.broadcom.com/external/article/313033/esxi-smart-health-monitoring-for-hard-dr.html




  • 5.  RE: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing

    Posted Jul 24, 2024 07:33 PM

    Hi Joseph,

    Thanks for the suggestions. The first one doesn't really apply. The VM being tested is small (less than 20G) and should migrate within 10 minutes or so. It's all happening over high speed links.

    The second one, the disks I'm using are provided via FC from a SAN so the SMART data is not available to the ESXi systems. The SAN is not reporting any errors with disks and should drop a dud disk out and activate a hot spare until the failed unit is replaced. From the ESX point of view, it should not be aware of any of this.

    Thanks for the suggestions. Our local vendor has put in a support request to VMware now so we'll see if that comes up with any bright ideas.

    Gary




  • 6.  RE: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing

    Posted Jul 29, 2024 12:34 AM

    Hi Gary,

    Check the hardware / VMware Tools versions on the source VMs. They might not be supported by vSphere v8.

    Regards, George.




  • 7.  RE: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing

    Posted Jul 29, 2024 06:14 PM

    Hi George,

    Do you have a reference to check this against? I haven't seen anything that mentions cross compatibility for vMotion actions.

    Thanks,

    Gary




  • 8.  RE: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing

    Posted Jul 30, 2024 02:19 AM

    Greetings all,

    Bit of an update for those following along at home...Cold migrations are working now!! Yaaaaay!!

    Turns out it was an issue in the network where a particular device seemed to be ignoring a /8 route to the source ESX management network. Inserting a more specific /24 route fixed that issue. Not sure why the /8 wasn't doing what it was supposed to do but there you go.

    Now just the hot migrations to sort out. They are failing much faster now with an error like:

    Failed waiting for data. Error 195887167. Connection closed by remote host, possibly due to timeout. vMotion migration [173670679:4961902957415192578] timed out while waiting for disk 0's queue count to drop below the maximum limit of 32768 blocks. This could indicate either network or storage problems preventing proper block transfer.

    Not overly helpful but at least I can move some VMs around, even if I have to shut them down.

    Thanks all for your suggestions,

    Gary