Bit of an update for those following along at home...Cold migrations are working now!! Yaaaaay!!
Turns out it was an issue in the network where a particular device seemed to be ignoring a /8 route to the source ESX management network. Inserting a more specific /24 route fixed that issue. Not sure why the /8 wasn't doing what it was supposed to do but there you go.
Now just the hot migrations to sort out. They are failing much faster now with an error like:
Not overly helpful but at least I can move some VMs around, even if I have to shut them down.
Original Message:
Sent: Jul 29, 2024 06:14 PM
From: Gary MacMinn
Subject: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing
Hi George,
Do you have a reference to check this against? I haven't seen anything that mentions cross compatibility for vMotion actions.
Thanks,
Gary
Original Message:
Sent: Jul 29, 2024 12:34 AM
From: gparker
Subject: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing
Hi Gary,
Check the hardware / VMware Tools versions on the source VMs. They might not be supported by vSphere v8.
Regards, George.
Original Message:
Sent: Jul 24, 2024 07:32 PM
From: Gary MacMinn
Subject: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing
Hi Joseph,
Thanks for the suggestions. The first one doesn't really apply. The VM being tested is small (less than 20G) and should migrate within 10 minutes or so. It's all happening over high speed links.
The second one, the disks I'm using are provided via FC from a SAN so the SMART data is not available to the ESXi systems. The SAN is not reporting any errors with disks and should drop a dud disk out and activate a hot spare until the failed unit is replaced. From the ESX point of view, it should not be aware of any of this.
Thanks for the suggestions. Our local vendor has put in a support request to VMware now so we'll see if that comes up with any bright ideas.
Gary
Original Message:
Sent: Jul 24, 2024 01:26 PM
From: Joseph
Subject: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing
Hello Gary,
You can try this :
https://knowledge.broadcom.com/external/article/344928/shared-nothing-vmotion-fails-for-large-v.html
But I think you may have a bad disk in /vmfs/volumes/668f26b6-ef44f8d1-51b0-0025b5a2000c
Try checking the disk health:
https://knowledge.broadcom.com/external/article/313033/esxi-smart-health-monitoring-for-hard-dr.html
Original Message:
Sent: Jul 23, 2024 08:52 PM
From: Gary MacMinn
Subject: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing
Morning,
It goes straight to 22% - see detailed timings:
Migrate start @ 23:17:46
22% immediate
24% @ 23:26:36
29% @ 23:26:45
32% @ 23:26:51
Failed @ 23:28:13
Log files from the source and destination ESXi hosts and the VM attached.
Original Message:
Sent: Jul 23, 2024 04:36 PM
From: Joseph Infelise
Subject: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing
Hello Gary,
Always 22%?
I would check both systems' vmkernel.log for disk timeouts.
There could be an unrecoverable block on either the read side or the write side.
Check for NIC timeouts as well.
Original Message:
Sent: Jul 23, 2024 03:19 AM
From: Gary MacMinn
Subject: ESXi 6.7 to ESXi 8.0.2 Compute & Storage vMotion Failing
Greetings All,
I'm looking for some inspiration from the collective minds here. I have two datacenters, one running VMs on 6.7.0, 17700523 (the source DC) and the other running 8.0.2, 22380479 (the destination DC). Both have a HDS G370 storage systems behind them. Both DCs are run from the same vCenter 8.0.3, 24022515.
Both systems share the same VLAN for vMotion on a dedicated vmkernel adapter. This VLAN is trunked between the DCs. vmkping from hosts in one DC get through to the other with no issues and visa-versa. Ping times are around 1.5ms. NC connect tests on port 8000 also work both ways with no issues over the vMotion kernel adapter. MTU is 1500 across the board so no jumbo packets.
Problem is occurring when I try to vMotion a VM (compute and storage) from one DC to the other. The migration kicks off, stalls at 22% and then fails after about 5 minutes. The error seen in the VMs vmware.log file shown below.
Watching the destination file system during the move I see files being copies over; hlog, vswp file, various logs and then they are all cleared out when the process fails.
Any thoughts on what might be causing the issues here?
Thanks,
Gary
2024-07-23T06:00:42.270Z| worker-2195212| I125: SVMotion: Enter Phase 8
2024-07-23T06:00:42.271Z| worker-2195212| I125: Disk/File copy started for /vmfs/volumes/5747daae-8c118858-5659-0000878429f0/GM-Test04/GM-Test04.vmdk.
2024-07-23T06:01:02.282Z| vmx| W115: SVMotion: scsi0:0: Disk transfer rate slow: 0 kB/s over the last 10.01 seconds, copied total 512 MB at 26201 kB/s.
2024-07-23T06:01:02.354Z| vmx| W115: Mirror: scsi0:0: Failed to copy disk: Timeout
2024-07-23T06:01:02.354Z| worker-2195212| W115: SVMotionMirroredModeThreadDiskCopy: Found internal error when woken up on diskCopySemaphore. Aborting storage vmotion.
2024-07-23T06:01:02.354Z| worker-2195212| W115: SVMotionCopyThread: disk copy failed. Canceling Storage vMotion.
2024-07-23T06:01:02.354Z| worker-2195212| I125: SVMotionCopyThread: Waiting for SVMotion Bitmap thread to complete before issuing a stun during migration failure cleanup.
2024-07-23T06:01:02.355Z| worker-2195212| I125: SVMotion: FailureCleanup thread completes.
2024-07-23T06:01:02.355Z| vmx| I125: SVMotion: Worker thread performing SVMotionCopyThreadDone exited.