vSAN1

 View Only
  • 1.  vMotion always fails 72%, Message: Failed to receive migration

    Posted Oct 24, 2021 08:52 AM

    Cluster:

    vcsa 6.7u2, esxi 6.7u2, 4 nodes, 

    All Flush VSAN,signal diskgroup for  each host

    today, we added additional one flash disk to diskgroup to extend capacity,. after erveral hours rebalance. looks goold, vSan Healthy all good.

    then  we upgrade vcsa from 6.7u2 -> 6.7u3, successed.

    then, we upgrade esxi from 6.7u2 -> 6.7u3, as we didn't have DRS licences, we did manually to upgrade

    1. host1,  we manually move all vms to another host, and put the hose in maintainance "ensure data accessibilty".  patch by UM.

       patching successed. host1 back online and exit maintainance, rebalanace .

    2. then , same as host1. - successed.

    3. host3, when we move vms to other hosts it shows as the picture. , alway failure at 72%. 

     

    is there any clues for this stuation. thanks

     

    vmotion.jpg



  • 2.  RE: vMotion always fails 72%, Message: Failed to receive migration

    Posted Oct 24, 2021 12:04 PM

    Please check whether the VM's vmware.log contains further details about the error.

    André



  • 3.  RE: vMotion always fails 72%, Message: Failed to receive migration

    Posted Oct 24, 2021 02:03 PM

    appreciate for your reply.

     

    attached is that vm's vmware.log,  I don't know how to read that. trying to understand that log.

    Attachment(s)

    log
    vmware-3.log   306 KB 1 version
    log
    vmware-5.log   129 KB 1 version
    log
    vmware-4.log   129 KB 1 version
    log
    vmware-6.log   129 KB 1 version
    log
    vmware.log   129 KB 1 version
    log
    vmware-2.log   273 KB 1 version
    log
    vmware-7.log   129 KB 1 version


  • 4.  RE: vMotion always fails 72%, Message: Failed to receive migration

    Posted Oct 25, 2021 11:01 AM

    Hi,

    seems to be caused by CPU feature mismatch between hosts.

    2021-10-23T15:00:25.602Z| vmx| I125: [msg.checkpoint.migration.failedReceive] Failed to receive migration.
    2021-10-23T15:00:25.602Z| vmx| I125: [msg.vpmc.unavailcountersA performance counter used by the guest is not available on the host CPU.
    2021-10-23T15:00:25.602Z| vmx| I125: Msg_Post: Error
    2021-10-23T15:00:25.602Z| vmx| I125: [msg.vpmc.unavailcounters] A performance counter used by the guest is not available on the host CPU.
    2021-10-23T15:00:25.602Z| vmx| I125: [msg.checkpoint.migration.failedReceive] Failed to receive migration.
     

     



  • 5.  RE: vMotion always fails 72%, Message: Failed to receive migration

    Posted Oct 25, 2021 11:15 AM

    Yes, I observed that, but sitll have no idea where the problem is.



  • 6.  RE: vMotion always fails 72%, Message: Failed to receive migration

    Posted Oct 25, 2021 11:21 AM

    looks like this KB https://kb.vmware.com/s/article/81191, just not sure that.

     



  • 7.  RE: vMotion always fails 72%, Message: Failed to receive migration

    Posted Oct 25, 2021 03:03 PM

    Is your cluster using identical CPUs on each node?

    If so, are some of your nodes on a different vSphere Version than others?

    Your VM does use vpmc.enable=true, so the KB might describe the reason for your problem.



  • 8.  RE: vMotion always fails 72%, Message: Failed to receive migration

    Posted Oct 25, 2021 04:22 PM

    host1,2,3

    name: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz    codename: Skylake EP/EN/EX

    host4.

    name: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz   codename: Cascade Lake

     

    host1, host2   6.7.0 build-17700523

    host3, host4  6.7.0 build-16075168

     

    KB meantioned CPUS below, is that exactly impacting  cpus what my cluster using ?  I don't know how to macth  with them , 

    [1] Intel® Xeon® Processor E3 v5 and v6 Family (codename Skylake, Kaby Lake)
        Intel® Xeon® D (code name Skylake-D)
        Intel® Xeon® Scalable Processor and 6th, 7th, and 8th Generation Intel® Core™ i7 and i5 (code name Skylake, Kaby Lake, Coffee    Lake and Whiskey Lake)

     

    Thanks



  • 9.  RE: vMotion always fails 72%, Message: Failed to receive migration

    Posted Oct 26, 2021 08:01 AM

    Hi,

    when I got you right the current situation looks like this.

    • Host 1 & 2
      • Skylake CPUs, updated to ESXi 6.7 P05 (2021/03/18)
    • Host 3
      • Skylake CPUs, still on to ESXi 6.7 P02 (2020/04/28)
    • Host 4
      • Cascade Lake CPUs, still on to ESXi 6.7 P02 (2020/04/28

    I might be wrong, but I assume you don't have EVC enabled on your cluster.

    If I'm right you'll definitively be affected by the KB article you mentioned, as the VMs running on Host 3 can't migrated to any other host.

    Host 1 & 2 are now on ESXi 6.7 P05 that would prevent migration, Host 4 is equipped with Cascade Lake CPUs which already includes the microcode update.

    So you should stop those VMs and run a cold migration.