Since upgrading our two main hosts to v7 update 3c, I'm getting VM freezes on a couple of VMs. One is happening every few hours.
vSphere reports "The CPU has been disabled by the guest operating system. Power off or reset the virtual" at the time of the lockup. VM needs reset to continue.
One VM on shared storage has had this happen once since the upgrade. Other one on local SSD is repeatedly suffering. I've move the repeat offender to shared storage to see if it happens again.
Both VMs are hardware 13. I've tried upgrading the repeated one to 15 and made no difference.
All my VMs are using NVME storage mode. Only these two have shown this issue, and never before u3c upgrade.
Both VMs had this error in syslog about 15 mins before lockups:
Feb 23 00:46:07 mon kernel: [36834.916147] nvme nvme0: I/O 213 QID 1 timeout, aborting
Feb 23 00:46:07 mon kernel: [36834.916287] nvme nvme0: Abort status: 0x0
Feb 23 00:46:37 mon kernel: [36865.123128] nvme nvme0: I/O 213 QID 1 timeout, reset controller
Feb 23 00:46:37 mon kernel: [36865.165409] nvme nvme0: 15/0/0 default/read/poll queues
Feb 23 02:34:22 mon kernel: [43329.673158] nvme nvme0: I/O 100 QID 4 timeout, aborting
Feb 23 02:34:22 mon kernel: [43329.673393] nvme nvme0: Abort status: 0x0
Feb 23 02:34:52 mon kernel: [43359.880150] nvme nvme0: I/O 100 QID 4 timeout, reset controller
Feb 23 02:34:52 mon kernel: [43359.926041] nvme nvme0: 15/0/0 default/read/poll queues
Versions:
- Hypervisor:VMware ESXi, 7.0.3, 19193900
- Model:PowerEdge R640
- Processor Type:Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz
- Ubuntu 20.04 LTS
- Linux Kernel 5.4.0-100-generic x86_64
Many thanks in advance!