Random CPU Ready spikes across an entire estate usually indicate a systemic resource-contention issue rather than a host-specific fault. In VMware or similar virtualized environments, this often happens when multiple VMs compete for the same physical CPU cycles, especially if vCPU allocations are oversized compared to the underlying pCPU capacity. It can also be triggered by DRS migration delays, noisy-neighbor workloads, or misconfigured CPU reservations/limits that cause scheduling stalls during peak operations.
If the spikes occur consistently across clusters, it's worth checking whether recent changes were made to HA/DRS settings, host power management profiles, or BIOS configurations such as C-states and hyper-threading. Reviewing firmware consistency and ensuring that all hosts match in microcode and CPU generation can also eliminate unpredictable scheduling behavior. Pulling esxtop or performance logs during spike windows will help confirm whether the ready time correlates to queue depth, storage latency, or VM bursts. Addressing these root causes typically stabilizes CPU scheduling and reduces estate-wide performance anomalies.
-------------------------------------------
Original Message:
Sent: Oct 09, 2024 05:20 AM
From: Paulk99
Subject: Estate Wide random CPU Ready Spikes
Hi All,
Need to throw this one open a little as we are running out of options \ ideas and will shortly be opening a ticket.
We are randomly seeing CPU ready spikes of 400ms when the estate is otherwise totally inactive (In this case its a carbon copy of prod where the issue is also seen.)
Running ESXi 7.0 u3 (Dell Version)
Affecting Windows and Linux builds.
Spike will occur at a random interval on a random VM
No VMs have a ridiculous quantity of CPUs assigned.
Each VM has a Reservation and Limit equal to to the Mhz of vCPUs assigned. (Customer requirement)
Hosts are not overprovisioned [By the total MHz available provided by the pCPU vs that assigned to vCPU] (Customer requirement)
A total of 20% of the pCPU overall MHz has been reserved to Host overhead.
VM Tools and VM Hardware are out of date planning uplift to evaluate impact.
ESXi Host CPU \ Memory Utilisation is on tickover only.
Anyone any thoughts..?
P