VMware vSphere

 View Only
  • 1.  Estate Wide random CPU Ready Spikes

    Posted Oct 09, 2024 12:22 PM

    Hi All,

    Need to throw this one open a little as we are running out of options \ ideas and will shortly be opening a ticket.

    We are randomly seeing CPU ready spikes of 400ms when the estate is otherwise totally inactive (In this case its a carbon copy of prod where the issue is also seen.)

    Running ESXi 7.0 u3 (Dell Version)

    Affecting Windows and Linux builds.

    Spike will occur at a random interval on a random VM

    No VMs have a ridiculous quantity of CPUs assigned.

    Each VM has a Reservation and Limit equal to to the Mhz of vCPUs assigned. (Customer requirement)

    Hosts are not overprovisioned [By the total MHz available provided by the pCPU vs that assigned to vCPU] (Customer requirement)

    A total of 20% of the pCPU overall MHz has been reserved to Host overhead. 

    VM Tools and VM Hardware are out of date planning uplift to evaluate impact.

    ESXi Host CPU \ Memory Utilisation is on tickover only.

    Anyone any thoughts..?

    P



  • 2.  RE: Estate Wide random CPU Ready Spikes

    Posted Oct 09, 2024 06:08 PM

    check 4 things:

    1. IPMI/SMI Interrupts: Similar to the IPMI-related issues found in the ESXi knowledge base, it's possible that IPMI interrupts are causing unexpected CPU activity. To mitigate this, you can decrease the polling rate for the IPMI module or adjust IPMI driver settings as suggested in references, such as by configuring specific module settings.

    2. Hardware-specific Issues: If AMD EPYC CPUs are in use, certain workload characteristics could be causing these spikes due to architectural behavior, as described in references. The solution might involve disabling specific CPU accounting features or applying updates that address how processor cycles are managed.

    3. VM Configuration: Updating the VM tools and hardware version could help alleviate some of the performance irregularities, as outdated system components can sometimes lead to inefficiencies or bugs.

    4. Advanced Kernel Settings: Adjusting advanced kernel settings such as NHCC in systems using AMD CPUs could stabilize frequency-related accounting issues.

    References:



  • 3.  RE: Estate Wide random CPU Ready Spikes

    Posted Oct 11, 2024 12:40 PM

    Hi there.

    Unfortunately no AMD.

    Do you have a link to the IPMI \ SMI interrupts you mention i could not locate an article.

    Where do the IPMI \ SMI driver mods referenced occur as it looks like this can be within the guest OS or also within ESXi as you seem to indicate.  

    Kr

    P




  • 4.  RE: Estate Wide random CPU Ready Spikes

    Posted Oct 11, 2024 12:51 PM

    I don't have it handy Paul, but I got the references from DeepQuery and so can you