ESXi

 View Only
Expand all | Collapse all

VM performance torublshooting

  • 1.  VM performance torublshooting

    Posted Jul 15, 2014 05:07 PM

    Hi All,

    I'm interested in the cpu %RDY figures I'm getting from a VM. It's sitting between 15% and 25%, mostly towards the lower end. I'm finding it hard to calculate whether that is good or bad. It's an 8 core machine so am I right in saying that's 3% or less on each vcpu. I've read that around 2.5 is nothing to worry about but to contradict that I have a chart saying it should be under 10%!

    People are asking me to throw more cores at it but I really want to find a better way of increasing the performance. It's running video encoding so it's CPU intensive. Windows is reporting 75% cpu and vshpere performance is showing around 50% cpu usage. I think we're likely just to make the problem worse by giving it more cores.

    The host it's on is only just over a 1:2 pCPU to vCPU ratio, we have a fairly high amount of large vm's with regards to cpu and memory.

    Options are....

    Increase cpu cores

    Reduce cpu cores

    Use resource shares?!?

    Distribute & group larger smaller vm's separately

    Thanks



  • 2.  RE: VM performance torublshooting

    Posted Jul 15, 2014 06:25 PM

    That's a lot of ready time your processors are waiting for instruction. I've always started low on processor count unless I'm proven absolutely wrong. Sometimes less is more when it comes to procs.

    Reduce your CPU count and give it high resource shares and see if that helps. You are right in thinking that upping the cores will not help performance. I can understand how video encoding would be intensive but that ready count throws red flags.



  • 3.  RE: VM performance torublshooting

    Posted Jul 15, 2014 11:51 PM

    You are right, %RDY is cummilative of all vCPU assigned, 3% per vCPU.  While in esxtop press e and type in the groupid of the vm, this will expand the vm world ids and you will be able to see exactly the %rdy time for each vCPU.  video encoding is always resource intensive, your best option is to scale out.  Before you make any further changes, I would suggest to do proper resource analysis for this vm during high usage period, monitor both hypervisor level and inside the guest vm, try to stay within numa node.



  • 4.  RE: VM performance torublshooting

    Posted Jul 16, 2014 12:48 PM

    Scaling out isn't really an option as we need this one machine to deal with all the feeds. I definitely think we have an issue with oversized vm's. Problem is usually throwing some cores at it fixes the blipping but I think it's just masking the performance issue really. I'm sure we could achieve the same result with less cores. If I view the groupid of the VM I can see %rdy not equally spread across all the cores.

    I've posted two esxtop readings, one for the group of the problem vm and one for all of the vm's. This host has 4 x 8 to 10 vCPU vm's, no surprise that 3 of them have the highest %rdy. The one showing 40%RDY usually sits around 25-30.

    I'm going to migrate the problem vm to another host with a lower cpu usage first and monitor the %rdy, then I'll investigate reducing the cores and maybe using shares?



  • 5.  RE: VM performance torublshooting

    Posted Jul 16, 2014 12:56 PM


  • 6.  RE: VM performance torublshooting

    Posted Jul 16, 2014 01:17 PM

    Have a look at this article. Very well written. I use this setting in my Lync environment and has worked out well for me. http://www.datacenterdan.com/blog/vsphere-55-bpperformance09-latency-sensitive-apps -  I've never had to go above 4 cores per vm and there is about 14 of them just for Lync. 500+ users..must be on 5.5 however.



  • 7.  RE: VM performance torublshooting

    Posted Jul 16, 2014 01:45 PM

    Looking at your esxtop output, your ESXi host is over committed, you are maxed out.  esxi hosts is well over 100% cpu utilisation, 115% of host cpu based on esxtop for the last 15min. 

    You don't have enough CPU resource available on this particular host to drive the workload, possibly move to another will help.  Also, I would check CPU utilisation across the cluster.  If your workload is cpu intensive, then be very careful with over commitment ratios.  ideally 1:1 for best performance or 2:1 for reasonable performance.



  • 8.  RE: VM performance torublshooting

    Posted Jul 16, 2014 01:54 PM

    I didn't even look at those logs.. yeah, your hosts are pegged.



  • 9.  RE: VM performance torublshooting

    Posted Jul 16, 2014 02:30 PM

    I'm new to esxtop, where are you seeing the 115% of host cpu? That's not what I'm seeing in vcenter performance, none of the hosts are that high.



  • 10.  RE: VM performance torublshooting
    Best Answer

    Posted Jul 16, 2014 01:47 PM

    As mentioned before ready time is the sum for all CPU's.

    So if you have 40% ready time on a 10 vCPU VM, this means that each vCPU had to wait around 4% of the time of the interval to be scheduled on the processor. The 5% and 10% rules apply as PER vCPU and not the total amount. So it's important to not misinterpret the values you are looking to.

    Interval is also important

    An excellent post I use as reference a lot of time can be found here

    http://vmtoday.com/2013/01/cpu-ready-revisted-quick-reference-charts/

    What kind of hardware do you have ? Intel-based with HyperThreading or AMD?

    I did some testing with VM's using 100% CPU on HT Intel machines, what I noticed, as soon as I passed the physical 16 core boundary (ESX host had 16 pCore / 32 logCore with HT) the ready times went up and became CPU bottlenecks the more VM's I put on it. This only as applicable if you all want to use the CPU cores at the same time :smileyhappy:. In our normal environment we have a ratio sometimes to 1:4 without high ready times because most machines don't use so much CPU on daily bases.

    Try to map the logical cores/physical cores with the amount of vCPU's.

    Do you use any affinity rules ? If set wrong this can negatively or positively impact your ready times.



  • 11.  RE: VM performance torublshooting

    Posted Jul 16, 2014 02:36 PM

    We use Intel HT, 2 x 6 core, 24 logical.

    What I've seen is even 2.5 %rdy is not good for machines that need low latency.  Is it worth keeping larger machines machines together on a certain host and separating them from the smaller 1 and 2 core machines. Or is is the pcpu and vpcu ratio all that matters?



  • 12.  RE: VM performance torublshooting

    Posted Jul 16, 2014 02:45 PM

    It's about scheduling; you only have 2x6 core which can execute CPU instructions on. So if it really is low-latency and needs the CPU performance

    asvfk mentioned you would have the best performance keeping the ratio 1:1 (physical cores).

    A good explanation can also be found here :

    Hyper-Threading Gotcha with Virtual Machine vCPU Sizing | Wahl Network



  • 13.  RE: VM performance torublshooting

    Posted Jul 16, 2014 04:00 PM

    Could you shed some light on why on the performance tab in vCenter for that host showed at max 75% avg usage but on the esxtop figure it was 1.15?



  • 14.  RE: VM performance torublshooting

    Posted Jul 16, 2014 05:32 PM

    I keep this pinned above my desk. Invaluable overview of esxtop  :  http://www.running-system.com/wp-content/uploads/2012/08/esxtop_english_v11.pdf

    Regarding vCenter perf metrics vs.ESXTOP output - Frank Pedersen has some good stuff here : http://www.vfrank.org/2011/01/31/cpu-ready-1000-ms-equals-5/

    There is also a fling called visualesxtop that I use. Find it here: https://labs.vmware.com/flings/visualesxtop

    Good Luck! Let us know how it goes for you!