VMware vSphere

 View Only
  • 1.  Using NUMA effectively

    Posted Feb 24, 2015 03:20 PM

    Hello,

    I have a NUMA enabled Dual Processor DELL Server (each processor has 8 physical cores). I have created a VM on this with NUMA affinity set to Node 0. Once this is configured, I am running an application that is NUMA agnostic i.e it doesn't care about NUMA or is not NUMA aware. With this VM running only on a single NUMA node, I was expecting some improved performance compared to running without setting the NUMA affinity (The reason being that the memory allocation would have been on the local node). However, I do not observe any difference in performance in the two cases. Am I doing something wrong ?

    Would really appreciate any pointers on this.

    Thanks



  • 2.  RE: Using NUMA effectively

    Posted Feb 24, 2015 05:03 PM

    I was expecting some improved performance compared to running without setting the NUMA affinity (The reason being that the memory allocation would have been on the local node)

    ESXi by default already always tries to assign memory from the local NUMA node the VM is being scheduled on. There is no need to forcibly pin a VM on a certain NUMA node.

    You can verify this in (r)esxtop in the memory view (enable fields for NUMA stats with f->g). This will show you the VM's NUMA home node (NHN) as well as the amount of local and remote memory (NLMEM, NRMEM).

    Note that the NUMA placement used to be a bit wonky on earlier ESXi 5.x builds, where some VMs often ended up with a lot of remote memory even though there was enough free local memory. But this has been fixed in all recent 5.x releases.

    Also, even if memory is located on the remote node, I wouldn't expect really big performance differences for most applications. Sure, the difference will be notable with synthetic benchmarks like a memory throughput test, but synthetic benchmarks like these tell little about actual application performance.



  • 3.  RE: Using NUMA effectively

    Posted Feb 25, 2015 08:39 AM

    Thanks for the reply. How about the observations below :

    • If I turn off NUMA on the host (effectively, the Node Interleaving is turned ON), I get a better performance for my application.
    • The measurement of performance is the amount of work that I can do with a set of CPU/RAM resources.
    • With NUMA turned off, I see a gain of at least 25%.

    So I am a little confused as to what is the role of NUMA here ? Its only leading to performance degradation.



  • 4.  RE: Using NUMA effectively

    Posted Feb 25, 2015 11:59 AM

    That sounds indeed odd. The general recommendation has always been to leave Node Interleaving disabled, see:

    http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf

    http://frankdenneman.nl/2010/12/28/node-interleaving-enable-or-disable/

    Does your VM fit into one NUMA node? I.e. how many and what socktes/physical CPUs/Threads do you have, how many vCPUs are assigned to the VM, how much physical memory do you have per NUMA node and how much memory assigned to the VM?



  • 5.  RE: Using NUMA effectively

    Posted Feb 25, 2015 12:08 PM

    There are 2 NUMA Nodes on my machine - 8 Physical Cores per Node, each NUMA Node has 16 GB memory(Total RAM : 2 x 8GB on Node 0 and 2 x 8GB on Node 1), HT is disabled on the hardware

    The VM spec is as follows :

    1 virtual socket with 8 vCPUs and 12 GB RAM and am using Windows 2012 R2 Standard OS

    My Application has large memory footprint - it easily consumes 6 GB of memory.



  • 6.  RE: Using NUMA effectively

    Posted Feb 26, 2015 02:10 PM

    Hi there,

    this is an interesting issue. There are quite a few variables that could play into this - the VM draining the local memory for its own NUMA Node (let's say node0) that is also reserved for VMkernel's use (see https://vmxp.wordpress.com/2014/12/09/how-much-memory-should-be-free-for-vmkernel/ ). Or the vNUMA not doing it's job "so well" (it gets turned on automatically after you have 1socket/8cores configured). This would correlate to the "merged" memory performing better.

    Two things I'd like to see if you have the time for benchmarking:

    • set the VM to have 2 sockets / 4 cores each - this way the VM's workload gets split evenly between the two CPUs.
    • enable hyper-threading to help the hypervisor with CPU co-scheduling an try out with your (1 socket w/ 8cores/socket ) and forced (2sockets w/ 4cores/socket) scenario: HyperThreading: What is it and does it benefit ESXi? | VMXP


  • 7.  RE: Using NUMA effectively

    Posted Feb 26, 2015 03:10 PM

    Thanks for the reply. I'll try out the suggested tests. Though I have tried "8 sockets with 1 core/socket" - this gives me the same results as "1 socket with 8 cores/socket".



  • 8.  RE: Using NUMA effectively

    Posted Feb 26, 2015 04:26 PM

    vNUMA is only enabled if you configure a VM with more than 8, so at least 9 vCPUs. Just 8 does not enable it, so using 8sockets/1core or 1socket/8cores or 2sockets/4cores doesn't make a difference.

    This activation of vNUMA in a VM is guided by the numa.vcpu.min advanced parameter, which has 9 as it's default value, see:

    https://pubs.vmware.com/vsphere-51/index.jsp#com.vmware.vsphere.resmgmt.doc/GUID-3E956FB5-8ACB-42C3-B068-664989C3FF44.html#GUID-3E956FB5-8ACB-42C3-B068-664989C3FF44

    numa.vcpu.min  

    Minimum number of virtual CPUs in a virtual machine that are required in order to generate a virtual NUMA topology.

    9

    Try setting the parameter manually to 8 for the VM and run your test again on a NUMA-enabled physical host.

    Also you should generally not configure a core count and only use sockets unless it's needed for licensing or similar reasons. ESXi will select an appropriate vNUMA topology based on the host the VM is being started on. See this article for more details:
    http://blogs.vmware.com/vsphere/2013/10/does-corespersocket-affect-performance.html

    I agree with enabling HT on the servers as well.



  • 9.  RE: Using NUMA effectively

    Posted Feb 27, 2015 09:36 AM

    I am planning to do this but I am waiting for a new server for this - like a DELL M620 2 Socket system. Will share the results soon.

    I am too eager to solve this problem.. Thanks for the suggestions.



  • 10.  RE: Using NUMA effectively

    Posted Mar 12, 2015 11:23 AM

    I have tried all the suggested permutations and combinations and NUMA never worked for me. I always got a better performance when NUMA was off.