1 -for example if i have a HP server DL380 with 2 physical cpu totaly 32 core can i say we have 2 nUMA ?
You have 2 NUMA nodes. Node 0, which consists of CPU0 and its associated memory banks and Node 1, which consists of CPU1 and its own memory banks.
2- i understand local memory bank is best performance and remote bank will be active when local memory bank is full is this correct ?
ESXi will try to use local memory whenever possible to back memory pages, but it also depends on the way the OS/application accesses memory. In most general application cases however, remote memory access is not a noticeable performance degradation though.
3 - with HP DL380 with 2 physical cpu if i want create VM best performance for create is create vm with 2 cpu socket and 2 core per socket(4vCPU) or create vm with 4socket and 1 core per socket (4vCPU)
First of all, this example is irrelevant to vNUMA because you have 8 or fewer vCPUs (unless you edit the numa.vcpu.min parameter, see: Advanced Virtual NUMA Attributes).
The VM will see 2 sockets with 2 cores each or 4 sockets with one core, but not a in any NUMA topology. It will just see a single flat pool of memory for all sockets because the virtual hardware is not being presented with NUMA-capabilities (think: installing a OS on an old Pentium 4 computer).
Secondly, the socket/core configuration was never intended to configure NUMA sizes, but only for licensing purposes. For vNUMA you should configure sockets only, this lets ESXi select an appropriate autosize NUMA topology to present to the Guest (again, only if you have 9 or more vCPUs). Check this article for details:
Does corespersocket Affect Performance? - VMware vSphere Blog - VMware Blogs