I have had the best performance matching the numa architecture of the guest to the numa architecture of the host, or where that isn't possible, one core per socket (so an 8 core VM would have 8 sockets, even though this is an unusual configuration to find in actual hardware).
If you are able, I strongly recommend watching Frank Denneman's excellent presentation "60 Minutes of NUMA" from VMworld 2020 (and I believe 2019 as well) it's deep, dense, and technical, but he lays it all out for you.