So you're saying the only indication of throttling you have is that the guest numbers don't match the host numbers? Well, they won't, rarely ever. The reason is there's different methods measured in order to come up with the measurements.
For instance, most OS's (Linux and Windows) use a method called a Watchdog Timer in order to determine the CPU usage. It does this by starting a low priority threat on the CPU(s) and waits to see how long it takes for that thread to complete. That calculation is the measurement for usage of the CPU. The logic is that the low priority thread will not complete until all other threads are completed, so in theory, this is a viable ball park measurement. This is what happens regardless of the hardware, which is why it works because it is hardware agnostic.
So, you have a VM that is reporting high CPU usage because it's timer threads are taking a while to return. Taking into account the vCPU's are scheduled across a finite number of pCPU's, and that other VM workloads can impact this, you often will see a VM guest reporting higher CPU usage than what the host itself reports for the same VM. In this case, the host is actually correct because it knows about the scheduling and can take that into account, as well as all the other VMs running on it. The VM guest has no knowledge of this so therefore is blind and thinks that things are more utilized than they are.
Now, there's a whole slew of other factors in this, %RDY, IOWAIT, CO-STOP, etc. One thing to keep in mind is that for the most part, the vCPU of the VM will be used to process data that normally would be handled by a storage controller and NIC, which is the IOWAIT measurement. If this is high, then the VM is waiting for the vCPU to process IO from either the storage or network stack, which causes the watchdog threads to take much longer to complete, therefore the VM thinks it's CPUs are heavily utilized, when in fact that is not the case, you have an IO bottleneck somewhere. Often, if the storage doesn't have high latency, this will be something in the network stack, like a long running SQL query or a large single-threaded data transfer.
Now one may think to just throw more vCPU at the problem, but that only makes the situation worse for not only the VM in question, but all VMs on the host. This is why the term "Right Sizing" is so heavily stressed and used, you have to properly size the VM's resources to the actually workload and observe. Often, VMs will be given resources just because, or "Because the vendor says so", then wonders why they have this situation occur.
Also, hyperthreading is not your friend because despite popular belief, it's not a full added thread; in reality it's only a 50% increase in performance. So, having 8 core and 16 threads does not equal having 16 cores. Sometimes you can get lucky, but most times, you will see your VM report high CPU utilization that is not actually true.
You need to look at the VM counters on the host for %RDY, CO-STOP (CSTOP), and IOWAIT. This is a good start to determine what is going on which your VM. Also, do not override vNUMA by changing the default core per socket from 1. Leave that alone unless you got some software that still thinks it's a great idea to license this way. You don't actually gain any real benefit from messing with this setting and can cause more harm than good. Also, disable Hot-Add for both the memory and CPU, another performance killer.
Lastly, right size the VMs. Do you actually need 8 vCPU's assigned to the VM? Most folks think that if the CPU utilization of a VM is > 50% then more CPUs need to be added. That's absolutely wrong, in a VM, if you run between 70% and 80% normally, then you are right sized for sure. Measure this by taking 1-minute samples over 90 days and then only using the 95th percentile, you don't care about spikes, only plateaus. I ran a very large infrastructure on that basic principle and not only did things perform better with less vCPU's, several million $$$'s in equipment was avoided. It works.
Also, lastly, not every VM is made the same, even if the same software is installed on each one. Smal variations in workload, how the workload is used, and how the IO stack is used will cause massive variations in functionality of each. You must treat each VM as it's own container for fine-grained tuning; t-shirt sizes are a great starting point but not the end of the conversation.
Look up VM right sizing on this forum, you will find a lot of good discussions, even possibly some of my past discussions, that will explain way in depth more than I have.