Slow VDI – often caused by the Overcommitment of vCPUs on the ESX hosts that contain the VMs performing the VDI function.
Metrics to look at in VMWARE
When CPU_READY remains above 10% per vCPU in a VM, it has a major impact on the performance of the VM.
The VMWARE probe can alert on the number of VMs with "High_CPU_READY".
This is an ESX HOST metric so it doesn't require metrics from each VM.
To know which VM has problems collect CPU_READY from each VM and alert if > 10% .
If development at Broadcom will adjust the VMware probe to use the "Chunk-Size" fix which was written in 2017 it could have a massive performance improvement on pulling data from VCenter.
Principal Services Consultant | Enterprise Studio
HCL Technologies Ltd.
404-617-3023 | email@example.com | Lookout Mountain, GA
www.hcltech.com | www.ca.com/services
There is an old product called CA Capacity Management that can collect via VMWARE API 2500+ vdi metrics from vCenter once per night.
It pulls in the metrics from Vcenter history tables.
There are tables with metrics stored at different intervals and the data goes back a fixed period.
5 minute data for 24 hours
30 minute data for 7 days
2 hour data for a month?
1 day data for a year?
Because the values are averages computed by Vcenter the metrics are good.
Capacity Planning is not Real-Time monitoring like UIM… so the newest metrics are for yesterday up to midnight.
Jobs run after midnight to pull in data from the prior 24 hour period.
Normally, we pull in the 30 minute data to get an idea of where the issues are and then pull in 5 minute data to drill down on specific esx hosts.
The bad news is that this product uses an Oracle database, and that makes it an expensive tool.
Below is what a Top CPU Ready by Cluster report looks like: (Note that my example is only overcommitted vCPU at 2.5x)
- Many customers overcommit VDI's at 4 or 5x, and this overcommit rate causes slow VDI response times.
Below is a link to the documentation,