One could probably write a book on the answer to this.
But let's start with the simple question - what's the definition of CPU usage %?
And now, how do you know that the Microsoft engineer who developed the performance counter defined it the exact same way?
And how do you know that the VMware engineer agrees with either you or the MS engineer?
And what exactly is "CPU usage"? does it include steal? How about accounting for cycles that can't happen because of conflicts resulting from contention in hyperthreading? How about IO wait?
That aside, assuming that you all agree on what the "CPU usage %" means, it's an average right? In the instant, a CPU is always running at 100% of its clock speed right? And with the concept of Idle time, that divides this 100% running into two groups - Idle and not-Idle. But in order to measure the split between two things, you have to do it across a period of time. So who defines that period? So maybe the MS eng and VM eng disagree on the length of the period across which to sample usage? And generally it's a rolling average but who enforced that everyone who measures CPU has to start at the same exact time? Or use the same number of sample points?
It's gets real messy.
So take a step back and ask yourself if the difference matters?
When I looked into this, my thought process was essentially that I care how the guest is performing. When considering only the guest, I don't care at all what VMware says. So CDM was the choice as it was measuring the resources that the end user was experiencing directly.
That's not to say that I ignored VMware in total but only paid attention to the VMware specific things that weren't already better measured. Things like CPU Ready is huge in monitoring the ability of a VM guest to run.
Original Message:
Sent: 02-26-2021 02:24 PM
From: Miller Grisepe Echagarreta Parra
Subject: Difference between cdm and vmware to have cpu, mem and disk metrics
Thanks to both it's very important your comments.
Part of my question is being able to homologate the cpu and memory metrics of cdm and vmware. So if we compare them with each other because there is a difference of more than 1%? Isn't it supposed that if it's the same component "cpu" and the same metric "cpu_usage" the value of the metric should the same? Why is this happening?
Original Message:
Sent: 02-26-2021 07:11 AM
From: Jay Wink
Subject: Difference between cdm and vmware to have cpu, mem and disk metrics
Hi Miller,
There is a big difference in VMware vs cdm.
VMware prove provides simple metrics like CPU Total, Memory Usage etc.
The cdm probe gets Per CPU and Total along with Proc Queue Length/1,5, and 15 min Load Avg. On the Memory side we break down Swap vs Physical and Total Memory. For the Disk you have a few different options how to alarm and gather the data on it, % vs MB. Plus you can gather the partition size. You can also hide unwanted partitions also.
The best practice is to monitor inside out and outside in. Meaning an OS is an OS regardless where it is installed, VMware, Azure, AWS, Nutanix etc. There is a ton of rich data on the OS to help with MTTR. So monitoring with probes is essential for mature OS monitoring.
On the VMware side, the OS isn't aware (outside the drivers) it is an ESX Host. The ESX Host controls the CPU, Memory, Disk and NIC and has it's own schedulers, with priorities, with bottleneck metics that are exposed via the API. The OS has no idea why it is running slow if there is a bottleneck or limiters being set. So getting both sides is the only way to quickly isolate root cause.
Hope this helps!
Jay Wink
------------------------------
Solution Engineer - AIOps
Broadcom
Original Message:
Sent: 02-25-2021 08:13 PM
From: Miller Grisepe Echagarreta Parra
Subject: Difference between cdm and vmware to have cpu, mem and disk metrics
Hi everyone.
What is the difference between cdm and vmware to have cpu, mem and disk metrics? I see for example that the value of cdm (qos_cpu _usage) metric is not the same as (qos_vmware_cpu_usage).
What is the perspective of each one. Shouldn't I think I'm going to change cdm monitoring to vmware or can complement one with the other?
What do you think ?