DX Unified Infrastructure Management

 View Only
  • 1.  Difference between cdm and vmware to have cpu, mem and disk metrics

    Posted Feb 25, 2021 08:14 PM
    Hi everyone.

    What is the difference between cdm and vmware to have cpu, mem and disk metrics? I see for example that the value of cdm (qos_cpu _usage) metric is not the same as (qos_vmware_cpu_usage).
    What is the perspective of each one. Shouldn't I think I'm going to change cdm monitoring to vmware or can complement one with the other?

    What do you think ?


  • 2.  RE: Difference between cdm and vmware to have cpu, mem and disk metrics

    Posted Feb 25, 2021 08:26 PM
    CDM is a probe that use internal queries from agent to obtain the values and after send to hub, this queries use the native tools of operating system otherwise the VMWARE probe connect to vSphere or vCenter and via API VMWARE obtain the values of cpu, memory, data store, vmware tools, power status, etc...

    About the name of QOS_ depend of probe for example CPU_USAGE share the QOS between CDM and VMWARE, but other QOS maybe not. You can consult the name of QOS via SQL with the simple query: select * from s_qos_data;

    Regards.

    ------------------------------
    Jesús Glc
    KIO Networks
    #KIONetworks is powerfull
    ------------------------------



  • 3.  RE: Difference between cdm and vmware to have cpu, mem and disk metrics

    Broadcom Employee
    Posted Feb 26, 2021 07:11 AM
    Hi Miller,

    There is a big difference in VMware vs cdm.  

    VMware prove provides simple metrics like CPU Total, Memory Usage etc.

    The cdm probe gets Per CPU and Total along with Proc Queue Length/1,5, and 15 min Load Avg.  On the Memory side we break down Swap vs Physical and Total Memory.  For the Disk you have a few different options how to alarm and gather the data on it, % vs MB.  Plus you can gather the partition size.  You can also hide unwanted partitions also.

    The best practice is to monitor inside out and outside in.  Meaning an OS is an OS regardless where it is installed, VMware, Azure, AWS, Nutanix etc.  There is a ton of rich data on the OS to help with MTTR.  So monitoring with probes is essential for mature OS monitoring.

    On the VMware side, the OS isn't aware (outside the drivers) it is an ESX Host.  The ESX Host controls the CPU, Memory, Disk and NIC and has it's own schedulers, with priorities, with bottleneck metics that are exposed via the API.  The OS has no idea why it is running slow if there is a bottleneck or limiters being set.  So getting both sides is the only way to quickly isolate root cause.

    Hope this helps!
    Jay Wink

    ------------------------------
    Solution Engineer - AIOps
    Broadcom
    ------------------------------



  • 4.  RE: Difference between cdm and vmware to have cpu, mem and disk metrics

    Posted Feb 26, 2021 02:24 PM
    Thanks to both it's very important your comments.
    Part of my question is being able to homologate the cpu and memory metrics of cdm and vmware. So if we compare them with each other because there is a difference of more than 1%? Isn't it supposed that if it's the same component "cpu" and the same metric "cpu_usage" the value of the metric should the same? Why is this happening?




  • 5.  RE: Difference between cdm and vmware to have cpu, mem and disk metrics
    Best Answer

    Posted Feb 26, 2021 02:58 PM
    One could probably write a book on the answer to this.

    But let's start with the simple question - what's the definition of CPU usage %?

    And now, how do you know that the Microsoft engineer who developed the performance counter defined it the exact same way?

    And how do you know that the VMware engineer agrees with either you or the MS engineer?

    And what exactly is "CPU usage"? does it include steal? How about accounting for cycles that can't happen because of conflicts resulting from contention in hyperthreading? How about IO wait?

    That aside, assuming that you all agree on what the "CPU usage %" means, it's an average right? In the instant, a CPU is always running at 100% of its clock speed right? And with the concept of Idle time, that divides this 100% running into two groups - Idle and not-Idle. But in order to measure the split between two things, you have to do it across a period of time. So who defines that period? So maybe the MS eng and VM eng disagree on the length of the period across which to sample usage? And generally it's a rolling average but who enforced that everyone who measures CPU has to start at the same exact time? Or use the same number of sample points?

    It's gets real messy.

    So take a step back and ask yourself if the difference matters? 

    When I looked into this, my thought process was essentially that I care how the guest is performing. When considering only the guest, I don't care at all what VMware says. So CDM was the choice as it was measuring the resources that the end user was experiencing directly.

    That's not to say that I ignored VMware in total but only paid attention to the VMware specific things that weren't already better measured. Things like CPU Ready is huge in monitoring the ability of a VM guest to run.