DX Unified Infrastructure Management

 View Only
  • 1.  UIM's Lack of Citrix Monitoring with VDI Health Status

    Posted Jul 11, 2019 05:03 PM
    So we've been getting slammed for our lack of visibility into our clients Citrix environments. We started really making use of the 3 main probes for Citrix monitoring (xenapp, xendesktop and xenserver) when we can setup these probes with their horrible book of pre-reqs needed in order to get the probe working.
    The PVS probe doesn't even work with the latest version and we can't monitor that environment for almost a year now b/c client is on latest version which probe doesn't support. (Defect DE399302 pvs probe connection problems)

    But what is hurting us the most is the lack of visibility from the VDI's overall end user perspective. Is there any probes that can monitor each VDI's overall health and responsiveness? No were not using the vmware probe to start monitoring each and every VDI instance as that would just blow up the # of metrics the probe already monitors.

    Does UIM provide anything that can give us this visibility into the VDI's themselves? ​And I don't mean testing opening 1 VDI session, taking a time check logging out. I mean something that checks all the active VDI's and can alert if they are slow, non-responsive, etc... Clients are calling in saying their VDI is slow and then the guessing game begins...

    We need this for both Citrix and VMWare's versions of VDI. There is 0 visibility into VMWare Horizon VDI. Is there still nothing that can monitor this environment?

    #vdi
    #cauim9.0.2
    #lackofmetrics


    ​​​​

    ------------------------------
    Daniel Blanco
    Enterprise Tools Architect
    Alphaserve Technologies
    ------------------------------


  • 2.  RE: UIM's Lack of Citrix Monitoring with VDI Health Status

    Posted Jul 11, 2019 06:02 PM
    Yes the PVS last update is from 2016, so not much hope there. 
    Excuse my ignorance here in regards to VDI monitoring but it seems to me that metrics like these would be of some help:
    QOS_XENDESKTOP_DESKTOP_CPU, Integer, CPU usage of the desktop
    QOS_XENDESKTOP_DESKTOP_MEMORY, Integer, Memory Available in KB
    QOS_XENDESKTOP_DESKTOP_PROCESSOR_QUEUE_LENGTH, Integer, System Processor Queue Length

    For xendesktop there is 4.24 (beta)
    Added support for monitoring of Citrix Virtual Apps and Desktops (formerly XenApp &XenDesktop) 7.15 and 7.17 versions.
    Provides additional configuration and recommendations for collection of metrics from individual desktops and enabling support for monitoring large-scale deployments.

    ------------------------------
    Support Engineer
    Broadcom
    ------------------------------



  • 3.  RE: UIM's Lack of Citrix Monitoring with VDI Health Status

    Posted Jul 12, 2019 12:40 PM

    Slow VDI – often caused by the Overcommitment of vCPUs on the ESX hosts that contain the VMs performing the VDI function.

     

    Metrics to look at in VMWARE  

    • CPU_READY 
    • CO_STOPS
    • CPU_BUSY  --- can become unable to reach 100% utilization when CPU_READY is extremely high.

     

    When CPU_READY remains above 10% per vCPU in a VM, it has a major impact on the performance of the VM.

     

    The VMWARE probe can alert on the number of VMs with "High_CPU_READY".   

    This is an ESX HOST metric so it doesn't require metrics from each VM.

     

    To know which VM has problems collect  CPU_READY  from each VM  and alert if > 10% .

     

    If development at Broadcom will adjust the VMware probe to use the "Chunk-Size" fix which was written in 2017  it could have a massive performance improvement on pulling data from VCenter.

     

    David

     

     

    David DuPre'

    Principal Services Consultant  |  Enterprise Studio

    HCL Technologies Ltd.

    404-617-3023  |  david.dupre@hcl.com  | Lookout Mountain, GA

    www.hcltech.com  | www.ca.com/services

     

    pic1

     

    ::DISCLAIMER::
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------





  • 4.  RE: UIM's Lack of Citrix Monitoring with VDI Health Status

    Posted Jul 16, 2019 01:50 PM
    Thanks Dave(s).
    But the xendesktop probe and using it to query all the VDI's in large scale deployments is useless. In the case above, client has 2500+ VDI's and the xendesktop probe when configured to monitor the VDI's would take anywhere between 1 hour to 3 hours to complete just 1 poll cycle. Not really at all very helpful. I was trying this out with the 4.24 beta of the probe. The probe became unresponsive and could not access it in the Admin Console b/c ppm probe could not communicate to it when it was querying the VDI's. (was getting java heap memory errors in ppm log) (Case 20027459)

    I saw the "High Scale Deployment" option for this probe but that brings in a whole level of complexity into the picture. We would have to add possibly 3 additional dedicated hubs that would just be responsible for collecting the metrics from each VDI's who now would act as robots that need to report up to a hub. That would also be 3k worth of robots from a lic standpoint. No this is not an option. 

    The vmware probe option is looking like the best option but won't work great either b/c you can only have 1 template for everything.
    We can't say for these VM's use this template and for these VM's use this template where we just want to collect QoS. We don't want to alert on any of the VDI's but just collect basic stats (cpu/mem...)

    ------------------------------
    Daniel Blanco
    Enterprise Tools Architect
    Alphaserve Technologies
    ------------------------------



  • 5.  RE: UIM's Lack of Citrix Monitoring with VDI Health Status

    Posted Jul 16, 2019 03:13 PM

    Daniel,

       There is an old product called CA Capacity Management that can collect via VMWARE API 2500+ <g class="gr_ gr_36 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling" id="36" data-gr-id="36">vdi</g> metrics from vCenter once per night.

     

    It pulls in the metrics from Vcenter history tables.

    There are tables with metrics stored at different intervals and the data goes back a fixed period.

    <g class="gr_ gr_43 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling multiReplace" id="43" data-gr-id="43">5 minute</g> data for 24 hours

    <g class="gr_ gr_44 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling multiReplace" id="44" data-gr-id="44">30 minute</g> data for 7 days

    <g class="gr_ gr_45 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling multiReplace" id="45" data-gr-id="45">2 hour</g> data for a month?

    <g class="gr_ gr_46 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling multiReplace" id="46" data-gr-id="46">1 day</g> data for a year?

     

    Because the values are averages computed by Vcenter the metrics are good.

    Capacity Planning is not Real-Time monitoring like UIM… so the newest metrics are for yesterday up to midnight.

    Jobs run after midnight to pull in data from the prior 24 hour period.

     

    Normally, we pull in the <g class="gr_ gr_41 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling multiReplace" id="41" data-gr-id="41">30 minute</g> data to get an idea of where the issues are and then pull in <g class="gr_ gr_37 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling multiReplace" id="37" data-gr-id="37">5 minute</g> data to drill down on specific <g class="gr_ gr_35 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" id="35" data-gr-id="35">esx</g> hosts.

    The bad news is that this product uses an Oracle database, and that makes it an expensive tool.


    Below is what a Top CPU Ready by Cluster report looks like:  (Note that my example is only overcommitted vCPU at 2.5x)

       - Many customers overcommit VDI's at 4 or 5x, and this overcommit rate causes slow VDI response times.

    Summary of cluster config
    List of VMs showing CPUReady,AVG_CPU_UTIL,MaxUtil,vCPU_count, vRam, HostIP

    Below is a link to the documentation,

    https://docops.ca.com/ca-capacity-management/2-9-3/en

    David



    ------------------------------
    Principal Services Consultant
    ------------------------------