ESXi

 View Only
  • 1.  Latency on datastore

    Posted Feb 23, 2024 09:11 PM

    Hi, i've a lab setup for testing of some VMs.

    All license is regular. hardware is a simple node dell T7910, with Dell LSI MegaRAID 9341-8i with 1 ssd pool and 1 hdd pool.

    Without any specific reason, i see latency on disk increase without limit, last time until 40seconds.

    In this moment, no people is using this server or VMs, and latency is 720ms for some vms; no task are executed and no task are scheduled.
    I've tried with some tools on windows vm to generate i/o stress on ssd pool, with some tests for 40k iops as ssd pool, with queue on 16 and test on 4k settings.

    ESX version is 8.0update 1. 

    Can someone help me to debug this?



  • 2.  RE: Latency on datastore

    Posted Feb 23, 2024 10:56 PM

    can you please share esxtop screenshot for adapter view with D, share device view with U and VM view with V. 

    PS: looking for DAVG,KAVG, GAVG values

    Also share the host task list with counts:

    vim-cmd vimsvc/task_list |wc -l 

    vim-cmd vimsvc/task_list 

     



  • 3.  RE: Latency on datastore

    Posted Feb 24, 2024 08:42 PM

    Hi!

    i've attached screen,at this moment,during a vm deploy from template, just to be sure that i can reproduce problems.

    do you see something strage?

    Thanks!



  • 4.  RE: Latency on datastore

    Posted Feb 24, 2024 12:05 PM

    Here was a similar latency issue thread (but on HP hardware): 

    https://communities.vmware.com/t5/vSphere-Storage-Discussions/Disk-Latency-Issues/td-p/2535575

    lots of discussion in this post, but it came down to looking at the storage configuration (in this case it was a raid 6). How is your MegaRAID configured? This thread could help.



  • 5.  RE: Latency on datastore

    Posted Feb 24, 2024 08:44 PM

    Hi! megaraid is configured with a single ssd pool in raid 5, no spare is present and all disk are healty. i've double check also virtual volume and there aren't logs or problems logged by controller or bios, so i think that virtual volume and all ssd are ok and configuration is valid.

    FYI, i've forgot to specify that HDD pool is not under megaraid controller but only on sata connection on m/b.

    thanks!



  • 6.  RE: Latency on datastore

    Posted Feb 26, 2024 12:32 AM

    Can you please confirm the controller driver version?

    https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=35201&vcl=true

    ESXi 8.0 U1lsi_mr3 version 7.724.03.00-1vmw

     

    You can run command below 

    for a in $(esxcfg-scsidevs -a |awk '{print $2}') ;do echo $a; vmkload_mod -s $a |grep -i version ;done | awk '!a[$0]++'; for a in $(esxcfg-scsidevs -a |awk '{print $1'}); do vmkchdev -l |grep $a | awk -F" |:" '{print "http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID="$4"&DID="$5"&SVID="$6"&SSID="$7"&details=1"}'; done | awk '!a[$0]++'; vmware -vl

    PS: just checking if the dell server is compatible with running esxi 8.x?

    "https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=35201&vcl=true"

    == 

    Following the screenshot, the storage device session thread is maxed out with 100% usage and host is queuing IO, this is where you have a very high latency on DAVG and since host is re-sending there is added injected latency from host on KAVG and the total on GAVG.

    -- 

    regarding host task list, please run disable FCD disk query on your vcenter if not using VMware Tanzu or Kubernetes. 

    If you are not planning to use Kubernetes/Tanzu in the near future, you can disable Catalog Sync (and the log messages it generates).
    - To do this, please make a copy of the file /usr/lib/vmware-vpx/sps/conf/vslm.properties
    - Then edit the original and add the following line at the end:
     
    vslm.disablePeriodicSync = Y
     
    - Save the file then and restart the vmware-sps service with:
    # vmon-cli -r sps
     
     

     

     



  • 7.  RE: Latency on datastore

    Posted Mar 02, 2024 09:07 PM
      |   view attached

    Hi, sorry for late reply.

    i've made some test, from changing controller and also a new fresh vmware installation (version 7).

    At the moment, command that you sent me has this output:

    vmw_ahci
    Version: 2.0.9-1vmw.702.0.0.17867351
    lsi_msgpt3
    Version: 17.00.10.00-2vmw.702.0.0.17867351
    lsi_mr3
    Version: 7.716.03.00-1vmw.702.0.0.17867351
    http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID=8086&DID=8d02&SVID=1028&SSID=0619&details=1
    http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID=1000&DID=0097&SVID=1028&SSID=0619&details=1
    http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID=1000&DID=005f&SVID=1000&SSID=9341&details=1
    VMware ESXi 7.0.2 build-18538813
    VMware ESXi 7.0 Update 2

    Changing controller made no effects, and change vmware version the same. Actually i was working on docker swarm test infrastructure, and is impossible to make any kind of task.

    PCI port is PCI Express 3.0 x16.

    i don't know where start to looking for solutions or similar problems. look at that fantastic screenshot.

    thanks!



  • 8.  RE: Latency on datastore

    Posted Feb 25, 2024 02:25 AM

    To make sure that I can replicate issues, I've attached a screen that is now being used during a virtual machine deploy from template.