VMware vSphere

 View Only
Expand all | Collapse all

ESXI 6.7 very slow disk performance (extrem high latency)

  • 1.  ESXI 6.7 very slow disk performance (extrem high latency)

    Posted May 02, 2021 07:15 PM

    Since the last update, the hard drive is extremely slow (6.7.0 Update 3 (Build 17700523)).
    At the beginning the latency is still ok, then the values go extremely high.

    ozzproject_0-1619982840613.png

     

    See screenshot (light blue line!). Yesterday I had values up to 354000 ms.
    I have already tried to deactivate the wmw-ahci driver, because it has already caused such problems.
    Unfortunately, then my hard drive is no longer recognized.
    Therefore I searched for alternative ahci drivers, unfortunately without success.
    So I downgraded the AHCI driver to the previous version (VMW_bootbank_vmw-ahci_2.0.5-2vmw.670.3.132.17167734).
    Still no better.

    Any Suggestions?



  • 2.  RE: ESXI 6.7 very slow disk performance (extrem high latency)

    Broadcom Employee
    Posted May 04, 2021 10:01 AM

    very hard to say what the cause is I would advise looking at the vmkernel logs  to see if you are seeing errors

    also you can also look at esxtop and focus on storage performance metrics to understand any device latency or or kernel latency on the ESXi 

     

     



  • 3.  RE: ESXI 6.7 very slow disk performance (extrem high latency)

    Posted May 04, 2021 12:17 PM

    Thanks for your help, paudieo.

    I checked the log and found this error almost every 20 minutes. What does it mean?

    2021-05-04T11:04:43.067Z cpu0:2097183)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x85 (0x459a40dee600, 2098995) to dev "mpx.vmhba33:C0:T0:L0" on path "vmhba33:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
    2021-05-04T11:04:43.067Z cpu0:2097183)ScsiDeviceIO: 3483: Cmd(0x459a40dee600) 0x85, CmdSN 0x1416 from world 2098995 to dev "mpx.vmhba33:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T11:04:43.067Z cpu2:2097179)ScsiDeviceIO: 3483: Cmd(0x459a40dee600) 0x85, CmdSN 0x1417 from world 2098995 to dev "mpx.vmhba33:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T11:04:43.123Z cpu0:2097181)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x85 (0x459a40dee600, 2098995) to dev "mpx.vmhba33:C0:T0:L1" on path "vmhba33:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
    2021-05-04T11:04:43.123Z cpu0:2097181)ScsiDeviceIO: 3483: Cmd(0x459a40dee600) 0x85, CmdSN 0x1418 from world 2098995 to dev "mpx.vmhba33:C0:T0:L1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T11:04:43.124Z cpu0:2097181)ScsiDeviceIO: 3483: Cmd(0x459a40dee600) 0x85, CmdSN 0x1419 from world 2098995 to dev "mpx.vmhba33:C0:T0:L1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T11:04:43.128Z cpu0:2097183)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x85 (0x459a40dee600, 2098995) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
    2021-05-04T11:04:43.128Z cpu0:2097183)ScsiDeviceIO: 3483: Cmd(0x459a40dee600) 0x85, CmdSN 0x141a from world 2098995 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T11:04:43.128Z cpu0:2097183)ScsiDeviceIO: 3483: Cmd(0x459a40dee600) 0x85, CmdSN 0x141b from world 2098995 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T11:05:28.030Z cpu7:2097640)NMP: nmp_ResetDeviceLogThrottling:3586: last error status from device mpx.vmhba33:C0:T0:L1 repeated 1 times
    2021-05-04T11:05:28.030Z cpu7:2097640)NMP: nmp_ResetDeviceLogThrottling:3586: last error status from device mpx.vmhba33:C0:T0:L0 repeated 1 times
    2021-05-04T11:05:28.030Z cpu7:2097640)NMP: nmp_ResetDeviceLogThrottling:3586: last error status from device mpx.vmhba32:C0:T0:L0 repeated 1 times
    2021-05-04T11:07:28.028Z cpu7:2097695)DVFilter: 6068: Checking disconnected filters for timeouts
    2021-05-04T11:17:28.017Z cpu5:2097695)DVFilter: 6068: Checking disconnected filters for timeouts
    2021-05-04T11:27:28.006Z cpu5:2097695)DVFilter: 6068: Checking disconnected filters for timeouts

     

    I also found another error regarding the other hard disk in the /var/log/vmkwarning.log file.

    The error occurs very often and seems to occur more and more frequently and also much longer.

    2021-05-01T17:07:56.340Z cpu2:2097181)WARNING: ScsiDeviceIO: 1564: Device t10.ATA_____KingFast________________________________AA000000000000000612 performance has deteriorated. I/O latency increased from average value of 797 microseconds
    2021-05-01T17:07:56.340Z cpu2:2097181)WARNING: to 16610 microseconds.



  • 4.  RE: ESXI 6.7 very slow disk performance (extrem high latency)

    Posted May 04, 2021 12:41 PM


    Thank you for your help. I have found some entries that occur often lately and also longer. What exactly does it mean?

    2021-05-04T12:34:45.079Z cpu7:2097181)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x85 (0x459a40bd4c80, 2098995) to dev "mpx.vmhba33:C0:T0:L0" on path "vmhba33:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
    2021-05-04T12:34:45.079Z cpu7:2097181)ScsiDeviceIO: 3483: Cmd(0x459a40bd4c80) 0x85, CmdSN 0x144c from world 2098995 to dev "mpx.vmhba33:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T12:34:45.079Z cpu7:2097181)ScsiDeviceIO: 3483: Cmd(0x459a40bd4c80) 0x85, CmdSN 0x144d from world 2098995 to dev "mpx.vmhba33:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T12:34:45.092Z cpu7:2097181)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x85 (0x459a40bd4c80, 2098995) to dev "mpx.vmhba33:C0:T0:L1" on path "vmhba33:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
    2021-05-04T12:34:45.092Z cpu7:2097181)ScsiDeviceIO: 3483: Cmd(0x459a40bd4c80) 0x85, CmdSN 0x144e from world 2098995 to dev "mpx.vmhba33:C0:T0:L1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T12:34:45.092Z cpu7:2097181)ScsiDeviceIO: 3483: Cmd(0x459a40bd4c80) 0x85, CmdSN 0x144f from world 2098995 to dev "mpx.vmhba33:C0:T0:L1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T12:34:45.096Z cpu7:2097181)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x85 (0x459a40bd4c80, 2098995) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
    2021-05-04T12:34:45.096Z cpu7:2097181)ScsiDeviceIO: 3483: Cmd(0x459a40bd4c80) 0x85, CmdSN 0x1450 from world 2098995 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T12:34:45.096Z cpu7:2097181)ScsiDeviceIO: 3483: Cmd(0x459a40bd4c80) 0x85, CmdSN 0x1451 from world 2098995 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
    2021-05-04T12:35:27.933Z cpu7:2097640)NMP: nmp_ResetDeviceLogThrottling:3586: last error status from device mpx.vmhba33:C0:T0:L1 repeated 1 times
    2021-05-04T12:35:27.933Z cpu7:2097640)NMP: nmp_ResetDeviceLogThrottling:3586: last error status from device mpx.vmhba33:C0:T0:L0 repeated 1 times
    2021-05-04T12:35:27.933Z cpu7:2097640)NMP: nmp_ResetDeviceLogThrottling:3586: last error status from device mpx.vmhba32:C0:T0:L0 repeated 1 times


    I also found another error regarding the other hard disk in the /var/log/vmkwarning.log file.
    The error occurs very often and seems to occur more and more frequently and also much longer.


    2021-01-29T17:26:48.021Z cpu0:2097738)WARNING: ScsiDeviceIO: 1564: Device t10.ATA_____ST2000LM0152D2E8174__________________________________WDZ1BAXV performance has deteriorated. I/O latency increased from average value of 14400
    2021-01-29T17:26:48.021Z cpu0:2097738)WARNING: microseconds to 433749 microseconds.



  • 5.  RE: ESXI 6.7 very slow disk performance (extrem high latency)

    Broadcom Employee
    Posted May 04, 2021 01:53 PM

    2021-05-04T12:34:45.096Z cpu7:2097181)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x85 (0x459a40bd4c80, 2098995) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE

    means ESXi is sending a SCSI cmd 0x85 (which means its checking for if its capabile of supporting ATA PASS-THROUGH cmds)  to the device mpx.vmhba32:C0:T0:L0 , (this could be a USB / CDROM or other such virtual device)

    The device, did not like the command and in return responded back with  check condition D0x2, saying its an illegal command  sense code 0x5

    In summary ESXi is probing the device and the device says I can't support that command

    nothing to worry about this is common log entry on esxi

    see KB https://kb.vmware.com/s/article/289902 about interpreting more SCSI codes

     

    However this is not good

    2021-01-29T17:26:48.021Z cpu0:2097738)WARNING: ScsiDeviceIO: 1564: Device t10.ATA_____ST2000LM0152D2E8174__________________________________WDZ1BAXV performance has deteriorated. I/O latency increased from average value of 14400
    2021-01-29T17:26:48.021Z cpu0:2097738)WARNING: microseconds to 433749 microseconds.

     

    This means the reported device latency from this device went from 14ms to a whopping 433milliseconds 

    again this is just confirming what we know

    What kind of device is this t10.ATA_____ST2000LM0152D2E8174__________________________________WDZ1BAXV ?

     

     

     

     



  • 6.  RE: ESXI 6.7 very slow disk performance (extrem high latency)

    Posted May 04, 2021 02:12 PM

    The disk ST2000LM015-2E81 is an 2000GB Seagate BarraCuda ST2000LM015 128MB 2.5" (6.4cm) SATA 6Gb/s. This run quite good in the past.



  • 7.  RE: ESXI 6.7 very slow disk performance (extrem high latency)

    Broadcom Employee
    Posted May 04, 2021 03:14 PM

    I see magnetic disk then

    ok so there should be a SCSI controller managing the device like vmhba0 or vmhba1, to confirm if what type of scsi controller you're using

    You mentioned earlier you changed the ahci driver. 

    you should take a look at esxtop as well to find out what workload you are generating and the latency you're getting

    https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-2F960309-58FA-4828-941C-79584B153B0F.html

     



  • 8.  RE: ESXI 6.7 very slow disk performance (extrem high latency)

    Posted May 04, 2021 03:55 PM

    I can not find a scsi controler. There only this 3 ones.

    vmhba0
    * is listed as Sunrise Point-LP AHCI Controller --> driver vmw_ahci

    vmhba32 and vmhba33
    * listed as USB Storage Controller --> driver vmkusb


    vmhba0 is listed as ahci. During the last update of ESXi 6.7 the ahci driver was changed. The symptoms were the same as with the update within ESXXi 6.5, where the driver also had a problem. That was the reason to replace the driver with the previous version.
    without success.
    Currently I use the current ESXi 6.7 version with the included vib.


    The machine on this hard disk unfortunately crashed and I have to rebuild it. But this was not possible so far because of the latency. It is a mysql server with 20 database.
    The current esxtop are not meaningful.

     

    DEVICE PATH/WORLD/PARTITION DQLEN WQLEN ACTV QUED %USD LOAD CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd
    mpx.vmhba32:C0:T0:L0 - 1 - 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    mpx.vmhba33:C0:T0:L0 - 1 - 0 0 0 0.00 9.14 5.83 3.30 0.52 0.07 4.77 0.41 5.18 0.39
    mpx.vmhba33:C0:T0:L1 - 1 - 0 0 0 0.00 101.66 0.00 101.66 0.00 1.00 0.14 0.09 0.23 0.08
    t10.ATA_____KingFast________________ - 31 - 0 0 0 0.00 0.39 0.19 0.19 0.00 0.00 0.03 0.02 0.05 0.00
    t10.ATA_____ST2000LM0152D2E8174_____ - 31 - 5 0 16 0.16 45.10 13.22 31.88 0.12 4.10 183.70 0.01 186.71 0.00

     



  • 9.  RE: ESXI 6.7 very slow disk performance (extrem high latency)

    Broadcom Employee
    Posted May 05, 2021 07:01 PM

    Hi

    Looking at the HCL 

    doing a search for Sunrise Point-LP

    https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=42130&deviceCategory=io&details=1&keyword=Sunrise%20Point-LP&page=2&display_interval=10&sortColumn=Partner&sortOrder=Asc

    Reading the footnotes for 6.7 U3 there was a patch see https://docs.vmware.com/en/VMware-vSphere/6.7/rn/esxi670-202103001.htm

    see if it's applicable

    from the esxtop output there does not seem to be much IO activity to really judge, so its not really useful

    DEVICE PATH/WORLD/PARTITION DQLEN WQLEN ACTV QUED %USD LOAD CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd
    t10.ATA_____ST2000LM0152D2E8174_____ - 31 - 5 0 16 0.16 45.10 13.22 31.88 0.12 4.10 183.70 0.01 186.71 0.00

    however from what I see there seems to be minimal latency introduced by the ESXi itself (KAVG/cmd) while DAVG/cmd shows quite high latency (186.70)

     

    I suggest is to test the performance on 6.5  on  gather the performance data (see KB https://kb.vmware.com/s/article/653
    Then repeat on 6.7 U3 and gather more perf data to compare between the two versions (with the patch) 
    If you have a support contract , you could engage Global Services to see if they are aware of any issues



  • 10.  RE: ESXI 6.7 very slow disk performance (extrem high latency)

    Posted May 06, 2021 02:29 PM

    Hi paudieo,
    thank you very much for the help.
    I have already tried the patch...
    Unfortunately also without success.

    I wasted too much time in the search in the meantime.
    At the beginning I had Windows in suspicion, because here some updates make problems, but the Linux machine had no abnormalities. Later, my second Esxi has become so slow that I have taken it out of service.
    The other machine still runs very slow with the described problems. This is the second time I have had trouble with the drivers on Esxi. It was the same with version 6.5 U2 and now again.
    I am more than disappointed.
    Therefore, I have switched to ProXmox and move all the machines.