ESXi

 View Only
  • 1.  Repeated Datastore Disconnection Every 23 minutes

    Posted Nov 13, 2019 10:32 PM

    We are troubleshooting an issue where our VMs are hanging for 5-10 seconds every time our datastores disconnect and reconnect.

    The load on the server is minimal with a little bit of I/O. The disconnections happen every 23 minutes whether there is a lot of IO or not.

    There are three datastores. If all guest VMs are powered off except for one VM which only has VMDK files on ONE datastore, the controller will still show large IO spikes, however it may not fully reset all datastores. During business hours or at night during backups where there is constant RW operations going on, the datastores will always reset and come back. We are receiving some application errors due to this in one of our databases.

    Dell PowerEdge R540 (no cluster)

    • PERC H730P Adapter (Embedded)
    • Firmware: 25.5.6.0009 (Latest)
    • 4 x 2TB 7.2k RAID 6
    • 2 x 200GB SSD RAID 1
    • 2 x 600GB 15k RAID 1
    • All volumes have Read-Ahead and Write-Back enabled

    ESXi 6.7.0 Update 3 Build-14320388 (A00)

    • No snapshots in place
    • Driver version 7.708.07.00-3vmw (original driver)
    • Driver version 7.710.07.00-1OEM.670.0.0.8169922 (*)
    • VMFS3.UseATSForHBOnVMFS5 is set to default (1). We tried setting value to (0) with no improvement.

    * This driver is supported only on Dell PowerEdge Servers R6525, C6525, R6515 and R7515

    * After contacting Dell, they recommended installing the above driver to test as the changelog indicated it addressed my symptoms. No improvement, however

    * SCGCQ02033302 Resolves issue in which non-RAID drives may not be listed during OS installation or in vSphere.

    * SCGCQ02189085 Fixed an issue that could cause an IO timeout and controller reset under certain workloads.

    As mentioned above, this happens EVERY 23 minutes.

    If anyone is able to offer assistance I would be forever grateful.



  • 2.  RE: Repeated Datastore Disconnection Every 23 minutes

    Posted Nov 14, 2019 01:51 PM

    As a test, I would disable write-back cache and see if it makes any difference. Your overall latency may be higher, but maybe there's something in that controller's microcode that is generating this.



  • 3.  RE: Repeated Datastore Disconnection Every 23 minutes

    Posted Nov 16, 2019 12:31 AM

    Thanks for the reply. We didn't try changing the writeback policy.

    Changing firmware and driver versions in the 2019 time frame didn't seem to make any difference.

    Astoundingly, the solution was to downgrade the firmware of the iDRAC on our server to a version from December 2018. Apparently, the iDRAC controller polls the storage controller on a schedule and that was causing the datastore disconnections.

    Dell supposedly will have an updated iDRAC firmware out in December 2019 that should fix this issue.



  • 4.  RE: Repeated Datastore Disconnection Every 23 minutes

    Posted Apr 17, 2021 09:28 AM

    Hi Iansol,

    We have this same issue, did you ever upgrade the idrac firmware and the issue is still no longer there? Or are you still on the 2018 version?

    Thanks



  • 5.  RE: Repeated Datastore Disconnection Every 23 minutes

    Posted Apr 17, 2021 04:17 PM

    Hello.
    Very rare your case, the IDRAC is management only.
    Who updated the Frimware you or DELL support ?

    Integrated Remote Access Controller (iDRAC) Service Module is an optional lightweight software application that can be installed on Dell servers of the 12th generation or higher with iDRAC7. It complements the iDRAC interfaces: Graphical User Interface (GUI), Remote Access Controller Administration (RACADM), CLI and Web Services Management (WSMAN) with additional monitoring data.

     



  • 6.  RE: Repeated Datastore Disconnection Every 23 minutes

    Posted Apr 18, 2021 07:14 AM

    Hi,

    Our idrac firmware was the same version as the OPs. We had exactly the same issue, vmware logs showing datastore disconnects every 23 minutes. Yesterday we updated the firmware to the latest and now there are no disconnects. Also the same applies to the Windows logs, before we had delay writes and virtual machines freezing, which has now gone.

    Thankful the OP added 23 minutes to their post otherwise it would never have led me here.

    Thanks



  • 7.  RE: Repeated Datastore Disconnection Every 23 minutes

    Posted Apr 19, 2021 05:08 PM

    We still seem to have an issue, but it is a different error now. But still causing the same problem. 

    Device naa.Xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx performance has deteriorated. I/O latency increased from average value of 7354 microseconds to 235438 microseconds

    Device naa.Xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx performance has improved. I/O latency reduced from 46157 microseconds to 14558 microseconds

    This appears to be occurring since OpenManage was installed, but different error since the idrac update 2 days ago.

    Before the drop out was 3-5 seconds and causing bigger issues, but we still had to reboot VMs today.