We are troubleshooting an issue where our VMs are hanging for 5-10 seconds every time our datastores disconnect and reconnect.
The load on the server is minimal with a little bit of I/O. The disconnections happen every 23 minutes whether there is a lot of IO or not.
There are three datastores. If all guest VMs are powered off except for one VM which only has VMDK files on ONE datastore, the controller will still show large IO spikes, however it may not fully reset all datastores. During business hours or at night during backups where there is constant RW operations going on, the datastores will always reset and come back. We are receiving some application errors due to this in one of our databases.
Dell PowerEdge R540 (no cluster)
- PERC H730P Adapter (Embedded)
- Firmware: 25.5.6.0009 (Latest)
- 4 x 2TB 7.2k RAID 6
- 2 x 200GB SSD RAID 1
- 2 x 600GB 15k RAID 1
- All volumes have Read-Ahead and Write-Back enabled
ESXi 6.7.0 Update 3 Build-14320388 (A00)
- No snapshots in place
- Driver version 7.708.07.00-3vmw (original driver)
- Driver version 7.710.07.00-1OEM.670.0.0.8169922 (*)
- VMFS3.UseATSForHBOnVMFS5 is set to default (1). We tried setting value to (0) with no improvement.
* This driver is supported only on Dell PowerEdge Servers R6525, C6525, R6515 and R7515
* After contacting Dell, they recommended installing the above driver to test as the changelog indicated it addressed my symptoms. No improvement, however
* SCGCQ02033302 Resolves issue in which non-RAID drives may not be listed during OS installation or in vSphere.
* SCGCQ02189085 Fixed an issue that could cause an IO timeout and controller reset under certain workloads.
As mentioned above, this happens EVERY 23 minutes.
If anyone is able to offer assistance I would be forever grateful.