So I have been suspecting storage latency issues in our VMware environment for a while. So been working on ways to find solution to the problem. Core issue has been ESXi host disconnecting from vCenter server and getting greyed out. When this happens we can't do much with the host or VM's on it except for shutting them down and starting them on different ESXi host.
As a result I've been monitoring things using ESXTOP while I work on remediation and understand bottlenecks. Before DAVG figures were spiking above 25. So apparently anything above 25 is bad and indicates storage latency.
However today when I was monitoring one of the host I came across these crazy figures for DAVG. And to me those looks bad.
Now I understand I can't just look at DAVG and need to look at bigger picture.
Can someone please look at attached and let me know how bad this is? Are spikes like this in DAVG normal? Or should it ideally not got this high even for a spike?
More about environment below.
VMware ESXi, 7.0.3, 20842708
vCenter 7.0.3 21477706
Dell PowerEdge hardware
Dell Storage Center SCSI storage array
iSCSI protocol being used to connect to storage array using software iSCSI adapter
Backup runs mostly through the day so there is no definite time when backup runs or doesn't run
Also have Zerto which is used for replication purpose
Finally we also have live volumes at storage array layer
We're looking at things like queue depth at iSCSI vmk port, making sure round robin is used for path selection etc. I must mention all firmware and drivers are updated recently already. This has been double checked already including compatibitlity