VMware vSphere

 View Only
Expand all | Collapse all

ESX connectivity issues , could Anti-virus updates be the culprit?

  • 1.  ESX connectivity issues , could Anti-virus updates be the culprit?

    Posted Jan 09, 2014 12:28 PM

    HI ,

    I have two Esxi hosts connected via ISCI to an HP MSA2312 with an extra enclosure.

    I have 21 VM's running over 5 datastores of 1TB each, there is free space available on all of these.

    I have had an issue occur twice now which I am attempting to get to the bottom of.

    Back in October, (on a Saturday evening) I noticed my Veeam backups started failing , when I logged onto Vcentre , I noticed that many of my VM's had become almost unresponsive, hideously slow.

    Looking in the Host events I noticed these errors :

    Successfully restored access to volume

    51a5c107-ef6df18e-c975-68b599cd612c (Rep2) following connectivity

    issues.

    info

    22/12/2013 08:55:09

    10.0.0.211

    Successfully restored access to volume

    4d2b3c89-54366d62-04f2-68b599cd612c (Data1) following connectivity

    issues.

    info

    22/12/2013 08:55:09

    10.0.0.211

    Successfully restored access to volume

    4d2b3cdc-69c88ccc-911f-68b599cd612c (General1) following connectivity

    issues.

    info

    22/12/2013 08:55:09

    10.0.0.211

    Lost access to volume 4d2b3cdc-69c88ccc-911f-68b599cd612c (General1)

    due to connectivity issues. Recovery attempt is in progress and outcome will

    be reported shortly.

    info

    22/12/2013 08:55:09

    10.0.0.211

    Lost access to volume 51a5c107-ef6df18e-c975-68b599cd612c (Rep2)

    due to connectivity issues. Recovery attempt is in progress and outcome will

    be reported shortly.

    info

    22/12/2013 08:55:09

    10.0.0.211

    Lost access to volume 4d2b3c89-54366d62-04f2-68b599cd612c (Data1)

    due to connectivity issues. Recovery attempt is in progress and outcome will

    be reported shortly.

    info

    22/12/2013 08:55:09

    10.0.0.211

    My other host seemed ok , so to bring the unresponsive VM's back to life I had to shut them down (took over an hour for some of them) then vmotion them to the unaffected host.

    After rebooting the affected host , all seemed well again.

    just before Xmas a few weeks back (again on a Saturday night), the same thing happened, but this time on the other host , got the same event errors, this time I couldn't actually shut some of the Vm's down , they just weren't responding.

    a reboot of the affected host again sorted the problem.

    The ONLY thing I have noticed that occured around this time was a Mcafee software update on all machines running the software that popped up saying mcafee had updated it's software and the machine needs a reboot.

    Is it likely that the Virus software update happening on all machines at the same time would cause an issue like this?

    thanks.



  • 2.  RE: ESX connectivity issues , could Anti-virus updates be the culprit?

    Posted Jan 09, 2014 06:39 PM

    I just recovered from these same events that showed up in a client's vSphere 5.5 environment.  Every second for hours on end, those "lost access to volume" events would show on all hosts.  The datastores themselves never went offline, VMs never showed inaccessible, and no events showed in the NetApp logs or UCS logs related to dropping connections.  VMs would also experience the issues you described.  I first noticed that the Snapshots Veeam created were not getting deleted.  Some had grown fairly large, one up to 31 GB and was over a month old, others smaller and newer, but still several GB.  Manually deleting these Snapshots restored those VMs for a bit, but many Veeam jobs weren't deleting the Snapshots.

    Long story short, I learned ESXi 5.5 was not supported with the version of code running on the NetApp, 8.1.2.  I downgraded to ESXi 5.1 and all the above problems went away immediately.  Veeam Snapshots were getting deleted after backup jobs just as they were supposed to, VM performance was like never before (they'd been running this way for several months), and those datastore errors never returned.

    Check for incompatibilities.  You might be surprised.

    Cheers,

    Mike

    http://VirtuallyMikeBrown.com

    https://twitter.com/VirtuallyMikeB

    http://LinkedIn.com/in/michaelbbrown