VMware vSphere

 View Only

  • 1.  Long delay between host not responding and moving VMs

    Posted Nov 06, 2009 06:22 PM

    We had an issue with one of our esx4 servers last night where it appeared to panic and went unresponsive. According to the cluster even log, it took exactly 30 minutes (to the second infact) from the time that the host was marked as not responding to when HA kicked in and moved the VMs to another host in our cluster. What settings should I look into why it took so long for this to happen? And is there any significance to the round number of minutes? We have our VM monitoring sensitivity set a little low, but that should only account for 2-3 minutes.

    HA settings for the cluster as follows: enabled host monitoring, allow VM to power on if it violates availability, high restart priority for all VMs, enabled VM monitoring, VM monitoring sensitivity is low

    Thanks for any ideas you might have



  • 2.  RE: Long delay between host not responding and moving VMs

    Posted Nov 06, 2009 07:59 PM

    Just a thought, is it possible that the Host was still responding to a ping and therefore the HA nodes could still communicate for a period of time after the Host became unresponsive in VC but prior to the kernel panic? I've seen this happen. The guest could be up and running just fine, so the VM monitoring wouldn't necessarily reboot them, even though they aren't responding to ping. At the same time the host is still on the network responding to the other HA nodes, so HA isn't going to doing anything either.



  • 3.  RE: Long delay between host not responding and moving VMs

    Posted Nov 06, 2009 08:17 PM

    Unfortunately the issue happened off hours and by the time we had somebody start looking into it, the ESX server had rebooted and the VMs were back online on different hosts.

    If I read this right, you're implying that the host could start to exhibit problems but so long as the HA agent is sending/receiving information it won't do anything? If thats the case, is there anything that can be done to prevent this type of situation like moving VMs when the VC detects a host as disconnected? I can't seem to find good documentation on HA advanced options that might help.



  • 4.  RE: Long delay between host not responding and moving VMs

    Posted Nov 06, 2009 08:32 PM

    Here is a good link for HA advanced options. http://www.yellow-bricks.com/2008/10/06/update-ha-advanced-options-2/ . Did you check the HA logs for any errors? (/var/log/vmware/aam).



  • 5.  RE: Long delay between host not responding and moving VMs

    Posted Nov 06, 2009 08:33 PM

    remember vCenter is only part of HA for the initial configuation. After that HA is no longer dependant on vCenter. As stated it's heartbeat, and as long as the service console is on-line there will be no HA event triggered. A host not responding or disconnected in vCenter is not an indication of an HA event.

    You can check your tasks and events tab in vCenter which may give you some information of an HA event, but I would check the ESX Hosts first, which will give you more information.

    HA agent logs: /var/log/vmware/aam

    Configuration files: /etc/opt/vmware/aam

    Also, here's a great blog, if you haven't seen it

    http://www.yellow-bricks.com/vmware-high-availability-deepdiv/



  • 6.  RE: Long delay between host not responding and moving VMs

    Posted Nov 06, 2009 08:37 PM

    That's correct. HA has come a long way in the past couple years but unless I missed something(which is possible), if the host is still responding on the network HA won't do anything even if it has stopped responding in Virtual Center. In vSphere you might be able to use the built in alarming to kick off a Orchestrator script to relocate the VM's in the case a host becomes unresponsive, but that might get a little tricky as well. It's more likely that would cause more problems since a host might just go unresponsive due to a heavy work load and then come back in a few seconds or minutes.

    The HA documentation is a bit fuzzy. I know there are few good VMworld slide decks out there that have some good information on the advanced settings. As well as the link that chilow posted. I also remember recently seeing a white paper that listed all the different advanced options as well as this... http://kb.vmware.com/kb/1006421

    Brian