VMware vSphere

 View Only
Expand all | Collapse all

power outages, some VMs did not restart, any ideas why??

  • 1.  power outages, some VMs did not restart, any ideas why??

    Posted Jul 06, 2010 08:30 PM

    We recently had a power outage, during which all our UPSes completely wore out. :smileyhappy:

    All host servers finally got restarted, and most of the VMs came back on -- those which had had their restart priority set to High within HA DRS settings within the HA DRS cluster (not per-host).

    A few which had only had had the cluster 'default' of Medium did not restart and had to be restarted from the vCenter client.

    These VMs have since had their priority changed to 'High.'

    What, if anything, would prevent VMs from restarting when power is restored and the SAN and hosts have restarted??

    How can I prevent them from not restarting again??

    Does vMotioning VMs mess up anything of the VMs' restart priorities etc.??

    Thank you, Tom



  • 2.  RE: power outages, some VMs did not restart, any ideas why??

    Posted Jul 06, 2010 08:36 PM

    you probably had an issue where you violated availability constraints, and the option is to leave powered off. This would be my initial thought.

    Can you check your admission control settings on the cluster? Right Click on the clusteredit settingsVMware HA--Admission control.



  • 3.  RE: power outages, some VMs did not restart, any ideas why??

    Posted Jul 06, 2010 08:43 PM

    That could be it...it was set to 'Enable,' with 25% of cluster resources reserved for failover spare capacity as opposed to one (1) host.

    I hadn't yet figured out a good percentage for only 3 hosts where during normal circumstances, nothing is even close to being overcommitted.

    I could change it to 'Disabled'...??? or use one (1) host instead of 25% -- what do you suggest?? with 3 hosts and current failover capacity for cpu and memory presently in the high 90's??

    One of the power outage issues is that one host lost one of its two power supplies (since replaced) and that could have affected vCenter's calculation of available resources etc.

    Thank you, Tom

    P.S. nice avatar!! :smileyhappy: I should put one...



  • 4.  RE: power outages, some VMs did not restart, any ideas why??

    Posted Jul 06, 2010 08:50 PM

    For us, we disable admission control, but that's for us.

    If you would have disabled it, you would have had all your VM's start, in this case. I can't make that call for you though.



  • 5.  RE: power outages, some VMs did not restart, any ideas why??

    Posted Jul 06, 2010 09:01 PM

    I think for now I will disable admission control...easier on the non-VMware admins.

    Do any caveats exist about doing this??

    Potential dangers or pitfalls etc.??

    Thank you, Tom



  • 6.  RE: power outages, some VMs did not restart, any ideas why??

    Posted Jul 06, 2010 09:10 PM

    make sure you have enough ports on your vSwitches(Virtual Machine Port Group). With admission control disabled, you could be put into a situation where your VM's don't run so well because of lack of resources available to them. However, if you lose an entire cluster of ESX Hosts with the exception of 1 or 2, you have bigger problems then just HA.

    We have 8 nodes per cluster and we can lose 2 of the 8 and still have enough resources to run our guests quite well.



  • 7.  RE: power outages, some VMs did not restart, any ideas why??

    Posted Jul 06, 2010 09:23 PM

    If you're referring to a vSwitch upon which production VMs run (not

    sc/vmkernel or backend iSCSI), I set this particular vSwitch to 120

    ports, with the backend iSCSI vSwitch to 56 ports and the SC/vmKernel

    vSwitch to its default 24...

    This cluster is 3 hosts, I think maybe one of the issues was that 1 VM

    had 5 GB allocated to it (since moved down to 3) and this might have

    caused issues, though this VM is not on the "High" list of VMs to be

    restarted.

    2 hosts can easily run all our important VMs, for sure, I may go back to

    the 1-host setting instead of 25% since this is actually more resources

    than 25%...

    It was an unusual power outage, usually we have never had HA etc. issues

    with power outages. This week's weather will remain hot/muggy etc. for a

    few more days, we can have power outages in this weather, National Grid

    is not the best of power companies...

    Thank you, Tom



  • 8.  RE: power outages, some VMs did not restart, any ideas why??

    Posted Jul 06, 2010 09:28 PM

    If you're referring to a vSwitch upon which production VMs run (not

    sc/vmkernel or backend iSCSI), I set this particular vSwitch to 120

    ports

    That is the one indeed. Usually vSwitch1.

    There are some advanced options you can set in HA and leave admission control enabled, but to me, it's tricking the configured failover capacity HA sets, even though it it quite conservative. However, you must be running vCenter/ESX4

    You can read more below

    http://www.yellow-bricks.com/vmware-high-availability-deepdiv/

    http://www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_availability.pdf

    Good Luck.



  • 9.  RE: power outages, some VMs did not restart, any ideas why??

    Posted Jul 06, 2010 09:33 PM

    Thank you...I've read the HA Deepdive several times and am still working

    to understand it all.

    For the short term I'm returning to the 1-host default though I already

    set das.failuredetectiontime to 60000 and Host Isolation to Shutdown and

    all hosts are on 4.x U2 so we should not have any split brain issues.

    Thank you, Tom



  • 10.  RE: power outages, some VMs did not restart, any ideas why??

    Posted Jul 06, 2010 08:46 PM

    I recall having some issues with machine auto-restart priorities and whether or not to actually power them on automatically at all related to vMotioning VMs around in a 3 node cluster test environment. I was thinking it was something I did (this was way back) until you mentioned it here.

    The problem I had was centered on leaving only one ESX host running 24hrs a day and turning the other two off when not in use.

    The remaining ESX hosted the AD controller and vCenter VMs, as well as a few others. When I went to shutdown they would automatically move to the last running host. I do recall times when my VM start priorities disappeared once they had been migrated to a different ESX. Since I was powering hosts on and off each day I just thought it was related to an "unstable" environment.

    I didn't leave the same machine running each time, it could be any of the 3. I had DRS and HA running with fully automatic power and vMotion(ing) enabled so it was random as to which server at the end of the day actually had control of the AD and/or vCenter.

    That's my experience...

    Hugh



  • 11.  RE: power outages, some VMs did not restart, any ideas why??

    Posted Jul 06, 2010 08:48 PM

    I have read it's best practice to always keep one's vCenter (VM) always on the same vSphere host as well as any AD servers (VM), which I had done.

    P'raps I should use 'Disable' for admission control...???

    Thank you, Tom