Hi,
just wanted to share something that might help someone get a hold of HA intricacies...
I've always liked to analyze and think about distributed algos, and HA is a nice piece.
Cluster in general have this dilemma about staying alive or dying when a partition occurs. Being in two separate worlds,
it's impossible to decide if the other side is dead or alive... so vSphere 5 way of handling it is interesting!
Being able to have more than one master seems to be a good option, given that you know that management view can be wrong for as long as your partition holds.
That's better than having the VMs down.
But I have had a hard time understanding isolation. I was under the (wrong) impression that your isolation response was tied
to how the solution would behave at both ends, i.e., at the slave being isolated and at the master.
(It doesn't really matter if the master was isolated, after being so, another master will be brought up)
My idea was that if isolation response was bring down, then the master would bring back up.
And converselly, if isolation response was keep up, then the master would not mess with this VM.
That is not so. Isolation only modifies the isolated host response. The (new or acting) master will try to restart the protected VM no matter what.
And the VM lock will protect the VM from running at both places. This is even so if the isolated has means to tell the master that it is indeed isolated
and "responsible" for the VM. I had not expected that...
-Carlos
P.S.
Another nifty thing I found why playing with this:
Even if you don't set a vmnic as "management" it will answer management traffic. In fact, you can manage an isolated host with vSphere client
provided you sit in the same segment of the alternative NIC. Cool.
I was able to have an NFS based datastore alive (second NIC) and an iSCSI datastore that went down (first NIC) on a workstation based VDC.