ESXi

 View Only
  • 1.  Geographically dispersed HA cluster

    Posted Oct 01, 2011 06:27 PM

    Currently we're investigating a DR solution for our site. So the first thing we've looked at was a solution like SRM. Mirror your san to the other location and let SRM handle the fail over of the VM's. In this situation we have 2 different subnets, so SRM is needed to automate the re-ip of the VM's

    But with the option of a LAN extension we've found a couple of extra options.

    One of the options we like the most is a split cluster like the picture below. The reason we liked it most is that we can split our DTAP environment. DTA on the second site and P on the fist site

    http://www.van-lieshout.com/wp-content/uploads/2009/11/111509_1554_Geographica5.jpg

    Due to line capacity we can only replicate a-sync so there is a little delay. So is it possible to use this design and what are the extra design considerations we have to take care of?



  • 2.  RE: Geographically dispersed HA cluster

    Posted Oct 01, 2011 09:02 PM

    This is possible with a few different storage solutions out there (EMC VPLEX, HP LeftHand/P4000, possibly Compellent LiveVolume), but they all need syncronous-level response times and throughput....



  • 3.  RE: Geographically dispersed HA cluster

    Posted Oct 02, 2011 03:28 AM

    Whocarez wrote:

    Due to line capacity we can only replicate a-sync so there is a little delay. So is it possible to use this design and what are the extra design considerations we have to take care of?

    This is the clincher.

    If you don't have the line capacity, resiliency,  and sufficiently low latency for synchronous replication,

    then your sites also don't have sufficient  characteristics  to form a proper VMware HA cluster across sites either.

    Because synchronous mirroring is basically a requirement in this scenario.

    Even if you do have synchronous mirroring and LAN extension;  HA alone is not a DR strategy.

    There are other challenging issues in designing such a scenario, such as how to make sure a loss of one link between

    datacenters doesn't result in HA failover causing a "split brain" situation,  with respect to both network and VMs.



  • 4.  RE: Geographically dispersed HA cluster

    Posted Oct 02, 2011 09:37 AM

    If i'm correcty understanding

    I have the following requierments to be made first

    If i want an active/active geo dispersed Cluster whit HA enabled i should have enough line capacity for real time sync.

    In the current situation, I can split the cluster. Use HA on each site but not over both sites. (affinity settings).

    If the primairy site failes I should be able to manualy recover the vm's on the secondairy knowing that there is data loss.

    The split brain problem i'm awair of and could be solved with vmware heartbeat if i'm correct.



  • 5.  RE: Geographically dispersed HA cluster

    Posted Oct 02, 2011 01:36 PM

    Whocarez wrote:

    In the current situation, I can split the cluster. Use HA on each site but not over both sites. (affinity settings).

    If the primairy site failes I should be able to manualy recover the vm's on the secondairy knowing that there is data loss.

    The HA agent in ESXi5 utilizes a heartbeat on the datastores themselves. If your VMFS datastore is synchronously mirrored,   so that a write can be made on either side,  then it can possibly be configured to  "look like"  the same datastore to all the hosts.

    So that any write operation to either side is immediately propagated  to the opposite side, before the write is committed.

    In fact... you really need not only synchronous mirroring of the files, but also  mirroring of file locks,

    so that a VM cannot accidentally be started on both sides simultaneously,  either by HA, or by a human.

    However, if the datastore is only asynchronously mirrored,  the datastore will  "look different"  to different hosts.

    Different hosts will see different filesystem contents,  and for HA purposes, the hosts care about that.

    The split brain problem i'm awair of and could be solved with vmware heartbeat if i'm correct.

    The VMware vCenter Heartbeat product is a product for backing up vCenter;  it doesn't help with VMware HA.

    Other than forcing failover to be a manual operation...  (in that case, why try to use HA?)

    The only real ways to really prevent split brain in a HA scenario are to have assuredly redundant communication links,

    that you can guarantee will never fail together,  or have  a "third site",  with some kind of resource,  so you have

    an odd number of "clustered systems"  (sites that can fail)  with independent communication paths.



  • 6.  RE: Geographically dispersed HA cluster

    Posted Oct 02, 2011 06:40 PM

    The only real ways to really prevent split brain in a HA scenario are to have assuredly redundant communication links,

    that you can guarantee will never fail together,  or have  a "third site",  with some kind of resource,  so you have

    an odd number of "clustered systems"  (sites that can fail)  with independent communication paths.

    Indeed, this is how VPLEX solves the problem, with a 'witness' machine (or VM) running at some third site.