vSAN1

 View Only
  • 1.  Question re VM availability whilst Host is in maintenance mode

    Posted Jun 15, 2017 03:34 PM

    I have the following configuration for a VM in a cluster …

    In the scenario below ESXA has been placed in maintenance mode but without doing a full data evacuation. My FTT=1 and stripewidth of 1

    ESXA contains Replica(A)

    ESXB contains Replica(B)

    ESXC contains Witness(W)

    If I am restarting the Host containing Replica(A), and a couple of minutes into the restart a disk in ESXB that contains Replica(B) fails, the VMDK will go offline as only the witness is available and the Windows OS would probably blue screen.

    If the disk failure was permanent ie needed replacing – what would happen ? Replica(A) is offline due to a Host being restarted, and obviously you have just lost Replica(B) on a failed disk.  What would happen once Replica(A) was back online after its restart, would the VMDK become available? There would be a Witness(W) present and Replica(A) present .. but Replica(A) would be out of date ?.. so would VSAN allow the disk to be available ?  Or, would the cluster wait for 60 minutes until a resync was performed – if that’s the case im assuming that it would resync Replica(A), as Replica(B) no longer exists – so there could be the potential for data loss ?

    If the disk failure was temporary ie accidentally pulled out, but ESXA came back online before it was put back in, what would happen then ? Would the cluster wait 60 minutes to see if Replica(B) became available ? And once available make the VMDK active and resume IO ?

    Hope that’s clear !!  If you can provide any info that would be great !!



  • 2.  RE: Question re VM availability whilst Host is in maintenance mode
    Best Answer

    Posted Jun 15, 2017 04:54 PM

    Hello tekhie

    Yes the vmdk Object would become unavailable if the majority of it's components were not accessible e.g. DataComponent1 is Absent due to ESXA being in vSAN decom statre during reboot, DataComponent2 is permanently goine with the dead disk, Witness-component is still Active but regardless of whether this was a Witness-component or Data-component still available, 1/3 Active does not satisfy the quorum.

    Provided DataComponent1 was healthy when the was put in MM this will be marked Active and the vmdk Object will become available again, it would then proceed to rebuild the missing components from the failed disk (provided there is space on the remaining disks on that host or other hosts if 4+ node cluster).

    If data is then written to the VM and changes made to the vmdk Object (and thus to DataComponent2) while DataComponent1 was Absent, after DataComponent2 is lost and DataComponent1 comes back unfortunately it will not be able to use this data as it is not current, this is by design for the fairly obvious reason of relying on the most current state of data. (on a side-note though, in most recent versions, 6.2 U3 and 6.6, there *may* be situations like this that VMware GSS can use tools to revert to using an older data-component)

    Edit: Did further research and edited accordingly

    Bob

    -o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-