vSAN1

 View Only
  • 1.  vSAN Streched cluster Preferred Fault Domain set to Null

    Posted Jun 03, 2020 08:31 PM

    Hi, we have been dealing with some issues in out vSAN Streched cluster when some VMs disappeared from inventory and from vSAN Datastore. When we took a deep look, the used space in the datastore didn't change. weird.

    After some troubleshooting we manage to see the cluster Preferred Fault Domain was set to Null. We resolved the issue by disconnecting cluster Master vSAN VMKernel and rebooting al the host one by one.

    My question is: Why would this happen? What are possible causes for that to happen?



  • 2.  RE: vSAN Streched cluster Preferred Fault Domain set to Null

    Posted Jun 03, 2020 08:47 PM

    Hello Lucas,

    "some issues in out vSAN Streched cluster when some VMs disappeared from inventory and from vSAN Datastore."

    So my first question would be what exactly do you mean by "disappeared", do you mean unavailable/inaccessible or permanently gone? - the only legitimate times I have seen such things (e.g. excluding something or someone deleting stuff) is when people have data Objects stored as FTT=0 (almost exclusively unknowingly) and lose a Disk/Disk-Group, aside from this it should be clear what happened to the data.

    I would strongly advise to open a Support Request with vSAN GSS if this is not well understood already.

    "Why would this happen? What are possible causes for that to happen?"

    I would start with checking do you have leftover stale CMMDS entries from replaced Witness(es), from Master/Backup node (*should* be the same) this can be checked with:

    # cmmds-tool find -t PREFERRED_FAULT_DOMAIN

    # cmmds-tool find -t HOSTNAME

    Other than that, potentially there was some other issue with CMMDS, I can only really think of one (which is actually a long knock-on effect from issues causing /scratch to be unavailable) as issues in this area are exceedingly rare (which they SHOULD be as this service places a critical role).

    Bob



  • 3.  RE: vSAN Streched cluster Preferred Fault Domain set to Null

    Posted Jun 03, 2020 08:54 PM

    Thanks Bob. What i mean by disappear is that the files were no longer mapped in vsan file system until the master was rebooted.

    I agree with you, really strange scenario. VMware is still looking for a root cause



  • 4.  RE: vSAN Streched cluster Preferred Fault Domain set to Null

    Posted Jun 09, 2020 10:16 AM

    which version of vSAN are you running?



  • 5.  RE: vSAN Streched cluster Preferred Fault Domain set to Null

    Posted Jun 09, 2020 01:25 PM

    Hi, Duncan. Really enjoyed vSAN 6.7 deep dive.

    This is a cluster deployed with VCF 3.8.1, now upgraded to 3.9.1.

    vSAN version: 6.7

    ESXi builds: 15160138

    vCenter + external PSC build: 15976728



  • 6.  RE: vSAN Streched cluster Preferred Fault Domain set to Null

    Posted Jun 10, 2020 07:06 AM

    Are you running any scripts against vSAN APIs? I have seen issues before where people were running scripts against our APIs and would populate fields which did not need to be populated. Other than that, this doesn't ring a bell unfortunately.

    Did you file an SR?



  • 7.  RE: vSAN Streched cluster Preferred Fault Domain set to Null

    Posted Jun 10, 2020 01:29 PM

    Duncan, we opened a SR and it was solved by disabling vSAN traffic for the master node VMK and rebooting the hosts one by one. I'm not aware of scripts running but I will definitely take a deeper look into that.

    Thanks!