vSAN1

 View Only
  • 1.  2-Host ROBO Cluster with vSAN Cluster Partition error

    Posted Mar 20, 2018 07:00 PM

    Been wrestling with this for weeks now.

    I have a vSAN 6.6 2-host cluster with external witness.

    When I run the configuration check, everything shows green except for the "vSAN cluster partition error" as seen below:

    I see that the witness host is running on partition 1 and my two vsan hosts are running on partition 2.  Is this the cause of the failure?

    I cannot seem to troubleshoot this successfully.  Any help would be greatly appreciated.  All other tests pass in the Configuration Test.



  • 2.  RE: 2-Host ROBO Cluster with vSAN Cluster Partition error

    Posted Mar 20, 2018 07:30 PM

    Hi GatorMania93

    Did you configure the static routes from your 2-nodes ROBO data site to the witness site using esxcfg-route -a commands ?

    Regards,



  • 3.  RE: 2-Host ROBO Cluster with vSAN Cluster Partition error

    Posted Mar 20, 2018 07:51 PM

    Currently I have all components running in the same VLAN.



  • 4.  RE: 2-Host ROBO Cluster with vSAN Cluster Partition error

    Posted Mar 20, 2018 08:03 PM

    You should have stretched L2 network for VSAN network in your 2-nodes ROBO site, and another VLAN for VSAN traffic in the witness site.

    Static routes should be configured between VSAN vmkernel ports in the ROBO site and the VSAN vmkernel port in the witness site.

    This is a network requirement for VSAN ROBO configuration. Please see below:

    Network Design for Stretched Clusters

    Regards,



  • 5.  RE: 2-Host ROBO Cluster with vSAN Cluster Partition error

    Posted Mar 20, 2018 08:05 PM

    Hello GatorMania93​,

    Welcome to Communities! Some useful info on participating here:

    https://communities.vmware.com/docs/DOC-12286

    "Been wrestling with this for weeks now."

    Sorry to hear that, what have you tried/checked so far?

    "I see that the witness host is running on partition 1 and my two vsan hosts are running on partition 2.  Is this the cause of the failure?"

    Cluster members need to be able to communicate with one another and should never be network partitioned.

    This cluster is on 6.6 so going to assume Unicast.

    Check the cluster config on all three node to ensure they are all *trying* to be part of the same cluster and all have Unicast mode enabled:true :

    # esxcli vsan cluster get

    Check the unicastagent lists on each node:

    # esxcli vsan cluster unicastagent list

    Each node should have the 2 other nodes in their list (don't worry if witness shows as 0000 for UUID just look at the IP, these should state if they have Unicast enabled)If these are all good then check the network connectivity from the vSAN-enabled vmk on each host to the IP of the vmk on the others:

    Get the IP of the vSAN interface on each node:

    #esxcfg-vmknic -l

    Confirm how this is configured (in case you have multiple or Witness Traffic Seperation in use):

    # esxcli vsan network list

    Ping the other interfaces from data-nodes to Witness:

    # vmkping -I vmk# <Other_nodes_vsan_IP>

    Check this BOTH directions.

    If this fails then start looking at your network configuration and gateways, other issues such as busted vmk interfaces can rarely occur so remove and reconfigure this on Witness might be an approach.

    FYI Witness appliances are very simple to redeploy in 6.6 and there is an in-built check for basic network configuration etc. when adding this to a node.

    Bob



  • 6.  RE: 2-Host ROBO Cluster with vSAN Cluster Partition error

    Posted Mar 20, 2018 09:14 PM

    Thanks Bob.

    I simply enabled vSAN on my cluster, was able to set up the storage successfully, passed all of the checks, then when I enabled stretched cluster and set up my two fault domains, I was able to add the witness with no errors either.  So, I'm not sure why this is happening.  Here's a few screen shots:

    From Witness:

    From Host 1:

    From Host 2:

    I've tried adding 3 different witness hosts, all of which were installed from scratch.

    I'll get to work on trying some of your troubleshooting tips



  • 7.  RE: 2-Host ROBO Cluster with vSAN Cluster Partition error

    Posted Mar 20, 2018 09:26 PM

    Hello GatorMania93​,

    Are you installing these Witnesses with the same ESXi build as the data-nodes?

    How are your interfaces configured?

    Are these all on the same L2 network in same subnets?

    If there is no communication between nodes over these interfaces then try untagging the existing vsan and/or witness interfaces, try creating a new interface on the Witness in the same subnet as the vSAN-enabled vmk on the data-nodes and just tag it for vsan traffic not witness (-T=vsan) and see can they communicate.

    Edit: Yes I am fully aware this is not how a 2-node DirectConnect should ideally be configured but testing where the issue is here.

    Bob



  • 8.  RE: 2-Host ROBO Cluster with vSAN Cluster Partition error

    Posted Mar 20, 2018 09:35 PM

    Looks like that might be my issue, Bob.  I can ping the VMKernel NICs between hosts, but cannot ping the Witness from either host (or vice versa).

    I should also mention that the VMKernel NICs are direct connected between both hosts.  However, I thought that route could be substantiated by running

    esxcli vsan network ipv4 add -i vmk0 -T=witness   on each host, whereas vmk0 is my management interface.



  • 9.  RE: 2-Host ROBO Cluster with vSAN Cluster Partition error
    Best Answer

    Posted Mar 21, 2018 04:39 PM

    I found what I did wrong here.

    TheBobkin

    When adding the command esxcli vsan network ipv4 add -i vmk0 -T=witness....   to each of my two remote hosts, for some reason I also added this to my witness host, who's witness traffic runs over vmk1.     Ran esxcli vsan network ipv4 add -i vmk1 -T=witness on my witness host which resolved the error.

    Thanks Bob for pointing me in the right direction with those ping tests.