vSAN1

 View Only
  • 1.  VSAN Issues

    Posted Jun 27, 2018 08:27 PM

    Good Afternoon All,

    As of yesterday, I am experiencing some issues with my VSAN. An unexpected power outage temporarily created an issue where one of the hosts in my stretched datacenter was knocked offline. Because of this and for whatever reason, I was unable to regain network connectivity until I restored the network settings from the ESXi splash screen. Now I can remote in, but of course lost all VMkernel, NICs, etc. This would be no issue to rebuild except for that my VCenter server was hosted on this particular host and this host is no longer connected to the VSAN. After rebuilding each of the VMkernels and the NICs, the host will not re-enter the VSAN and I am not able to power on VCenter. I have attempted to register the VCenter server on another host, but to no avail. For whatever reason, upon power up the newly registered VCenter server has lost all configurations, to include IP address, and I am no longer able to remote in through a browser.

    I have verified network connectivity to each of the IP addresses for management, VSAN and VMotion to each of the hosts in my datacenter (3 hosts, 1 witness). Unfortunately, the one host in question is the only one that cannot access the VSAN and to make matters worse, it was the host where VCenter VM is located.

    What can I do to solve this issue? I have exhausted my troubleshooting steps and would incredibly appreciate any guidance. Thank you very much.

    Don



  • 2.  RE: VSAN Issues

    Posted Jun 27, 2018 11:54 PM

    Hello Don,

    "For whatever reason, upon power up the newly registered VCenter server has lost all configurations, to include IP address"

    vCSA or Windows-based? Lost *all* configurations or just no network connectivity?

    Are you using vDS here? There can be issues with vCenter not having it's previously assigned port available - there are workarounds to this such as 'borrowing' a NIC on a host to make a vSS and using this (or if you want to get esoteric about it: stealing the port of another VM is technically feasible).

    "(3 hosts, 1 witness)"

    As in a 2-node + 1 witness or some form of lop-sided stretched cluster?

    Is all the vSAN data currently healthy/accessible?

    Run on any clustered node:

    # cmmds-tool find -f python | grep CONFIG_STATUS -B 4 -A 6 | grep 'uuid\|content' | grep -o 'state\\\":\ [0-9]*' | sort | uniq -c

    "Unfortunately, the one host in question is the only one that cannot access the VSAN and to make matters worse, it was the host where VCenter VM is located."

    Do you mean to say one node cannot join the cluster? Test the vSAN netowrk connection inter-node:

    # vmkping -I vmk<vSANenabled> <IPofOtherNodesvSANvmk>

    If this functions then you need to start looking at the ports that are used for inter-node communication which depends on what build you are using (5.5-6.5 use Multicast - 12345, 23451 (UDP),  6.6-6.7 uses Unicast mode on port 12321 (UDP)) and/or the unicastagent lists if on 6.6/6.7 - you did apparently rework the IPs of the hosts so if on 6.6/6.7 do check the lists as they won't have changes pushed down by vCenter if it is down.

    https://blogs.vmware.com/vsphere/2014/09/virtual-san-networking-guidelines-multicast.html

    https://kb.vmware.com/s/article/2150303

    "What can I do to solve this issue? I have exhausted my troubleshooting steps and would incredibly appreciate any guidance. Thank you very much."

    If you have a support contract and/or this is a production workload I would advise opening a Support Request with us at VMware GSS.

    Bob



  • 3.  RE: VSAN Issues

    Posted Jun 28, 2018 12:44 AM

    Apologies for any issues understanding my babble. This is relatively new technology to me and I am learning as much as possible.

    The vCenter Server is VCSA and when i say lost all configurations, I mean network configurations from what I can tell. Because I have no way of logging in to check the VM, I am left to guess how much has been lost. Because it was originally hosted on the host with all the problems, I have chosen to register to a new host both by copy and by move. Neither has produced the IP configurations that were once associated with the VCSA VM. I am using vDS, but the vDS is not present on the server that is experiencing all these issues. There is a vDS on all other hosts.

    There was a 2 node and 1 witness setup, but for testing purposes, we had added a third host to the cluster. I cannot remember how that worked with the fault domains but it was operating just fine for what we needed it to. The vSAN data is completely fine and accessible, only from the other 2 hosts, however, not from the one in question. For the one host, I can select the vSAN Datastore and browse, but no folders or files appear. Also, all VMs registered to the host show "invalid".

    Lastly, I have no way of determining whether this host can rejoin the cluster because I cannot log into VCenter. This is the one roadblock I continue to run into. I would be able to migrate VMs or re-add hosts to the cluster, but VCenter is currently inaccessible because of the one host's issues.

    Again, thank you very much for the help. I will certainly submit a ticket if that can help me alleviate the issue.