vSAN1

 View Only
  • 1.  vSAN recovery after vCenter failure

    Posted Jun 18, 2023 04:30 PM

    Recently we had vCenter issues and one of team member decided to rebuild vCenter. In the process vSan it seems also has been lost. We can only see now  UUID paths  with "inaccessible" suffix instead VMs. As I was talking to some people data on hosts drives should not be lost (we did NOT reclaimed or destroyed disks) but we are not sure how to proceed at this point to recover vSAN.

    I would appreciate if somebody has recommendation or steps to suggest how we can reinstate vSAN.

    Currently there is new vCenter with added hosts outside cluster.

    Thanks.

     



  • 2.  RE: vSAN recovery after vCenter failure

    Posted Jun 18, 2023 05:43 PM

    , Ideally you should open a P1 Support Request with VMware global support to assist with this issue.

     

    Should this not be possible, then step one is to validate that all nodes can communicate with one another - please if you can share the following info retrieved from any one node in the cluster (feel free to obfuscate anything necessary) and we can tell you how to check this:

    # esxcli vsan cluster get

    # esxcli vsan cluster unicastagent list

    # esxcli vsan network list

    # esxcfg-vmknic -l



  • 3.  RE: vSAN recovery after vCenter failure

    Posted Jun 19, 2023 11:29 PM

    Thanks TheBobkin,

    After some reading and consultation  I plan to proceed with Moving a vSAN cluster from one vCenter Server to another (2151610) 

    If you see issue with this let me know.

    I just created cluster with vSan service enabled only, storage policy and distributed switch (remain to be created) that was used in old vCenter.

    There are 4 hosts outside new cluster. 

    Had to manually create unicast agents (Configuring vSAN Unicast networking from the command line (2150303))

    Here is requested info:

    esxcli vsan cluster get
    Cluster Information
    Enabled: true
    Current Local Time: 2023-06-19T13:00:07Z
    Local Node UUID: 62ec0382-0cc4-dc68-47b7-9440c9232ca6
    Local Node Type: NORMAL
    Local Node State: MASTER
    Local Node Health State: HEALTHY
    Sub-Cluster Master UUID: 62ec0382-0cc4-dc68-47b7-9440c9232ca6
    Sub-Cluster Backup UUID:
    Sub-Cluster UUID: 52dd8b23-8e02-8940-3b11-a55ffbc236b7
    Sub-Cluster Membership Entry Revision: 4
    Sub-Cluster Member Count: 1
    Sub-Cluster Member UUIDs: 62ec0382-0cc4-dc68-47b7-9440c9232ca6
    Sub-Cluster Member HostNames: esxi-01.test.can
    Sub-Cluster Membership UUID: 66b58c64-2646-80c0-b63e-9440c9232ca5
    Unicast Mode Enabled: true
    Maintenance Mode State: OFF
    Config Generation: c6cdaacf-13ab-45a9-9c01-26c7e4aee807 3 2023-06-16T19:06:34.532
    Mode: REGULAR

    esxcli vsan cluster unicastagent list

    host 1
    NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid
    ------------------------------------ --------- ---------------- ---------- ----- ---------- ----------------------------------------------------------- --------------
    62ec0380-20d4-7f98-1b36-9440c9232ccf 0 true x.x.0.2 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7
    62ec0377-4a0e-a27e-d53a-9440c9232bb8 0 true x.x.0.3 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7
    62ec0390-30b4-82f2-a4ab-9440c9232dde 0 true x.x.0.4 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7

    host2
    NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid
    ------------------------------------ --------- ---------------- ---------- ----- ---------- ----------------------------------------------------------- --------------
    62ec0366-804a-65aa-a56d-9440c9232aa2 0 true x.x.0.1 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7
    62ec0377-4a0e-a27e-d53a-9440c9232bb8 0 true x.x.0.3 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7
    62ec0390-30b4-82f2-a4ab-9440c9232dde 0 true x.x.0.4 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7

    host3
    NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid
    ------------------------------------ --------- ---------------- ---------- ----- ---------- ----------------------------------------------------------- --------------
    62ec0366-804a-65aa-a56d-9440c9232aa2 0 true x.x.0.1 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7
    62ec0380-20d4-7f98-1b36-9440c9232ccf 0 true x.x.0.2 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7
    62ec0390-30b4-82f2-a4ab-9440c9232dde 0 true x.x.0.4 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7

    host4
    NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid
    ------------------------------------ --------- ---------------- ---------- ----- ---------- ----------------------------------------------------------- --------------
    62ec0366-804a-65aa-a56d-9440c9232aa2 0 true x.x.0.1 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7
    62ec0380-20d4-7f98-1b36-9440c9232ccf 0 true x.x.0.2 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7
    62ec0377-4a0e-a27e-d53a-9440c9232bb8 0 true x.x.0.3 12321 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 52dd8b23-8e02-8940-3b11-a55ffbc236b7

    [root@esxi-01:~] esxcli vsan network list
    Interface
    VmkNic Name: vmk2
    IP Protocol: IP
    Interface UUID: 52ab6dcd-df8c-fad7-4619-ec0313ea1558
    Agent Group Multicast Address: 224.2.3.4
    Agent Group IPv6 Multicast Address: ff19::2:3:4
    Agent Group Multicast Port: 23451
    Master Group Multicast Address: 224.1.2.3
    Master Group IPv6 Multicast Address: ff19::1:2:3
    Master Group Multicast Port: 12345
    Host Unicast Channel Bound Port: 12321
    Data-in-Transit Encryption Key Exchange Port: 0
    Multicast TTL: 5
    Traffic Type: vsan

    [root@esxi-01:~] esxcfg-vmknic -l
    Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
    vmk0 Management Network IPv4 x.x.1.101 255.255.248.0 x.x..7.255 00:40:c9:23:2c:a5 1500 65535 true STATIC defaultTcpipStack
    vmk0 Management Network IPv6 xxxxxxxxxxxxxx:fe23:2ca5 64 00:40:c9:23:2c:a5 1500 65535 true STATIC, PREFERRED defaultTcpipStack
    vmk1 3 IPv4 x.x..10.1 255.255.255.0 x.x.10.255 11:50:56:65:16:4f 1500 65535 true STATIC defaultTcpipStack
    vmk1 3 IPv6 xxxxxxxxxxxxxx:fe65:164f 64 11:50:56:65:16:4f 1500 65535 true STATIC, PREFERRED defaultTcpipStack
    vmk2 8 IPv4 x.x.0.1 255.255.255.0 x.x..0.255 11:50:56:65:e9:e8 1500 65535 true STATIC defaultTcpipStack
    vmk2 8 IPv6 xxxxxxxxxxxxxx:fe65:e9e8 64 11:50:56:65:e9:e8 1500 65535 true STATIC, PREFERRED defaultTcpipStack

     

    [root@esxi-03:~] esxcfg-vmknic -l
    Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
    vmk0 Management Network IPv4 x.x.1.103 255.255.248.0 x.x..7.255 00:40:c9:23:2c:cf 1500 65535 true STATIC defaultTcpipStack
    vmk0 Management Network IPv6 xxxxxxxxxxxxxx:fe23:2ccf 64 00:40:c9:23:2c:cf 1500 65535 true STATIC, PREFERRED defaultTcpipStack
    vmk1 1 IPv4 x.x.10.2 255.255.255.0 x.x..10.255 11:50:56:62:92:41 1500 65535 true STATIC defaultTcpipStack
    vmk1 1 IPv6 xxxxxxxxxxxxxx:fe62:9241 64 11:50:56:62:92:41 1500 65535 true STATIC, PREFERRED defaultTcpipStack
    vmk2 10 IPv4 x.x.0.2 255.255.255.0 x.x..0.255 11:50:56:6c:e1:62 1500 65535 true STATIC defaultTcpipStack
    vmk2 10 IPv6 xxxxxxxxxxxxxx:fe6c:e162 64 11:50:56:6c:e1:62 1500 65535 true STATIC, PREFERRED defaultTcpipStack


    [root@esxi-05:~] esxcfg-vmknic -l
    Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
    vmk0 Management Network IPv4 x.x.1.105 255.255.248.0 x.x..7.255 00:40:c9:23:2b:b7 1500 65535 true STATIC defaultTcpipStack
    vmk0 Management Network IPv6 xxxxxxxxxxxxxx:fe23:2bb7 64 00:40:c9:23:2b:b7 1500 65535 true STATIC, PREFERRED defaultTcpipStack
    vmk1 0 IPv4 x.x.10.3 255.255.255.0 x.x..10.255 11:50:56:66:57:75 1500 65535 true STATIC defaultTcpipStack
    vmk1 0 IPv6 xxxxxxxxxxxxxx:fe66:5775 64 11:50:56:66:57:75 1500 65535 true STATIC, PREFERRED defaultTcpipStack
    vmk2 11 IPv4 x.x.0.3 255.255.255.0 x.x.0.255 11:50:56:66:a1:16 1500 65535 true STATIC defaultTcpipStack
    vmk2 11 IPv6 xxxxxxxxxxxxxx:fe66:a116 64 11:50:56:66:a1:16 1500 65535 true STATIC, PREFERRED defaultTcpipStack


    [root@esxi-07:~] esxcfg-vmknic -l
    Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
    vmk0 Management Network IPv4 x.x..1.107 255.255.248.0 x.x..7.255 11:40:c9:23:28:d8 1500 65535 true STATIC defaultTcpipStack
    vmk0 Management Network IPv6 xxxxxxxxxxxxxx:fe23:28d8 64 11:40:c9:23:28:d8 1500 65535 true STATIC, PREFERRED defaultTcpipStack
    vmk1 2 IPv4 x.x.10.4 255.255.255.0 x.x..10.255 11:50:56:6d:c4:5d 1500 65535 true STATIC defaultTcpipStack
    vmk1 2 IPv6 xxxxxxxxxxxxxx:fe6d:c45d 64 11:50:56:6d:c4:5d 1500 65535 true STATIC, PREFERRED defaultTcpipStack
    vmk2 9 IPv4 x.x.0.4 255.255.255.0 x.x..0.255 11:50:56:65:15:1e 1500 65535 true STATIC defaultTcpipStack
    vmk2 9 IPv6 xxxxxxxxxxxxxx:fe65:151e 64 11:50:56:65:15:1e 1500 65535 true STATIC, PREFERRED defaultTcpipStack



  • 4.  RE: vSAN recovery after vCenter failure

    Posted Jun 21, 2023 09:46 AM

     

    "Sub-Cluster Member Count: 1" - the cluster is not formed.

    So, test the vmkping between the nodes here (this is assuming they all have vmk2 set for vSAN traffic but either way, can just run it from 'esxi-01'):

    # for i in `localcli vsan cluster unicastagent list | grep true | awk '{ print $4}'`; do echo "pinging $i 10 times"; echo; vmkping -I vmk2 $i -s 1472-d -c 10 -i 0.2; echo; echo "**************************"; done

     

    The above should should show successful pings to all nodes - if it does not then you have a networking configuration issue, this could be wrong VLAN, switch issue etc. .
    If you cannot figure out and fix this then alternative would be to temporarily configure vsan-traffic on another usable, pingable vmk network and then reconfigure all the unicastagent lists on all nodes to use these new addresses (or get the new vC to do this).

     

    If vmkpings show no issues then validate see communication between the nodes via the following (should show traffic in both directions) - if it does not show traffic for UDP 12321 then cluster won't get formed and you likely have something blocking traffic e.g. firewall (either physical, appliance, or host rules):

    # tcpdump-uw -i vmk2 -n -s0 -t udp port 12321
    # tcpdump-uw -i vmk2 -n -s0 -t tcp port 2233

     



  • 5.  RE: vSAN recovery after vCenter failure

    Posted Jun 26, 2023 10:28 PM

    Actually provided info was outdated as in meantime we re-created unicast agents and member count shows 4 (which is correct number of hosts involved).  vmkping is successful among hosts...

    Current state:

    esxcli vsan debug object health summary get

    reduced-availability-with-no-rebuild 38
    healthy 45

    (above is relevant info as other items are "0")

    We were able to get vSAN object accessible VIA datastore browser 

    At the moment we have issue with applied policies not matching vSAN datastore so cannot power on VMs (cannot create virt mem file) but we can create e.g. folder via datastore browser. No matter what policy we are testing for compatibility nothing comes compatible and we re-created policy from esxcli vsan debug object list | less output.

    We figured we can make backup VIA ofvtool so we do not want to loose opportunity of making .OVA backups. Current stage.

    Once we test .OVA restore in other env we will proceed with migration to new vCenter procedure.

    Will update post with progress/outcome for sake if somebody find this useful.

    Regards,

     



  • 6.  RE: vSAN recovery after vCenter failure

    Posted Jul 26, 2023 01:00 PM

    Problem was successfully resolved  and for sake if somebody gets in same boat here is summary:

    Summary:

    • New vCenter deployed
    • All hosts were added to new vCenter (but not in cluster)
    • After I added unicast agents to each host , vSAN cluster formed (with warning that host belong to vSAN outside cluster)
    • There were some object healthy and some reduced-availability-with-no-rebuild
    • I was able to see data in vSAN datastore browser
    • ESXi host were still able to see vDS from previous vCenter but it was not manageable under new vCenter
    • Could not power on VMs due to the storage policy mismatch (VM startup could not create swap file)
    • I was able to use ovf-tool to export VMs as .OVA just in case
    • Used vSAN -> vCenter migration article
    • Created cluster only with vSAN service enabled
    • Created new vDS to best match lost vDS -> migrated all hosts
    • Recreated storage policy found in output of existing vSAN
    • Hosts were already in vCenter, vmkping works among them
    • Disabled updates from vCenter on each host
    • Start drag and dropping hosts to new cluster and monitor cluster for # of members which did not change
    • Let it sync for 30 min
    • Applied storage policy created earlier
    • approx. 30-60 min later object resynced and all become healthy