vSAN1

 View Only
  • 1.  VSAN failure after power failure

    Posted Jul 17, 2024 12:24 PM

    Hello,

    I have had a VSAN failure and I'm almost certain I know why, but I'm not sure how to fix it.

    After a power failure, all ESXi (x3 esxi8 hosts) hosts crashed. My vCenter resided on my vSAN, so after power was restored I powered on all three nodes, however vCenter did not come back up and none of the hosts to browse the vSAN datastore.

    CAUSE:
    Laziness. I had configured vmkernels for each host to use VSAN, however I neglected to reserve the IPs for these hosts. All were configured to use DHCP, so after crashing the hosts did not come back with the correct IPs. This seems simple enough to fix, however I can't seem to link each host to the IP it was using prior to the crash. Any ideas how I can figure out what IPs the hosts should be using? Here are the results of these commands run on each host:

    esxcli vsan cluster unicastagent list
    esxcli vsan network list
    esxcfg-vmknic -l


    Some things which jump out at me, the result of esxcli vsan cluster unicastagent list is not consistent across all three nodes, as if one node expects a different IP address than the other ones. I'm still confident that I can fix this but re-assigning each not the VSAN IP address it was using prior to the crash, but how do I figure out what that was?



    Blue

    [root@blue:~] esxcli vsan cluster unicastagent list
    NodeUuid                              IsWitness  Supports Unicast  IP Address      Port  Iface Name  Cert Thumbprint                                                                                  SubClusterUuid
    ------------------------------------  ---------  ----------------  -------------  -----  ----------  -----------------------------------------------------------------------------------------------  --------------
    64f8e301-42b9-9a23-96dc-48210b3f450b          0              true  192.168.0.175  12321              7E:94:91:29:48:B3:B2:78:C2:22:D4:60:5C:CE:1A:3D:39:B8:8E:8B:21:C5:87:22:C6:A7:CF:18:2F:59:85:7B  523fee61-ca75-9436-d5b8-0e248d50647a
    64f8ed26-acd7-5c09-3e1e-48210b506c23          0              true  192.168.0.172  12321              47:FC:90:C8:EE:DF:AA:36:31:C1:3A:34:CF:98:D8:23:F1:2D:00:9F:F5:31:73:ED:41:C8:B0:3F:63:61:47:D4  523fee61-ca75-9436-d5b8-0e248d50647a
    6087f705-b8d8-e534-1b48-000acd39de18          0              true  192.168.0.189  12321              45:8A:19:4B:BF:69:C4:BC:C0:C8:C3:E8:0E:E2:CF:CA:86:25:99:25:BD:05:71:B9:D2:08:E7:11:36:D6:FA:36  523fee61-ca75-9436-d5b8-0e248d50647a
    [root@blue:~] esxcli vsan network list
    Interface
       VmkNic Name: vmk1
       IP Protocol: IP
       Interface UUID: 5267eb50-a169-1d2d-c4a5-34767ee13233
       Agent Group Multicast Address: 224.2.3.4
       Agent Group IPv6 Multicast Address: ff19::2:3:4
       Agent Group Multicast Port: 23451
       Master Group Multicast Address: 224.1.2.3
       Master Group IPv6 Multicast Address: ff19::1:2:3
       Master Group Multicast Port: 12345
       Host Unicast Channel Bound Port: 12321
       Data-in-Transit Encryption Key Exchange Port: 0
       Multicast TTL: 5
       Traffic Type: vsan
    [root@blue:~] esxcfg-vmknic -l
    Interface  Port Group/DVPort/Opaque Network        IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type                NetStack
    vmk0       Management Network                      IPv4      192.168.0.118                           255.255.255.0   192.168.0.255   48:21:0b:50:5b:bb 1500    65535     true    DHCP                defaultTcpipStack
    vmk0       Management Network                      IPv6      fe80::4a21:bff:fe50:5bbb                64                              48:21:0b:50:5b:bb 1500    65535     true    STATIC, PREFERRED   defaultTcpipStack
    vmk1       VSAN                                    IPv4      192.168.0.134                           255.255.255.0   192.168.0.255   00:50:56:64:c7:73 1500    65535     true    DHCP                defaultTcpipStack
    vmk1       VSAN                                    IPv6      fe80::250:56ff:fe64:c773                64                              00:50:56:64:c7:73 1500    65535     true    STATIC, PREFERRED   defaultTcpipStack
    [root@blue:~]

    Grey

    [root@grey:~] esxcli vsan cluster unicastagent list
    NodeUuid                              IsWitness  Supports Unicast  IP Address      Port  Iface Name  Cert Thumbprint                                                                                  SubClusterUuid
    ------------------------------------  ---------  ----------------  -------------  -----  ----------  -----------------------------------------------------------------------------------------------  --------------
    64f8e854-6bb4-5163-ba94-48210b505bbb          0              true  192.168.0.134  12321              3B:A1:8B:01:8B:AC:5E:F8:C1:06:4B:56:C0:FC:F0:54:A1:75:1B:C3:E0:68:AD:F2:2B:8E:68:77:76:52:27:F3  523fee61-ca75-9436-d5b8-0e248d50647a
    64f8ed26-acd7-5c09-3e1e-48210b506c23          0              true  192.168.0.172  12321              47:FC:90:C8:EE:DF:AA:36:31:C1:3A:34:CF:98:D8:23:F1:2D:00:9F:F5:31:73:ED:41:C8:B0:3F:63:61:47:D4  523fee61-ca75-9436-d5b8-0e248d50647a
    6087f705-b8d8-e534-1b48-000acd39de18          0              true  192.168.0.189  12321              45:8A:19:4B:BF:69:C4:BC:C0:C8:C3:E8:0E:E2:CF:CA:86:25:99:25:BD:05:71:B9:D2:08:E7:11:36:D6:FA:36  523fee61-ca75-9436-d5b8-0e248d50647a
    [root@grey:~] esxcli vsan network list
    Interface
       VmkNic Name: vmk1
       IP Protocol: IP
       Interface UUID: 52116264-919d-ac76-1b10-6ff98017bfca
       Agent Group Multicast Address: 224.2.3.4
       Agent Group IPv6 Multicast Address: ff19::2:3:4
       Agent Group Multicast Port: 23451
       Master Group Multicast Address: 224.1.2.3
       Master Group IPv6 Multicast Address: ff19::1:2:3
       Master Group Multicast Port: 12345
       Host Unicast Channel Bound Port: 12321
       Data-in-Transit Encryption Key Exchange Port: 0
       Multicast TTL: 5
       Traffic Type: vsan
    [root@grey:~] esxcfg-vmknic -l
    Interface  Port Group/DVPort/Opaque Network        IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type                NetStack
    vmk0       Management Network                      IPv4      192.168.0.148                           255.255.255.0   192.168.0.255   48:21:0b:3f:45:0b 1500    65535     true    DHCP                defaultTcpipStack
    vmk0       Management Network                      IPv6      fe80::4a21:bff:fe3f:450b                64                              48:21:0b:3f:45:0b 1500    65535     true    STATIC, PREFERRED   defaultTcpipStack
    vmk1       VSAN                                    IPv4      192.168.0.176                           255.255.255.0   192.168.0.255   00:50:56:6b:bf:a1 1500    65535     true    DHCP                defaultTcpipStack
    vmk1       VSAN 

    Pink
    [root@pink:~] esxcli vsan cluster unicastagent list
    NodeUuid                              IsWitness  Supports Unicast  IP Address      Port  Iface Name  Cert Thumbprint                                                                                  SubClusterUuid
    ------------------------------------  ---------  ----------------  -------------  -----  ----------  -----------------------------------------------------------------------------------------------  --------------
    64f8e301-42b9-9a23-96dc-48210b3f450b          0              true  192.168.0.175  12321              7E:94:91:29:48:B3:B2:78:C2:22:D4:60:5C:CE:1A:3D:39:B8:8E:8B:21:C5:87:22:C6:A7:CF:18:2F:59:85:7B  523fee61-ca75-9436-d5b8-0e248d50647a
    64f8e854-6bb4-5163-ba94-48210b505bbb          0              true  192.168.0.134  12321              3B:A1:8B:01:8B:AC:5E:F8:C1:06:4B:56:C0:FC:F0:54:A1:75:1B:C3:E0:68:AD:F2:2B:8E:68:77:76:52:27:F3  523fee61-ca75-9436-d5b8-0e248d50647a
    6087f705-b8d8-e534-1b48-000acd39de18          0              true  192.168.0.189  12321              45:8A:19:4B:BF:69:C4:BC:C0:C8:C3:E8:0E:E2:CF:CA:86:25:99:25:BD:05:71:B9:D2:08:E7:11:36:D6:FA:36  523fee61-ca75-9436-d5b8-0e248d50647a
    [root@pink:~] esxcli vsan network list
    Interface
       VmkNic Name: vmk1
       IP Protocol: IP
       Interface UUID: 52270570-82b7-25d3-0231-32593c02885b
       Agent Group Multicast Address: 224.2.3.4
       Agent Group IPv6 Multicast Address: ff19::2:3:4
       Agent Group Multicast Port: 23451
       Master Group Multicast Address: 224.1.2.3
       Master Group IPv6 Multicast Address: ff19::1:2:3
       Master Group Multicast Port: 12345
       Host Unicast Channel Bound Port: 12321
       Data-in-Transit Encryption Key Exchange Port: 0
       Multicast TTL: 5
       Traffic Type: vsan
    [root@pink:~] esxcfg-vmknic -l
    Interface  Port Group/DVPort/Opaque Network        IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type                NetStack
    vmk0       Management Network                      IPv4      192.168.0.154                           255.255.255.0   192.168.0.255   48:21:0b:50:6c:23 1500    65535     true    DHCP                defaultTcpipStack
    vmk0       Management Network                      IPv6      fe80::4a21:bff:fe50:6c23                64                              48:21:0b:50:6c:23 1500    65535     true    STATIC, PREFERRED   defaultTcpipStack
    vmk1       VSAN                                    IPv4      192.168.0.173                           255.255.255.0   192.168.0.255   00:50:56:68:5b:3c 1500    65535     true    DHCP                defaultTcpipStack
    vmk1       VSAN                                    IPv6      fe80::250:56ff:fe68:5b3c                64                              00:50:56:68:5b:3c 1500    65535     true    STATIC, PREFERRED   defaultTcpipStack



  • 2.  RE: VSAN failure after power failure

    Posted Jul 18, 2024 12:47 AM

    in my experience each host is looking for the other on DNS name - do you have local DNS available that they can resolve hostnames to?  if not from the cli

    first things first set the IP addresses to static

    then ensure they can talk to each other without a DNS service eg
    esxcli network ip host add --hostname xxxxxxxx --ip 10.1.1.1

    esxcli network ip host add --hostname yyyyyyyy --ip 10.1.1.2

    esxcli network ip host add --hostname zzzzzzzz --ip 10.1.1.3

    repeat for each node on each node so they can all talk to one another without a dns server available.  i'd also add the vcenter name as well

    then i'd reboot them all

    once they're online they should see eachother and mount the vsan storage from there you can then jump on the web admin of each one and work out which was hosting the vcenter server and start that VM from there

    once vcenter is online you should be up and running :-)  

    hope this helps




  • 3.  RE: VSAN failure after power failure
    Best Answer

    Posted Jul 19, 2024 04:09 AM

    Hello,

    Each nodes unicastagent list shows the UUID and vSAN IP of the other 3 nodes in the cluster (buy not the local node) e.g. node with UUID 64f8e854-6bb4-5163-ba94-48210b505bbb should have vSAN vmdk with IP 192.168.0.134.

    Run this on each nodes to determine each nodes UUID and then use this to correlate the IPs from each with the unicastagent list data:

    # cmmds-tool whoami




  • 4.  RE: VSAN failure after power failure

    Posted Jul 19, 2024 04:09 AM

    Hello,

    Each nodes unicastagent list shows the UUID and vSAN IP of the other 3 nodes in the cluster (buy not the local node) e.g. node with UUID 64f8e854-6bb4-5163-ba94-48210b505bbb should have vSAN vmdk with IP 192.168.0.134.

    Run this on each nodes to determine each nodes UUID and then use this to correlate the IPs from each with the unicastagent list data:

    # cmmds-tool whoami




  • 5.  RE: VSAN failure after power failure

    Posted Jul 19, 2024 04:09 AM

    I fixed it myself by rebuilding the unicast agents on each node. It was a bit tricky, but it worked:

    Configuring vSAN Unicast networking from the command line (broadcom.com)