vSphere Availability

 View Only
  • 1.  Host isolation although uplink is redundant

    Posted Sep 13, 2011 01:33 PM

    Dear community

    The environment:

    - ESXi 4.1 U1 HA-DRS Cluster (3 Hosts)

    - vCenter is running in a VM within this cluster

    - iSCSI LUNs as datastores

    - Redundant vSwitches: e.g. vmnic0 -> SwitchA, vmnic1 -> SwitchB

    - Switches: Cisco 2960G, no channeling/trunking possible

    Now, SwitchA was down due to a power failure. Given that vSwitch0 consists of two vmnics (vmic0, vmnic1) in active/active configuration, I would assume that ESXi does transparently remove the failed link and continue to use only vmnic1 resulting in almost zero loss of packets.

    Reality looked different. Due to the switch failure, the log showed:

    "Lost uplink redundancy on virtual Switch "vSwitch0". Physical NIC vmnic0 is down. Affected portgroups: ..."

    Shortly after I got these messages:

    "Node esxi1 has stopped receiving heartbeats from Primary node esxi2 1/9. Declaring node as unresponsive."

    "user esxi1 VMware HA Agent Isolated, Notifying VPXA"

    Due to the isolation, all VMs were shutdown according to the HA configuration, which is expected.

    So, the failover did not work as expected and all three hosts were isolated. Because all three are setup the same way, behaviour on all hosts was the same.

    Management Network "Failover and Load Balancing" Parameters:

    Load Balancing          Port ID

    Network Failure Detection: Link status only

    Notify Switches:          Yes

    Failback:                     Yes

    Active Adapters:          vmnic0,vmnic1

    Standby Adapters:         None

    Unused Adapters:          None

    Do you have any idea what could be wrong?


  • 2.  RE: Host isolation although uplink is redundant

    Posted Sep 13, 2011 09:20 PM

    How are the physical ports configured? Is SpanningTree is enabled (not set to spanning-tree portfast) it can take up to ~45 seconds for the link to come up on the other switch/port. Depending on the HA isolation response settings (default: 15 seconds) this could cause HA to trigger.

    Once you take a look at the physical port configuration, make also sure the ports are set to "switchport mode access" to allow multiple MAC addresses to register on this port.

    If you are working with VLANs you may use "spanning-tree mode trunk" and "switchport mode trunk".


  • 3.  RE: Host isolation although uplink is redundant

    Posted Sep 14, 2011 07:22 AM


    Thank you for your hints. A switchport config looks like this:

    interface GigabitEthernet0/6
    description description esxi1 vmnic0
    switchport trunk native vlan 99
    switchport trunk allowed vlan 1,100,172
    switchport mode trunk

    So there are actually VLANs configured (VLAN 1 as management).

    Regarding spanning-tree. The current ports status is

    Interface           Role Sts Cost      Prio.Nbr Type
    ------------------- ---- --- --------- -------- --------------------------------
    Gi0/6               Desg FWD 4         128.6    P2p

    on both ports (on both switches). Do i have to change it to portfast?

    Kind regards

  • 4.  RE: Host isolation although uplink is redundant

    Posted Sep 14, 2011 05:38 PM

    There are a couple of switch port settings that you should look at. "spanning-tree portfast trunk" is definitely one of them, otherwise you need to configure a longer failure detection time (see http://kb.vmware.com/kb/1006421)

    for a sample configuration, see http://kb.vmware.com/kb/1004074


  • 5.  RE: Host isolation although uplink is redundant

    Posted Sep 15, 2011 06:11 AM

    Yes you will to set it to portfast or portfast trunk to avoid things like these happen.