vSphere vNetwork

 View Only
  • 1.  Issue with network using multiple nics

    Posted Sep 24, 2010 01:42 PM

    Hi All

    Hopefully one of you guys can offer some advice. I'll lay out a bit of the background info.

    I am running vsphere on a blade setup. An individual esx host has nics going to two separate physical switches. These nics are set up as a team (on the same vswitch) in vsphere to provide connectivity for the virtual machines. Management traffic and vmotion use separate nics (im not concerned with these at the moment). The load balancing method in use is route by originating port ID.

    Now, as I understand it - vsphere 'attaches' a given vm to one of those nics at any one time. When I reload one of the physical switches (that one of the nics are connected to, thereby simulating a switch failure) the vms will all begin to use the remaining functioning nic. This works ok, without any loss of traffic. (im using ping to test connectivity to the vms). Like wise when I vmotion, at most, only one ping is dropped. All as expected so far.

    The problem I have is that when the switch that Ive reloaded/simulated failure on comes back, making all nics in the team available again, I get around 8-12 dropped ping packets. I'm guessing this is because the switches (arp) are relearning the location of the vms macs as vsphere is attempting to reload balance them now all the nics are available again. Should it take so long to do so ? I;m concerned that if I want to do any switch maintenance, the vms will drop traffic. I'm guessing there is something I can do to change this behaviour?

    Thanks for any help in advance. Let me know if ive been unclear anywhere.

    Thanks

    E



  • 2.  RE: Issue with network using multiple nics

    Posted Sep 24, 2010 03:11 PM

    Actually, I'm not aware of any re-loadbalancing with standard vSwitches, except for the case you configured the NICs on the vSwitch or port groups for active/standby.

    One issue where you will see some delay in connecting is spanning tree. You should configure the switches for "spanning-tree portfast" to avoid a possible delay. However I think that may not be your issue since the failover works without interruption.

    see http://www.vmware.com/files/pdf/virtual_networking_concepts.pdf

    André



  • 3.  RE: Issue with network using multiple nics

    Posted Sep 25, 2010 03:13 PM

    Hello,

    Outside of port fast settings I am not sure there is much you can do. You can disable 'failback' on the vSwitch so that when the pNIC once more has connectivity to the pSwitch the VMs do not failback to the failed pNIC. However, I personally want that behavior...

    Last time I saw something like this it was a STP issue and there were at least 2 pNICs going to each vSwitch.


    Best regards,
    Edward L. Haletky VMware Communities User Moderator, VMware vExpert 2009, 2010

    Now Available: 'VMware vSphere(TM) and Virtual Infrastructure Security'[/url]

    Also available 'VMWare ESX Server in the Enterprise'[/url]

    Blogging: The Virtualization Practice[/url]|Blue Gears[/url]|TechTarget[/url]|Network World[/url]

    Podcast: Virtualization Security Round Table Podcast[/url]|Twitter: Texiwll[/url]



  • 4.  RE: Issue with network using multiple nics

    Posted Sep 28, 2010 10:10 AM

    Thanks for your comments guys. Regarding spanning-tree, I have the spanning-tree portfast trunk command on each of the links going into the vSphere servers, so I don't think it is spanning-tree that is causing the issue - it more seems as though it is an issue with the switches learning the location of the VMs mac address. I guess disabling failback may resolve the issue, but then I guess everything could end up running through one nic as i'm using the originating port load balancing.

    Do you think this is the correct method given the topology i explained? The nic team is split across different physical switches - am I right in saying that if I wanted to use the other load balancing methods, the team would have to go back to the same physical switch. I'm guessing its because the team is going to different physical switches that this issue is occuring, however it was done for redundacy reasons.

    Is this a recommended way of handling the network when using a blade infrastructure, or are there any other recommendations.

    Thanks again,

    E