VMware vSphere

 View Only
Expand all | Collapse all

nic link down but nic shows status lights (esxi)

  • 1.  nic link down but nic shows status lights (esxi)

    Posted Nov 29, 2011 10:53 PM

    Hello all, I have a bit of a weird problem and I'm not having much luck with the search option.

    Let me describe how I got to the point I am at with this:

    We have about 29 VM's running on our vmware esxi server, all of them are more or less for internal testing purposes. All of these vm's use 2 network ports on our vmware device, but are seperated using virtual switches and vlans. One of the network port connects to a dumb switch and another connects to a cisco switch with trunking on the interface.

    I created a brand new vm and a new vswitch to test bridging mode in a product we are using. I modified an older vm to connect to this new vm switch, and then the new device to connect to this new switch, our public vlan, and our internal network. The idea was that the bridge would allow the older vm to continiue to talk to the public vlan as normal, but the bridged device would work transparently in the middle.

    Once I finished this configuration and started the bridge device, the network of our entire building ceased to function, including my connection to the vmware server. Once I realized that this had happened, I walked over to the physical piece of hardware and unplugged it entirely from the network. The rest of the building then went back to its happy network-purring self.

    Unforunately, I have usb passthrough turned on, and was unable to connect to the terminal on the esxi server. Instead, I simply pushed the power button and let the server shut down.

    Once it came back up, the nic that was previously attached to the trunking port of the cisco (vmnic0) had/has stopped working. Once I managed to get into it on the other interface, I was able to issue a "esxconfig-nic -l" which shows the device as down.

    On the back of the server, I can see the vmnic0 has a yellow light and a green light. The functioning vmnic1 has two green lights. I switched network cables on the two interfaces once to see if anything would change, it did not.

    Does anyone have any ideas?

    Thanks,

    -Jacob



  • 2.  RE: nic link down but nic shows status lights (esxi)

    Posted Nov 30, 2011 05:21 AM

    When you had the network down after creating a bridge inside a VM is most likely because a Layer 2 loop was formed in some way and broadcasts quickly consumed all bandwidth for you. Bridging inside VMs could be dangerous for this very reason. If possible it is much better to do IP routing inside a VM which needs to be attached to several networks/VLANs.

    The issue with the NIC with link, but does not work could be related to this. Do you know if Spanning Tree was changed or configured at the physical switches after this event? The link to your host might be "up", but logically disabled through Spanning Tree from the physical switch.



  • 3.  RE: nic link down but nic shows status lights (esxi)

    Posted Nov 30, 2011 08:35 AM

    Totally agree with Rickard..  Can you access this switch?

    If so try to see if the port is in an err-disabled state..

    /Rubeck



  • 4.  RE: nic link down but nic shows status lights (esxi)

    Posted Nov 30, 2011 05:39 PM

    Thank you guys for the responses.

    I do have access to the switch, but i'm not very familiar with the concept of the spanning tree or how to modify it within cisco's operating system.

    However, I don't think the problem is in the switch, I've tried plugging that ethernet port, vmnic0, into a dumb switch just to test if the port would come up - it did not. Also, I have that same port on the cisco now plugged into the other port on our vmware server (vmnic1), and its working fine. This to me points to either a hardware problem or something within esxi itself. Given that the circumstances around the event were software related, I think the most likely solution is that something in esxi has disabled the nic - and its certainly not obvious how or why. Dmesg does not show any errors, and neither does syslog. Not sure where else to look.

    I realize that bridging inside of the vmware server is not a best case scenario - in our company, we actually consider bridging to not be a best case scenario at all. Unfortunately, we are short of hardware to do this test on a physical device, or I would not have tried it within the vm in the first place. I'm also not sure how I could have created a broadcast loop, but that theory does seem plausible since the entire building's network died. There must be something connected between the networks I was bridging which I did not realize.

    Any other ideas on the source of the dead nic? Is it physically possible the nic itself was overloaded in some way and was actually "fried" by this experiment? I've never heard of it happening that way, but I don't have the physical hardware knowledge to know if its even possible, though intuition tells me its not.

    Thanks again,

    Jacob



  • 5.  RE: nic link down but nic shows status lights (esxi)

    Posted Nov 30, 2011 10:41 PM

    I would say that it is very unlikely that the network card was damaged by the high traffic load. It is interesting that you have tested the card on another physical switch and it does not work there too.

    When connected to another physical switch, does the switch port seems to get link? (Light up)

    Could you do a print of the output from esxcfg-nics -l and also esxcfg-vswitch -l ?

    Do you know if anything has been set on the VMNIC, like special speed or duplex settings?



  • 6.  RE: nic link down but nic shows status lights (esxi)

    Posted Nov 30, 2011 10:54 PM

    The switch port lights up the same no matter where it is plugged in. I have tested three different places - the cisco switch, a dumb switch, and directly to my laptop. In all three cases there is one green light and one orange light. Also, the green light (I assume the activity light) blinks very quickly and constantly. For reference, the working nic (vmnic1) has two green lights.

    There are not any special speed or duplex settings that have been set, save one or two commands, that did not appear to make any difference, which I tried after the failure. I don't personally know of any other parameters that could have been set.

    ~ # esxcfg-nics -l
    Name    PCI           Driver      Link Speed     Duplex MAC Address       MTU    Description                  
    vmnic0  0000:02:00.00 e1000e      Down 0Mbps     Half   00:24:8c:57:aa:56 1500   Intel Corporation 82574L Gigabit Network Connection
    vmnic1  0000:03:00.00 e1000e      Up   1000Mbps  Full   00:24:8c:57:aa:87 1500   Intel Corporation 82574L Gigabit Network Connection

    Please note that I have changed the config displayed by esxcfg-vswitch since the problem occured. I had to change things around to get the server to work again so that we could continue using the vm's.

    ~ # esxcfg-vswitch -l
    Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks  
    vSwitch0         64          22          64                1500    vmnic1   

      PortGroup Name        VLAN ID  Used Ports  Uplinks  
      192.168.0.0/16        7        1           vmnic1   
      Public Vlan           3        3           vmnic1   
      Nacs Testing          6        2           vmnic1   
      172.17.0.0/16 Network  2        12          vmnic1   
      Management Vlan2      2        1           vmnic1   
      Management Network    2        1           vmnic1   

    Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks  
    vSwitch1         64          1           64                1500             

      PortGroup Name        VLAN ID  Used Ports  Uplinks  
      OLD 192.168.0.0/22 Network OLD  0        0                    

    Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks  
    vSwitch2         128         2           128               1500             

      PortGroup Name        VLAN ID  Used Ports  Uplinks  
      Bridge Testing Network  0        1                 

    changes - the 19.168.0.0/26 network was previously NOT a vlan, and was connected to a dumb switch via vSwitch2 on vmnic1. vSwitch0 was connected to vmnic0

    Thank you once again for taking the time to examine my/our issue. It really does have me stumped.

    -Jacob



  • 7.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 03, 2011 10:29 PM

    Jacob0B wrote:

    There are not any special speed or duplex settings that have been set, save one or two commands, that did not appear to make any difference, which I tried after the failure.

    Just to be sure that there is no misconfiguration in the speed/duplex settings, could you make sure that the vmnic0 is set to AUTO in the vSphere Client.

    Did you have access to the Cisco switch? When you have the vmnic0 cable attached, could you login to the switch and run:

    show interface (name of the interface, like GigabitEthernet0/1)

    It would be interesting to see what the physical switch reports from its point of view.



  • 8.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 05, 2011 07:04 PM

    Opening vSphere client, I couldn't see any way to assign speed settings. So, I got on the box and executed:

    # esxcfg-nics -a vmnic0

    No errors were reported, but also nothing changed.

    From the cisco switch (I borrowed a connection from an old testbox of mine, so you can ignore the description):

    DevSwitch#show interfaces gi0/15
    GigabitEthernet0/15 is up, line protocol is up (connected)
      Hardware is Gigabit Ethernet, address is 64ae.0c6c.080f (bia 64ae.0c6c.080f)
      Description: Jacob's testbox (PatchPort 6)
      MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
         reliability 255/255, txload 1/255, rxload 1/255
      Encapsulation ARPA, loopback not set
      Keepalive set (10 sec)
      Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX
      input flow-control is off, output flow-control is unsupported
      ARP type: ARPA, ARP Timeout 04:00:00
      Last input never, output 00:00:00, output hang never
      Last clearing of "show interface" counters never
      Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
      Queueing strategy: fifo
      Output queue: 0/0 (size/max)
      5 minute input rate 0 bits/sec, 0 packets/sec
      5 minute output rate 10000 bits/sec, 12 packets/sec
         0 packets input, 0 bytes, 0 no buffer
         Received 0 broadcasts (0 multicasts)
         0 runts, 0 giants, 0 throttles
         0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
         0 watchdog, 0 multicast, 0 pause input
         0 input packets with dribble condition detected
         1706 packets output, 146545 bytes, 0 underruns
         0 output errors, 0 collisions, 1 interface resets
         0 babbles, 0 late collision, 0 deferred
         0 lost carrier, 0 no carrier, 0 PAUSE output
         0 output buffer failures, 0 output buffers swapped out

    the line:

    0 input packets with dribble condition detected

    Is interesting, but I'm not sure what it means or how to correct it.

    -Jacob

    EDIT: After doing my own research on the dribble condition, that is a counter, not a status. Therefore, having it say zero is completely normal.



  • 9.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 05, 2011 07:28 PM

    "Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX"

    Could you try to hardcode at Gbit speed on the pSwitch ports as a test..? The speed negotiation looks screwed, IMO..

    /Rubeck



  • 10.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 05, 2011 09:48 PM

    Kim Rubeck wrote:

    Could you try to hardcode at Gbit speed on the pSwitch ports as a test..? The speed negotiation looks screwed, IMO..

    How would I go about doing this and then undoing this?

    I  unfortunately don't have as much cisco experience as I would like, and  what your asking is currently outside of my skill level on these  devices.

    -Jacob



  • 11.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 06, 2011 07:42 AM
    conf t
    interface gi0/15
    speed 1000
    duplex full

    This sets the switch port to 1 Gbit using full duplex....

    You might have to do the same thing on the ESX side for the vnic connected to this switch port..

    /Rubeck



  • 12.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 06, 2011 04:29 PM

    Forcing the speed/duplex in that way caused the switch to stop detecting a link at all. I also checked the hardware status lights on the esxi server and they had shut off. (Where previously there was one green, one yellow/orange light)

    You inspired me to try some things though, so I tried forcing speed on the switch to 10mb, which also did not work. Setting the switch back to auto with

    no speed 10

    no duplex full

    brought the status back to up, though the vmnic0 still displays a link down. I also tried forcing vmnic0 to match the 100mb by issuing

    esxcfg-nics -s 100 -d full vmnic0

    No errors returned, but neither did it work.

    I put all settings back to automatic once I was done.

    Any more Ideas? I'm certainly willing to try.

    -Jacob



  • 13.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 06, 2011 08:48 PM

    Here are some comments on the information from your physical Cisco switch (see bold comments inside your text) :

    Jacob0B wrote:

    GigabitEthernet0/15 is up, line protocol is up (connected)

    This means that from the switch side the link is totally up and no problems have been detected enabling the port.

    Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX

    As Rubeck pointed out, this is interesting that the switch has selected 100/full. The full would mean that the other side is set to auto, if not then it would be half. This does probably mean that your adapter when negotiating reports that it is capable of only 100 Mbit/Full duplex.

    0 packets input, 0 bytes, 0 no buffer

    This means that not a single frame has been received from the ESXi host to the switch port.

    1706 packets output, 146545 bytes, 0 underruns

    But from the switch side the port is up and 1706 packets have been sent into the ESXi host, most likely broadcast frames.

    This seems to mean that the network card is logically "down" from the ESXi host, even if it is physically "up". Still a mystery why this state has been set and how to revert it.



  • 14.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 06, 2011 11:02 PM

    Rickard Nobel wrote:

    This seems to mean that the network card is logically "down" from the ESXi host, even if it is physically "up". Still a mystery why this state has been set and how to revert it.

    That was my hypothesis, yes. I'm glad we've managed to show more evidence than just my gut feeling, however.

    I feel like there must be something in vmware which is disabling the port. Some sort of override safety I may have triggered in the driver for the port itself maybe? The thing is I would expect any sort of override or auto shutoff to show in the logs somewhere, or to display a big red flashing warning in vsphere or something similair.

    If you have any ideas as to commands I could use to try to enable the card, i'm all for it. I've already identified and gave ethtool a shot, with no success.

    Interestingly, ethtool vmnic0 shows:

    ~ # ethtool vmnic0
    Settings for vmnic0:
            Supported ports: [ TP ]
            Supported link modes:   10baseT/Half 10baseT/Full
                                    100baseT/Half 100baseT/Full
                                    1000baseT/Full
            Supports auto-negotiation: Yes
            Advertised link modes:  Not reported
            Advertised auto-negotiation: No
            Speed: Unknown! (65535)
            Duplex: Unknown! (255)
            Port: Twisted Pair
            PHYAD: 1
            Transceiver: internal
            Auto-negotiation: off
            Supports Wake-on: pumbag
            Wake-on: g
            Current message level: 0x00000001 (1)
            Link detected: no

    This would make me think it is a driver issue of some sort?

    -Jacob



  • 15.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 07, 2011 10:28 AM

    If using the vSphere Client and check the VMNIC0 settings for speed and duplex, what do they look like?

    And also, since you have made different tests and re-configuration, there is some vSwitch attached to the VMNIC0 right? And some VM or something attached to that vSwitch? Just so there is something internal that could actually send frames out to the physical switch.



  • 16.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 07, 2011 05:56 PM

    Using the vsphere client the nic reports that it is down, but configured for "1000 Full". I don't see any way to change this from vsphere? I've tried changing this from the ssh session but It does not update on vsphere

    I have reattached a vswitch while I was testing things. I didn't have anything running all the time on this vswitch, everything that used to use it has been moved to the other nic. You make a good point, however, so i've created a vm with the express purpose of sitting on that interface in the hopes that it forces it to activate.

    One of my coworkers was wondering if there is a way that this interface could be somehow stuck in a bridging-type mode, and refusing to send packets to avoid the broadcast loop discussed earlier. Is there an interface bridging build into vmware?

    -Jacob



  • 17.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 07, 2011 08:41 PM

    After you connected a vSwitch to the vmnic you could go through the vSwitch setting and edit the vmnic duplex and speed. Make sure to set it to Auto. Start any VM connected to the vSwitch and see if anything happens.

    One of my coworkers was wondering if there is a way that this interface could be somehow stuck in a bridging-type mode, and refusing to send packets to avoid the broadcast loop discussed earlier. Is there an interface bridging build into vmware?

    I would say that it is unlikely. The vSwitches has no Spanning Tree or any other loop detection / prevention, so it should not really understand that a loop is taking place.



  • 18.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 08, 2011 08:10 PM

    I now have a vm set up on a vswitch connected to vmnic0, and vmnic0 is set to auto negotiate. Unforunately, vsphere still shows that the interface is down.

    -Jacob



  • 19.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 08, 2011 10:51 PM

    It certainly seems stuck at a shutdown state for no obvious reason. Have you had the opportunity to reboot the host since the incident?



  • 20.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 09, 2011 12:56 AM

    Not since I switched all the vm's to the other nic.

    However, within the course of figuring out that the nic would no longer work the server was rebooted several times. I'm skeptical that rebooting it one more time would work, but at this point I guess i'm willing to try just about anything.

    I'll power down the vm's and the server after hours tomorrow to give rebooting one last hurrah.

    -Jacob



  • 21.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 09, 2011 08:37 AM

    If you have already rebooted the host then this will most likely not help now. Something that might be possible would be to shutdown the host, disable the NIC in bios, reboot and let ESXi see this, then reboot again and re-enable it in BIOS, just to see if it "becomes visible" again. Not entire likely, but running a bit out of options here. :smileyhappy:



  • 22.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 13, 2011 04:07 PM

    Thats a very interesting idea. Seems like a bit of a long shot, but like you pointed out, we don't exactly have a lot of options.

    I ended up not getting a chance to reboot the server after hours the other day, it was a bit of a busy week for us. I'm doing some other maintenance on that server after hours on thursday anyway, so i'll have lots of time to fiddle with it.

    I'll be sure and let you know what happens.

    -Jacob



  • 23.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 16, 2011 07:26 AM

    Hello guys,

    we are experiencing exactly the same issue on two servers.

    ESXi 5.0, but we have the same with ESX 4.1

    Server: HP DL 380 G6

    nic with problem Broadcom NetXtreme II BCM 5709

    disabling/enabling the nic didn't help.

    We are going to open a ticket with VMware in the mainwhile...... any other idea?

    Thanks

    Giorgio



  • 24.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 16, 2011 04:15 PM

    yesterday I had lots of time to mess with the server. We basically shut down for two weeks around the Christmas holiday, and most of the people who rely on that server took off a couple days early. Basically, that gave me yesterday and today to mess with it.

    That server has had slow I/O for a long time, due simply because its only drives were 2 1TB drives in a mirrored RAID. Just 7200RPM SATA, nothing special. It also used to have only 8GB of RAM and a single quad core Xeon. We recently upgraded the RAM to 64GB and added a second Xeon. At the same time, we bought two new 1TB drives, but had to hold off because installing the drives involved a full backup of the server and a reinstall. (we only have four drive slots, and before there was a system drive + the 1tb raid array = 3 slots used. )

    For the backup process I booted into a live Ubuntu flash disk, simply because I am more comfortable in a real linux environment than the kindof-linux of esxi. I used this opportunity to test the nic. It was still dead.

    Once I reinitialized the new RAID10 array, and got esxi reinstalled, I double checked using the new esxi. again, the nic is still dead. I'm going to have to call it - either through coincidence or a weird overload condition, the botched bridging experiment I explained in my first post must have killed the nic.

    gctn wrote:

    ESXi 5.0, but we have the same with ESX 4.1

    Server: HP DL 380 G6

    nic with problem Broadcom NetXtreme II BCM 5709

    disabling/enabling the nic didn't help.

    While the symptoms sound similair, the nic I am describing is an "Intel Corporation 82574L Gigabit".

    Ricknob has been very helpful in pointing out ways to check the nic. He may be able to help you.

    Since you say you are having the same problem, I am assuming that your switch is reporting that the nic is active, and that there are link lights on the physical hardware, but that esxi is not detecting the the link. If so, then I would definitely say you must have found a bug. For me, however, I am fairly certain I have managed to overload the nic in some way. The only way this could still be esxi's mistake is if the generic e1000 driver has somehow misconfigured a firmware-level setting for my nic, which I find unlikely.

    Good luck on getting your issue resolved, however.

    -Jacob



  • 25.  RE: nic link down but nic shows status lights (esxi)

    Posted Dec 21, 2011 05:49 PM

    Thanks for your answer Jacob, I confirm the problem is the same, even if a different nic, and unfortunately ti has not been solved yet.

    However I have opened a ticket with vmware and HP also, and making dozens of tests and experiment but for the time being no solution at the problem.

    Thanks

    Giorgio



  • 26.  RE: nic link down but nic shows status lights (esxi)

    Posted Oct 14, 2015 05:20 PM

    I bought two Dell R730xd with Qlogic nics, 10GB and 1 GB combo.

    They came with ESXi 6.0.0 build number starts with 28.... I have to go to build 302... which is ESXi 6.0.0 U1.

    WMware posted a network isolation issue and recommended 6.0.0 U1A build 307...

    I upgraded one server with the offline bundle and the other fresh install.

    Now both have the "down" nics (especially the 1 GB ones), while the activity lights are up on the back of the nic.

    These servers had only the management nic configured and nothing else.

    I tried different versions of drivers for the nics, swapped cables around... no luck.

    I have also reverted to U1, but no change at all.



  • 27.  RE: nic link down but nic shows status lights (esxi)

    Posted Sep 17, 2018 04:33 PM

    Ran in to this same issue.  The server has 2 onboard NICs (0 and 1), plus 2 quad-NIC cards, stacked horizontally.  My confusion was thinking that NICs 3-5 were the bottom board, and 6-9 the  top (number label 6-9 runs down the center, between the two cards, no other numbers visible).  My problem was that I was plugged in to NIC 9, but my vSwitch was configured to use NIC 5.  Changed to NIC9 at the vSwitch - all good.