VMware vSphere

 View Only
Expand all | Collapse all

NIC failing during traffic

  • 1.  NIC failing during traffic

    Posted Feb 27, 2012 08:53 PM

    Hello there,

    I've recently received what I hope would be my new lab rig, this SuperMicro 1U Twin Server : http://www.supermicro.nl/products/system/1U/5016/SYS-5016TI-TF.cfm

    It's not on the HCL, but it's still server grade material :

    - Xeon X3440

    - ECC RAM

    - Dual Intel 82574L nic onboard

    Most if not all of these components are on the HCL, and truth is, ESXi installs without complaining and works out of the box.

    Problem is, when using the NICs (for example, I'm using an iSCSI SAN), the NIC "crash", but the ESXi keeps running.

    I've attached an extract of the vmkernel during the following sequence :

    - Boot a VM

    - Start a Debian installation inside the VM

    At some point during the installation of Debian the nic used for iSCSI traffic just crash, and the following lines can be seen in the log :

    2012-02-27T20:15:31.595Z cpu2:2050)WARNING: NetSched: 1713: Scheduler [0x4100045c5b80] lock up [stopped=0] for vmnic3:
    2012-02-27T20:15:31.595Z cpu2:2050)WARNING: NetSched: 1723: detected at 866999 while last xmit at 861775 and 2/37092 packets/bytes in flight [window full 1] and binary heap size 1 [stress 0]
    2012-02-27T20:15:31.595Z cpu2:2050)WARNING: NetSched: 1732: Packets completion seem stuck, issuing reset on vmnic3 [stress 0]

    The following lines are iSCSI software initiator complaining about not being able to write to the LUN.

    I've also done the same test with the other onboard nic and no iSCSI traffic (ie with VM Network traffic), to the same result.

    Any ideas on how to fix the problem ?

    I have another dual port card in the server I intend to use in the meantime, but loosing 2 onboard nics that are on the HCL makes me sick ...

    Sylvain.



  • 2.  RE: NIC failing during traffic

    Posted Feb 28, 2012 12:52 AM

    Can't you try to upgrade mother BIOS and/or the nic firmware? Maybe you already have a correction to this issue available...



  • 3.  RE: NIC failing during traffic

    Posted Feb 28, 2012 07:15 AM

    Hello,

    In fact I already checked, and I'm at the latest revision available form BIOS/NIC/IPMI firmware.

    The output of ethtool -i vmnic2 indicate that I'm running the latest version of the Intel driver supported in the HCL.

    Sylvain.



  • 4.  RE: NIC failing during traffic

    Posted Feb 28, 2012 04:07 PM

    No idea ? Anyone ?

    I can give any additional details that may be necessary.

    Sylvain.



  • 5.  RE: NIC failing during traffic

    Posted Feb 28, 2012 06:03 PM

    We are experiencing the same issue on hardware we just purchased.  I have just started digging into it...

    I saw your other post at: http://communities.vmware.com/message/1997632#1997632.  FYI, we are using a Dell PowerEdge R710 (http://www.dell.com/downloads/global/products/pedge/en/server-poweredge-r710-specs-en.pdf) with two Broadcom 5709 Dual Port 1GbE NIC w/TOE iSCSI, PCIe-4 (430-3260).

    If you find out any information about your issue, if you wouldn't mind updating this thread that would be helpful.  I'll do the same.  BTW, I saw this: http://www.vmug.nl/phpbb/viewtopic.php?t=5680.  It's in Dutch, but I think they are saying there is some incompatibility with the NICs he was using.



  • 6.  RE: NIC failing during traffic

    Posted Apr 01, 2012 01:13 PM

    Still no luck getting this to work.

    I just applied a bunch a patches, at least 3 of them related to the e1000e driver, and the problem is still not fixed.

    012-04-01T13:03:33.645Z cpu6:9136)WARNING: NetSched: 1817: Scheduler [0x4100188b79c0] lock up [stopped=0] for vmnic3:

    2012-04-01T13:03:33.645Z cpu6:9136)WARNING: NetSched: 1827: detected at 4035995 while last xmit at 4030121 and 1/60 packets/bytes in flight [window full 0] and binary heap size 0 [stress 0]

    2012-04-01T13:03:33.645Z cpu6:9136)WARNING: NetSched: 1836: Packets completion seem stuck, issuing reset on vmnic3 [stress 0]

    Any idea is welcome...

    Sylvain.



  • 7.  RE: NIC failing during traffic

    Posted Apr 04, 2012 04:14 PM


  • 8.  RE: NIC failing during traffic

    Posted Apr 04, 2012 07:45 PM

    I don't think it's related, because these links applies to a similar problem but at the virtual machine level (vnic), where my problem is at the physical nic level (vmnic).

    Thanks for the info anyway, it was worth looking into it.

    Sylvain.



  • 9.  RE: NIC failing during traffic

    Posted Apr 04, 2012 08:35 PM

    You might also look at this thread and see if it relates to what you're experiencing:

    http://communities.vmware.com/message/1430032

    Datto



  • 10.  RE: NIC failing during traffic

    Posted Apr 07, 2012 08:16 AM

    I tried with all three IntMode (0,1,2) for the e1000e driver, with no luck.

    Using the IntMode=0,0 (There are two NICs in this box), the problem was worse as the NIC failed just by loading a simple web page on the only VM using this vmnic.

    The other two modes didn't do much to help either, the NIC kept failing after somewhere between 100Mb to 400Mb of traffic passing on the vmnic.

    At least I have things to try!

    Keep it going if you have any other thought on the matter :smileyhappy:

    Sylvain.



  • 11.  RE: NIC failing during traffic

    Posted Apr 07, 2012 03:51 PM

    You might run a network cable tester or swap the network cable to see if there's any positive effect.

    Datto



  • 12.  RE: NIC failing during traffic

    Posted Apr 07, 2012 04:01 PM

    Got the very same problem on 2 boards, and on all onboard NICs on these 2 boards.

    No problem whatsoever with the additionnal, not onboard, NICs.

    Beside, changing/checking the cables was one of my first step in troobleshooting the problem :smileyhappy:

    I'll see if I can get someone from Supermicro to look into it, but as they do not officially support this board with ESXi, I'm thinking it's a lost battle anyway.

    Sylvain.



  • 13.  RE: NIC failing during traffic

    Posted Apr 07, 2012 04:19 PM

    I did once have some Dell servers that had NICs that wouldn't work unless I shut off the USB ports in the system BIOS and exhibited a sympton similar to your system. If for some reason shutting off the USB ports works, you may be able to re-assign the interrupts to get an arrangement that works for USB also.

    Datto



  • 14.  RE: NIC failing during traffic

    Posted Aug 26, 2012 01:21 AM

    Hi Sylvian,

    Did you get any further with this issue as I'm tearing my hair out with this exact problem too - in my case a Supermicro X8S16-F and I too bought it specfically because everything was supported in ESXi...

    I've purchased another Intel NIC and this works perfectly but having two onboard NICs just sitting there is very frustrating (especially as I had plans for them). (Rant over)

    In any case, does anyone have any further suggestions?



  • 15.  RE: NIC failing during traffic

    Posted Aug 27, 2012 07:25 PM

    Hi thomasq,

    As a matter of fact, yes I have it working fine now.

    After trying to fix the problem myself, I opened a case with Supermicro somewhere in June.

    They were able to reproduce the problem, then got in touch with Intel, and they had me try a couple of firmwares, until the last one they sent me at the end of July fixed my problem.

    I didn't have the chance to try it until 3 days ago, but I can report that it is working fine so far, even when put under a lot of stress (I switched my iSCSI network on to it, and stressed it with a steady 90+Mb/s throughput for 12 hours straight).

    I recommend that you check with Supermicro, as their support staff has really been helpful on this one.

    I hope you'll get your NICs working!

    Sylvain.



  • 16.  RE: NIC failing during traffic

    Posted Aug 27, 2012 08:33 PM

    Sylvain,

    Thank you so much for the reply - I had given up hope!

    regards,

    Thomas.

    -- Thomas B. Quillinan



  • 17.  RE: NIC failing during traffic

    Posted May 13, 2016 04:22 AM

    Hi Sylvain,

              Did you mean Supermicro update a BIOS version to you and solved the problem? Any further details? I met a similar problem with the following log in vmkwarning.log several month ago and really drived me mad. Any suggestion? Thanks in advance.

    2016-04-15T09:43:22.985Z cpu26:33758)WARNING: LinNet: netdev_watchdog:3474: NETDEV WATCHDOG: vmnic9: transmit timed out

    2016-04-15T09:43:27.036Z cpu48:1887628)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3921: vmnic8: failed to push the coalescing settings cranking upinflight window to infinite: Failure

    2016-04-15T09:43:28.037Z cpu14:1887636)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3921: vmnic9: failed to push the coalescing settings cranking upinflight window to infinite: Failure

    2016-04-15T09:43:34.000Z cpu24:33758)WARNING: LinNet: netdev_watchdog:3474: NETDEV WATCHDOG: vmnic9: transmit timed out

    2016-04-15T09:43:38.037Z cpu22:1887666)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3921: vmnic8: failed to push the coalescing settings cranking upinflight window to infinite: Failure

    2016-04-15T09:43:38.041Z cpu80:1887668)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3921: vmnic9: failed to push the coalescing settings cranking upinflight window to infinite: Failure

    2016-04-15T09:43:44.012Z cpu58:33751)WARNING: LinNet: netdev_watchdog:3474: NETDEV WATCHDOG: vmnic9: transmit timed out

    2016-04-15T09:43:44.046Z cpu72:1887668)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3863: vmnic9 : scheduler(0x410c9034c2f0)/device(0x410b5da37940) 0/1 lock up [stopped=0]:

    2016-04-15T09:43:44.046Z cpu72:1887668)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3874: detected at 682017011 while last xmit at 682011996 and 4401 bytes in flight [window 1500000 bytes] and last enqueued/dequeued at 682016995/682011996 [st$

    2016-04-15T09:43:44.046Z cpu72:1887668)WARNING: netsched: NetSchedMClkWatchdogSysWorld:3890: vmnic9: packets completion seems stuck, issuing reset

    Regards,

    Felix