VMware vSphere

 View Only
  • 1.  eth0: tx hang

    Posted Jun 01, 2016 07:00 AM

    Got one node tx hang when running MR jobs, this node is one of datanodes from our hadoop cluster, we are importing databases from sqoop when got below errors, any ideas ?

    Jun  1 14:01:19 kernel: ------------[ cut here ]------------

    Jun  1 14:01:19 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted)

    Jun  1 14:01:19 kernel: Hardware name: VMware Virtual Platform

    Jun  1 14:01:19 kernel: NETDEV WATCHDOG: eth0 (vmxnet3): transmit queue 2 timed out

    Jun  1 14:01:19 kernel: Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc autofs4 8021q garp stp llc vsock(U) ipv6 microcode vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip6t_REJECT]

    Jun  1 14:01:19 kernel: Pid: 18580, comm: java Not tainted 2.6.32-431.el6.x86_64 #1

    Jun  1 14:01:19 kernel: Call Trace:

    Jun  1 14:01:19 kernel: <IRQ>  [<ffffffff81071e27>] ? warn_slowpath_common+0x87/0xc0

    Jun  1 14:01:19 kernel: [<ffffffff81071f16>] ? warn_slowpath_fmt+0x46/0x50

    Jun  1 14:01:19 kernel: [<ffffffff8147b74b>] ? dev_watchdog+0x26b/0x280

    Jun  1 14:01:19 kernel: [<ffffffff81083e75>] ? internal_add_timer+0xb5/0x110

    Jun  1 14:01:19 kernel: [<ffffffff8147b4e0>] ? dev_watchdog+0x0/0x280

    Jun  1 14:01:19 kernel: [<ffffffff81084b07>] ? run_timer_softirq+0x197/0x340

    Jun  1 14:01:19 kernel: [<ffffffff810ac8f5>] ? tick_dev_program_event+0x65/0xc0

    Jun  1 14:01:19 kernel: [<ffffffff8107a8e1>] ? __do_softirq+0xc1/0x1e0

    Jun  1 14:01:19 kernel: [<ffffffff810ac9ca>] ? tick_program_event+0x2a/0x30

    Jun  1 14:01:19 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30

    Jun  1 14:01:19 kernel: [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0

    Jun  1 14:01:19 kernel: [<ffffffff8107a795>] ? irq_exit+0x85/0x90

    Jun  1 14:01:19 kernel: [<ffffffff815310aa>] ? smp_apic_timer_interrupt+0x4a/0x60

    Jun  1 14:01:19 kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20

    Jun  1 14:01:19 kernel: <EOI>

    Jun  1 14:01:19 kernel: ---[ end trace 808af6e00c97548a ]---

    Jun  1 14:01:19 kernel: vmxnet3 0000:03:00.0: eth0: tx hang

    Jun  1 14:01:24 kernel: vmxnet3 0000:03:00.0: eth0: resetting

    Jun  1 14:01:24 kernel: vmxnet3 0000:03:00.0: eth0: intr type 3, mode 0, 9 vectors allocated

    Jun  1 14:01:24 kernel: vmxnet3 0000:03:00.0: eth0: NIC Link is Up 10000 Mbps



  • 2.  RE: eth0: tx hang

    Posted Jun 02, 2016 04:57 PM

    Read this KB: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2055140

    Then disable TSO on your guest, then check and if the problem not resolved, change your NIC to E1000.

    If the problem was still exist, you can disable TSO on ESXi as a test but I don't suggest it.



  • 3.  RE: eth0: tx hang

    Posted Jun 03, 2016 06:29 AM

    Thx, Davoud, I will try your suggestion and continue monitoring this guest, :smileyhappy:



  • 4.  RE: eth0: tx hang

    Posted Jun 08, 2016 07:08 PM


  • 5.  RE: eth0: tx hang

    Posted Jun 02, 2017 07:42 AM

    Hi,

    We are using version VMware ESXi 5.5.0 build-1331820 & facing the same issue(tx hang) with same back trace in vmxnet3 driver.

    Please let us know the availablity of patch .

    As mentioned TSO is disabled by default in our guest vm.

    Thanks in advance.