ESXi

 View Only
Expand all | Collapse all

unresponsive host

  • 1.  unresponsive host

    Posted Jun 01, 2015 06:57 AM

    Hi

    Since i upgraded to VMWare esxi6 and vCenter 6 i have the following issue:

    The host shows as greyed out and is not responding.

    All the vm's on that host are also greyed out and show disconnected.

    i can connect to the host directly by is unresponsive.

    If i reboot the host it reconnects fine and everything works.  this has happened on 2 hosts so far.  one was a hp and the other was an intel.

    Any help or input will really be appreciated.

    Thanks



  • 2.  RE: unresponsive host

    Posted Jun 01, 2015 07:10 AM

    i would recommend to restart the management agents:

    VMware KB: Restarting the Management agents on an ESXi or ESX host

    what happens if you rightclick the disconnected server in vcenter and say reconnect? does ke asks you for credentials?

    maybe he has lost the certificates by upgrading to vpshere 6.0 and now he thinks that this could be another server.

    so you have to  reconnect them manuallly



  • 3.  RE: unresponsive host

    Posted Jun 01, 2015 08:11 AM

    hi

    i haven't tried to disconnect and reconnect the host.  but i did try to right click and selected connect.  this did nothing.

    when i have the problem again I'll try to restart the management agent.



  • 4.  RE: unresponsive host

    Posted Jun 03, 2015 01:17 AM

    Hi,

    I have had something similar on an upgraded test host, the server would randomly disconnect and a reboot resolved it. Eventually the host wouldn't reconnect to vcenter at all.

    The fix for me was up uninstall the vpxa agent, restart the host then reconnect to vcenter (as though connecting a new host)

    R



  • 5.  RE: unresponsive host

    Posted Jul 03, 2015 05:36 PM

    could you please confirm how you uninstall the vpxa agent...



  • 6.  RE: unresponsive host

    Posted Jun 02, 2015 07:17 AM

    Hello at all!

    I had the same issue some days ago in two different environments. One standalone free ESXi 6.0 Hypervisor and one in a two-node-cluster managed by vCenter-Server-Appliance 6.0.

    I tried to reconnect the host, but i didn't work for me.

    At the DCUI is tried to enter my password, but the Host did not respond. Only the reboot did solve my problem. After that everything was fine.

    I'm running the ESXi 6.0 on a Fujitsu RX200 S6 and RX 200 S7.

    Please let me know if there is a fix for this issue.

    Regards,

    schulzman



  • 7.  RE: unresponsive host

    Posted Jul 03, 2015 05:46 PM


  • 8.  RE: unresponsive host



  • 9.  RE: unresponsive host

    Posted Jul 21, 2015 05:59 PM

    If you're seeing this in your vmkernel.log at the time of the disconnect it could be related to an issue that will one day be described at the below link (it is not live at this time). We see this after a random amount of time and nothing VMware technical support could do except reboot the host helped.

    http://kb.vmware.com/kb/2124669

    vmkernel.log:

    2015-07-19T08:22:35.552Z cpu0:33257)WARNING: LinNet: netdev_watchdog:3678:

    NETDEV WATCHDOG: vmnic4: transmit timed out

    2015-07-19T08:22:35.552Z cpu0:33257)WARNING: at vmkdrivers/src_92/vmklinux_92/vmware/linux_net.c:3707/netdev_watchdog()(inside vmklinux)

    2015-07-19T08:22:35.552Z cpu0:33257)Backtrace for current CPU #0,worldID=33257, rbp=0x430609af4380

    2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49be10:[0x418029896b4e]vmk_LogBacktraceMessage@vmkernel#nover+0x22 stack: 0x430609af4380, 0

    2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49be30:[0x418029f1e7b7]watchdog_work_cb@com.vmware.driverAPI#9.2+0x27f stack: 0x430609ac3ce

    2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49bea0:[0x418029f44a5f]vmklnx_workqueue_callout@com.vmware.driverAPI#9.2+0xd7 stack: 0x4306

    2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49bf30:[0x41802984f872]helpFunc@vmkernel#nover+0x4e6 stack: 0x0, 0x430609ac3ce0, 0x27, 0x0,

    2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49bfd0:[0x418029a1231e]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0, 0x0, 0x0, 0x0,



  • 10.  RE: unresponsive host

    Posted Aug 11, 2015 05:21 PM

    sdnbtech, have you heard or seen any updates on the issue you described?  I haven't been able to get an update on the status of a fix from VMware after about a few weeks after confirming VMware engineering is working on a solution.  A host downgrade to 5.5 was the only recommendation aside from rebooting the 6.0 hosts each time networking drops.



  • 11.  RE: unresponsive host

    Posted Aug 12, 2015 03:33 PM

    I seem to be having very similar issues:

    2015-08-11T11:14:53.340Z cpu23:33256)WARNING: LinNet: netdev_watchdog:3678: NETDEV WATCHDOG: vmnic4: transmit timed out

    2015-08-11T11:14:53.340Z cpu23:33256)<6>ixgbe 0000:41:00.0: vmnic4: Fake Tx hang detected with timeout of 160 seconds

    When this happens, both ports on a dual port NIC die at the same time and only a reboot fixes it.  I opened an SR with VMware support with reference back to here and the not-yet-exiting KB posted above and will follow up if/when I hear something back on this.



  • 12.  RE: unresponsive host

    Posted Aug 14, 2015 03:51 PM

    Confirmed what sdnbtech stated above.  The "transmit timed out" is a known issue.  No ETA on a time frame for release yet, not very forthcoming with details.  Basically was told to downgrade if this issue is affecting me as there is no workaround.  Engineer I spoke to says he sees this at least once a week.



  • 13.  RE: unresponsive host

    Posted Aug 24, 2015 06:34 PM

    I checked this morning and there are a few options. 1) Apply a debug build of ESXi that will still be affected by the problem but gather more information for the development team, 2) There is a script that has to be run at each boot of each ESXi server that they believe fixes the issue entirely but can cause performance degradation, 3) Downgrade to 5.5 or below.

    My case has now been open 60 days regarding this issue. It's very disappointing.



  • 14.  RE: unresponsive host

    Posted Aug 24, 2015 06:34 PM

    I checked this morning and there are a few options. 1) Apply a debug build of ESXi that will still be affected by the problem but gather more information for the development team, 2) There is a script that has to be run at each boot of each ESXi server that they believe fixes the issue entirely but can cause performance degradation, 3) Downgrade to 5.5 or below.

    My case has now been open 60 days regarding this issue. It's very disappointing.



  • 15.  RE: unresponsive host

    Posted Aug 24, 2015 07:35 PM

    hello,

    i have the same problem with 2 hp dl 580 G7.

    any chance you could share the script to run on each reboot ?

    thanks !



  • 16.  RE: unresponsive host

    Posted Sep 10, 2015 08:43 PM

    The fix script for this is now available here.

    That KB article seems to have been published today, the same day 6.0 U1 came out.  No mention that 6.0U1 fixes this problem.  In fact it specifically states "After upgrading to or installing ESXi 6.0.x and ESXi 6.0 Update 1, you may experience these symptoms" so one would have to assume the problem still persists in 6.0 U1 also.  If it was fixed in 6.0 U1, I would expect the article to say as much.



  • 17.  RE: unresponsive host

    Posted Aug 13, 2015 05:42 PM

    Troubleshooting a non-responsive host without looking at the logs is not really effective, You can open a service request with VMware.



  • 18.  RE: unresponsive host

    Posted Aug 13, 2015 08:24 PM

    share the log details, Without logs it is hard to find root cause. storage might also be the reason. APD recovery issue still unresolved in 6.0.

    What about VMs on host , are they live when host go unresponsive? Even time sync make host disconnected.



  • 19.  RE: unresponsive host

    Posted Sep 11, 2015 03:15 PM

    Hi,

    Check below article, It could be firewall issue at VC, hope this will help

    VMware KB: ESXi/ESX hosts enter a Not Responding state after connecting to vCenter Server



  • 20.  RE: unresponsive host

    Posted Sep 11, 2015 03:20 PM

    Thanks for the info. I guess I will wait for the patch I don`t want to run into performance issues:(

    VMware really needs to have a warning on their download page that references this KB.



  • 21.  RE: unresponsive host

    Posted Sep 12, 2015 02:12 PM

    Did you try to connect ESXi using VI Client directly?is it working?

    1) I would suggest to try starting managment agent and see if it works.

    2) The very moment you try connecting ESXi to vCenter sevrer , have vCenter Server logs been open to analyse the logs and identify the cause of problem.



  • 22.  RE: unresponsive host

    Posted Sep 15, 2015 01:27 PM

    In my case you couldn`t try it even when accessing it through the console. The host appears to be locked up.

    I have not seen this issue before and I have been working with Vmware since version 3.



  • 23.  RE: unresponsive host

    Posted Sep 16, 2015 09:36 AM

    Neither i had came across such problem.

    In ESXi, there are several menu options provided to test the management network. If the ESX host responds to user interaction, but does not respond to pings, you may have a networking issue.

    As stated by you that you are not even able to access the console. Can you check what the Enclosure says about the state of host? What i suspect is the hardware might be incompatible with version 6 or vice versa.

    try rollback the settings and check if the blade respond well. If yes, then suggest you to first update hardware firmware/bios version and then install ESXi 6.x



  • 24.  RE: unresponsive host

    Posted Sep 16, 2015 09:35 AM

    Neither i had came across such problem.

    In ESXi, there are several menu options provided to test the management network. If the ESX host responds to user interaction, but does not respond to pings, you may have a networking issue.


    As stated by you that you are not even able to access the console. Can you check what the Enclosure says about the state of host? What i suspect is the hardware might be incompatible with version 6 or vice versa.

    try rollback the settings and check if the blade respond well. If yes, then suggest you to first update hardware firmware/bios version and then install ESXi 6.x





  • 25.  RE: unresponsive host



  • 26.  RE: unresponsive host

    Posted Sep 21, 2015 01:45 PM

    ESXi 6.0 network connectivity is lost with NETDEV WATCHDOG timeouts in the vmkernel.log (2124669)



  • 27.  RE: unresponsive host