ESXi

 View Only
  • 1.  Host connection failure

    Posted Dec 07, 2017 09:15 AM

    Dears,,

    I am facing a strange issue, i have a VSAN cluster, one of the ESXIs hosts every day appears as "not responding" on the Vcenter, however, i can successfully ping the host and i can access the host DCUI normally through ILO. the host return to the normal state after restart.

    Please  guide me if you have any suggestions about how to stop this behavior.

    Attached the "event log".



  • 2.  RE: Host connection failure

    Posted Dec 07, 2017 09:35 AM

    Hi Tarigx

    Can you send a ping to the another VSAN host over its VSAN-VMKernel with vmkping ?

    Is it only in vCenter "not responding" or are the other vSAN hosts also complaining that the host is gone....

    best regards

    Dave



  • 3.  RE: Host connection failure

    Posted Dec 07, 2017 09:43 AM

    Hi dgreebe

    - i did not try vmkping. but i will if it happened again today.

    - Yes other hosts are complaining, and HA is not working at the type of not responding message.



  • 4.  RE: Host connection failure

    Posted Dec 07, 2017 10:03 AM

    Tarigx,

    Please try the vmkping when that host is according vCenter "not responding".

    Try the vSAN vmkernel and also the one where you have configured your "management"

    The fact is that the VSAN HA will go over the VSAN kenelport, but the connection with vCenter is over your management-kernel.

    If both are using the same NIC, that my advise is to check the drivers and firmware of that NIC  and check if they are on the HCL of VMWare.

    Hope to hear from you tomorrow and hopefully with some more information.

    When HA is not working to fail over, it seems that there is still some kind of connection i think.

    What is the setup of your HA ? What is your FTT of your vSAN ?



  • 5.  RE: Host connection failure

    Posted Dec 11, 2017 01:48 PM

    Dear Dgreebe

    I am facing now the not responding issue, The vmkping is working properly from all othe ESXI's to the affected ESXI. i am using Full automated DRS. and n+1 configuration in VSAN .



  • 6.  RE: Host connection failure

    Posted Dec 07, 2017 09:46 AM

    Hi,

    Are you able to login to the host using DCUI or you just see the logon screen?

    If the host is not responding it indicates that hostd agent is not responding. Please check the hostd log in the host with the timestamp and check if any backtrace reported also check in /var/core location for any hostd dump file

    hostd log location --> /var/log/hostd.log or /scratch/log/  find for the old hostd logs in case the logs are rolled over

    Update the hostd log here if you cant find any clue.



  • 7.  RE: Host connection failure

    Posted Dec 07, 2017 10:16 AM

    HI,

    Yes i am able to login normally through the DCUI.



  • 8.  RE: Host connection failure

    Posted Dec 11, 2017 08:38 PM

    Can you share your vpxd.log from the vcenter during a time when the host becomes disconnected? I am facing a similar issue, want to compare your logs to mine.



  • 9.  RE: Host connection failure

    Posted Jan 31, 2018 01:23 PM

    Same issues here - it is a disaster.   We see these alerts generating constantly, every single night, and nothing in the logs that would indicate what is causing the problem, no backups occurring, log entries are empty and all of the sudden VMs showing as disconnecting, then host not responding, then the disconnect, then sometime later, often times within seconds, but other times up to 40 minutes later, I see "Established a connection"..  

    I don't believe the VMs or the host are actually disconnecting from the network, otherwise, we would see other alerts triggered from our monitoring system that has hooks directly into the guest OS and would page our support staff.

    I see the KB about increasing the handshakeTimeoutMs value, VMware Knowledge Base

    And I agree that in many cases this could work to relieve the alerts, but then again, when these alerts appear in the logs they aren't being triggered and DO NOT even show up as triggered alerts in the web client under triggered alerts..  BOGUS!    

    We recently consolidated our vCenter installations from 4 to 2, so there are now more hosts managed under a single vCenter, but none the less, we only have 72 hosts.  this is not a large inventory..  and we deploy vCenter as if we have max size environment...