ESXi

 View Only
  • 1.  ESXi Server Randomly Becomes "Not Responding" and VM Disconnection

    Posted 13 days ago

    I have a cluster of 8 ESXi 6.7 (14320388) servers, Dell R640. Occasionally, random servers go into a "not responding" status, and the virtual machines on them become "disconnected," although the virtual machines on the problematic server continue to run.

    In the /var/log/hostd.log file, there are many lines like this:

    d[2595562] [Originator@6876 sub=IoTracker] In thread 2100290, access("/vmfs/volumes/642fde55-b53efb8c-836f-908d6ec63b42/catalog") took over 15503 sec.

    d[2595562] [Originator@6876 sub=IoTracker] In thread 2100474, access("/vmfs/volumes/642fde55-b53efb8c-836f-908d6ec63b42/catalog") took over 12372 sec.

    This is one of the Dell ME5084 datastores with HDD disks, and there are no alerts in vCenter indicating any errors. I cannot log in through the ESXi web interface because it times out. After entering the password in DCUI, it takes 7-10 minutes to log in. Additionally, when executing any list commands via SSH, the console hangs.

    I have been able to resolve this issue by restarting the ESXi server, but I would like to know if there is a way to solve this problem without rebooting the host.



  • 2.  RE: ESXi Server Randomly Becomes "Not Responding" and VM Disconnection

    Posted 13 days ago
    It's been a long time since I ran v6.7 but I seem to recall that your issue may have been a software bug.
    You should be able to restart the host management agents and get them re-connected without having to reboot.

    Paul Boserup
    Senior Server Engineer
    Information Systems - Technical Services
    Sarasota Memorial Hospital
    603-276-5329 mobile

    *************************************************************************
    Confidentiality Notice: the information contained in this email and any attachments may be legally privileged and confidential. If you are not an intended recipient, you are hereby notified that any dissemination, distribution, or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender and permanently delete the e-mail and any attachments immediately. You should not retain, copy or use this e-mail or any attachments for any purpose, nor disclose all or any part of the contents to any other person.




  • 3.  RE: ESXi Server Randomly Becomes "Not Responding" and VM Disconnection

    Posted 13 days ago

    The root of the problem is you have an issue with that volume.  The host disconnecting from vCenter is a symptom.  Accessing volumes has a higher priority than the vCenter agent, which is what connects to vCenter, the host is more concerned with that volume working than connecting to vCenter.  This will also cause the issue you are seeing logging into DCUI and will cause logging into the host client to take forever also.

    Is it just that volume showing in the logs?  If so something is wrong with it.  Are all 8 hosts showing the same error?  If it is multiple  volumes on that storage array you may have an issue with the storage array or cabling.  You didn't mention if the storage array was FC, iscsi, or NFS.

    Another thing, did you update ESXi and/or firmware lately?  It's possible that could have caused an incompatibility with something.