vCenter

 View Only
  • 1.  Host disconnected from VC and vmware-watchdog: not found

    Posted Jan 15, 2019 02:06 PM

    Hi!

    I have host (esxi 6.0) that are disconnected from VC. I began to study the problem .

    The service hostd don't start.

    [root@esx-35:/var/run/vmware] /etc/init.d/hostd restart

    watchdog-hostd: PID file /var/run/vmware/watchdog-hostd.PID does not exist

    watchdog-hostd: Unable to terminate watchdog: No running watchdog process for hostd

    sh: you need to specify whom to kill

    Ramdisk 'hostd' with estimated size of 1803MB already exists

    [root@esx-35:/var/run/vmware] /opt/vmware/vpxa/bin/vmware-watchdog -r hostd

    -sh: /opt/vmware/vpxa/bin/vmware-watchdog: not found

    [root@esx-35:/var/run/vmware] /sbin/watchdog.sh -r hostd

    nothing

    [root@esx-35:/var/run/vmware] ls -l vmware-hostd.PID watchdog-hostd.PID

    ls: watchdog-hostd.PID: No such file or directory

    -rw-r--r--    1 root     root             8 Jan 15 13:46 vmware-hostd.PID

    hostd.log

    2019-01-15T10:00:16.218Z error hostd[676C1B70] [Originator@6876 sub=SoapAdapter.HTTPService.HttpConnection] Failed to read header on stream <io_obj p:0x667580a4, h:36, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15SystemExceptionE(Connection reset by peer)

    2019-01-15T10:02:59.391Z error hostd[67CC4B70] [Originator@6876 sub=SoapAdapter.HTTPService.HttpConnection] Failed to read header on stream <io_obj p:0x6665fc6c, h:31, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15SystemExceptionE(Connection reset by peer)

    2019-01-15T10:03:24.108Z error hostd[674BAB70] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-22-71b7 user=root] GetPrimitiveParam: Cannot find (help)

    2019-01-15T10:03:24.408Z error hostd[674FBB70] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-a0-71cb user=root] GetPrimitiveParam: Cannot find (help)

    2019-01-15T10:03:24.926Z error hostd[67CC4B70] [Originator@6876 sub=SoapAdapter.HTTPService.HttpConnection] Failed to read header on stream <io_obj p:0x67a3a74c, h:34, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15SystemExceptionE(Connection reset by peer)

    2019-01-15T10:03:24.977Z error hostd[67CC4B70] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-e7-71db user=root] GetPrimitiveParam: Cannot find (help)

    2019-01-15T14:47:58.335Z warning -[FFA75B20] [Originator@6876 sub=Default] Estimated fds limit 4864 > 4096 max supported by setrlimit. Setting fds limit to 4096

    2019-01-15T14:47:58.336Z warning hostd[FFA75B20] [Originator@6876 sub=Default] Unrecognized log/level '' using 'info'

    2019-01-15T14:47:58.380Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Removing duplicate pools.xml entry 'resourcePool[0003]'

    2019-01-15T14:47:58.380Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1'

    2019-01-15T14:47:58.386Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277'

    2019-01-15T14:47:58.386Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277/worldGroup.15702277'

    I see this KB

    https://kb.vmware.com/s/article/1005566

    https://kb.vmware.com/s/article/1003490?1=

    In my case i use LACP. In KB 1003490 i see this:

    • If LACP is enabled and configured, do not restart management services using services.sh command. Instead restart independent services using the /etc/init.d/module restart command.

    I use "services.sh restart" command on this host and on others hosts, Other hosts are ok, but this host are gone crazy)

    Ony ideas?

    P.S. i cant reboot host.



  • 2.  RE: Host disconnected from VC and vmware-watchdog: not found

    Posted Jan 15, 2019 06:32 PM

    Have you tried removing it from vCenter and re-adding it? The host should remain online while this is happening, just make sure you know the Root password before you do this.



  • 3.  RE: Host disconnected from VC and vmware-watchdog: not found
    Best Answer

    Posted Jan 18, 2019 11:07 AM

    The problem is solved!

    look at my hostd.log

    2019-01-17T09:33:59.202Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Removing duplicate pools.xml entry 'resourcePool[0003]'

    2019-01-17T09:33:59.203Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1'

    2019-01-17T09:33:59.208Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277'

    2019-01-17T09:33:59.208Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277/worldGroup.15702277'

    error: Sysinfo error on operation returned status : Operation not permitted. Please see the VMkernel log for detailed error information

    see the 1st line - Removing duplicate pools.xml entry 'resourcePool[0003]

    Then see pools.xml file. Look at this

    <resourcePool id="0002">

        <lastModified>2018-12-18T14:14:46.494756Z</lastModified>

        <name>EIS</name>

        <objID>pool1</objID>

        <path>host/user/pool2</path>

      </resourcePool>

      <resourcePool id="0003">

        <lastModified>2018-12-18T14:14:46.508533Z</lastModified>

        <name>zak.local</name>

        <objID>pool2</objID>

        <path>host/user/pool2</path>

    The bold lines must be differnt! But not i my case, In my case i see this file on other host.

    The zak.local resource pool on other hosts has 3 nesting:host & user & pool2 .

    But the EIS pool on other hosts has 4 nests and on the problem host should be:

    host/user/pool2/pool1.

    Pool1 is taken from <objID> pool1 </ objID>.

    Final fix:

    <resourcePool id="0002">

        <lastModified>2018-12-18T14:14:46.494756Z</lastModified>

        <name>EIS</name>

        <objID>pool1</objID>

        <path>host/user/pool2/pool1</path>

    Any way see other hosta to get logic of this file.

    Then i restart vpxa and hostd and boom!! Host is alive!

    Without reboot host!!!



  • 4.  RE: Host disconnected from VC and vmware-watchdog: not found

    Posted Oct 13, 2021 09:04 PM

    you have to run this command 2 times to simply fix the issue. 1st run will fail and the 2nd run will do the task

     

    /etc/init.d/hostd restart

    /etc/init.d/vpxa restart