DX Unified Infrastructure Management

  • 1.  Error during gethubs request to hub

    Posted Jun 22, 2016 07:16 AM

    discovery_server log file has a number of errors "Error during gethubs request to hub"

    In most cases, this is because Discovery_server (on our central hub) is attempting to contact a subsidiary hub

    an example would be

         Error during gethubs request to hub /UIM/ldnfpi-mgt01/ldnfpi-mgt01/hub : (2) communication error, I/O error on nim      session (C) com.nimsoft.nimbus.NimNamedClientSession(/10.15.10.57:56336-/10.115.10.74:48083): Read timed out

     

    Note that the hub "/UIM/xxxx-mgt01/xxxx-mgt01/hub" is connected to an intermediary hub by a tunnel

    That is we have tunnels between

    /UIM/xxxx-mgt01/xxxx-mgt01/hub and /UIM/xxxx-rhub01/xxxx-rhub01/hub

    /UIM/xxxx-rhub01/xxxx-rhub01/hub and /UIM/xxxxhubrdr01/xxxxhubrdr01/hub

    /UIM/xxxxhubrdr01/xxxxhubrdr01/hub and the central hub are on the same network (no tunnel)

    There is a firewall between each layer where a tunnel is used and, as advised in previous documentation, we allow ports 48000-48030 through the firewall

     

    In this error, it seems discovery_server is attempting to connect to port 48083 on a server

    (other similar errors reflect different ports all above 48030)

     

    Any ideas where I should look



  • 2.  Re: Error during gethubs request to hub

    Posted Jun 22, 2016 11:15 AM

    Hello,

     

    In IM , try tools>connect to robot and then specify the remote hub through IP, (try both 48000 and 48002 ports) to see if it connects. If it fails it could be a network issue.

     

    Make sure you are not using both a static route (name services in hub gui) to the hub AND a tunnel. That's always a no-no.

     

    Also try using the callback manually form your primary, and ping all your different hub's to see which are not accessible. Press ctrl + p on the hub probe and run the "gethubs" callback.

     

    Additionally, make sure the hub/robot that is using port 48083 has the "first_probe_port" parameter in the robot.cfg set to something like 48003 (default).

     

    Lastly, you can manually check the connectivity from one systems probe to another system's probe by using the "nametoip" callback (press ctrl + p again). Just input the entire address like nimsoft_domain/nimsoft_hubA/nimsoft_robot123/cdm etc.

     

    Cheers,

     

    A



  • 3.  Re: Error during gethubs request to hub

    Posted Jun 22, 2016 12:17 PM

    Thanks for that

    See notes below

     

    In IM , try tools>connect to robot and then specify the remote hub through IP, (try both 48000 and 48002 ports) to see if it connects. If it fails it could be a network issue.

    this work OK

     

    Make sure you are not using a static route (name services in hub gui) to the hub AND a tunnel. That's always a no-no.

    There is a static route to XXXRDG01_HUB on XXXHUBENF01 but no Tunnel

    There is a tunnel from XXXRDG01_HUB to ***-BIR-RHUB01 but no static route (there is a static route to XXXHUBENF01)

    There is a tunnel from ***-BIR-RHUB01 to ***-FPI-MGT01 (the target of the discovery connection), but NO static routes

     

    Also try using the callback manually form your primary, and ping all your different hub's to see which are not accessible. Press ctrl + p on the hub probe and run the "gethubs" callback.

    callback work fine when done like this,

    Some Hubs show a Tunnel Port, and some of those are outside the port range 48000-48030

     

    Additionally, make sure the hub/robot that is using port 48083 has the "first_probe_port" parameter in the robot.cfg set to something like 48003 (default).

    first_probe_port" parameter is set to 48000

     

    Lastly, you can manually check the connectivity from one systems probe to another system's probe by using the "nametoip" callback (press ctrl + p again). Just input the entire address like nimsoft_domain/nimsoft_hubA/nimsoft_robot123/cdm etc.

    OK: when executed against the hub for my failing system, this returns the expected IP and port 48083!

    This is the same port listed in Discovery and causing read timeouts

    Checking the controller on that system shows first_probe_port" parameter is set to 48000

    Infrastructure Manager shows the hub running on port 48002

     

    Checking the “Hubs” tab in the hub GUI for XXXRDG01_HUB, shows the connection is XXXRDG01_HUB port 48083 == ***-BIR-RHUB01 48002

     

    i.e. the downstream hub is listening on the correct port ; the upstream hub is using a non-standard port

     

    maybe this is right and I’m just seeing network issues?

     

    Ivan Blair

    Remote Services Consultant

     

    T:

     

    01924 425229

     

    M:

     

    07713 654904

     

     

     

     

    P Please consider the environment before printing this e-mail.



  • 4.  Re: Error during gethubs request to hub

    Posted Jun 22, 2016 12:45 PM

    It's a possibility it's a network error, firewall, application firewall or antivirus.
    You can use wireshark or tcpdump on the hub that fails to see if it even gets the request etc.

     

    Double check the robot.cfg on that hub and see if it's specified to 48002 as default. Because if it's indeed using port 48003 and it dictates 48002 in cfg then you'll have a problem. Try hardcoding the current port hub is using (48083?) in robot.cfg and bounce the service.

     

    "In IM , try tools>connect to robot and then specify the remote hub through IP, (try both 48000 and 48002 ports) to see if it connects. If it fails it could be a network issue." Use port 48003 as tunneled hubs use this as default.

     

    A



  • 5.  Re: Error during gethubs request to hub

    Posted Aug 28, 2016 11:24 AM

    I see this too - very frequently. There are a number of causes but the number one that isn't obvious is that if there is a tunnel to traverse to get to the destination, the local hub will have a cached list of the IP and port that identifies this side of the tunnel that the destination hub is on the other side of. The mechanism for maintaining this cached list is slow and in my experience subject to an unacceptably high level of error. This list of known hubs will also get polluted with hubs that no longer exist (consider what happens when you rename one for instance) but discovery will keep trying to reach these. Finally, discovery will try to open connections to many hubs at the same time. It is possible that the capacity of the local hub to handle that traffic (from a bookkeeping standpoint, not processing capability) is being exhausted.

     

    -Garin