DX Unified Infrastructure Management

 View Only
Expand all | Collapse all

No contact with Controller

  • 1.  No contact with Controller

    Posted Jun 04, 2020 06:48 AM

    Hi all,

    I am frequently facing this issue like no contact with controller.


    Jun  4 14:49:11:108 [29516] hub: nimSession - failed to connect session to HUBIP:48001, error code 10055
    Jun  4 14:49:11:108 [29516] hub: nimSession - failed to connect session to HUBIP:48001, error code 10055

    Jun  4 14:50:01:381 [14244] hub: (nim_ldap_get_connection): LDAP server spec 'IP of LDAP server' failed (secure=0)
    Jun  4 14:50:01:496 [14244] hub: do_ldap_query [LDAP] - open failed: auth (ldap_simple_bind_s) failed: 'Server Down' (81)

    Jun  4 14:50:01:614 [29012] hub: do_ldap_query [LDAP] - open failed: auth (ldap_simple_bind_s) failed: 'Server Down' (81)
    Jun  4 14:50:01:619 [5356] hub: (nim_ldap_get_connection): LDAP server spec 'IP of LDAP server' failed (secure=0)


    Regards
    Amar


  • 2.  RE: No contact with Controller
    Best Answer

    Posted Jun 04, 2020 08:31 AM
    Looks like the hub is having connection problems to the ldap server. 
    If it is related to the primary hub being overloaded the ldap auth can be moved to a secondary hub. 
    From a general perspective ensure the latest robot_update and hub appropriate for the version of UIM is in place.
    https://support.broadcom.com/external/content/release-announcements/CA-Unified-Infrastructure-Management-Hotfix-Index/7233

    ------------------------------
    Support Engineer
    Broadcom
    ------------------------------



  • 3.  RE: No contact with Controller

    Posted Jun 05, 2020 02:49 AM
    Edited by amar kondraju Jun 05, 2020 02:54 AM
    hi David,

    Where can I get this hotfix7233 as I am unable to get that hotfix from the link you have provied/. We have only one hub and we do not have secondary actually so what should we do to rectify this load ?

    Regards
    Amar


  • 4.  RE: No contact with Controller

    Broadcom Employee
    Posted Jun 05, 2020 03:11 AM
    Hi 

    For UIM 9.02
    Hotfix: hub_7.97HF6
    robot_update-7.97HF9

    Should be downloadable from earlier link 
    https://support.broadcom.com/external/content/release-announcements/CA-Unified-Infrastructure-Management-Hotfix-Index/7233

    or there is something blocking download in your environment


  • 5.  RE: No contact with Controller

    Posted Jun 05, 2020 08:19 AM
    Hi David and Ravi ,

     nimSession - failed to connect session to HUBIP:48001, error code 10055 ---- what does this error mean ? Please let me know how to fix this permanently as we have only one hub .


    Regards
    Amar


  • 6.  RE: No contact with Controller

    Broadcom Employee
    Posted Jun 05, 2020 08:53 AM
    what version of UIM are you running? 
    You can check <nimsoft>\version.txt on the primary hub.
    What version of the controller are you running?
    What version of the hub are you running?
    How many robots do you have reporting to the primary hub as you state you do not have any secondary hubs?
    I would suggest the following two KB articles be applied to the hub and controller.
    hub and tunnel connection settings in 7.x, 8.x, and 9.x (Knowledge Base Articles - 72187)
    https://ca-broadcom.wolkenservicedesk.com/external/article?articleId=72187

    Avoiding communication errors when configuring UIM components and improving tunnel stability for hub-to-hub tunnels (Knowledge Base Articles - 10838)
    https://ca-broadcom.wolkenservicedesk.com/external/article?articleId=10838

    ------------------------------
    Gene Howard
    Principal Support Engineer
    Broadcom
    ------------------------------



  • 7.  RE: No contact with Controller

    Posted Jun 05, 2020 11:55 AM
    10055 is an OS level winsock error being passed back to the application. 

    According to Microsoft, winsocket error 10055 means that your socket or pipe is failing due to lack of buffer. When this happens ports cannot be accessed. Telnet to any port would fail most likely.

    ------------------------------
    Support Engineer
    Broadcom
    ------------------------------



  • 8.  RE: No contact with Controller

    Posted Jun 08, 2020 12:50 AM
    Hi David and Gene,

    So 10055 error belongs to OS related right? How to fix this error in the server?

    This could be fixed at OS level - is my understanding correct ? If so shall we intimate this to server team?



    On UIM front we are using UIM ver 9.0.2 in production and we have only one primary hub and we don't have secondary hub,
    We have integrated primary hub with LDAP .Please suggest the solution on UIM fron what needed to be done to fix this ?



    Regards
    Amar 







  • 9.  RE: No contact with Controller

    Posted Jun 08, 2020 07:30 AM
    Following Gene's recommendations will ensure UIM will overcome communication problems the best it can. However if as the winsock error indicates it will be necessary to engage the network team. telnet is a great tool for testing since it clearly identifies the problem is outside of UIM. So you would remote into the system hosting the primary hub and telnet to the LDAP server.

    ------------------------------
    Support Engineer
    Broadcom
    ------------------------------



  • 10.  RE: No contact with Controller

    Posted Jun 10, 2020 12:24 PM
    Hi David,

    I have verified Gene's links . We dont have any muli-tier hub and we have only one hub.

    We have given the max_heartbeat = 1800 in hub.cfg  under tunnel section but not as per mentioned doc. Also  reuse_async_session = 1 is not there in controller section of robot - IS there any way to bulk update this reuse_async_session = 1? Also error code 10055 is still comingv?

      in the <tunnel> section:

            protocol_mode = 3

            max_heartbeat = 30


    Regards
    Amar




  • 11.  RE: No contact with Controller

    Broadcom Employee
    Posted Jun 11, 2020 08:49 AM

    As Devid has mentioned the error code 10055 is a winsock error being returned by the Operating system.

    please have your system admin look into what is causing the OS to return this error.



    ------------------------------
    Gene Howard
    Principal Support Engineer
    Broadcom
    ------------------------------



  • 12.  RE: No contact with Controller

    Broadcom Employee
    Posted Jun 11, 2020 11:19 AM
    Amar,

    Winsock error 10055 means your server does not have enough buffers to create the connection.  This can be caused by a few things.

    • Windows can handle thousands of connections, you can do a netstat -o and compare the pid to the pids in Task Manager.  This could be an indication of an application not closing it's ports correctly.  
    • Running out of RAM.  Reboot the server and see if the problem goes away.
    • Swap file too small.  I have seen lately Windows Admins putting down small swap files <6Gb and the server has >24Gb of RAM.  Applications are going to swap, just the nature of how Windows Memory Management works.  If the system can't swap to get more pages of RAM, you will see errors, not just with Winsock (on a heavy connection server) but might see Java Memory Dumps etc.  Reboot the server and see if the problem goes away.
    • Application not cleaning up it's own RAM.  There is a version or 2 of the Hub probe that has been seen to use high memory utilization. Can't remember the version but it was a 7.96 or somewhere in that version area.  Check the memory consumption of the hub probe.   Memory will be different depending on the volume of messages a hub processing.  If you have the processes probe with a profile for the hub.exe, look at this history to see if it has a memory leak
    • Also, in Windows the Hub can't have more than 64 permanent or temp queues.  Since the queues are tied to a TCP socket, this might be the problem.  This is just a guess but count them, if you have more than 64, reduce your queue count and see if it goes away.


    ------------------------------
    Customer Success Architect
    Broadcom
    ------------------------------