Automic Workload Automation

 View Only
  • 1.  Agents repeately disconnecting every monday from Automic

    Posted Sep 07, 2020 01:05 PM
    Hello, dear community members,

    I am kinda lost with the issue we are facing for over 2 months. Let me briefly describe our environment first. We do have two separate instances of Automic, running on Windows Server 2016 R2 with the following config:

    Automic v12.3
    MS SQL 2014 std edition

    We are currently facing a very weird scenario (in both environments). Every Monday morning, at (approximately) the same time, all the connected agents disconnect from the environment and after a certain period (usually a couple of minutes) connect back. In the agent log files, we observe the following messages:

    Win agents:

    20200907/163926.447 - U02000042 Connection aborted. Error code '10053', error description: 'An established connection was aborted by the software in your host machine.'.
    20200907/163926.447 - U02000010 Connection to Server 'ILMARINEN_1' terminated.
    20200907/163926.447 - U02000072 Connection to system 'UC4PE' initiated.
    20200907/163926.447 - U02000011 Connection to Server '172.23.248.35:2217' initiated.
    20200907/163926.447 - U02000011 Connection to Server '172.23.248.36:2218' initiated.
    20200907/163926.463 - U02000011 Connection to Server 'C105S273VM014:2219' initiated.
    20200907/163931.088 - U02001040 Error in function 'Connect', error code '10022', error description: 'An invalid argument was supplied.'.
    20200907/163931.088 - U02000012 Connection to Server 'C105S273VM014:2219' denied.
    20200907/163931.088 - U02000011 Connection to Server 'C105S273VM014:2220' initiated.
    20200907/163935.666 - U02001040 Error in function 'Connect', error code '10022', error description: 'An invalid argument was supplied.'.
    20200907/163935.666 - U02000012 Connection to Server 'C105S273VM014:2220' denied.

    UX agents:

    20200907/071115.727 - U02003044 Invalid 'send' call, socket '0'. Error code: ('88' - 'Socket operation on non-socket')

    or

    20200907/065125.119 - U02003044 Invalid 'read' call, socket 'UC4TE#CP005'. Error code: ('104' - 'Connection reset by peer')
    20200907/065125.119 - U02000010 Connection to Server 'C105S1449VM011:2219' terminated.
    20200907/065133.173 - U02003044 Invalid 'read' call, socket 'UC4TE#CP006'. Error code: ('104' - 'Connection reset by peer')
    20200907/065133.173 - U02000010 Connection to Server 'C105S1449VM011:2220' terminated.

    as/400:

    20200907/065125.119 - U02003044 Invalid 'read' call, socket 'UC4TE#CP005'. Error code: ('104' - 'Connection reset by peer')
    20200907/065125.119 - U02000010 Connection to Server 'C105S1449VM011:2219' terminated.
    20200907/065133.173 - U02003044 Invalid 'read' call, socket 'UC4TE#CP006'. Error code: ('104' - 'Connection reset by peer')
    20200907/065133.173 - U02000010 Connection to Server 'C105S1449VM011:2220' terminated.

    ...and quite similar messages on mainframe agents.

    We haven't found a root cause for this yet. Most of the agents are on v12 version. What we tried so far was, to send a periodical telnet attempts from the agents towards the application server - no gaps. Disabled Mcaffee software - no help. Allowed all the ports on the windows firewall - no help. Investigated from DB perspective - no help. Check for the event log entries - no help. The only agents that remain connected are the agents on the localhost or agents on the same network as the application server. It is very weird, that it's happening every Monday at a similar time. Would you please suggest, what should we try to identify the problem? Is there anyone facing the same issues?

    I also tried to change values in UC_HOSTCHAR_DEFAULT to these:

    KEEP_ALIVE 60
    RECONNECT_TIME 180

    No help at all.

    I am getting very frustrated with this issue and would appreciate any kind of help. Thanks for your suggestions!

    PS: Broadcom support closed the case that it's not caused by automic (which seems to be obvious).

    From my perspective, it looks like there is ran some routine outside of Automic (backup, monitor, vulnerability scan, ...) that is causing that. I would love to know a way how to identify what is causing that.

    Thanks again for any input!


  • 2.  RE: Agents repeately disconnecting every monday from Automic

    Broadcom Employee
    Posted Sep 08, 2020 02:22 AM
    Hi Vojtech,
    you should activate the trace mode in one of the Agent an hour or so before the typical connection loss time.
    I would suggest to start with the value 3 and if does not provide you with new insights use the value 9.

    You can activate the trace mode in the [Trace]  section of the INI file of an Agent. Use the tcp/ip=3 or tcp/ip=9



    ------------------------------
    Sr. Solution Architect
    Broadcom
    ------------------------------



  • 3.  RE: Agents repeately disconnecting every monday from Automic

    Posted Sep 08, 2020 03:57 AM
    Thanks a lot Kay, I will try to start with that trace of 3 for TCP on one of the agents. Anyway, do you have any idea, what can cause these troubles? Thanks in advance.


  • 4.  RE: Agents repeately disconnecting every monday from Automic
    Best Answer

    Posted Sep 08, 2020 11:11 AM
    I would want to know exactly when and how your AE servers are backed up.  Some backup procedures include a minor stun when they complete, and sometimes a server stun of 1-2 seconds can cause agents to "disconnect".  But they reconnect the next time they try.  (Personal experience.)

    ------------------------------
    Pete Wirfs
    SAIF Corporation
    Salem Oregon USA
    ------------------------------



  • 5.  RE: Agents repeately disconnecting every monday from Automic

    Posted Sep 17, 2020 04:12 AM
    Hi Kay, can you please describe a difference between debug level 3 and 9? Which one do you recommend to set?


  • 6.  RE: Agents repeately disconnecting every monday from Automic

    Broadcom Employee
    Posted Sep 17, 2020 07:29 AM
    Hi Vojtech,
    it defines the level of details.  Start with level 3 and it does not help you, use level 9 but this may generates a of debug data.
    Regards
    Kay

    ------------------------------
    Sr. Solution Architect
    Broadcom
    ------------------------------



  • 7.  RE: Agents repeately disconnecting every monday from Automic

    Posted Sep 22, 2020 04:27 AM
    Hello, I was able to get a debug logs of tpc/ip level 3 from one of the connected linux agents. These are here:

    https://www.dropbox.com/s/rgehpwy4ik58b7t/debug_logs.zip?dl=0

    I am unfortunately unable to investigate these. Can someone take a look on these logs and possibly check whether there is any information that could lead to finding a rootcause here?

    Thanks in advance! I do owe a beer to everyone helping out here :)


  • 8.  RE: Agents repeately disconnecting every monday from Automic

    Posted Sep 22, 2020 07:16 AM

    Hi,

     

    i found the following message in your logfile

     

     

    So please check if any other process is using this port or change the port in the agent ini file and restart the agent.