Automic Workload Automation

 View Only
Expand all | Collapse all

Strange behavior with CP, JCP and agent reconnect

  • 1.  Strange behavior with CP, JCP and agent reconnect

    Posted Jan 13, 2020 03:08 AM
    My AE System
    12.3.0+hf.3.build.1568920495898 on Windows Server 2012 R2 with Postgresql 10.4
    DB-Agent on the same machine with version '12.3.0+build.1560758094161'

    I run this system with 2 normal CPs and 1 JCP. The MSSQL DB agent was already connected with the system and was working. After a restart of that agent the system refuses to reconnect that agent with this error message

    20200113/063349.415 - U00003412 Agent 'MSSQL' logged on (Client connection='430').
    20200113/063349.431 - U00003590 UCUDB - DB error: 'ERROR: duplicate key value violates unique constraint "pk_mqsrv" DETAIL: Key (mqsrv_name)=(MSSQL) already exists.', '', '', ''
    20200113/063349.431 - U00003366 Connection to agent 'MSSQL' already exists (old connection '*CP002#00000007', new connection '*CP001#00000430').


    The same happens with a JMS and WebService REST agent, but not with a SOAP agent

    I checked the CP process objects in C0000 and in the explorer view I manually shut down CP2 - which was the JCP. Keeping the JCP shut down allowed the DB agent to log on to the system and connect to CP03
    Than I started the JCP and started the REST agent - again successfully.

    To me it seems as if the new dynamic switching of Server process names causes this problem. Does anybody have any insides to this problem or how to solve it?

    Thanks & regards
    Christoph


  • 2.  RE: Strange behavior with CP, JCP and agent reconnect

    Posted Apr 29, 2020 06:56 AM
    Hello Chris,
    We are using following versions of AE & Windows agent. Today we see similar issue where we had to kill the JCP & started them back to get the agent connected. 
    AE version : 12.3.1+hf.1.build.1573752065178
    Windows Agent Version : 12.2.3+build.1558154957665
    Just to follow up raised one case to Automic on this. I will update once received communication from Automic.

    ------------------------------
    Regards,
    Prosenjit
    ------------------------------



  • 3.  RE: Strange behavior with CP, JCP and agent reconnect

    Posted Apr 29, 2020 07:46 AM
    Hi

    seems that my issue was similar/the same.

    https://community.broadcom.com/enterprisesoftware/communities/community-home/digestviewer/viewthread?GroupId=1435&MessageKey=3f372ef0-bd9a-4ad1-826f-ea3f898e5f6b&CommunityKey=2e1b01c9-f310-4635-829f-aead2f6587c4&tab=digestviewer&ReturnUrl=%2fenterprisesoftware%2fcommunities%2fcommunity-home%2fdigestviewer%3fcommunitykey%3d2e1b01c9-f310-4635-829f-aead2f6587c4%26tab%3ddigestviewer

    We had this issue twice, once the Start of a system with a cloned DB, the second one was a crashed OS server which was restarted afterwards.

    The problem was caused by the fact that JCP and common CP "changed" their Number, i.e. CP2 was CP afterwards JCP and vice versa.

    Agent connected to any CP - this one found an existing connection to CP2 and forwarded connection to this CP.
    Unfortunately this was the JCP which wasnt able to handle the agent connection and dumped (but kept running).

    Withing agent log there was no indication that there was anything wrong, the agent log simply stopped after Environment info.

    Solution: stop JCP, start/restart/reconnect  all affected agents and start JCP 

    KR Wolfgang

    ------------------------------
    Support Info:
    if you are using one of the latest version of UC4 / AWA / One Automation please get in contact with Support to open a ticket.
    Otherwise update/upgrade your system and check if the problem still exists.
    ------------------------------



  • 4.  RE: Strange behavior with CP, JCP and agent reconnect

    Posted Apr 29, 2020 07:57 AM
    Hi Wolfgang, 

    2nd solution could be: 

    as @Pete Wirfs wrote 

    delete from mqsrv where mqsrv_name=<agentname>

    https://community.broadcom.com/enterprisesoftware/communities/community-home/digestviewer/viewthread?MessageKey=3077e3e0-909f-43c9-9711-ce793c694615&CommunityKey=2e1b01c9-f310-4635-829f-aead2f6587c4&tab=digestviewer#bm3077e3e0-909f-43c9-9711-ce793c694615

    ------------------------------
    Thx & rgds
    Christian
    ------------------------------



  • 5.  RE: Strange behavior with CP, JCP and agent reconnect

    Posted Apr 29, 2020 08:03 AM
    Hi Christian,

    absolutely correct, BUT: As UC4 dinosaur we learned: Never fiddle around in UC4 DB manually.

    :-)

    cheers, Wolfgang

    ------------------------------
    Support Info:
    if you are using one of the latest version of UC4 / AWA / One Automation please get in contact with Support to open a ticket.
    Otherwise update/upgrade your system and check if the problem still exists.
    ------------------------------



  • 6.  RE: Strange behavior with CP, JCP and agent reconnect

    Posted Apr 29, 2020 09:36 AM

    Hello Everyone,

    While troubleshooting I stopped the Agent, then deleted the concerned windows agent from client 0, 5 mins later I started the agent. 

    I was hoping the stuck connection will get dropped & agent will get connected but it did not happen.

    Still I saw the log as following. Later I stopped & started the CP02 ( which was JCP ). 

    U00003366 Connection to agent 'XYZ' already exists (old connection '*CP002#00000081', new connection '*CP009#00013326')



    ------------------------------
    Regards,
    Prosenjit
    ------------------------------



  • 7.  RE: Strange behavior with CP, JCP and agent reconnect

    Posted Apr 29, 2020 10:58 AM
    LEGAL DISCLAIMER: I take no personal responsibility for any damage you may cause to your AE database by manipulating its contents via SQL.  Test everything carefully!  Pete

    Stop the agent via servicemanagerdialog again, and run this query;
    select * from mqsrv where mqsrv_name=<agentname>
    This query is supposed to return nothing when the agent is stopped.  (you can confirm this by stopping/starting a healthy agent)

    If it returns a row even though the agent is stopped, then you'll need to delete the row like so;
    delete from mqsrv where mqsrv_name=<agentname>

    Then use servicemanagerdialog to start the agent again and it should reconnect.


    My observation has been that this is a new issue with V12.  We never had this issue with V11.

    ------------------------------
    Pete
    ------------------------------



  • 8.  RE: Strange behavior with CP, JCP and agent reconnect
    Best Answer

    Posted Apr 29, 2020 11:57 PM

    Thanks Peter. You are right. It is a bug. Please find following response from Automic. 

    --------------------------------------------------------------------------
    Hi, This was identified as a bug and is fixed with 12.3.2. In the meantime you can use the steps below as a workaround:
    Stop JCP
    Delete agent
    Restart agent
    Start JCP
    For the long term solution please upgrade the Automation Engine to the latest (at the very minimum version 12.3.2).
    --------------------------------------------------------------------------



    ------------------------------
    Regards,
    Prosenjit
    ------------------------------