Automic Workload Automation

 View Only
  • 1.  Agents not shown active after DB clone

    Posted Apr 17, 2020 04:14 AM
    Hi guys,

    Question to the knowing: Why do the agents not show up as running in AWI?

    We performed a DB Clode from Test env to sandbox which went fine. (COLD start performed)
    We did this in the past (V11.2.X) several times successfully.

    Now I am restarting the agents as usual after renew transfer key but some SQL Agents are still shown as inactive in AWI (AE V12.3.0)
    In SMGR and in teh log I can see they connect to the System - as shown below.

    20200417/093931.489 - ------------------------------------------------------------------------------------------
    20200417/093931.490 - U02000071 Current directory: /opt/uc4/V12_3/agents/postgres/bin
    20200417/093931.490 - U02000066 Host information: Host name='XYZ', IP address='XYZ'
    20200417/093931.569 - U02000153 The JVM Option HeapDumpOnOutOfMemoryError is enabled.
    20200417/093931.577 - U02000072 Connection to system 'GMUC4' initiated.
    20200417/093931.578 - U02000011 Connection to Server 'XYZ' initiated.
    20200417/093931.591 - U02000004 Connection to Server 'GMUC4#CP004' successfully created.
    20200417/093931.591 - U02000075 CP Server 'GMUC4#CP004' has '1' client connections.
    20200417/093931.658 - U02000020 Environment: Hardware = 'SQLPOSTGRESQL'.
    20200417/093931.659 - U02000021 Environment: Software = 'SQLPOSTGRESQL'.
    20200417/093931.659 - U02000022 Environment: SW version = '1.0'.

    here the log stops.


    What I exactly did:
    renewed transferkey
    stopped & started agent

    2nd. try
    deleted ucxjsqlx.kstr
    renewed transferkey
    stopped & started agent

    3rd. try
    deleted ucxjsqlx.kstr
    deleted Agent object in Clt0
    started agent

    The agents that came up correctly do have following additional entries in their log:
    20200416/124148.820 - U02000017 The check interval for 'Jobs' has been set to '60' seconds.
    20200416/124148.820 - U02000017 The check interval for 'Server' has been set to '660' seconds.
    20200416/124148.820 - U02000017 The check interval for 'Reconnect' has been set to '600' seconds.
    20200416/124148.821 - U02000017 The check interval for 'Report' has been set to '60' seconds.
    20200416/124148.878 - U07001001 Charset used by the Agent: 'ISO-8859-15'

    There are only SQL Agents affected and although I deleted their keystore file and the agent object in clt0 itself
    I am unable to start them that they show up in clt 0 in AWI.

    Any hints?

    many THX
    Wolfgang


    ------------------------------
    Support Info:
    if you are using one of the latest version of UC4 / AWA / One Automation please get in contact with Support to open a ticket.
    Otherwise update/upgrade your system and check if the problem still exists.
    ------------------------------


  • 2.  RE: Agents not shown active after DB clone
    Best Answer

    Posted Apr 20, 2020 09:43 AM
    Just for the sake of completeness, my colleague found the cause. He opened the explanation with the words "you won't believe it..."
    And yeah, I did not believe it....

    What you should need to know beforehand:
    Before the DB export CP5 was a common and real CP.
    After the DB import CP5 was a JCP....

    We did a COLD start
    The renewed all Transfer keys and some agents kept offline (in AWI) but log was ok on the first glance.

    Cause was: the CP(new CP2) remembered the old connection to previous CP(old CP5) and tried to handover to the same CP (new CP5 = JCP).
    NOW CP5 was a JCP which was unable to handle the logon attempt and performed a Memory Dump but nothing else - no entry in Agent log.

    RA WS Agent log snip
    20200420/130309.493 - U02000072 Connection to system 'UC4server' initiated.
    20200420/130309.493 - U02000011 Connection to Server 'UC4server:Port' initiated.
    20200420/130309.509 - U02000004 Connection to Server 'UC4server#CP002' successfully created.
    20200420/130309.510 - U02000075 CP Server 'UC4server#CP002' has '1' client connections.
    20200420/130309.547 - U02000020 Environment: Hardware = 'CIT'.
    20200420/130309.547 - U02000021 Environment: Software = 'CIT'.
    20200420/130309.547 - U02000022 Environment: SW version = '7.2.0+build.1'.
    20200420/130515.013 - U02000017 The check interval for 'Jobs' has been set to '60' seconds.
    20200420/130515.013 - U02000017 The check interval for 'Server' has been set to '660' seconds.
    20200420/130515.013 - U02000017 The check interval for 'Reconnect' has been set to '600' seconds.
    20200420/130515.014 - U02000017 The check interval for 'Report' has been set to '60' seconds.
    20200420/130515.023 - U02013327 Timestamp of the Solution in the local file system: 'Thu Feb 14 13:32:38 CET 2019'
    20200420/130515.023 - U02013326 Timestamp of the Solution in the database: 'Thu Feb 14 13:32:38 CET 2019'

    CP002 log snip:
    20200420/130309.527 - U00003412 Agent 'RAWS02' logged on (Client connection='68').
    20200420/130309.547 - U00003366 Connection to agent 'RAWS02' already exists (old connection '*CP005#00000049', new connection '*CP002#00000068').
    20200420/130309.548 - U00003490 Connection to 'UC4server:Port' initiated, client connection '69(30)'
    20200420/130309.557 - U00003489 Server 'UC4server#CP005' logged on (Client connection='69').
    20200420/130514.501 - U00003316 Zero Downtime information: MixedMode='N', base MQSet='1', active MQSet='1', own MQSet='1', MQSet PWP='1'.
    20200420/130514.553 - U00003472 Connection to Server 'UC4server#CP005' has been closed.
    20200420/130514.553 - U00003407 Client connection '69(29)' from 'UC4server:Port' has logged off from the Server.
    20200420/130530.403 - U00003397 Agent 'RAWS02' logged off (client connection='68').

    CP005 Log snip:
    20200420/130309.549 - 25 U00003406 Client connection '16' from 'UC4server:Port' has logged on to the Server.
    20200420/130309.564 - 25 U00009907 Memory dump 'Unknown message from 16' (Address='n/a', Length='000099')
    20200420/130309.566 - 25 00000000 F4600052 41575330 32202020 20202020 >ô`.RAWS02 <
    20200420/130309.566 - 25 00000010 20202020 20202020 20202020 20202020 > <
    20200420/130309.566 - 25 00000020 2020202A 43503030 35233030 30303030 > *CP005#000000<
    20200420/130309.566 - 25 00000030 34392020 20202020 20202020 20202020 >49 <
    20200420/130309.566 - 25 00000040 2020202A 43503030 32233030 30303030 > *CP002#000000<
    20200420/130309.566 - 25 00000050 36382020 20202020 20202020 20202020 >68 <
    20200420/130309.566 - 25 00000060 202020 > <
    20200420/130309.568 - 29 U00003489 Server 'UC4server#CP002' logged on (Client connection='16').

    After stopping new CP5 (JCP) and restarting the agents everything worked well.
    I think there is a missing feature in connect handling of the CPs (or JCPs) and log message in agent logs.

    KR Wolfgang

    ------------------------------
    Support Info:
    if you are using one of the latest version of UC4 / AWA / One Automation please get in contact with Support to open a ticket.
    Otherwise update/upgrade your system and check if the problem still exists.
    ------------------------------



  • 3.  RE: Agents not shown active after DB clone

    Posted Apr 20, 2020 11:52 AM
    Makes me wonder if this problem might be a cousin to the"agents won't reconnect after AE reboot for maintenance" problem, where the root cause is that AE leaves connection data in the mqsrv table that has to be cleared for proper operations?

    ------------------------------
    Pete
    ------------------------------



  • 4.  RE: Agents not shown active after DB clone

    Posted Apr 21, 2020 05:04 AM
    Hi Pete,

    sounds valid - 

    The best practice would be starting AE up until everything is connected and afterwards start JCPs.

    Otherwise its not the best practice because many functions are not available without JCP...

    KR Wolfgang

    ------------------------------
    Support Info:
    if you are using one of the latest version of UC4 / AWA / One Automation please get in contact with Support to open a ticket.
    Otherwise update/upgrade your system and check if the problem still exists.
    ------------------------------