Automic Workload Automation

 View Only
  • 1.  Agent shown as active although server is disconnected/isolated

    Posted Sep 22, 2020 10:59 AM
    ​Hello

    Since V12 we have the problem that agents do not disconnect themselves in the event of network problems or remain online in the AE for a very long time.
    Jobs start running on the agent (because it's officially online) but then get stuck in the 'Start initiated' status.

    Through tests in advance (network of only one agent switched off), we have adjusted 4 config files, which are now (according to support) involved in agent handling.
    In my opinion, the decisive factor was the value in tcp_retries2, here we could actually influence disconnect times.
    The value 4 caused a disconnect after about 60 seconds.

    /proc/sys/net/ipv4/tcp_keepalive_time = 1200
    /proc/sys/net/ipv4/tcp_keepalive_intvl = 30
    /proc/sys/net/ipv4/tcp_keepalive_probes = 3
    /proc/sys/net/ipv4/tcp_retries2 = 4

    On the last weekend we had a failsave test again (loc.1 is isolated, everything runs on loc.2).
    1700 agents are connected, actually about 800 agents should now be offline.

    After 2 hours only about 100 were gone, the rest was shown as active in the AE, but the servers were not available (as expected).
    The entries in the txp config files showed no effect.

    One of our systems:
    1700 agents, mixed versions 12.2.1, 12.2.4, 12.3.0, 12.3.1 ...
    AE: 12.3.1+build.157046751939

    Does anyone know the problem? and may have already solved it ?
    Thanks in advance
     
    Carsten

    ------------------------------
    Application Analyst Senior
    Postbank Systems
    ------------------------------


  • 2.  RE: Agent shown as active although server is disconnected/isolated

    Broadcom Employee
    Posted Sep 30, 2020 07:00 AM
    Hi, basically the AE and Agents use their own algorithm for the KeepAlive timer. Maybe check the KEEP_ALIVE parameter in UC_HOSTCHAR_DEFAULT and try to decrease the value?

    ------------------------------
    Product Manager - Automation
    CA Technologies, A Broadcom Company
    ------------------------------



  • 3.  RE: Agent shown as active although server is disconnected/isolated

    Posted Oct 01, 2020 02:46 AM
    Hi
    In our test system where we tested the positive results with 1 agent, the value is 180.
    Here the agent logged off after approx. 1 minute. In all other environments, the value is 600.
    During the FLSE test ( failsave test ), nearly nothing was disconnected after 2 hours !
    I think that's not the reason ...


    ------------------------------
    Application Analyst Senior
    Postbank Systems
    ------------------------------