Automic Workload Automation

 View Only
  • 1.  After upgrade from 11.2 to 12.3, command style unix jobs have 'ENDED_VANISHED' status

    Posted Nov 20, 2019 11:06 AM
      |   view attached

    We recently upgraded from v11.2 to v12.3, and are having a problem with Unix jobs that we run with job type 'Command'.  We have many jobs set up like this and they were running fine on v11.2 but on v12.3 even though the job log shows that it finished successfully, the job ends up with at status of 'ENDED_VANISHED'.    The jobs do not fail every time, and when they do fail, we can restart the job and it finishes successfully.

     

    If we change the job type to 'Shell Script' and rework our commands to run that way, we don't seem to have this problem.   However, this would affect a large number of jobs and we do not want to make the changes unless absolutely necessary.    Is there something we may have missed in our configuration to take care of this?

     

    Here's a link to the documentation where it describes these two job types:

    https://docs.automic.com/documentation/webhelp/english/AA/12.3/DOCU/12.3/Automic%20Automation%20Guides/help.htm#AWA/Objects/obj_job_UNIX.htm?Highlight=unix%20job

    We see this in the unix agent log:

    MAIN-THREAD      20191119/133356.354 start_job <-- (terminated normally, Result (pid): '50754')

    MAIN-THREAD      20191119/133356.354 process_aktj: result of function start_job()-call : 50754

    MAIN-THREAD      20191119/133356.354 process_aktj: job was started with fork() call!

    MAIN-THREAD      20191119/133356.354 CheckPidJEntry(search pid  50754) -->

    MAIN-THREAD      20191119/133356.354 CheckPidJEntry <-- (terminated normally, entry unavailable)

    MAIN-THREAD      20191119/133356.354 process_aktj: job-entry with process ID '50754' was not found!



    ------------------------------
    Cloud Engineer
    ------------------------------


  • 2.  RE: After upgrade from 11.2 to 12.3, command style unix jobs have 'ENDED_VANISHED' status

    Posted Nov 20, 2019 11:33 AM
    This is only for Unix/Linux jobs I assume. Do you start all these jobs with the same user?

    All of this is still pretty much a stab in the dark, but try checking if the "dot files" for the user the agent runs as have anything out of the ordinary. ENDED_VANISHED would probably indicate that the agent has lost "track" of the OS process it started. Since you say that it only fails for "command" and not "shell script", I'd possibly check the .profile, .bash_profile etc. for the user used in the login objects, and also for the user the agent actually runs as. Also check the "generic" aequivalent files in /etc. Check if there's something that could cause a uid switch or a subshell to be spawned when the user logs in. You'll probably want your user profiles for Automic job execution to be clean of those things.

    You can also make a long running job with a command such as "sleep 500" and bring up a root shell on the agent machine and look at how that executes, with ps, pstree and (advanced) strace -f. See if the child process (the "sleep 500") stays alive or if it dies somehow, warranting an actual "ENDED_VANISHED" message. If it dies, one would need to find out why.

    If jobs only fail occasionally however and not always, I could also imagine that the fork to a child process fails. You might need to use strace to figure out why that is, e.g. in theory your available PID or your open files could be exhausted. In that vein, also check if there is a ulimit set. Process spawning could fail occasionally because you hit the limit of open files/file handles, I wager that might possibly also produce an ENDED_VANISHED (though with proper programming at Automic's end should not). You probably want ulimit -a to effectively say "unlimited" for open files and such.

    You can also enable a trace (=9 is maximum) on the agent, in the Admin view's advanced options for that particular execution agent. The trace files migtht hold clues.

    Hth,
    Carsten


  • 3.  RE: After upgrade from 11.2 to 12.3, command style unix jobs have 'ENDED_VANISHED' status

    Posted Nov 20, 2019 04:00 PM
      |   view attached
    Carsten, thank you for your input.  In our migration from 11.2 to 12.3, we are using the same userID's as previously.  Another thing to note is that when a Vanished Job is re-run, it typically succeeds, leading me to think it's not a .bash_profile issue. *We are continuing to explore those options though, potentially the file limit idea!  I would like to add a screenshot - Some our of WP's are not being assigned ports, which is probably not ideal.   I'm a little stumped as to how to have these add port assignments.  


    ------------------------------
    Cloud Engineer
    ------------------------------



  • 4.  RE: After upgrade from 11.2 to 12.3, command style unix jobs have 'ENDED_VANISHED' status
    Best Answer

    Broadcom Employee
    Posted Nov 21, 2019 05:52 AM
    Hi @Jennifer LeBlanc,
    There are some changes in V12.3 around port assignments. In V12.3, the PWP is the only WP that must have a port assigned.

    This information can be found in the Incompatibilities checklist between between Version 12.2 and 12.3 doc. 
    We always list any changes that could affect upgrading customers between all the versions - Checking for incompatibilities
     
    David



    ------------------------------
    Senior Product Line Manager - Automation
    CA Technologies, A Broadcom Company
    ------------------------------



  • 5.  RE: After upgrade from 11.2 to 12.3, command style unix jobs have 'ENDED_VANISHED' status

    Posted Nov 21, 2019 09:17 AM
    Thanks David, much appreciated!

    ------------------------------
    Cloud Engineer
    ------------------------------