Automic Workload Automation

  • 1.  AE v9: KillSignal = SIGTERM (too wimpy) or SIGKILL (too mean). Anyone have a better solution?

    Posted Mar 14, 2014 04:13 PM

    Our AppWorx installation (I'm not sure if it was a custom script or a regular feature) had a Unix/Linux kill script which tried a more graceful "kill -15" signal first, but then moved on to "kill -9" if the process or children didn't actually end after a few seconds.

    We've had several occasions lately where the child processes continued to do work (deleting records, no less), even after the job was in ENDED_CANCEL status.

    So, we'll switch to SIGKILL all the time, but since it doesn't allow for graceful shutdown of processes, it's not ideal. I was wondering if anyone's developed a workaround that's more like I described above?



  • 2.  AE v9: KillSignal = SIGTERM (too wimpy) or SIGKILL (too mean). Anyone have a better solution?

    Posted Mar 14, 2014 05:45 PM

    Hi Jessica,

    Doing a -9 kill of a process or its children that is connected to a database is always risky. Because of the child-server nature of it a process could inadvertently be left to spin and continue doing stuff.

    To kill DB related jobs/processes:The best way to kill a database process is to stop the session in the database itself first then work on cleaning up the UNIX processes. I've never used an automated script that can kill a DB session when all you know is a UNIX process ID . . . I've only done that manually.

    An script could be written though. If you want to try look at:http://www.dba-oracle.com/tips_killing_oracle_sessions.htm. . . It will take time to perfect an automation and you could end up with a messed up DB. Fun Sunday afternoon activity.

    For UNIX/Linux processes only: If SIGTERM/-15 doesn't work and you need to resort to a SIGKILL/-9, use this command on the PID to make sure it kills the whole tree (in fact you should do this for the -15 too):

    kill $(ps -o pid= -s $(ps -o sess --no-heading --pid $PID))

    The command as is does a SIGTERM (kill -15). You can add in -9 to do a SIGKILL:

    kill -9 $(ps -o pid= -s $(ps -o sess --no-heading --pid $PID))
    Definitely test this one out. Its always worked for me but its one typo away from causing a lot of damage.

    Best regards,
    Marc

    Extra note: you can also kill a whole Process Group in UNIX/Linux. This way needs to be used with caution since you could easily kill too much depending on how you found the process group ID.
    Depending on how something is spawned you might cause a lot of chaos:

    kill -9 -parentprocessid

    So killing the Parent Group of a parent process with a PID of 123 would look like this:

    kill -9 -123

    This will make sure all the children are killed when you have to resort to -9. You can do softer SIG's as well before you resort to this.
    From: http://stackoverflow.com/questions/392022/best-way-to-kill-all-child-processes/15139734#15139734 












  • 3.  AE v9: KillSignal = SIGTERM (too wimpy) or SIGKILL (too mean). Anyone have a better solution?

    Posted Mar 24, 2014 06:28 PM
    Thanks, Marc.

    I guess the problem is that we need an automated solution, but would prefer one that is smart enough to try the gentle kill first, then the hard kill. 

    It seems like anything scripted to handle the child processes would have to actually take place before killing the job through the Automation Engine -- because if you don't have PIDs for the child processes ahead of time, you can't go back and find the (now-orphaned) children of a parent that doesn't exist anymore.

    So the short answer appears to be "no," no one's solved this problem yet. :-)

     



  • 4.  AE v9: KillSignal = SIGTERM (too wimpy) or SIGKILL (too mean). Anyone have a better solution?
    Best Answer

    Posted Apr 04, 2014 11:50 AM

    The kill signal is normally SIGTERM.  It can be control by changing the agent's INI file.

    ; KillSignal=SIGKILL
    ; FT_Owner=user

    I don't think there is anyway for Automic to control the killing of Oracle sessions that are left behind after the OS process is killed.



  • 5.  AE v9: KillSignal = SIGTERM (too wimpy) or SIGKILL (too mean). Anyone have a better solution?

    Posted Apr 05, 2014 10:46 AM

     

    Thanks, Steven as well. I do know about the ini file, but I was hoping for a compromise like AppWorx had - to first try SIGTERM, and if the process didn't exit, then try SIGKILL. But it seems we'll have to settle for one or the other. Marking the question answered so the forum quits bugging me about it!