Automic Workload Automation

Back to discussions

Expand all | Collapse all

Way to monitor or properly report execution of nohup script

1. Way to monitor or properly report execution of nohup script

Recommend
Michael_Coxson_5769
Posted Apr 18, 2017 04:05 PM

Reply Reply Privately
A user has requested we develop a method to properly monitor the execution of a script being run as nohup. Has anyone dealt with this before? I can imagine there are a few methods to do so, but I wanted to see if there's a tried-and-true way before I start in on testing my ideas.

Any input is appreciated!
2. Way to monitor or properly report execution of nohup script

Recommend
Michael_Coxson_5769
Posted Apr 19, 2017 10:19 AM

Reply Reply Privately
After a little thinking I believe I've come up with a decent solution that a) gets the correct approximate runtime b) captures log output and c) succeeds/aborts based on the log content.

#!/bin/ksh

:SET &LOGRUNID# = RUNNR2ALPHA(&$RUNID#)

nohup sh -c '&COMMAND# >> /var/automic/agent/out/O&LOGRUNID#.TXT' & PID=$!
while ps -p $PID > /dev/null
do
sleep 5
done
if grep -q &SUCCESS# /var/automic/agent/out/O&LOGRUNID#.TXT; then
exit 0
else
exit 1
fi
3. Way to monitor or properly report execution of nohup script

Recommend
Carsten Schmitz
Posted Apr 19, 2017 11:42 AM

Reply Reply Privately
Hope this isn't too bluntly put, but I'm not sure what one would be trying to achive with this.

Nohup, in my understanding, does two things: It detaches a process from it's controling terminal, and makes it ignore the HUP signal, which (usually, on a modern UNIX-like OS) is sent by the shell to it's children upon closing of the controlling terminal. In UC4s case, there is no controlling terminal, the agent is the parent for the jobs.

The former, i.e. the detaching of stdin, stdout from the controlling parent, probably causes UC4 to loose control over a nohub job, meaning it can not monitor it's status any longer.

I assume the motivation behind the "nohup" in the first place is to keep a process alive despite it loosing its controlling terminal.

But I'm not even sure whether jobs started by the agent do receive a HUP signal at all when the controlling process (i.e. the agent dies), because the agent isn't a shell. My gut feeling (based on some experiments with crashing agents on Windows) is, the processes gets taken over by "init" and continues to run, even without the nohup. One would probably have to test what happens to child processes of an UC4 agent on UNIX when the agent dies; it may turn out that the "nohup" is entirely redundant.

If, however, a "nohup" would in fact be sent, then yes, your process would ignore it due to the nohup. But the surrounding shell script, which would be kept alive by means of the loop, would be suspectible to that signal still, and would terminate. Now the wrapped process would definetly be taken over by init and continue to run, but since your shell script is now dead, it will never get to the "grep" part, and no reporting back into UC4 will happen. So this may just wrap a certain problem into a new, slightly more complicated layer.

I'm not sure what the eventual solution is, that depends on your actual reason for the nohup. I would possibly look into "disown" and "screen", which are alternate ways of keeping processes alive. If your actual command is a program, you might also be able to have it manipulate it's signal mask itself to ignore any unwanted signals.

Best,
Carsten
4. Way to monitor or properly report execution of nohup script

Recommend
Michael_Coxson_5769
Posted Apr 19, 2017 12:38 PM

Reply Reply Privately
Thanks for the input Carsten. I was boggling about what they would need it for as well.
5. Way to monitor or properly report execution of nohup script

Recommend
MikeBurnham603785
Posted Apr 19, 2017 07:37 PM

Reply Reply Privately
I've needed to deal with a few cases where we were running a binary that created a log we needed to watch, or where we were taking over a script that used to be run by an operator that hadn't been updated.

I didn't want to rely on "time boxing" to identify the log, so we backgrounded it (using &) to get the child PID. I did it in a bourne shell script called by a job, not directly in JCL, but it probably isn't different. You need to tell the shell to wait for the background process.

I agree there's no need for the nohup that I can see, it just creates output redirection issues and maybe would mess with the ability to cancel a job from Automic and kill the children too.

This is a way to deal with the background/wait scenario. You can add nohup and output redirection around MYCOMMAND if needed.

MYCOMMAND &
PROCESS_PID=$!
echo "PID ${PROCESS_PID}"
wait ${PROCESS_PID}
rtn=$?
echo "status $rtn"
exit $rtn

Automic Workload Automation

Way to monitor or properly report execution of nohup script

Michael_Coxson_5769Apr 18, 2017 04:05 PM

Michael_Coxson_5769Apr 19, 2017 10:19 AM

Carsten SchmitzApr 19, 2017 11:42 AM

Michael_Coxson_5769Apr 19, 2017 12:38 PM

MikeBurnham603785Apr 19, 2017 07:37 PM

1. Way to monitor or properly report execution of nohup script

2. Way to monitor or properly report execution of nohup script

3. Way to monitor or properly report execution of nohup script

4. Way to monitor or properly report execution of nohup script

5. Way to monitor or properly report execution of nohup script