AutoSys Workload Automation

 View Only

 Job stuck in loop condition due to NotRunning(Job) condition

Jump to  Best Answer
Kurk's profile image
Kurk posted Dec 20, 2024 10:15 AM

So we have a Autosys job (let's say JobA) that's stuck in a loop due to the condition NotRunning(jobA). So, the job continuously restarting itself, moving from STARTING state to RUNNING state, completing either in SU or FA & restarting itself. & This is all happening in seconds. 

We are unable to Kill / Term / Hold the job, as the job is restarting in split seconds. ( we tried pushing the send event commands in a loop using script, but still no use).
We tried updating the JIL, which is not working as the job is RUNNING. 
We can't  put the machine as offline as it has high priority jobs running. 

Any suggestions would be very helpful. Thank you. 

Venkateswarlu Dondapati's profile image
Broadcom Employee Venkateswarlu Dondapati  Best Answer

Hi Kurk,

You are right, job update is not allowed while job is in STARTING/RESTART state.

But the update will be successful once the job moves to FAILURE/SUCCESS/TERMINATED/RUNNING etc states.

You can use a simple bash shell script below to attempt updating the job with global variable in a loop.

The job update will be successful once the job moves out of STARTING/RESTART state. i.e. once the job moves to FAILURE after max retries.  

Below script keep trying to update the job every 1 second in a loop of 1000.

Script exits once the job update is successful or loop of 1000 completes.

Thanks

Venkat D (Broadcom Engineering)

Sample script below. You may want change the job name, global variable name, loop count etc.

=========================

#! /bin/bash
for i in {1..1000}; do
  echo "update_job: jobA condition: v(GBL1)=1 and notrunning(jobA)"|jil
  if [ $? -eq 0 ]; then
   echo "*Job updated successfully*"
   exit 0
  else
   echo "****Attempting to update the job again***"
   sleep 1
fi
done

Bryan Kelleman's profile image
Bryan Kelleman

Create a Global Variable and corresponding value.   Then add a condition to the box/job where if the global variables value is not seen, it cannot run. Starting condition not met. Toggle the GV value as needed to stop/start

re read your post above reply may not work because you can’t update the jil.

Richard Heaney's profile image
Broadcom Employee Richard Heaney

Good morning Kurk,

What Bryan says is correct. If you create a global variable and then run a JIL update command to add this to your condition, it will eventually pick up that new condition and stop itself from running.

I tested this out with a simple job that runs a sleep and had a "not running" condition on itself. It continued to run and I couldn't stop the job with terminated/inactive/on_hold commands. I then created a global variable called stop_job and set it to true and added the global variable to the condition of my job. It looked like this:

condition: n(rh_simple_test_run_c) & v(stop_job) = "false"

On the next run of the job, it stopped as the condition was not met.

One other thing you could do is also remove the condition from the job as this works as well. The next time the job runs, it picks up the updated JIL and will not run.

Cheers

Richie