This is a case where I'd go back to the developer of the "job" and ask them to manage the recurrence more robustly.
Every 30 seconds is a huge number of executions, and each execution contributes (just a feather) to weigh down your infrastructure. Automic handles high executions wonderfully, for the most part. But let's say you have 1000 jobs that run hourly. That's 24k executions per day. Now you add 2800 daily executions for one single job. It's 0.1% of your jobs, and it takes over 10% of your executions! 10% of your history tables, 10% of your WP resources, 10% of communication between master and agent, etc... and moreover, you have set a precedent of treating every execution like it's free.
Especially for a simple job (which it must be, running in less than 30 seconds), it's not difficult to code a loop around a set of commands, no matter what language. Take a parameter for number of loops or number of minutes; run the commands and sleep 30 seconds until the number is met, tracking success or failure; and then exit appropriately (or exit immediately if the error is fatal). For example, let the command run 500 times (about 4 hours if the work takes 1 second) and exit, then restart the job 1 minute later.
You might think at first that you will lose some information this way, but it is usually much better, operationally.
- You can look back over several days of history without scrolling through thousands of records.
- If the Automic master goes down for 2 hours (but the job host is OK), no problem... your job keeps running fine without having to check in with the master; otherwise it would stop running for the whole 2 hours.
- If the runtime of the actual command doubles, you'll see much more clearly when it goes from 4 hours to 4.5 hours, than from .8 second to 1.6 seconds.
- If you have a notification set for every failure, but it's normal to have a few failures occasionally, then you can give the job just a bit of internal logic so it only exits after N number of errors. Otherwise you get lots of unactionable notifications (evil) or you have to code this kind of error handling in UC4 script, which is again more overhead, and usually less transparent to non-experts.
With over 300k activations per day, I feel pretty strongly about this. :smiley: