One of a Unix job in our shop fails with a SUBERROR intermediately and when restarted completes successfully and last by reviewing the agent log figured out an error on the runner_os_component.log.
Fri Aug 10 06:00:09 2018: Preparing job BDM99405/UBDMBTCH.5640/MAINFri Aug 10 06:00:13 2018: Cannot fork a new process to execute the job:BDM99405/UBDMBTCH.5640/MAIN, reason:Not enough spaceFri Aug 10 06:00:13 2018: Job BDM99405/UBDMBTCH.5640/MAIN failed - Submission errorFri Aug 10 06:00:13 2018: Transmitter: Sending AFM: 20180810 06001300+0500 JavaAgent#tcpip@TSO4_MANAGER OS_COMPONENT BDM99405/UBDMBTCH.5640/MAIN State SUBERROR Failed SetEnd Status(Submission error) Cmpc(12)
While surfing the CA Knowledge base, found the below information. Which pinpoints the problem as low swap space.
Jobs on a UNIX Workload Agent are failing with sub - CA Knowledge
However the server seems to be having enough swap space,
load averages: 38.30, 38.42, 38.86 14:58:23
185 processes: 156 sleeping, 1 running, 5 zombie, 23 on cpu
CPU states: 45.1% idle, 24.7% user, 30.2% kernel, 0.0% iowait, 0.0% swap
Memory: 128G real, 53G free, 33G swap in use, 35G swap free
total: 27736336k bytes allocated + 6985240k reserved = 34721576k used, 36502440k available
/dev/swap 4294967295,4294967295 16 8388592 8388592
Is there anything else that we could take a look at to figure out the root cause of this issue.
See if the link below resolves the issue.
Cannot fork a new process to execute the job - CA Knowledge
a) You might have run out of max user process (maxuproc). Don's suggestion might help.
b) You might have run out of max file descriptors (ulimit -n). Increase the number of file descriptors available to the user stating the CA WA System Agent process.
c) On AIX, one could run out of the paging size. Monitor the paging size with lsps -a command and increase if necessary.
Thanks Don & Chandru,
I am sorry forgot to mention the operating system, and it is SunOS 5.10. The jobs are runs as a user 'bdm' I hope since this is a Sun Option c. might not apply. Option a. and b. needs to be applied on to the bdm user setting.
On Solaris, swap is part of /tmp (tmpfs). So if you drop a huge file under /tmp, then it is going to count as swap. If your app users are in the habit of using /tmp for big transient files, then that could be a reason for Not enough space.