David
Yes, I have.
When we upgraded from AppWorx v7 to AM v8.0, we had this problem a lot.
After many weeks working with support on logfiles, tweaking settings and timeout values, it was concluded that the new Java engine (replacing the old C-code) was more susceptible to period of high CPU utilisation and that no code fix was going to resolve the problem.
Our master and primary local agent run on the same machine as our Oracle Retail ERP database and server for overnight batch processes. The application jobs that AppWorx is managing now cause AppWorx to stop responding.
In the end, support found a script which they had used on another customer site and provided this script to us.
It is a Unix wrapper script for the startso and stopso scripts, called startam_monitor and stopam_monitor which in turn call Perl script uc4_daemon_monitor.pl (also custom code from UC4), which stays running 24/7
This Perl script periodically checks the status of the agents, and if finds they have stopped (which is what happens after AwE-5103), it restarts the agents. It will continue to do this if it falls over again (e.g. if CPU activity is still too high).
While it does not fix the underlying problem, it does mean that there is no more than a 5-minute delay before the UC4 engine starts more jobs in chains/queues.
As this script was provided by UC4 rather than written by us, and I am therefore unsure on the copyright restrictions, I would suggest you contact support and quote the info I provided here, along with our support incident #204515
Alan