We have a situation where every other day Job scheduler goes hung and process engine remain un-affected. No jobs run on the affected BG instance however process continue to work. In order to resolved every time had to restart BGs which takes couple of minutes to stop.
- Clarity logs - No errors,
- Thread dump - can see it shows threads are BLOCKED but no significant information
- Reviewed job / process running around time this issue happening. All seems normal, nothing alarming.
Any ideas ?
If you don't already have an issue open for this with Support, and you can raise one, then I would suggest doing so on the next recurrence of this fault please and including the following items:
1. Provide the thread dumps. For the thread dumps to be of best use, if you could take them before the problem begins (let's say after the bg has been running for half a day or so), as well as later on once the problem has recurred and been observed, without any restarts of the bg instance, that would provide us some additional useful data for comparison.
2. Provide your bg-ca.log* and bg-system.log files from your $NIKU_HOME/logs folder covering the range of time that this bg instance had been running until the recurrence of the problem and last thread dump was taken.
3. If you can run queries on the database and put the contents into a readable format (e.g. into an Excel Spreadsheet format usually works well) CMN_SCH_JOBS, CMN_SCH_JOB_RUNS, CMN_SCH_JOB_DEFINITIONS, and PRLOCK tables.
4. Be sure to mention your exact version of Clarity (including any patches that have been applied).
All right we were able to track it down, it was some how related to windows server environment. Had to monitor windows events, observed that beacons locking due to an even error every day.
Windows support team resolved the environmental issue which stopped bg from locking. It was wearied but that is what helped.