we are seeing an entry in PRLOCK table , on cmn_sch_job table .
and all the scheduled jobs were not running .
what is the reason for this lock .
1. Have you checked the logs ?
2. Have you checked if any report is running as well ?
Is all your jobs scheduled/immediate not running? If it is then we have seen locks generated due to orphan records processing at the background. If its the case please let me know i can provide the workaround queries to overcome the same.
None of the schedule jobs were running , if we submit a job to run immediately also not running .
but Processes were intiated were running .
checked the logs not found any useful information .
when we see the prlock , we found the locked time , with that if we trace the job log we haven't seen any related errors in log .
we got to recycle the bg services to get this fixed , but this is happening once or twice in a week . no specific timings .
in our system we have configured
the rate matrix incremental to run every 15 mins in an hour .
and rate matrix full load once in a day .
also a data mart job which will copy the rates , obs informtion to datamart , this is configured hourly once . ( for this datamart we have added ratematrix as incompatible job )
were these schedule jobs causing any problem ?
also if we are restarting bg serices , what are the steps to take ?
i have also read your previous post on this . the scripts you have provided .
in our case we haven't stopped the bg services , restarted bg services for any path .
this is happening to us once or twice in a week .
Thanks in advance,
small typo path instead of patch.in our case we haven't stopped the bg services , restarted bg services for any patch.
The restart is one main cause, but if there is no restart and there was some database disconnection due to network issues also, you may run into this. In case of cluster environment if the beacon is unable to poll other BG we have chances to have this problem. But the workaround is to run those queries and restart the BG.
got the reason and answer .
i have some more doubts .
in our case we haven't restarted the services .
so db disconnection or as our env is cluster there is a chance of beacon uable to poll other BG .
we were doing the restart of BG services when ever this is happening , this is fixing our issue so far during this month period .
do we need to run the scripts you have given with out missing , every time whenever this happened ?
In that case why restaring the BG is solving our problem till now . are we breaking anything with this activity ?
Is there any chance that excessive job load like ratematrix , datamart , other jobs will cause this problem ?
If restarting is fixing the problem then you don't need the script. Its the reason your BG is unstable or there is a thread hung and BG is unable to process once restart is done. All the hung thread in the pipeline is getting cleaned up. So the next point of analysis should be why there is a thread hung. It could be due to memory issue i.e. JVM memory allocated.
Will it be possible to share the properties.xml after masking the sensitive information. I can take a look.
also during that time we have seen below error in log .. is this relevent to this issue ?
ERROR 2014-10-20 01:48:11,034 [http-bio-8080-exec-548] view.ViewL10nSAXHandler (clarity:akum145:6321064__E7860BB4-0E1F-4DD0-94CE-7B21CBD7638D:odf.filterStateChange) Could not locate vxsl file '' in component odf
This error is from app and not BG, its a benign error occuring due to a bug. This cannot cause the BG to hung.
we have two bg servers . restaring one of them is fixing the issue .
below is the part of the properties.xml got from config directory of clarity installation on bg server .
let me know if this helps ?
also our calrity installatoin is bigger one , having 6 application servers , 2 bg servers .
we have 30 k resources , 30 k project and many investments .
we are just one month in this new system , piloting with few people around 200 .
properties.xml is shown below
<applicationServer vendor="tomcat" useLdap="false" home="E:\CA\apachetomcat7.0.42" adminPassword="admin" externalUrl="" tokenCacheCapacity="0" tokenCacheStrategy="none">
<databaseServer vendor="oracle" home="/oracle/home" useMultilingualSort="false" sysPassword="change_on_install" largeTables="USERS_LARGE" smallTables="USERS_SMALL" largeIndex="INDX_LARGE" smallIndex="INDX_SMALL" highVolatilityParameters="PCTFREE 20 PCTUSED 60" lowVolatilityParameters="PCTFREE 5 PCTUSED 80" fetchSize="60" />
<processEngineMonitorConfig disable="false" numberOfThreads="1" appId="app">
<processEngineMonitorConfig disable="false" numberOfThreads="1" appId="app">
<processEngineMonitorTask name="bpmMonitorLoopDetector" className="com.niku.bpm.utilities.BpmProcessMonitorLoopDetector" initialDelay="60" period="60" disable="false">
<taskItem name="putOnHold" value="true" dataType="boolean"/>
<taskItem name="sendNotification" value="true" dataType="boolean"/>
<taskItem name="abortInstances" value="true" dataType="boolean"/>
<taskItem name="timeThreshold" value="1" dataType="int"/>
<taskItem name="loopLimit" value="100" dataType="int"/>
<taskItem name="processExceptionList" value="" dataType="string"/>
<processEngineMonitorTask name="bpmMonitorStormDetector" className="com.niku.bpm.utilities.BpmProcessMonitorStormDetector" initialDelay="60" period="60" disable="false">
<taskItem name="processLimit" value="100" dataType="int"/>
Error from BG log , yesterday when it hung..
SYS 2014-10-30 04:46:16,330 [Thread-4] njs.SchedulerImpl (none:none:none:none) Clarity 188.8.131.526 Job Scheduler bg@*** stopping...
SYS 2014-10-30 05:00:54,200 [WrapperSimpleAppMain] bgp.JobLogger (clarity:none:none:none) Tenant assigned: clarity
SYS 2014-10-30 05:00:55,809 [WrapperSimpleAppMain] niku.union (clarity:scheduler:6371042__6D4DCF16-DBE5-45CE-89FE-4300A95B11F9:none) Initializing Event Manager on server instance: bg
SYS 2014-10-30 05:00:56,294 [WrapperSimpleAppMain] niku.union (clarity:scheduler:6371042__6D4DCF16-DBE5-45CE-89FE-4300A95B11F9:none) Starting Process Engine on server instance: bg
SYS 2014-10-30 05:00:56,419 [WrapperSimpleAppMain] niku.union (clarity:scheduler:6371042__6D4DCF16-DBE5-45CE-89FE-4300A95B11F9:none) Process Engine Name: bg-*** SYS 2014-10-30 05:00:56,512 [WrapperSimpleAppMain] niku.union (clarity:process_admin:6371043__076C04AE-F78D-4400-B426-8C795F16B658:none) Starting Message Receiver on Event Manager
SYS 2014-10-30 05:00:56,559 [Event Interest Registration Thread] niku.union (clarity:none:none:none) Registering event interests...
SYS 2014-10-30 05:00:56,591 [Thread-5] njs.SchedulerImpl (clarity:none:none:none) Niku Job Scheduler (tenant=clarity)
SYS 2014-10-30 05:00:56,591 [Thread-5] njs.SchedulerImpl (clarity:none:none:none) Clarity 184.108.40.2066 Job Scheduler bg@APSEP3912 initializing...
SYS 2014-10-30 05:00:56,606 [Event Interest Registration Thread] niku.union (clarity:none:none:none) Event registration completed. Event manager started succesfully.
SYS 2014-10-30 05:00:56,762 [Thread-5] njs.SchedulerImpl (clarity:scheduler:6371044__258876EB-CA19-4F74-9631-BC8834E45621:none) Clarity 220.127.116.116 Job Scheduler bg@xxxxx initialized
ERROR 2014-10-30 05:00:59,684 [Dispatch pool-5-thread-2 : bg@xxxxx (tenant=clarity)] niku.reporting (clarity:admin:6371055__E6A1B713-28BA-495F-BCDB-AD5FDCBAF88E:D10 Create BusinessObjects Users) Login failed, unknown error.There was an error reading the shared secret from trusted principal configuration file. (FWM 02045)
ERROR 2014-10-30 05:00:59,684 [Dispatch pool-5-thread-2 : bg@xxxxx (tenant=clarity)] niku.njs (clarity:admin:6371055__E6A1B713-28BA-495F-BCDB-AD5FDCBAF88E:D10 Create BusinessObjects Users) Error executing job: 5042001
Caused by: com.crystaldecisions.sdk.exception.SDKException$TrustedPrincipalConfigError: There was an error reading the shared secret from trusted principal configuration file. (FWM 02045)
This error says only BO trusted authentication is not correct? Are you not runnning reports?
I don't see the background server tag in the properties.xml. Also its better to have dedicated BG for job scheduler and process. Rather than running both on one service. Thats what i prefer and i have seen better performance also.
may be the log was written when we restarted the BG after the jobs were struck . we have not yet running BO reports ..
below is the background server tag ..
<backgroundServer jvmParameters="-Xms3072m -Xmx6144m -XX:-UseGCOverheadLimit -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:NewRatio=1 -DforceMemorySettings=false -XX:PermSize=96m -XX:MaxPermSize=256m" programParameters=""/>
not understood what is your last post said exactly ..
can we have one BG server for Job Scheduling and one more for Processes .
is this what you mean by dedicated servers ?
Not sure why PERM Size & MAX Perm size is there, it should be ideally below which i mentioned, you can change the memory as per your requirement. And yes one bg for Job and one for Process.
-Xms512m -Xmx1024m -XX:-UseGCOverheadLimit -DforceMemorySettings=false -XX:MaxPermSize=192m
I am seeing the same issue in my production insatnce and none of the jobs are running schedule/immediately. I have restarted bg and removed, re-deployed but no improvement. Could you please provide us the query.
Please try the queries in this tec doc
Clarity: Scheduled jobs stuck in waiting or scheduled status