Clarity PPM

Expand all | Collapse all

Performance Tips

  • 1.  Performance Tips

    Posted 07-23-2015 05:55 AM

    Hi guys,

     

    I am going to try to provide as much info as I can, as well as what we have already tried thus far.

    I have a Sandbox environment, version 13.1 sp8. The server specifications are as below:

     

    App: Windows Server 2008 R2 Standard Virtual Machine SP1

    64 Bit

    RAM: 12GB (Allocated 4GB to the app in the properties.xml)

    CPU0 Cores: 4 Processors 4

    CPU1 Cores: 4 Processors 4

    C: 40GB (10GB Free)

    D: 40GB (26GB Free - Clarity is installed on D:)

     

    DB: Windows Server 2008 R2 Standard Virtual Machine SP1

    64 Bit

    RAM: 32GB (In SQL Studio, I have set the minimum server memory to 10GB and the max server memory to 28GB)

     

    CPU0 Cores: 4 Processors 4

    CPU1 Cores: 4 Processors 4

     

    C: 40GB (19GB Free)

    D: 700GB (60GB Free - contains DB files and log file)

    N: 2GB (1.77 Free - used for SSIS)

     

     

    Usually our Sandbox is just used for testing fixes and small pieces of Development. But we recently upgraded our Development environment to 14.2. We cannot develop in 14.2 version and promote to Production, as Production is still 13.1. So we have been using our Sandbox for Development work.

    It is true that more development work is taking place but to me it does look like it should be adequately resourced.

     

    However the development team have reported the following:

     

    Intermittently the system is responding normally, but most of the time we are seeing the below issues:

    Jobs are taking longer than normal to run e.g. the 'Annuities Marketing - Disable Unrequired Notification for all users' job normally takes 3 seconds.  It took 1 minute 56 seconds to run last night.

    Database updates are taking longer than normal to run e.g. An update statement to a trigger normally takes 1 second. An update statement ran for 8 minutes yesterday before being cancelled

    General navigation is very slow. e.g. logging in, navigating to projects, opening resources on the admin side, checking the jobs log or process engine

    As of this morning the process engine has become stuck.  No new processes are starting and the queue length is not reducing - (I have restarted bg service)

    bg.PNG

     

    Steps I have taken so far:

     

    Increase the memory allocation available to the app from 2.5GB to 4GB (<applicationServerInstance id="app" serviceName="Niku Server" rmiPort="23791" jvmParameters="-Xms4096m -Xmx4096m)


    Increased the RAM on the DB server from 16GB to 32GB.

    Also applied some of the tips suggested in the performance tuning webinar.

     

    Is there anything else I can try?



  • 2.  Re: Performance Tips

    Posted 07-23-2015 06:04 AM

    I should also add that occasionally when users try to access the url they get a "page cannot be displayed" type of error.

    I check and all the services are up.

     

    Usually I need to reboot the app VM and it's ok again.



  • 3.  Re: Performance Tips

    Posted 07-23-2015 07:04 AM

    Hi Colin,

     

    With the architecture it doesn't look to be an issue, if its specific to process slowness and job we will have to looks for orphans as we had a known bug. Also assuming you have trace dsabled as it pages the database heavily

     

      <logger alternateDirectory="/opt/ca/clarity/logs"    dynamicConfigurationEnabled="true"    traceEnabled="false" traceJDBCEnabled="false"/>

     

    Ensure you add traceEnabled="false" traceJDBCEnabled="false"  to ensure traces are turned off.


    Do let me know how it goes.


    Regards

    Suman Pramanik



  • 4.  Re: Performance Tips

    Posted 07-23-2015 07:27 AM

    Thank you Suman.

     

    So I stopped all services - service stop all.

    Opened properties.xml in edit mode and the config now looks like this:

     

    <logger alternateDirectory="" dynamicConfigurationEnabled="true" multitenantErrorReportingEnabled="false" traceEnabled="false" traceJDBCEnabled="false"/>

     

    Then ran admin general upload-config

     

    Restarted services again - service start all.

     

    I have handed back to the Development Team now for further monitoring. Hopefully that will do the trick.

    PS We also checked for orphans but none were present.

     

    Thanks again for your help.



  • 5.  Re: Performance Tips

    Posted 07-23-2015 07:31 AM

    Was it non tomcat instance as admin general upload-config is not required for tomcat.

     

    Can you send me your properties.xml so that I can take a look.

     

    Regards

    Suman Pramanik



  • 6.  Re: Performance Tips

    Posted 07-23-2015 07:38 AM

    Hi Suman,

     

    just emailed you the properties.xml file



  • 7.  Re: Performance Tips

    Posted 07-23-2015 07:48 AM

    Hi Colin,

     

    Updated the hprof extension to the heap setting so that the heap is generated properly.

     

    Regards

    Suman Pramanik



  • 8.  Re: Performance Tips

    Posted 07-23-2015 08:18 AM

    Hi Suman.

     

    after we have made the change and saved I restarted the services.

    Logging in is taking a very long time.

     

    Is this expected?



  • 9.  Re: Performance Tips

    Posted 07-23-2015 08:24 AM

    For the first time after restart it is, but it will settle down.

     

    Regards

    Suman



  • 10.  Re: Performance Tips

    Posted 07-23-2015 08:37 AM

    Logins are timing out for most of our developers.
    Is there anything else we can try?



  • 11.  Re: Performance Tips

    Posted 07-23-2015 08:39 AM

    Ah, I just got logged in.

    Logged out and back in again. Looks ok now thanks!



  • 12.  Re: Performance Tips

    Posted 07-23-2015 08:42 AM

    Can you try hitting the application server and see if you can login, this change doesn't impact login

     

    Regards

    Suman Pramanik



  • 13.  Re: Performance Tips

    Posted 07-23-2015 08:45 AM

    OK, so now logins appear to be fine.

    Users can log in no problem but navigation is quite slow. I tried going opening timesheets and then moving to the dashboards but it is taking a long time.



  • 14.  Re: Performance Tips

    Posted 07-23-2015 09:22 AM

    Suman - just opened a case as our Developers are currently unable to do any work.



  • 15.  Re: Performance Tips

    Posted 07-24-2015 07:46 AM

    Another idea we had was to check the bg logs for any errors or jobs that were failing and retrying over and over:

    Yesterday 4 bg.log files were created and I see the following error occurring repeatedly:

     

    ERROR 2015-07-23 18:31:25,203 [Dispatch pool-4-thread-7 : bg@SERVER (tenant=clarity)] xql2.xbl (clarity:process_admin:199902280__BD3CCFF4-46D4-4119-B0A6-453C2AF32827:Import Financial Actuals) ****IMPORT WIP ACTUALS: Failed to create assignment for WIP record ID = 23

    ERROR 2015-07-23 18:32:41,019 [Post Condition Transition Pipeline 0 (tenant=clarity)] bpm.engine (clarity:process_admin:199899659__EC83C8FF-9981-4531-9B58-0052069CEFAB:none) Error (will retry) caused by Step Instance: com.niku.bpm.engine.objects.StepInstance@501df7fd [Id: 12973610 Process Instance Id: 7373537 Step Id: 5106919 State: BPM_SIS_READY_TO_TRANSITION Step Name: null Start Date: 2014-12-23 14:32:22.113 Expected End Date: null Percent Complete: 0.25 Warned: false Retry Count: 106 No of Pre Conditions: -1 No of Post Conditions: -1 Last Condition Eval Time: 1437658361162 Pre Condition Wait Events: null Post Condition Wait Events: null Pass Conditions: [5256331] Error Id: -1

    Process Thread: com.niku.bpm.engine.objects.ProcessThread@5f842c22 [Id: 7385671 Parent Step Instance Id: -1 Join Step Instance Id: -1]

    Split Threads: null

    Join Threads:null]

     

    All four log files are filled with this error.

     

    But when I check the performance dashboard it does look like the job eventually completes. I'm just wondering if all the failures and retries could be leading to the performance problem.



  • 16.  Re: Performance Tips

    Posted 08-04-2015 04:53 PM

    Just a small correction for future clarification: If you modify the properties.xml directly then running admin general upload-config is still advised even for Tomcat environments.

     

    Essentially it's syncing the changes made to the file with the copy that's held in the CMN_CONFIG table in Clarity.  Although this table and contents were primarily added for the purposes of Websphere and Weblogic, there are some instances where some common code that executes regardless of the app vendor you're using may pick up the data from the table instead of the file.  So keeping them in sync is advisable even if it seems like most of Clarity appears to work as expected without doing this.



  • 17.  Re: Performance Tips

    Posted 07-24-2015 07:52 AM

    Hi Colin,

     

    What is performance dashboard? It could be some issue with assignment for WIP record ID = 23 which got fixed and then it processed.

     

    Regards

    Suman Pramanik



  • 18.  Re: Performance Tips

    Posted 07-28-2015 08:39 AM

    Hi Suman,

     

    with regard to our virtual environment and the tuning document:

    Dedicate CPU and memory resources to the VMs running CA Clarity PPM.

     

    We have dedicated resources to the VM running Clarity and our performance is still poor.

    As there is only 1 app server in the environment the other tips are not applicable to us.

     

    Is there anything else we could try?

    Should we also dedicate resources to the DB server?



  • 19.  Re: Performance Tips

    Posted 07-28-2015 08:53 AM

    Hi Colin,

     

    As this is development box and 1 app server I need to know what kind of activities are performed? Your database looks quiet good. Also please brief what sort of performance issue you are facing?

     

    Regards

    Suman Pramanik



  • 20.  Re: Performance Tips

    Posted 07-28-2015 09:09 AM

    Hi Suman,

     

    according to our Developers:

     

    Intermittently the system is responding normally, but most of the time we are seeing the below issues:

    Jobs are taking longer than normal to run e.g. the 'Annuities Marketing - Disable Unrequired Notification for all users' job normally takes 3 seconds.  It took 1 minute 56 seconds to run last night.

    Database updates are taking longer than normal to run e.g. An update statement to a trigger normally takes 1 second. An update statement ran for 8 minutes yesterday before being cancelled

    General navigation is very slow. e.g. logging in, navigating to projects, opening resources on the admin side, checking the jobs log or process engine



  • 21.  Re: Performance Tips

    Posted 07-28-2015 09:19 AM

    Hi Colin,

     

    If the jobs are taking longer you could try adding another server and deploy BG but if the updates at the database level is taking longer you should consult your DBA and do a health check to see if any improvements can be done. Also see the I/O stats between the app and database if its longer the transaction do take time.

     

    Regards

    Suman Pramanik



  • 22.  Re: Performance Tips

    Posted 07-29-2015 09:02 AM

    Hi Suman,

     

    as this is a Sandbox environment that we manage ourselves, technically I am the DBA!

     

    So I have checked the SQL Error logs and can see a lot of instances of the error below:

     

    2015-07-29 01:15:38.86 spid4s      SQL Server has encountered 2 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\SQL_Database\niku_2.ndf] in database [niku] (7).  The OS file handle is 0x000000000000080C.  The offset of the latest long I/O is: 0x000008a7a4e000

     

    These errors point to problems with disk I/O. So we are taking a closer look at how the storage and disk are configured on the database server. I will update further with any progress we make this afternoon hopefully.

     

    Thanks again for your help.



  • 23.  Re: Performance Tips

    Posted 07-29-2015 09:47 AM

    Thanks Colin, the I/O really brings down performance drastically



  • 24.  Re: Performance Tips

    Posted 07-31-2015 09:15 AM

    Hi Suman.

     

    Just to update you. We previously had 1 large drive to host the database files as well as the log.

    Now we have created separate  .vmdk(s) for data and another for logs. We hope that this will allow sequential writes to run quicker and thus boost I/O performance.

     

    We also had another look through the bg logs.

     

    We see the following error or similar occurring repeatedly. Can you please help us to analyse?

     

     

     

    WARN  2015-07-31 08:49:00,130 [Post Condition Transition Pipeline 0 (tenant=clarity)] bpm.engine (clarity:process_admin:199923659__78EA89F4-90A1-4996-B49A-FFA0A42462B8:none) Step Instance has be retried 50 times. Step: com.niku.bpm.engine.objects.StepInstance@5c953e9e [Id: 12974738 Process Instance Id: 7374675 Step Id: 5106919 State: BPM_SIS_READY_TO_TRANSITION Step Name: null Start Date: 2014-12-23 15:24:07.08 Expected End Date: null Percent Complete: 0.25 Warned: false Retry Count: 50 No of Pre Conditions: -1 No of Post Conditions: -1 Last Condition Eval Time: 1438332235933 Pre Condition Wait Events: null Post Condition Wait Events: null Pass Conditions: [5256331] Error Id: -1

    Process Thread: com.niku.bpm.engine.objects.ProcessThread@3cf15ab8 [Id: 7386718 Parent Step Instance Id: -1 Join Step Instance Id: -1]

    Split Threads: null

    Join Threads:null]

    ERROR 2015-07-31 08:49:00,134 [Post Condition Transition Pipeline 0 (tenant=clarity)] bpm.engine (clarity:process_admin:199923659__78EA89F4-90A1-4996-B49A-FFA0A42462B8:none) Error (will retry) caused by Step Instance: com.niku.bpm.engine.objects.StepInstance@19e7f5f3 [Id: 13040868 Process Instance Id: 7398100 Step Id: 5106919 State: BPM_SIS_READY_TO_TRANSITION Step Name: null Start Date: 2015-01-12 23:34:27.2 Expected End Date: null Percent Complete: 0.25 Warned: false Retry Count: 50 No of Pre Conditions: -1 No of Post Conditions: -1 Last Condition Eval Time: 1438332016388 Pre Condition Wait Events: null Post Condition Wait Events: null Pass Conditions: [5256331] Error Id: -1

    Process Thread: com.niku.bpm.engine.objects.ProcessThread@4bc431ce [Id: 7411170 Parent Step Instance Id: -1 Join Step Instance Id: -1]

    Split Threads: null

    Join Threads:null]

    java.lang.NullPointerException



  • 25.  Re: Performance Tips

    Posted 07-31-2015 11:47 AM

    Hi Colin,

     

    After addition of hardware did you check the I/O stats to see if it had improved. This error in process shouldn't cause navigation issue.

     

    Regards

    Suman Pramanik



  • 26.  Re: Performance Tips

    Posted 07-31-2015 12:03 PM

    That log entry looks very similar to my post see

    How do you read a bg log error

    I should be very interested to hear the interpretation and the cause.

    We have slow performance when the log gets filled with that.



  • 27.  Re: Performance Tips

    Posted 07-31-2015 12:11 PM

    Hi Urmas,

     

     

    If you feel that this is causing the performance, can you please stop the BG so the process engine don't interfere and you should have better performance. Can you please test that. The interpretation of the above error is there is a process whose step ID is 5106919  so you have to go to BPM_RUN_STEPS and from there you can get process Id and see what exactly the step is doing and why it is retrying for 50 times.

     

     

    Hope that helps and have a nice weekend

     

     

    Regards

    Suman Pramanik



  • 28.  Re: Performance Tips

    Posted 07-31-2015 03:41 PM

    CMCN1982

     

    How many of those you have? tens, hundreds, thousands, tens of thousand?

    How big is you bg log? meg, tens of meg, hundred of meg? Gigs?



  • 29.  Re: Performance Tips

    Posted 08-03-2015 12:20 PM

    SumanPramanik thanks,

    We do restart the bg and move the log to another folder.

    That helps until at some point of time the errors start coming again and the situation reoccurs.



  • 30.  Re: Performance Tips

    Posted 08-04-2015 08:37 AM

    Hi urmas,

     

    we maintain our bg logs at the default setting so once they grow over 5MB a new log file is created (e.g. bg-ca.log1, bg-ca.log2 etc)

    Currentky we have 4 bg files and each contains the error messages described above.



  • 31.  Re: Performance Tips

    Posted 08-05-2015 05:57 AM

    Hi guys.

     

    We were able to identify the process the step 5106919 belongs to. Even though there are no active instances of this process in the troublesome environment, there were some old instances that had caused an error, but these instances were aborted so the engine should not be retrying them.  I have deleted these old instances from the environment so BPM_RUN_STEPS is now clear of step id 5106919.

     

    We will monitor the logs and performance over the next few hours to see if there is any improvement.

     

    Thanks for the tip!



  • 32.  Re: Performance Tips

    Posted 08-05-2015 08:53 AM

    Just curious. I did further analysis of my case - see the thread referenced above. I found that the process instances were found in the BPM_RUN tables, but did not see them in the initiated process instances to the same number.

     

    Did you see those processes in the GUI in the initiated instances?

    What is you policy on deleting initiated instances?