I have CA SOI 4.0.2 that integrated with CA APM, CA UIM and another modules using SNMP Connector. Recently SOI received storm alarms from CA UIM and it leave an impact into AlarmGlobalUpdates please see below picture:
If we can see the above picture queue for AlarmGlobalUpdates reach up to 300K. I would like to know is there any way to clear queue for AlarmGlobalUpdates quickly? I have tried to stop SOI services then removed files under activemq-data folder but it didn't works.
MichaelBoehm Brahma Britta_Hoffner
you may have a look at https://communities.ca.com/thread/241786296-best-practices-for-soi-administration-during-os-patch-installation. The thing is when you get Alarm Storms that the SOI Managers Job queues get flooded and this causes a delay of processing of Alerts to the SOI Console. Before you stop the services on the SOI Manager you should stop the CA SAM Integration Service on the Connector servers and the Catalyst Container Service on the Connector Server in case you have Catalyst Connectors. Stop the services on the SOI UI Server as well.
Run SOI\tools\soitoolbox --stopAllServices on the manager. Once this services are stopped cleanup the activemq-data folder in SOI\tomcat\webapps\activemq-web\activemq-data, remove the folder localhost from the SOI\tomcat\work\Catalina folder and remove all files from the SOI\tomcat\temp folder. Startup the CA SAM Application Service on the SOI Manager. Open the Manager debug page http://<soimanager>:7090/sam/debug and click to Triage Tests, Run Tests. Wait until the Server application startup shows as Completed.
Run SOI\tools\soitoolbox --startAllServices on the Manager. Wait until all services are up and running. Start all Services on the SOI UI Server. Now start one CA SAM Integration and/or Catalyst Container service after the other and double check that the job queues are processing. If you still have high amounts of Alarms from the Alarm Storm in the MDR you should cleanup those before you start the related connector.
Thanks for your help.
You are welcome.
A few months ago, this procedure was shared by a guy from CA support during an incident we had, but i still don´t understand why it takes so long to process all the alarms from the job queue.
After removing all the folders you mentioned previously, I still seeing previous alarms in the SOI console. And the saddest thing of all, I had to wait 12 hours to SOI could release all the alarms from the AlarmGlobalUpdates queue (at other times we have to wait more or less 2.5 hours) I guess this occasion was atypical. It has to do with the Maximum Size value?
Here a few questions:
Is this related with the volume of information stored in the SQL Server database?
I would really appreciate for your comments...
**** jOAKO, clarifying here. That max size column represents the max value that the queue was up to. So at one point it had 1.4M things in the queue. Woah.
Also when you find yourself in this situation, shut down your connectors and the other SOI services on the UI box. Let the manager just run and process all these items w/o the connectors sending in more data info for it to process.
Do you have actions associated with specific alerts that maybe the SOI MGR is waiting to complete before it moves along? If so then possibly check what its doing and tweak accordingly.
If you stop your SOI manager then check your:
folder. If its filled with stuff delete it all then restart just the CA UCF Broker and the CA SAM Application Server. Let it run till the MGR Debug Queue Monitor page above settles down.
Also run the "Database Tables" and "Show Report" util. Check your Alert History. If its very large then you should run your DB Maintenance commands to purge old alert history.
Open command prompt and CD to:
C:\Program Files (x86)\CA\SOI\Tools>
soitoolbox -x --purgeClearedAlerts 90soitoolbox -x --cleanHistoryData 90soitoolbox -x --purgeDBInconsistencies -b 300 -t 1200soitoolbox -x --rebuildIndexes -b600 -t600
Also if your SOI MGR is taking that long then another thing to check is your DB and your SOI MGR box's Disk Activity Response Time:
If the #'s are high then that's the bottle neck. Check the over system running your SOI deployment to see if its maxed out in I/O's.