Service Operations Insight

 View Only
  • 1.  How to reset values of AlarmGlobalUpdates queues?

    Posted Nov 22, 2017 12:05 AM

    Hello Everyone,

     

    I have CA SOI 4.0.2 that integrated with CA APM, CA UIM and another modules using SNMP Connector. Recently SOI received storm alarms from CA UIM and it leave an impact into AlarmGlobalUpdates please see below picture:

    If we can see the above picture queue for AlarmGlobalUpdates reach up to 300K. I would like to know is there any way to clear queue for AlarmGlobalUpdates quickly? I have tried to stop SOI services then removed files under activemq-data folder but it didn't works.

     

    CC:

    MichaelBoehm Brahma Britta_Hoffner

     

    Best Regards,

    Okik Setianto



  • 2.  Re: How to reset values of AlarmGlobalUpdates queues?
    Best Answer

    Broadcom Employee
    Posted Nov 22, 2017 01:34 AM

    Hello Okik,

     

    you may have a look at https://communities.ca.com/thread/241786296-best-practices-for-soi-administration-during-os-patch-installation. The thing is when you get Alarm Storms that the SOI Managers Job queues get flooded and this causes a delay of processing of Alerts to the SOI Console. Before you stop the services on the SOI Manager you should stop the CA SAM Integration Service on the Connector servers and the Catalyst Container Service on the Connector Server in case you have Catalyst Connectors. Stop the services on the SOI UI Server as well. 

    Run SOI\tools\soitoolbox --stopAllServices on the manager. Once this services are stopped cleanup the activemq-data folder in SOI\tomcat\webapps\activemq-web\activemq-data, remove the folder localhost from the SOI\tomcat\work\Catalina folder and remove all files from the SOI\tomcat\temp folder. Startup the CA SAM Application Service on the SOI Manager. Open the Manager debug page http://<soimanager>:7090/sam/debug and click to Triage Tests, Run Tests. Wait until the Server application startup shows as Completed. 

    Run SOI\tools\soitoolbox --startAllServices on the Manager. Wait until all services are up and running. Start all Services on the SOI UI Server. Now start one CA SAM Integration and/or Catalyst Container service after the other and double check that the job queues are processing. If you still have high amounts of Alarms from the Alarm Storm in the MDR you should cleanup those before you start the related connector.

     

    Kind regards,

    Britta Hoffner

    CA Support



  • 3.  Re: How to reset values of AlarmGlobalUpdates queues?

    Posted Nov 22, 2017 02:16 AM

    Hi Britta,

     

    Thanks for your help.

     

    Best Regards,

    Okik



  • 4.  Re: How to reset values of AlarmGlobalUpdates queues?

    Broadcom Employee
    Posted Nov 22, 2017 02:17 AM

    You are welcome.



  • 5.  Re: How to reset values of AlarmGlobalUpdates queues?

    Posted Dec 16, 2017 12:14 PM

    Hi Britta,

     

    A few months ago, this procedure was shared by a guy from CA support during an incident we had, but i still don´t understand why it takes so long to process all the alarms from the job queue.  

     

    After removing all the folders you mentioned previously, I still seeing previous alarms in the SOI console. And the saddest thing of all, I had to wait 12 hours to SOI could release all the alarms from the AlarmGlobalUpdates queue (at other times we have to wait  more or less 2.5 hours) I guess this occasion was atypical. It has to do with the Maximum Size value?

     

    Here a few questions:

     

    • Why do we have to wait so long?
    • Is there any way to improve this? (It does not matter if I have to lose information)
    • Is there any way to change the Maximum Size from the job queue ? As you can see in my screen, i have the value of 1,453,793.
    •  Job queue

     

    Is this related with the volume of information stored in the SQL Server database?

     

    I would really appreciate for your comments...

     

    Kind Regards,

    Joaquín L.



  • 6.  Re: How to reset values of AlarmGlobalUpdates queues?

    Posted Dec 18, 2017 11:23 AM

    **** jOAKO, clarifying here. That max size column represents the max value that the queue was up to. So at one point it had 1.4M things in the queue.  Woah. 

    Also when you find yourself in this situation, shut down your connectors and the other SOI services on the UI box. Let the manager just run and process all these items w/o the connectors sending in more data info for it to process.  

    Do you have actions associated with specific alerts that maybe the SOI MGR is waiting to complete before it moves along? If so then possibly check what its doing and tweak accordingly.

    If you stop your SOI manager then check your:

    ...\CA\SOI\tomcat\webapps\activemq-web\activemq-data

    folder. If its filled with stuff delete it all then restart just the CA UCF Broker and the CA SAM Application Server. Let it run till the MGR Debug Queue Monitor page above settles down.

     

    Also run the "Database Tables" and "Show Report" util. Check your Alert History. If its very large then you should run your DB Maintenance commands to purge old alert history.

    Open command prompt and CD to: 

    C:\Program Files (x86)\CA\SOI\Tools>

    soitoolbox -x --purgeClearedAlerts 90
    soitoolbox -x --cleanHistoryData 90
    soitoolbox -x --purgeDBInconsistencies -b 300 -t 1200
    soitoolbox -x --rebuildIndexes -b600 -t600

     

    Also if your SOI MGR is taking that long then another thing to check is your DB and your SOI MGR box's Disk Activity Response Time:

    If the #'s are high then that's the bottle neck. Check the over system running your SOI deployment to see if its maxed out in I/O's.