We have a Test environment that exhibited a live metric explosion a couple of months ago that has resulted in collectors hitting the live metric clamp. We have looked at agents and do not see anything that sticks out as a culprit or culprits, agent-wise. Apart from increasing the live metric clamp and either reducing agents or the number of metrics per agent, is there anything else we can look at and change?
I would suggest to prune your collectors also. It will clean up Historical metrics from collectors and increase Live and historical metrics capacity.
Thanks for the reply. Can you explain what you mean by pruning the collectors?
look at historical metrics count under “Number of historical metrics”. If you see a spike (sudden or consistent) in historical metrics, this indicates historical metrics is growing and will breach its threshold one day. Looking at APM status console will confirm if historical metrics has been breached its threshold value. introscope.enterprisemanager.metrics.historical.limit clamp is a nice indicator to show per EM’s historical limit. The solution for this is to trim excessive metrics from smartstor to provide some space for live metrics flow. We have Smartstor cleanup procedures to achieve this.
here is the KB article to trim smartstor database
How do I cleanup SmartStor data? - CA Knowledge
following link is for CLW command will also help you running some queries.
CLW Command Reference - CA Application Performance Management - 9.7 - CA Technologies Documentation
The SmartStor Tools utility mentioned in Junaid Wahab's first link for the KB can be used to check metric counts before removal, so that might give you an idea of the source of the metric explosion. More details here: Configure and Manage SmartStor Data - CA Application Performance Management - 10.7 - CA Technologies Documentation
DId Junaid's and Lynn's responses answer all of your outstanding questions?