All the metrics were missing from the agents and it looks like the below in Introscope.
I can see this message in the Collector logs. "EM load exceeds hardware capacity. Timeslice data is being aggregated into longer periods."
Also I see a clamp in status console for introscope.enterprisemanager.metrics.historical.limit, reached 1,200,000 and no new metrics were populated in Introscope. I have manually increased the value in all the collectors to 1,300,000. But I need to fix this issue. Can you help me on how to remove the old historical metrics.
Note: This is our production environment and we have 9.5.2 version with 1 MOM and 4 Collectors in a Linux box.
nice to see someone else stay on Introscope 9.5.2, sometime I got the same problem.
In my opinion, a larger value for "introscope.enterprise.agent.metrics.limit" in the "apm-events-thresholds-config.xml" doesn't fix your problem. Try to increase "introscope.enterprise.harvest.duration" and "introscope.enterprise.smartstore.duration" from "3500" to "7000" to consider the slow smartstore access (the new values became active after some minutes without a Collector restart).
To reduce the collected metrics data in a easy way you can modify the "IntroscopeEnterpriseManager.properties"
to a smaller age value. You lost some old metrics data, but this removes lots of old metrics in a easy way (to activate the new values a Collector restart is necessary).
Try to find out the agents are reporting lots of new data, sometime we have some application problems and therefore some agents report lots of data.
In addition to Lutz's advice.
If you are seeing this message "[WARN] [Async MDQ 1] [Manager] Timed out adding to outgoing message queue" after the "EM load exceeds hardware capacity" message there are some KBs that may help with these symptoms::
APM Cluster Performance Health Check
Why I am getting Out Of Memory errors when viewing historical metrics?
Also if you search this APM Community for string ""outgoing message queue" you will find others.
Hope it helps
This is a common problem due to the fact that metric capacity has breached. This happens when agents sending lot of data.
The solution for this is to adding new collector, smartstor cleanup, fine-tune agents to send only required data. I wrote a post summarizing all these things.
You can read the same here - How to improve the performance of APM? . It talks about these types of problems and solution to those.
I understood that I need to clean my smartstor. Please clarify my below queries.
1) ./SmartStorTools.sh test_regex -metrics ".*SQL.*" -src /col2/data" I know this will provide me all the SQL metrics from this collector. If i use "./SmartStorTools.sh remove_metrics -src /col2/data -metrics ".*SQL.*" -dest /col2/datanew". Will it delete both live and historical SQL metrics(I might need it for past 30 days). How we can achieve this?
2) What will happen if we just "prune" the smartstor data without deleting any of the metrics?
3) How we can identify old agents which is not sending any live data but exists in Historical metrics?
1) If you run this command "./SmartStorTools.sh remove_metrics -src /col2/data -metrics ".*SQL.*" -dest /col2/datanew", all SQL metrics in smartstor database will be purged. There is no option in smartstor tool to remove metrics based on time range. The live SQL metrics data after smartstor cleanup activity will continue to report if agent is enabled to monitor SQL queries
2)Above command will remove metric values for SQL metrics. When you run prune command after running above command, prune will remove metric names, metric category etc... from metadata. Basically, Prune removes metrics which has no value in historical store.
3) You can see the count of agents without any data using this supportability metric. I don't think we have an option to list the old agents which is not sending live data but has data in historical store
*SuperDomain*|Custom Metric Host (Virtual)|Custom Metric Process (Virtual)|Custom Metric Agent (Virtual)|Enterprise Manager|Data Store|SmartStor|MetaData:Agents without Data
To correct on third point, we have an option to list historical agents which are not sending data but has its data in historical store. You need to use command line workstation, List Historical Agents command to achieve this.
Please look at CLW commands in documentation. A snippet from below.
List Historical Agents:
The List Historical Agents command lists all agents with data in SmartStor that are unmounted and not sending data to an Enterprise Manager.
To all posters make sure to still vote up (so that it does not fall off the roadmap:)
SmartStorTools to support time ranges
SmartStorTools Enhancement: Agent Renaming
Enhance SmartStorTools to provide the option of copying the SmartStor
and possibly one of my favorite (although more as a hint to CA to do more thinking than executing this idea as-is)
More advanced SmartStor historization rules