We just upgraded the CA APM from 220.127.116.11. to 10.5.2. Post upgrade, we are getting very high harvest duration and due to which we are having issue with metics and graphs.
Even with one single agent reporting merely <2000 metric, the harvest duration showing 25k...
We stopped back-end metrics and all other metrics which have potential to report high number of metrics but despite of that no significant improvement on harvest duration.
Can anybody through some light on the probable culprit(s).
Converted to a discussion.
Asking your fellow admins/APM usersCA Partners to respond first with their suggestions.
Is this a cluster and the High Harvest Duration symptom is the same on all Collectors?
Does anything else in the perflog.txt look abnormal? Relevant KB: How to perform an APM Cluster Performance Health Check.
Thanks Lynn... I have checked the same.. occasionally there are differences in agent number of metics and metric data rate. Afterwords I did shutoff most of the metric and currently i am allowing only 7000 to be reported to EM (got the value from supportibility>number of metrics...But even with these amount of metrics load the harvest duration rose to >50k and EM CPU capacity reaches >350%
Could you please suggest what possibly could go wrong?
You did not say if this was a standalone EM or a cluster with MOM/Collector EMs - please advise?
Another possible root cause could be the calculator load but that should only apply to an EM which is running as standalone or as MOM in a cluster (it would not apply to a Collector)CA APM Performance Monitoring Using Supportability Metrics - CA Application Performance Management - 10.5 - CA Technolog…
Hope it helps.
It is a standalone implementation and i turned off all the calculators as well.
Do you have the "MOM Infrastructure Monitoring" and "EM-Collector"management modules from the config/modules/examples deployed and configured for your environment?
If so, could you post the screen shots of the "EM - Data Harvesting", "EM - SmartStor Data Processing", "EM - Event Handling", "EM - Query Performance" and "EM - Resource Capacity" dashboards?
With a combination of the graphs from the above dashboards, there is a very good chance that I might be able to at least point you in the right direction.
More specific, I would need to know more about your hosting setup:
Please provide the system description for the collector:
virtual / physical
CPU cores allocated/installed
JVM Heap sizes max/min
APM environment description
Cluster / Single
If Clustered (per collector)
Number of collectors
Number of metrics
Number of agents
Number of Application
Number of events
Number of historic Metrics
Here are my suggestions:
1) for testing purpose, try to disable all management modules (
rename config/modes as config/modules_original)
2) Switch from Concurrent GC to G1GC,replace the below settings in the Introscope_Enterprise_Manager.lax > lax.nl.java.option.additional property
XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=50With just this:-XX:+UseG1GC
3) Make sure DEBUG logging is disabled in the IntroscopeEnterpriseManager.properties.
4) Make sure you have allocated enough memory and make sure to set the initial heap size (-Xms) equal to the maximum heap size (-Xmx) in the Introscope Enterprise Manager.lax or EMService.conf.
5) If the upgrade involved migrating the EM to another server then make sure Smatstor db is pointing to a dedicated disk controller and introscope.enterprisemanager.smartstor.dedicatedcontroller=true which allows the EM to fully utilize this setting. Failing to do this, will reduce collector performance by 50% . From https://docops.ca.com/ca-apm/10-5/en/installing/apm-sizing-and-performance-guide/system-information-and-requirements/ca-apm-data-storage-requirements
“The dedicated controller property is set to false by default. You must provide a dedicated disk I/O path for SmartStor to set this property to true; it cannot be set to true when there is only a single disk for each Collector.When the dedicated controller property is set to false, **the metric capacity can decrease up to 50 percent.**”
6) I wonder if Differential Analysis is causing a lot of Alerts automatically being created and propagated to AppMap, causing extra overhead in AppMap Alert to States mapping handling, for testing purpose disable the appMap alerting mapping temporarily in the EM (empty teamcenter-status-mapping.properties)
7) Check for any known issue in the log, try searching for the following key words:
- Outgoing message queue is not moving- No space left on device- reported Metric clamp hit- capacity- too many- reached- slowly- combining- outofmemory- skewed- cannot keep- Internal cache is corrupt- Processing of alerts is overloaded- [ERROR
I hope this helps,