DX Application Performance Management

Expand all | Collapse all

High Harvest Duration

  • 1.  High Harvest Duration

    Posted 08-21-2017 09:41 AM

    We just upgraded the CA APM from 9.6.0.0. to 10.5.2. Post upgrade, we are getting very high harvest duration and due to which we are having issue with metics and graphs.

    Even with one single agent reporting merely <2000 metric, the harvest duration showing 25k...

     

    We stopped back-end metrics and all other metrics which have potential to report high number of metrics but despite of that no significant improvement on harvest duration.

    Can anybody through some light on the probable culprit(s).

     

    Thanks.

     

    Best, JD



  • 2.  Re: High Harvest Duration

    Posted 08-21-2017 09:49 AM

    Hi Joydip:

    Converted to a discussion. 

    Asking your fellow admins/APM usersCA Partners to respond first with their suggestions.



  • 3.  Re: High Harvest Duration

    Posted 08-22-2017 12:32 AM

    Hi JD,

    Is this a cluster and the High Harvest Duration symptom is the same on all Collectors?

    Does anything else in the perflog.txt look abnormal? Relevant KB: How to perform an APM Cluster Performance Health Check. 

     

    Thanks

    Lynn



  • 4.  Re: High Harvest Duration

    Posted 08-22-2017 03:14 PM

    Thanks Lynn... I have checked the same.. occasionally there are differences in agent number of metics and metric data rate. Afterwords I did shutoff most of the metric and currently i am allowing only 7000  to be reported to EM (got the value from supportibility>number of metrics...But even with these amount of metrics load the harvest duration rose to >50k and EM CPU capacity reaches >350%

     

    Could you please suggest what possibly could go wrong?

     

    Thanks, Joydip 



  • 5.  Re: High Harvest Duration

    Posted 08-22-2017 10:27 PM

    Hi Joydip,

    You did not say if this was a standalone EM or a cluster with MOM/Collector EMs - please advise?

    Another possible root cause could be the calculator load but that should only apply to an EM which is running as standalone or as MOM in a cluster (it would not apply to a Collector)
    CA APM Performance Monitoring Using Supportability Metrics - CA Application Performance Management - 10.5 - CA Technolog… 

    Hope it helps.

    Regards,

    Lynn



  • 6.  Re: High Harvest Duration

    Posted 08-23-2017 04:25 AM

    Hi Lynn,

    It is a standalone implementation and i turned off all the calculators as well.

    Best Regards,Joydip



  • 7.  Re: High Harvest Duration

    Posted 08-25-2017 10:57 AM

    Do you have the "MOM Infrastructure Monitoring" and "EM-Collector"management modules from the config/modules/examples deployed and configured for your environment?

     

    If so, could you post the screen shots of the "EM - Data Harvesting", "EM - SmartStor Data Processing", "EM - Event Handling", "EM - Query Performance" and "EM - Resource Capacity" dashboards?

     

    With a combination of the graphs from the above dashboards, there is a very good chance that I might be able to at least point you in the right direction.

     

    More specific, I would need to know more about your hosting setup:

     

    Please provide the system description for the collector:

       virtual / physical

       CPU cores allocated/installed

       RAM allocated/installed

       JVM Heap sizes max/min

     

    APM environment description

       Cluster / Single

       If Clustered (per collector)

          Number of collectors

          Number of metrics

          Number of agents

          Number of Application

          Number of events

          Number of historic Metrics

     

     

    Billy



  • 8.  Re: High Harvest Duration

    Posted 08-28-2017 04:24 AM

    Hello Joydip,

    Here are my suggestions:

     

    1) for testing purpose, try to disable all management modules (

    rename config/modes as config/modules_original)


    2) Switch from Concurrent GC to G1GC,replace the below settings in the Introscope_Enterprise_Manager.lax > lax.nl.java.option.additional property

    XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=50
    With just this:
    -XX:+UseG1GC


    3) Make sure DEBUG logging is disabled in the IntroscopeEnterpriseManager.properties.


    4) Make sure you have allocated enough memory and m
    ake sure to set the initial heap size (-Xms) equal to the maximum heap size (-Xmx) in the Introscope Enterprise Manager.lax or EMService.conf.


    5) If the upgrade involved migrating the EM to another server then m
    ake sure Smatstor db is pointing to a dedicated disk controller and introscope.enterprisemanager.smartstor.dedicatedcontroller=true which allows the EM to fully utilize this setting. Failing to do this, will reduce collector performance by 50% . From https://docops.ca.com/ca-apm/10-5/en/installing/apm-sizing-and-performance-guide/system-information-and-requirements/ca-apm-data-storage-requirements

    “The dedicated controller property is set to false by default. You must provide a dedicated disk I/O path for SmartStor to set this property to true; it cannot be set to true when there is only a single disk for each Collector.
    When the dedicated controller property is set to false, **the metric capacity can decrease up to 50 percent.**”


    6) I wonder if Differential Analysis is causing a lot of Alerts automatically being created and propagated to AppMap, causing extra overhead in AppMap Alert to States mapping handling, for testing purpose disable the appMap alerting mapping temporarily in the EM (empty teamcenter-status-mapping.properties)


    7) Check for any known issue in the log, try searching for the following key words:

     

    - Outgoing message queue is not moving
    - No space left on device
    - reported Metric clamp hit
    - capacity
    - too many
    - reached
    - slowly
    - combining
    - outofmemory
    - skewed
    - cannot keep
    - Internal cache is corrupt
    - Processing of alerts is overloaded
    - [ERROR

     

    I hope this helps,

    Regards,

    Sergio