DX Application Performance Management

 View Only
  • 1.  Metric clamp issues - Threshold values

    Posted Jul 08, 2015 09:16 AM

    Hello Folks,

     

    We recently had an issue, where in the Metric clamp was breached and the agents dropped off the super domain.

    We were looking at the historic metric limits and found some anomalies. Is there a precedence order on which threshold values override what ?

     

    I do understand that the values should be consistent - but is precendence an issue in any way ?

     

    MOM historic threshold limit:

    <clamp id="introscope.enterprisemanager.metrics.historical.limit">

    <threshold value="1200000"/>

     

    But the collectors had different higher values:

    Collector Server 1

    Agent1

    <clamp id="introscope.enterprisemanager.metrics.historical.limit">

    <threshold value="5200000"/>

     

    Agent2

    <clamp id="introscope.enterprisemanager.metrics.historical.limit">

    <threshold value="3200000"/>

     

    Collector Server 2

    Agent1

    <clamp id="introscope.enterprisemanager.metrics.historical.limit">

    <threshold value="3200000"/>

     

    Collector Server 3

    Agent1

    <clamp id="introscope.enterprisemanager.metrics.historical.limit">

    <threshold value="6200000"/>

     

    Agent2

    <clamp id="introscope.enterprisemanager.metrics.historical.limit">

    <threshold value="6200000"/>



  • 2.  Re: Metric clamp issues - Threshold values

    Posted Jul 08, 2015 09:36 AM

    no, no precedence. introscope.enterprisemanager.metrics* is Per EM limit (not the agent). Takes into account metrics with Smartstor data (i.e. live and historical metrics)



  • 3.  Re: Metric clamp issues - Threshold values
    Best Answer

    Broadcom Employee
    Posted Jul 08, 2015 09:41 AM

    Hi,

     

    every EM can have different settings although I would suggest to have at least all collectors configured the same unless there is a very compelling reason not to. The MOM usually has a lot less metrics that the collectors. There is no precedence, every server has its own settings and the MOM does not overrule the collector settings. If you look at the "Historical metrics" metric you will see that.

     

    You are writing Agent1 and Agent2. I guess that you mean you have two collectors running on the same physical box, correct?

     

    But agents should not "drop off'. You won't be able to see any new metrics that the agents report to the EM.

     

    Quote from docs: When the Enterprise Manager detects that it has exceeded this limit, the Enterprise Manager stops all agents from registering new metrics. Old metrics continue to report.

     

    Ciao,

    Guenter



  • 4.  Re: Metric clamp issues - Threshold values

    Posted Jul 09, 2015 08:47 AM

    thanks for your answer Guenter,

    but just a follow up question - we do see that the agents do drop off from the super domain of MOM each time a metric clamp is reached on the Collector.

    Could that be becuase the collector is becoming unresponsive ? Would there be any other reason ?

    The no. of agents are well within the allowed limits.



  • 5.  Re: Metric clamp issues - Threshold values

    Posted Jul 09, 2015 10:21 AM

    Hi Venkit,

     

    We faced an issue on our environment where agents will drop off from one collector(where historical metric clamp breached) and re-connect to a different collector in the cluster. We then identified this functionality of reconnecting was done by MOM load balancing.

     

    The reason is when collector's metric clamp is breached, it will not accept any new metrics. So MOM when it is trying to load balance, it will check for over-loaded collector. When the collector is over loaded with too many historical metrics, MOM will re-direct some of those agents to under-loaded collector. But when the issue becomes worse(where all collectors are over-loaded with historical metrics), I noticed APM status console has some denied agents. So I suspect all collectors in cluster refuse to accept any agent.

     

    We overcome this by doing smartstor cleanup periodically. But you can also think about reducing excessive metrics from agent level itself.

     

    It's better to do end to end performance check of complete cluster to understand the behavior.

     

    Thanks,

    Karthik



  • 6.  Re: Metric clamp issues - Threshold values

    Broadcom Employee
    Posted Jul 09, 2015 02:58 PM

    Karthik Nalliyappan wrote:

     

    We overcome this by doing smartstor cleanup periodically. But you can also think about reducing excessive metrics from agent level itself.

    It's always better to cure the cause (metric explosion) than to ease the symptoms (high historical metric count)!

     

    Ciao,

    Guenter



  • 7.  Re: Metric clamp issues - Threshold values

    Posted Jul 10, 2015 01:53 AM

    You are right Guenter. Due to the excessive metrics, we happen to do cleanup very frequently nowadays. So we are going to check the agents which are sending too much of metrics.