DX Application Performance Management

 View Only
  • 1.  max limit of APM agents per APM cluster/collector

    Posted May 05, 2015 10:21 AM

    hi all,

     

    we are trying to create APM cluster (v9.7.1) with 10 collectors that will accommodate as many agents as possible. Our plan is to have 800 agents per collector each sending at most 500 KB of live metrics per interval, giving in total 400 000 KB of live metrics per collector and 8000 agents per APM cluster. Is it possible to have this amount of agents (while still sticking with officially recommended 400 000 KB of live metrics per collector)? Any field experience with such a big deployments? I know standard recommendation is to keep at most 400 agents per collector. Please advise.

     

    regards,

     

    Adam Bezecny



  • 2.  Re: max limit of APM agents per APM cluster/collector
    Best Answer

    Posted May 07, 2015 10:57 AM

    i implemented a EM Cluster with 4000+ agents and there are a few things so far to mention:

    - the initial loading of all the agents in to the client can be a problem

    - when the agents start loadbalancing it is as well a massive load and could lead to problems

    - if you let the mom loadbalance freely you might end up with 10x the number you calculated from a historical count perspective per collector

    - the agents really need to be controlled to not suddenly deliver massive amount of metrics

    - the historical metric count has an impact as well depending on how long you store the metric data and over time the enviornment could get slower

    - as with every sizing give enough power and headroom for the collectors and mom to work with

     

    but in general it works well and depending on the issue you are face you might have to analyze, configure and test things no one else did before not even CA



  • 3.  Re: max limit of APM agents per APM cluster/collector

    Posted May 08, 2015 04:11 AM

    The big problem is, when you run an env that close to capacity, what happens if one or two collectors need restarting?

    The load will be spread to the rest of the collectors and might cause them to fail thus causing more outages and the whole thing falls over like dominos

    My rule of thumb is to keep the env below capacity so in an failure event the rest of the collectors can sustain the increased load until the failing collectors can be accessed again.



  • 4.  Re: max limit of APM agents per APM cluster/collector

    Posted May 08, 2015 08:50 AM

    this is what i wanted to say with "if you let the mom loadbalance freely you might end up with 10x the number you calculated from a historical count perspective per collector"

     

    one way around it, but loosing the capability of an activ/activ solution is to not allow loadbalancing to happen and forcing the agents to stick on to one collector

    or you force them on to e.g. a collegtor pair so you have limited active/active capabilities.

     

    in any case you need to calculate the historical metric count per collector based on the way you loadbalance the agents.



  • 5.  Re: max limit of APM agents per APM cluster/collector

    Posted May 18, 2015 01:01 PM

    Disclaimer: I am in Presales AND I like to follow the rules. Yes, it's true.  SO my concerns would be:

    • heap footprint as it relates to historical metric count
    • logically assigned agents per collector - (ie a dashboard or report querying across multiple collectors for a single view, add in historical time range...you're only as fast as your slowest collector)
    • lack of headroom to handle the unexpected (new agents, metric name changes, failing collector, collector restarts required...

    I wish you well and would like to hear how well the plan works out!