DX Application Performance Management

 View Only
Expand all | Collapse all

Disconnected Historical Agent Limit

  • 1.  Disconnected Historical Agent Limit

    Posted Aug 16, 2018 10:31 AM

    APM Version : 10.5.2.92

    Total Agents: 1580

    Number of Collectors: 9

     

    On the APM Status Console, we have an active clamp on one of our collectors:

    introscope.enterprisemanager.disconnected.historical.agent.limitcollector009@50014004005:36:27 07/22/18

    The number of agents on the nine collectors vary between 118 - 250.  Typically once a month, the APM is restarted to pick up OS patches and during that time, the agents will jump between collectors during start up then will level out after a few hours of load-balancing.  The problem with that is, during this process a collector may have had over 400 agents connect and disconnect till the agents are load balanced.

     

    Question time:

     

    1.  What is the behavior of a collector that has a historical.agent.limit clamp (400)?

     

    2.  Depending on the resulting behavior of a collector, is this really an error level clamp/limit?

     

    3.  Again, depending on the resulting behavior, is there a way to address this issue without resorting to assigning agents to specific collector or collectors, basically artificially dividing the agents into groups of less than 400 per collector? 

     

    4.  What is the APM cluster impact when one, or more of the collectors are reporting the disconnected.historical.agent.limit?  In the nut-shell, when do I as the APM admin need to take preventative actions?



  • 2.  Re: Disconnected Historical Agent Limit
    Best Answer

    Broadcom Employee
    Posted Aug 16, 2018 12:42 PM

    The following reference might be helpful. The section "introscope.enterprisemanager.disconnected.historical.agent.limit" describes why this clamp occurs, how the EM responds, and has some suggestions on how to address. 

    apm-events-thresholds-config.xml - CA Application Performance Management - 10.5 - CA Technologies Documentation 

     

    Just some suggestions to check:

     

    "If there is no historical agent that the Enterprise Manager can automatically unmount, this means that CA APM users mounted manually all the disconnected historical agents. The Enterprise Manager never tries to unmount a disconnected historical agent that a CA APM user mounted manually."

     

    Would you have mounted any agents manually?

     

    "

    • Workstation displays an error message instructing the user to unmount some historical agents to make room to mount new historical agents."

    Would you see such a message?

     

    My suggestion would be to try to use introscope.enterprisemanager.loadbalancing.staywithhistoricalcollector=always to  prevent the clamp issue during the startup.

     

    I have covered the most common issues and recommendations regarding clustering in this KB

    Introscope Enterprise Manager Troubleshooting and - CA Knowledge 

     

    See point # 15 that covers loadbalancing



  • 3.  Re: Disconnected Historical Agent Limit

    Posted Aug 16, 2018 03:13 PM

    So, read through the doc, and thought, let us go look for these disconnected and mounted, agents, which I would typically go to the custom metric host to locate any agent that is grayed out.

     

    • *SuperDomain*|Custom Metric Host (Virtual)|Custom Metric Process (Virtual)|Custom Metric Agent (Virtual) (collector009.aessuccess.org@5001)|Agents

     

    To my surprise, there are no agents that are grayed out.  Then clicked on the Agents folder and did a search for "ConnectionStatus" and all have a value of 1.

     

    I would expect to see a hundred or so, mounted but disconnected (greyed out) agents under the Agents folder but I don't.  The collector has around 177 total agents but the APM status console is still reporting the active clamp on the

     

    introscope.enterprisemanager.disconnected.historical.agent.limitcollector@50014004005:36:27 07/22/18

     

    I couldn't find anywhere on the custom metric host where there was a metric that I could use to gauge against the APM console.

     

    Looking at the clamp line, the clamp occurred on July 22.  Our agents unmounts after 24 hours of being disconnected. 

     

    So it looks like a ghost message that is stuck in APM status console since I am not able to see any signs that the collector associated to the active clamp has the specific condition.

     

     

    Anyone know how to kick the APM console so it will check again and clear the message?



  • 4.  Re: Disconnected Historical Agent Limit

    Posted Aug 16, 2018 04:39 PM

    Just some suggestions to check:

     

    "If there is no historical agent that the Enterprise Manager can automatically unmount, this means that CA APM users mounted manually all the disconnected historical agents. The Enterprise Manager never tries to unmount a disconnected historical agent that a CA APM user mounted manually."

     

    Would you have mounted any agents manually?

     

    "

    • Workstation displays an error message instructing the user to unmount some historical agents to make room to mount new historical agents."

    Would you see such a message?

     

    Francis



  • 5.  Re: Disconnected Historical Agent Limit

    Posted Aug 17, 2018 06:44 AM

    Thank you Francis.

     

    There are only two people with administration rights which includes mounting and unmounting agents.  Neither of us have mounted or unmounted any agents. 

     

    We haven't seen any messages on unmounting historic agents to make room.

     

    Thank you again,

     

    Billy



  • 6.  Re: Disconnected Historical Agent Limit

    Broadcom Employee
    Posted Aug 17, 2018 05:23 AM

    Hi Billy,

    My suggestion would be to try to use introscope.enterprisemanager.loadbalancing.staywithhistoricalcollector=always to  prevent the clamp issue during the startup.

     

    I have covered the most common issues and recommendations regarding clustering in this KB

    Introscope Enterprise Manager Troubleshooting and - CA Knowledge 

     

    See point # 15 that covers loadbalancing

     

    I hope this helps,

    Regards,

    Sergio



  • 7.  Re: Disconnected Historical Agent Limit

    Posted Aug 17, 2018 07:02 AM

    Thank you Sergio.

     

    I will be going through the KB line for line against our new 10.5.2 cluster. 

     

    On the load balance, I could see where setting to staywithhistoricalcollector to always would help, but currently it does not appear like the collector that has the warning about the historic agent limit has any disconnected mounted agents.  So it appears like the APM Status Console thinks there clamp of historic agents but there does not appear  to be.

     

    The setting to always, how does that impact failures, if one of the collectors were to fail or unable to accept agents, will the agents move to a different collector till the failed collector returned to service?

     

    The message was from July 22, and I can reason it was due to a cluster restart.  I would expect that after 24 hours, the disconnected agents would unmount due to the introscope.enterprisemanager.autoUnmountDelayInMinutes=1440, and I'm guessing that the historic.agent.limit shouldn't count unmounted agents.

     

    On a cluster restart, I would expect to see quite a few APM Status Console messages, which should clear after the cluster has became balanced plus 24 hours to unmount the disconnected agents.

     

     

     

    Thank you,

     

    Billy



  • 8.  Re: Disconnected Historical Agent Limit

    Broadcom Employee
    Posted Aug 19, 2018 06:55 PM

    Hi Billy,

    Sergio also just updated this older KB covering that property: Tip for loadbalancing configuration when upgrading - CA Knowledge 

    Sergio can confirm but I believe introscope.enterprisemanager.loadbalancing.staywithhistoricalcollector=always means that:

    - as long as the Collector is up the agent will wait for a connection to it even if if it is overloaded.

    - if the Collector is down the agent will be redirected to another Collector

     

    Hope that helps

     

    Regards,

     

    Lynn



  • 9.  Re: Disconnected Historical Agent Limit

    Posted Aug 20, 2018 01:41 PM

    Thank you Lynn.

     

    Now, I'm a bit confused how this setting might help in my case since at least once a month, all collectors are stopped which would trigger the second clause.  Then during starting the collectors, the agents more than likely will get to the collector that is the one it is trying on it's original collector list before the collectors/MOM were shutdown (MOM first, so that the collector list is not updated).

     

    Could it be that this specific APM Status Console message/alert has the alert trigger set to "Whenever Severity Increases" and not to "Whenever Severity Changes", thus no clearing the active clamp since I do not see any metrics showing that the collector has more than a few hundred active agents and don't really see any metrics on the custom metric host that might be a historic agent count.  But then again, I could be missing the metric driving the active clamp alert.

     

    Regards,

     

    Billy



  • 10.  Re: Disconnected Historical Agent Limit

    Broadcom Employee
    Posted Aug 20, 2018 09:07 PM

    Hi Billy,

    SergioMorales for any more input he may have.

    I am thinking that even though agents might initially get to their preferred Collector during the startup process, as the MOM tries to load balance the metric load across the Collectors during the dynamic startup period the suggested setting could still help to avoid agents being subsequently moved around Collectors.

    Regarding the alert not clearing from APM Status Console perhaps you can create a support case on that so we can research it in more detail as to why the alert is not clearing.

     

    Thanks

     

    Lynn



  • 11.  Re: Disconnected Historical Agent Limit

    Posted Aug 21, 2018 07:57 AM

    Thank you Lynn.

     

    I have opened support case 01171999



  • 12.  Re: Disconnected Historical Agent Limit

    Broadcom Employee
    Posted Aug 17, 2018 08:33 AM

    Dear Billy:

       It is always great to see a good conversation taking place. I combined Tom's, Sergio's and Francis's responses into one response

     

    Thanks

    Hal



  • 13.  Re: Disconnected Historical Agent Limit

    Posted Sep 18, 2018 10:49 AM

    Support Case: 01171999

    APM Status Console - reporting "disconnected historic agent limit" - active clamp

     

    We found that there was a reporting/refresh issue with the APM Status Console, where when agents stopped reporting, and there were more than the out of the box setting of 400, the active clamp alert would appear and would not clear even after the agents had unmounted.

     

     

    1. We deployed a revised "/product/enterprisemanager/plugins/com.wily.introscope.em_10.5.2.jar"

     

    2. Set the File: apm-events-thresholds-config.xml "introscope.enterprisemanager.disconnected.historical.agent.limit" threshold value="1"

     

    3. Shutdown an epagent

     

    4. Manually unmounted the agent

     

    5. After a short bit, the APM Status Console cleared the active clamp notice.

     

     

    Hope this helps,

     

    Billy



  • 14.  Re: Disconnected Historical Agent Limit

    Broadcom Employee
    Posted Sep 18, 2018 06:18 PM

    Thank-you Billy Cole for letting the Community know