DX Application Performance Management

Expand all | Collapse all

APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

Jump to Best Answer
  • 1.  APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

    Posted 12-14-2015 05:42 AM

    Hi

     

    I am looking at the 10.1 agents in POC. We define 2 collectors in the agent profile and do not use MOM load balancing.

     

    I shut down the primary collector and the agent failed over to its secondary OK, but when I started the primary collector up again, the java agent at 10.1 did not fail back.

    They stayed connected to the secondary collector defined in the profile. Is this behaviour expected ?

    The 9.5 agents in this cluster have actually failed back, but the 10.1 agents have not.

     

    I have this set in the profile......

    introscope.agent.enterprisemanager.failbackRetryIntervalInSeconds=120

     

     

    We get this in the agent log......

     

    12/14/15 10:04:31 AM GMT [WARN] [IntroscopeAgent.ConnectionThread] Failed to re-connect to the Introscope Enterprise Manager at sxep.wload.barclays.co.uk:25318,com.wily.isengard.postofficehub.link.net.DefaultSocketFactory (1).

    12/14/15 10:04:31 AM GMT [INFO] [IntroscopeAgent.IsengardServerConnectionManager] Connected controllable Agent to the Introscope Enterprise Manager at zeqc.wload.barclays.co.uk:25318,com.wily.isengard.postofficehub.link.net.DefaultSocketFactory. Host = "gbrdsr000000577", Process = "WebSphere", Agent Name = "SIT/gbrdsr000000577_wasadmin1-bmbmobilegatewayR6-server01", Active = "true".

    12/14/15 10:04:31 AM GMT [INFO] [IntroscopeAgent.ConnectionThread] New list {}@0 downloaded from zeqc.wload.barclays.co.uk:25318,com.wily.isengard.postofficehub.link.net.DefaultSocketFactory

    12/14/15 10:04:31 AM GMT [INFO] [IntroscopeAgent.Agent] New list accepted

    12/14/15 10:04:31 AM GMT [INFO] [IntroscopeAgent.ConnectionThread] Connected to zeqc.wload.barclays.co.uk:25318,com.wily.isengard.postofficehub.link.net.DefaultSocketFactory in allowed mode.



  • 2.  Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

    Posted 12-15-2015 03:47 AM

    Hi Dave,

     

    Is this a cluster or standalone environment?

     

    As far as I know, starting from 9.1 onwards the failback retry capability only work when agents connect to a collector in disallowed mode.

    In your case, the agent doesn’t move to the primary EM as it is already connected to the Backup EM.

     

    https://docops.ca.com/display/APMDEVOPS101/Java+Agent+Properties#JavaAgentProperties-introscope.agent.enterprisemanager.connectionorder

     

    To achieve the pre 9.1 behavior (failback retry) you would need to:

     

    Option 1) latch the agent in the LB xml as below for example:

     

    Let say in your Agent profile you have something like:

    1. introscope.agent.enterprisemanager.connectionorder=EM1,EM2

     

    You would need to update your LB xml as below:

     

    <agent-collector name="Example">

    <agent-specifier>.*\|.*\|.*</agent-specifier>

           <include>

    <collector host="EM1" port="5001" latched="true"/>

    <collector host="EM2" port="5001"/>

    </include>

    </agent-collector>

     

    Make sure to update the agent-specifier accordantly to your requirements.

    And, in the MOM properties, set introscope.apm.agentcontrol.agent.emlistlookup.enable=true

     

    Option 2) Force to move the agent to the preferred collector by adding a new agent-collector entry in the load balancing xml.

     

    For any of the above suggestions, you would need to wait for the rebalancing (10 min by default, introscope.enterprisemanager.loadbalancing.interval=600)

     

    I hope this helps,

     

    Regards,

    Sergio



  • 3.  Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

    Posted 12-15-2015 04:29 AM

    Hi Sergio. Yes this is a clustered solution, MOM + 4 collectors.

     

    So what you are saying is that this IS expected behaviour for 9.5 agents ? Thats OK.

     

    Its not a major issue for us as long as I know that is the case and we know what to expect. We have been rolling out 9.5 agents this calendar year but we haven't had any collector outages so I have not seen this before.

    When we were at 9.1 and earlier, the agents would fail back to their primary collector whenever it became available so we used that information when we were patching the EM infrastructure. we used to check the agent count and move on to the next collector once the agent load was back to the same level. That won't be the case now.

     

    Is my understanding correct there ?



  • 4.  Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?
    Best Answer

    Posted 12-15-2015 04:47 AM

    Hi Dave,

    Yes, that is correct.

    To ensure your pre 9.1 behavior is intact you would to enter these agent-collector connections explicitly in LB xml (via include, latch option) as mentioned in previous note and example.

    Regards,

    Sergio



  • 5.  Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

    Posted 12-15-2015 04:54 AM

    Thanks for the clarification Sergio.