APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

View Only

Back to discussions

Expand all | Collapse all

APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

Jump to Best Answer

1. APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

0 Recommend
David Richards
Posted Dec 14, 2015 05:42 AM

Reply Reply Privately
Hi

I am looking at the 10.1 agents in POC. We define 2 collectors in the agent profile and do not use MOM load balancing.

I shut down the primary collector and the agent failed over to its secondary OK, but when I started the primary collector up again, the java agent at 10.1 did not fail back.
They stayed connected to the secondary collector defined in the profile. Is this behaviour expected ?
The 9.5 agents in this cluster have actually failed back, but the 10.1 agents have not.

I have this set in the profile......
introscope.agent.enterprisemanager.failbackRetryIntervalInSeconds=120

We get this in the agent log......

12/14/15 10:04:31 AM GMT [WARN] [IntroscopeAgent.ConnectionThread] Failed to re-connect to the Introscope Enterprise Manager at sxep.wload.barclays.co.uk:25318,com.wily.isengard.postofficehub.link.net.DefaultSocketFactory (1).
12/14/15 10:04:31 AM GMT [INFO] [IntroscopeAgent.IsengardServerConnectionManager] Connected controllable Agent to the Introscope Enterprise Manager at zeqc.wload.barclays.co.uk:25318,com.wily.isengard.postofficehub.link.net.DefaultSocketFactory. Host = "gbrdsr000000577", Process = "WebSphere", Agent Name = "SIT/gbrdsr000000577_wasadmin1-bmbmobilegatewayR6-server01", Active = "true".
12/14/15 10:04:31 AM GMT [INFO] [IntroscopeAgent.ConnectionThread] New list {}@0 downloaded from zeqc.wload.barclays.co.uk:25318,com.wily.isengard.postofficehub.link.net.DefaultSocketFactory
12/14/15 10:04:31 AM GMT [INFO] [IntroscopeAgent.Agent] New list accepted
12/14/15 10:04:31 AM GMT [INFO] [IntroscopeAgent.ConnectionThread] Connected to zeqc.wload.barclays.co.uk:25318,com.wily.isengard.postofficehub.link.net.DefaultSocketFactory in allowed mode.
2. Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

0 Recommend
Broadcom Employee

Sergio Morales Correa
Posted Dec 15, 2015 03:47 AM

Reply Reply Privately
Hi Dave,

Is this a cluster or standalone environment?

As far as I know, starting from 9.1 onwards the failback retry capability only work when agents connect to a collector in disallowed mode.
In your case, the agent doesn’t move to the primary EM as it is already connected to the Backup EM.

https://docops.ca.com/display/APMDEVOPS101/Java+Agent+Properties#JavaAgentProperties-introscope.agent.enterprisemanager.connectionorder

To achieve the pre 9.1 behavior (failback retry) you would need to:

Option 1) latch the agent in the LB xml as below for example:

Let say in your Agent profile you have something like:
introscope.agent.enterprisemanager.connectionorder=EM1,EM2

You would need to update your LB xml as below:

<agent-collector name="Example">
<agent-specifier>.*\|.*\|.*</agent-specifier>
<include>
<collector host="EM1" port="5001" latched="true"/>
<collector host="EM2" port="5001"/>
</include>
</agent-collector>

Make sure to update the agent-specifier accordantly to your requirements.
And, in the MOM properties, set introscope.apm.agentcontrol.agent.emlistlookup.enable=true

Option 2) Force to move the agent to the preferred collector by adding a new agent-collector entry in the load balancing xml.

For any of the above suggestions, you would need to wait for the rebalancing (10 min by default, introscope.enterprisemanager.loadbalancing.interval=600)

I hope this helps,

Regards,
Sergio
3. Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

0 Recommend
David Richards
Posted Dec 15, 2015 04:29 AM

Reply Reply Privately
Hi Sergio. Yes this is a clustered solution, MOM + 4 collectors.

So what you are saying is that this IS expected behaviour for 9.5 agents ? Thats OK.

Its not a major issue for us as long as I know that is the case and we know what to expect. We have been rolling out 9.5 agents this calendar year but we haven't had any collector outages so I have not seen this before.
When we were at 9.1 and earlier, the agents would fail back to their primary collector whenever it became available so we used that information when we were patching the EM infrastructure. we used to check the agent count and move on to the next collector once the agent load was back to the same level. That won't be the case now.

Is my understanding correct there ?
4. Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?
Best Answer

1 Recommend
Broadcom Employee

Sergio Morales Correa
Posted Dec 15, 2015 04:47 AM

Reply Reply Privately
Hi Dave,
Yes, that is correct.
To ensure your pre 9.1 behavior is intact you would to enter these agent-collector connections explicitly in LB xml (via include, latch option) as mentioned in previous note and example.
Regards,
Sergio
5. Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

0 Recommend
David Richards
Posted Dec 15, 2015 04:54 AM

Reply Reply Privately
Thanks for the clarification Sergio.

DX Application Performance Management

APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

David RichardsDec 14, 2015 05:42 AM

Sergio Morales CorreaDec 15, 2015 03:47 AM

David RichardsDec 15, 2015 04:29 AM

Sergio Morales CorreaDec 15, 2015 04:47 AMBest Answer

David RichardsDec 15, 2015 04:54 AM

1. APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

2. Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

3. Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

4. Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ? Best Answer

5. Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?

4. Re: APM 10.1 - java agent - failbackRetryIntervalInSeconds - has the behavior changed ?
Best Answer