Unable to update load balancing for collector

Back to discussions

Expand all | Collapse all

Jump to Best Answer

1. Unable to update load balancing for collector

1 Recommend
osvaldog
Posted Mar 23, 2017 09:22 AM

Reply Reply Privately
Hi everybody, I'm having an issue in my monitors enviroment, node tree doesn't show metrics and log's MOM show this:

3/23/17 10:06:40.096 AM GMT-03:00 [ERROR] [ClusterManager Async Executor] [Manager] Error registering with ARPAPP064@5001
3/23/17 10:06:40.096 AM GMT-03:00 [INFO] [ClusterManager Async Executor] [Manager.GeoLocation] Register GeoLocation Message Service for Collector-15
3/23/17 10:06:40.096 AM GMT-03:00 [INFO] [ClusterManager Async Executor] [Manager.GeoLocation] GeoLocationProvider registration failed for collector: ARPAPP064@5001com.wily.isengard.message.MessageUndeliverableException: Outgoing mailbox is closed. Message cannot be sent
3/23/17 10:06:40.217 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP064@6001
3/23/17 10:06:40.218 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: buaplnx009@5001
3/23/17 10:06:46.113 AM GMT-03:00 [WARN] [ClusterManager Async Executor] [Manager] Unable to update load balancing for collector "ARPAPP068@6001"
com.wily.isengard.message.MessageUndeliverableException
at com.wily.isengard.messageprimitives.service.MessageServiceClient.sendRequest(MessageServiceClient.java:173)
at com.wily.isengard.messageprimitives.service.MessageServiceClient.invoke(MessageServiceClient.java:356)
at com.sun.proxy.$Proxy218.removeCollector(Unknown Source)
at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadBalancer.removeCollector(ClusteredLoadBalancer.java:733)
at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadBalancerBean.collectorsRemoved(ClusteredLoadBalancerBean.java:436)
at com.wily.introscope.spec.server.beans.clusters.ClusterNotification.dataRemoved(ClusterNotification.java:28)
at com.wily.isengard.ongoingquery.AbstractQueryServiceManager$NotifyRemoved.run(AbstractQueryServiceManager.java:438)
at com.wily.isengard.ongoingquery.QueryServiceManager2$1.execute(QueryServiceManager2.java:46)
at com.wily.isengard.ongoingquery.QueryServiceManager2.runNotification(QueryServiceManager2.java:85)
at com.wily.isengard.ongoingquery.AbstractQueryServiceManager.stateRemoved(AbstractQueryServiceManager.java:231)
at com.wily.introscope.server.beans.AOngoingQueriableBean.stateRemoved(AOngoingQueriableBean.java:89)
at com.wily.introscope.server.beans.clusters.ClusterManager.collectorRemoved(ClusterManager.java:277)
at com.wily.introscope.server.beans.clusters.ClusterManager.access$1(ClusterManager.java:261)
at com.wily.introscope.server.beans.clusters.ClusterManager$CollectorRemovedCommand.run(ClusterManager.java:376)
at com.wily.EDU.oswego.cs.dl.util.concurrent.QueuedExecutor$RunLoop.run(QueuedExecutor.java:88)
at java.lang.Thread.run(Unknown Source)
3/23/17 10:06:46.129 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP068@6001
3/23/17 10:06:55.166 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP064@5001
3/23/17 10:06:55.166 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: buaplnx010@5001
3/23/17 10:07:02.077 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP068@5001
3/23/17 10:07:02.077 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP064@6001
3/23/17 10:07:02.079 AM GMT-03:00 [WARN] [ClusterManager Async Executor] [Manager] Unable to update load balancing for collector "ARPAPP068@5001"
com.wily.isengard.message.MessageUndeliverableException
at com.wily.isengard.messageprimitives.service.MessageServiceClient.sendRequest(MessageServiceClient.java:173)
at com.wily.isengard.messageprimitives.service.MessageServiceClient.invoke(MessageServiceClient.java:356)
at com.sun.proxy.$Proxy218.removeCollector(Unknown Source)
at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadBalancer.removeCollector(ClusteredLoadBalancer.java:733)
at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadBalancerBean.collectorsRemoved(ClusteredLoadBalancerBean.java:436)
at com.wily.introscope.spec.server.beans.clusters.ClusterNotification.dataRemoved(ClusterNotification.java:28)
at com.wily.isengard.ongoingquery.AbstractQueryServiceManager$NotifyRemoved.run(AbstractQueryServiceManager.java:438)
at com.wily.isengard.ongoingquery.QueryServiceManager2$1.execute(QueryServiceManager2.java:46)
at com.wily.isengard.ongoingquery.QueryServiceManager2.runNotification(QueryServiceManager2.java:85)
at com.wily.isengard.ongoingquery.AbstractQueryServiceManager.stateRemoved(AbstractQueryServiceManager.java:231)
at com.wily.introscope.server.beans.AOngoingQueriableBean.stateRemoved(AOngoingQueriableBean.java:89)
at com.wily.introscope.server.beans.clusters.ClusterManager.collectorRemoved(ClusterManager.java:277)
at com.wily.introscope.server.beans.clusters.ClusterManager.access$1(ClusterManager.java:261)
at com.wily.introscope.server.beans.clusters.ClusterManager$CollectorRemovedCommand.run(ClusterManager.java:376)
at com.wily.EDU.oswego.cs.dl.util.concurrent.QueuedExecutor$RunLoop.run(QueuedExecutor.java:88)
at java.lang.Thread.run(Unknown Source)
3/23/17 10:07:02.079 AM GMT-03:00 [INFO] [ClusterManager Async Executor] [Manager] BaselineAlertMoMService: Detected collector disconnected: buaplnx009@5001
3/23/17 10:07:02.079 AM GMT-03:00 [ERROR] [ClusterManager Async Executor] [Manager] Uncaught Exception in Enterprise Manager: In thread ClusterManager Async Executor and the message is java.lang.NullPointerException
3/23/17 10:07:02.126 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP064@5001
3/23/17 10:07:02.126 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: buaplnx010@5001
3/23/17 10:07:04.283 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP064@6001
3/23/17 10:07:10.179 AM GMT-03:00 [ERROR] [Alarm Pooled Worker] [Manager] Uncaught Exception in Enterprise Manager: In thread Alarm Pooled Worker and the message is java.lang.IllegalStateException: This is not a simple alert.
3/23/17 10:07:25.176 AM GMT-03:00 [ERROR] [Alarm Pooled Worker] [Manager] Uncaught Exception in Enterprise Manager: In thread Alarm Pooled Worker and the message is java.lang.IllegalStateException: This is not a simple alert.
3/23/17 10:07:27.285 AM GMT-03:00 [INFO] [ClusterManager Async Executor] [Manager] BaselineAlertMoMService: Detected collector disconnected: ARPAPP064@6001
3/23/17 10:07:27.286 AM GMT-03:00 [ERROR] [ClusterManager Async Executor] [Manager] Uncaught Exception in Enterprise Manager: In thread ClusterManager Async Executor and the message is java.lang.NullPointerException
3/23/17 10:07:55.229 AM GMT-03:00 [ERROR] [Alarm Pooled Worker] [Manager] Uncaught Exception in Enterprise Manager: In thread Alarm Pooled Worker and the message is java.lang.IllegalStateException: This is not a simple alert.
2. Re: Unable to update load balancing for collector
Best Answer

3 Recommend
Broadcom Employee

Matthew Muskaloon
Posted Mar 23, 2017 09:44 AM

Reply Reply Privately
Please check MOM log and the Collector log for ARPAPP068. Just from above it appears that either MOM disconnected the Collector or the Collector was brought down or went down on its own. Either way its definitely a disconnection between itself and MOM, thing is to find out what caused it.
3. Re: Unable to update load balancing for collector

0 Recommend
osvaldog
Posted Mar 23, 2017 10:02 AM

Reply Reply Privately
Thanks very much Musma, it seems to be a memory issue in MOM's server, we request a restart from it and we will see what happen
4. Re: Unable to update load balancing for collector

5 Recommend
Broadcom Employee

Sergio Morales Correa
Posted Mar 28, 2017 10:23 AM

Reply Reply Privately
Hi Osvaldo,

Indeed, this looks like a capacity/resource issue.
If after increasing the memory the problem persists, open a support case, we would need to do a log analysis.
Below the list of the 5 top common performance issues we see here in support.
In the EM (Mom and collector) logs, search for the below keywords:

1-reported Metric clamp hit
Example:
[INFO] [PO:client_main Mailman 2] [Manager] Collector jhbpsr020000011@25318 reported Metric clamp hit.
[WARN] [Harvest Engine Pooled Worker] [Manager.Agent] [The EM has too many historical metrics reporting from Agents and will stop accepting new metrics from Agents. Current count

Recommendation Increase the clamp in the EM/collectors apm-events-thresholds-config.xml, however, any increase of the default value will have an impact in the overall performacne
Also remember once a Collector has hit its metric clamp, MOM won't redirect Agent to that Collector any more. And if all Collectors hit the clamp at some point, MOM won't find any Collectors for the incoming Agent.
When we are in situation:
- If it's a 91+Agent, MOM will keep the Agent with it in disallowed mode. When this list grows too big, MOM will have trouble keeping up with the connections and taking any new connection.
- if it's a pre-91 agent, MOM will reject the Agent and the Agent will come back again later. This will also add connection load to the MOM.

2- reached
Example:
[WARN] [Dispatcher 1] [Manager] Timed out adding to outgoing message queue. Limit of 3000 reached. Terminating connection: Node=Workstation_29, Address=22.240.96.38/22.240.96.38:46896, Type=socket

Recommendation:
Add /Increase the following properties to the MOM and Collectors properties files. The impact of these changes will be in memory.
transport.outgoingMessageQueueSize=10000
transport.override.isengard.high.concurrency.pool.min.size=10
transport.override.isengard.high.concurrency.pool.max.size=10

3-slowly
Example:
[VERBOSE] [PO:main Mailman 6] [Manager.Cluster] Outgoing message queue is moving slowly: Node=Server, Address=/22.240.96.8:25318, Type=socket

Recommendation:
This could be due to the huge smartstor metadata / historical data.
Have you increased the live/historical metric clamp?

If you are running CLworkstation queries, it could be due to the huge queries, set
introscope.enterprisemanager.query.datapointlimit=5000000
introscope.enterprisemanager.query.returneddatapointlimit=1000000

4- too many
Example:
java.io.IOException: Too many open files

Recommendation: Make sure the max open file handle is at least 4096 on both MOM and Collectors. You can check current open file descriptors by using “ulimit -n” against the user who starts EM processes. You might need to increase the maximum number of open files allowed for that user.

5- outofmemory
Example:
java.lang.OutOfMemoryError: GC overhead limit exceeded

Recommendation: Increase heapsize by 2GB
Or
java.lang.OutOfMemoryError: PermGen space

Recommendation: Increase -XX:MaxPermSize

And of course search for any [ERROR] message

I hope this helps,
Regards,
Sergio
5. Re: Unable to update load balancing for collector

0 Recommend
osvaldog
Posted Mar 28, 2017 10:58 AM

Reply Reply Privately
Thanks very much, very clear, complete, and useful info.

DX Application Performance Management

Unable to update load balancing for collector

osvaldogMar 23, 2017 09:22 AM

Matthew MuskaloonMar 23, 2017 09:44 AMBest Answer

osvaldogMar 23, 2017 10:02 AM

Sergio Morales CorreaMar 28, 2017 10:23 AM

osvaldogMar 28, 2017 10:58 AM

1. Unable to update load balancing for collector

2. Re: Unable to update load balancing for collector Best Answer

3. Re: Unable to update load balancing for collector

4. Re: Unable to update load balancing for collector

5. Re: Unable to update load balancing for collector

2. Re: Unable to update load balancing for collector
Best Answer