DX Application Performance Management

 View Only
  • 1.  Unable to update load balancing for collector

    Posted Mar 23, 2017 09:22 AM

    Hi everybody, I'm having an issue in my monitors enviroment, node tree doesn't show metrics and log's MOM show this:

     

    3/23/17 10:06:40.096 AM GMT-03:00 [ERROR] [ClusterManager Async Executor] [Manager] Error registering with ARPAPP064@5001
    3/23/17 10:06:40.096 AM GMT-03:00 [INFO] [ClusterManager Async Executor] [Manager.GeoLocation] Register GeoLocation Message Service for Collector-15
    3/23/17 10:06:40.096 AM GMT-03:00 [INFO] [ClusterManager Async Executor] [Manager.GeoLocation] GeoLocationProvider registration failed for collector: ARPAPP064@5001com.wily.isengard.message.MessageUndeliverableException: Outgoing mailbox is closed. Message cannot be sent
    3/23/17 10:06:40.217 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP064@6001
    3/23/17 10:06:40.218 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: buaplnx009@5001
    3/23/17 10:06:46.113 AM GMT-03:00 [WARN] [ClusterManager Async Executor] [Manager] Unable to update load balancing for collector "ARPAPP068@6001"
    com.wily.isengard.message.MessageUndeliverableException
    at com.wily.isengard.messageprimitives.service.MessageServiceClient.sendRequest(MessageServiceClient.java:173)
    at com.wily.isengard.messageprimitives.service.MessageServiceClient.invoke(MessageServiceClient.java:356)
    at com.sun.proxy.$Proxy218.removeCollector(Unknown Source)
    at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadBalancer.removeCollector(ClusteredLoadBalancer.java:733)
    at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadBalancerBean.collectorsRemoved(ClusteredLoadBalancerBean.java:436)
    at com.wily.introscope.spec.server.beans.clusters.ClusterNotification.dataRemoved(ClusterNotification.java:28)
    at com.wily.isengard.ongoingquery.AbstractQueryServiceManager$NotifyRemoved.run(AbstractQueryServiceManager.java:438)
    at com.wily.isengard.ongoingquery.QueryServiceManager2$1.execute(QueryServiceManager2.java:46)
    at com.wily.isengard.ongoingquery.QueryServiceManager2.runNotification(QueryServiceManager2.java:85)
    at com.wily.isengard.ongoingquery.AbstractQueryServiceManager.stateRemoved(AbstractQueryServiceManager.java:231)
    at com.wily.introscope.server.beans.AOngoingQueriableBean.stateRemoved(AOngoingQueriableBean.java:89)
    at com.wily.introscope.server.beans.clusters.ClusterManager.collectorRemoved(ClusterManager.java:277)
    at com.wily.introscope.server.beans.clusters.ClusterManager.access$1(ClusterManager.java:261)
    at com.wily.introscope.server.beans.clusters.ClusterManager$CollectorRemovedCommand.run(ClusterManager.java:376)
    at com.wily.EDU.oswego.cs.dl.util.concurrent.QueuedExecutor$RunLoop.run(QueuedExecutor.java:88)
    at java.lang.Thread.run(Unknown Source)
    3/23/17 10:06:46.129 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP068@6001
    3/23/17 10:06:55.166 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP064@5001
    3/23/17 10:06:55.166 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: buaplnx010@5001
    3/23/17 10:07:02.077 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP068@5001
    3/23/17 10:07:02.077 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP064@6001
    3/23/17 10:07:02.079 AM GMT-03:00 [WARN] [ClusterManager Async Executor] [Manager] Unable to update load balancing for collector "ARPAPP068@5001"
    com.wily.isengard.message.MessageUndeliverableException
    at com.wily.isengard.messageprimitives.service.MessageServiceClient.sendRequest(MessageServiceClient.java:173)
    at com.wily.isengard.messageprimitives.service.MessageServiceClient.invoke(MessageServiceClient.java:356)
    at com.sun.proxy.$Proxy218.removeCollector(Unknown Source)
    at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadBalancer.removeCollector(ClusteredLoadBalancer.java:733)
    at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadBalancerBean.collectorsRemoved(ClusteredLoadBalancerBean.java:436)
    at com.wily.introscope.spec.server.beans.clusters.ClusterNotification.dataRemoved(ClusterNotification.java:28)
    at com.wily.isengard.ongoingquery.AbstractQueryServiceManager$NotifyRemoved.run(AbstractQueryServiceManager.java:438)
    at com.wily.isengard.ongoingquery.QueryServiceManager2$1.execute(QueryServiceManager2.java:46)
    at com.wily.isengard.ongoingquery.QueryServiceManager2.runNotification(QueryServiceManager2.java:85)
    at com.wily.isengard.ongoingquery.AbstractQueryServiceManager.stateRemoved(AbstractQueryServiceManager.java:231)
    at com.wily.introscope.server.beans.AOngoingQueriableBean.stateRemoved(AOngoingQueriableBean.java:89)
    at com.wily.introscope.server.beans.clusters.ClusterManager.collectorRemoved(ClusterManager.java:277)
    at com.wily.introscope.server.beans.clusters.ClusterManager.access$1(ClusterManager.java:261)
    at com.wily.introscope.server.beans.clusters.ClusterManager$CollectorRemovedCommand.run(ClusterManager.java:376)
    at com.wily.EDU.oswego.cs.dl.util.concurrent.QueuedExecutor$RunLoop.run(QueuedExecutor.java:88)
    at java.lang.Thread.run(Unknown Source)
    3/23/17 10:07:02.079 AM GMT-03:00 [INFO] [ClusterManager Async Executor] [Manager] BaselineAlertMoMService: Detected collector disconnected: buaplnx009@5001
    3/23/17 10:07:02.079 AM GMT-03:00 [ERROR] [ClusterManager Async Executor] [Manager] Uncaught Exception in Enterprise Manager: In thread ClusterManager Async Executor and the message is java.lang.NullPointerException
    3/23/17 10:07:02.126 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP064@5001
    3/23/17 10:07:02.126 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: buaplnx010@5001
    3/23/17 10:07:04.283 AM GMT-03:00 [INFO] [MOM Collection Buffer consumer thread.] [Manager] Ingore collector went away: ARPAPP064@6001
    3/23/17 10:07:10.179 AM GMT-03:00 [ERROR] [Alarm Pooled Worker] [Manager] Uncaught Exception in Enterprise Manager: In thread Alarm Pooled Worker and the message is java.lang.IllegalStateException: This is not a simple alert.
    3/23/17 10:07:25.176 AM GMT-03:00 [ERROR] [Alarm Pooled Worker] [Manager] Uncaught Exception in Enterprise Manager: In thread Alarm Pooled Worker and the message is java.lang.IllegalStateException: This is not a simple alert.
    3/23/17 10:07:27.285 AM GMT-03:00 [INFO] [ClusterManager Async Executor] [Manager] BaselineAlertMoMService: Detected collector disconnected: ARPAPP064@6001
    3/23/17 10:07:27.286 AM GMT-03:00 [ERROR] [ClusterManager Async Executor] [Manager] Uncaught Exception in Enterprise Manager: In thread ClusterManager Async Executor and the message is java.lang.NullPointerException
    3/23/17 10:07:55.229 AM GMT-03:00 [ERROR] [Alarm Pooled Worker] [Manager] Uncaught Exception in Enterprise Manager: In thread Alarm Pooled Worker and the message is java.lang.IllegalStateException: This is not a simple alert.



  • 2.  Re:  Unable to update load balancing for collector
    Best Answer

    Broadcom Employee
    Posted Mar 23, 2017 09:44 AM

    Please check MOM log and the Collector log for ARPAPP068.  Just from above it appears that either MOM disconnected the Collector or the Collector was brought down or went down on its own.  Either way its definitely a disconnection between itself and MOM, thing is to find out what caused it.



  • 3.  Re:  Unable to update load balancing for collector

    Posted Mar 23, 2017 10:02 AM

    Thanks very much Musma, it seems to be a memory issue in MOM's server, we request a restart from it and we will see what happen



  • 4.  Re:  Unable to update load balancing for collector

    Broadcom Employee
    Posted Mar 28, 2017 10:23 AM


    Hi Osvaldo,

     

    Indeed, this looks like a capacity/resource issue.
    If after increasing the memory the problem persists, open a support case, we would need to do a log analysis.
    Below the list of the 5 top common performance issues we see here in support.
    In the EM (Mom and collector) logs, search for the below keywords:

     

    1-reported Metric clamp hit
    Example:
    [INFO] [PO:client_main Mailman 2] [Manager] Collector jhbpsr020000011@25318 reported Metric clamp hit.
    [WARN] [Harvest Engine Pooled Worker] [Manager.Agent]  [The EM has too many historical metrics reporting from Agents and will stop accepting new metrics from Agents.  Current count

    Recommendation Increase the clamp in the EM/collectors apm-events-thresholds-config.xml, however, any increase of the default value will have an impact in the overall performacne
    Also remember once a Collector has hit its metric clamp, MOM won't redirect Agent to that Collector any more. And if all Collectors hit the clamp at some point, MOM won't find any Collectors for the incoming Agent.
    When we are in situation:
    - If it's a 91+Agent, MOM will keep the Agent with it in disallowed mode. When this list grows too big, MOM will have trouble keeping up with the connections and taking any new connection.
    - if it's a pre-91 agent, MOM will reject the Agent and the Agent will come back again later. This will also add connection load to the MOM.

     

    2- reached
    Example:
    [WARN] [Dispatcher 1] [Manager] Timed out adding to outgoing message queue. Limit of 3000 reached. Terminating connection: Node=Workstation_29, Address=22.240.96.38/22.240.96.38:46896, Type=socket

    Recommendation:
    Add /Increase the following properties to the MOM and Collectors properties files. The impact of these changes will be in memory.
    transport.outgoingMessageQueueSize=10000
    transport.override.isengard.high.concurrency.pool.min.size=10
    transport.override.isengard.high.concurrency.pool.max.size=10

     

    3-slowly
    Example:
    [VERBOSE] [PO:main Mailman 6] [Manager.Cluster] Outgoing message queue is moving slowly: Node=Server, Address=/22.240.96.8:25318, Type=socket

    Recommendation:
    This could be due to the huge smartstor metadata / historical data.
    Have you increased the live/historical metric clamp?

     

    If you are running CLworkstation queries, it could be due to the huge queries, set
    introscope.enterprisemanager.query.datapointlimit=5000000
    introscope.enterprisemanager.query.returneddatapointlimit=1000000

     

    4- too many
    Example:
    java.io.IOException: Too many open files

    Recommendation: Make sure the max open file handle is at least 4096 on both MOM and Collectors. You can check current open file descriptors by using “ulimit -n” against the user who starts EM processes. You might need to increase the maximum number of open files allowed for that user.

     

    5- outofmemory
    Example:
    java.lang.OutOfMemoryError: GC overhead limit exceeded

    Recommendation: Increase heapsize by 2GB
    Or
           java.lang.OutOfMemoryError: PermGen space

    Recommendation: Increase -XX:MaxPermSize

     

    And of course search for  any [ERROR] message

     

    I hope this helps,
    Regards,
    Sergio



  • 5.  Re:  Unable to update load balancing for collector

    Posted Mar 28, 2017 10:58 AM

    Thanks very much, very clear, complete, and useful info.