DX Application Performance Management

Back to discussions

Uncaugh Exception on MoM in Threadpoo

1. Uncaugh Exception on MoM in Threadpoo

0 Recommend
Manish Parikh
Posted Jan 19, 2017 10:27 AM

Reply Reply Privately
Need help with this ERROR, it keeps generated repeatedly.

1/19/17 10:16:55.182 AM EST [ERROR] [pool-11-thread-286204] [Manager] Uncaught Exception in Enterprise Manager: In thread pool-11-thread-286204 and the message is com.wily.util.exception.UnexpectedExceptionError: Tranport for the registry service at address: {1} is down
1/19/17 10:17:09.726 AM EST [WARN] [PO:main Mailman 2] [Manager] Unable to send signal for clearing denied agents to collector "10.60.168.40@5001"
com.wily.isengard.messageprimitives.ConnectionException: Tranport for the registry service at address: {1} is down
at com.wily.isengard.postofficehub.ClonedRegistry.getEntry(ClonedRegistry.java:141)
at com.wily.isengard.messageprimitives.service.MessageServiceFactory.internalGetServiceInterface(MessageServiceFactory.java:317)
at com.wily.isengard.messageprimitives.service.MessageServiceFactory.internalGetServiceInterface(MessageServiceFactory.java:270)
at com.wily.isengard.messageprimitives.service.MessageServiceFactory.getService(MessageServiceFactory.java:132)
at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadRebalancer.getLoadBalancerAdmin(ClusteredLoadRebalancer.java:65)
at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadBalancer.clearDeniedAgentsOnAllCollectors(ClusteredLoadBalancer.java:704)
at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadRebalancer.rebalance(ClusteredLoadRebalancer.java:275)
at com.wily.introscope.server.beans.loadbalancer.ClusteredLoadRebalancer$RebalanceTask.run(ClusteredLoadRebalancer.java:1095)
at com.wily.EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(PooledExecutor.java:728)
at java.lang.Thread.run(Thread.java:745)
1

Thanks in advance for any/all assistance.
Manish
2. Re: Uncaugh Exception on MoM in Threadpoo

0 Recommend
Manish Parikh
Posted Jan 19, 2017 10:41 AM

Reply Reply Privately
I can provide logs 1:1 if anyone requires them. Please reach out to me via private message and I can provide them to you.
3. Re: Uncaugh Exception on MoM in Threadpoo

2 Recommend
Broadcom Employee

Matthew Muskaloon
Posted Jan 19, 2017 11:04 AM

Reply Reply Privately
Hi Manish,

Looks like the MOM lost connection to the Collector. Check the MOM logs to see if this particular collector was disconnected or running slow around or before the time listed above.
4. Re: Uncaugh Exception on MoM in Threadpoo

0 Recommend
Manish Parikh
Posted Jan 19, 2017 11:45 AM

Reply Reply Privately
Hi Matt,
This what I see when I did a search on that IP

Line 478842: 1/19/17 10:07:09.416 AM EST [WARN] [PO:main Mailman 7] [Manager] Unable to send signal for clearing denied agents to collector "10.60.168.40@5001"
Line 478897: 1/19/17 10:17:09.726 AM EST [WARN] [PO:main Mailman 2] [Manager] Unable to send signal for clearing denied agents to collector "10.60.168.40@5001"
Line 478950: 1/19/17 10:27:10.900 AM EST [WARN] [PO:main Mailman 6] [Manager] Unable to send signal for clearing denied agents to collector "10.60.168.40@5001"
Line 479012: 1/19/17 10:28:37.274 AM EST [WARN] [ClusterManager Async Executor] [Manager] Unable to update load balancing for collector "10.60.168.40@5001"
Line 479116: 1/19/17 10:28:37.931 AM EST [WARN] [ClusterManager Async Executor] [Manager] Unable to update load balancing for collector "10.60.168.40@5001"
Line 479158: 1/19/17 10:28:46.819 AM EST [WARN] [Collector 10.60.168.40@5001] [Manager.Cluster] Lost contact with the Introscope Enterprise Manager at 10.60.168.40@5001
Line 479158: 1/19/17 10:28:46.819 AM EST [WARN] [Collector 10.60.168.40@5001] [Manager.Cluster] Lost contact with the Introscope Enterprise Manager at 10.60.168.40@5001
5. Re: Uncaugh Exception on MoM in Threadpoo

0 Recommend
Manish Parikh
Posted Jan 19, 2017 11:47 AM

Reply Reply Privately
I've went ahead and restart the MoM when I posted this question. I will continue to monitor to see if these ERRORs come back again.
6. Re: Uncaugh Exception on MoM in Threadpoo

1 Recommend
Broadcom Employee

Hallett German
Posted Jan 19, 2017 11:55 AM

Reply Reply Privately
Thanks Manish. If this issue does not reoccur by end of today, may we mark this as answered? (Knowing you may post additional questions/comments as needed.)
7. Re: Uncaugh Exception on MoM in Threadpoo

0 Recommend
Manish Parikh
Posted Jan 19, 2017 12:04 PM

Reply Reply Privately
Hal,
Sounds good. I am hoping for the best
Thanks
8. Re: Uncaugh Exception on MoM in Threadpoo
Best Answer

1 Recommend
Broadcom Employee

Matthew Muskaloon
Posted Jan 19, 2017 12:08 PM

Reply Reply Privately
In doing further research, this warning message is logged by the MOM when it couldn't deliver the notification to the collector for clearing denied agents upon load rebalancing. This is usually an indication of the collector having a connectivity issue with MOM or its message queue already full. MOM will try sending the notification periodically upon each load balancing cycle. What is the value of your introscope.enterprisemanager.loadbalancing.interval property?

This could happen every so often if the collector was restarted and not yet reconnected to MOM at the time of load balancing. However, if these warning messages were occurring repeatedly, then it would likely be a side-effect from some other connectivity issues or or performance issues.
9. Re: Uncaugh Exception on MoM in Threadpoo

0 Recommend
Manish Parikh
Posted Jan 20, 2017 09:18 AM

Reply Reply Privately
introscope.enterprisemanager.loadbalancing.interval=600
10. Re: Uncaugh Exception on MoM in Threadpoo

1 Recommend
Broadcom Employee

Matthew Muskaloon
Posted Jan 20, 2017 09:20 AM

Reply Reply Privately
Ok, not ridiculously low as we had in one issue a while back. 10 minutes should be sufficient.
11. Re: Uncaugh Exception on MoM in Threadpoo

1 Recommend
Billy Cole
Posted Jan 20, 2017 09:57 AM

Reply Reply Privately
In one of the tuning guides that we had for the 9.0.5.6 and also back in 9.1.1.1 it was suggested to set the outgoing message queue size to 6000

transport.outgoingMessageQueueSize=6000

We have added this to everyone of our enterprise managers in our 9.6 and now our 10.0 environments.

Would this help the communication queue buffer issue between the MOM and the collectors?
12. Re: Uncaugh Exception on MoM in Threadpoo

3 Recommend
Broadcom Employee

Matthew Muskaloon
Posted Jan 20, 2017 10:11 AM

Reply Reply Privately
Yes that would help the communication buffer queue. It should never need to go above 8000. At that point, 2 things happen. The higher you increase the number, the more resources it takes up. The second thing is, there is something more underlying contributing to the cause. So that is what we would troubleshoot.
13. Re: Uncaugh Exception on MoM in Threadpoo

2 Recommend
Manish Parikh
Posted Jan 20, 2017 10:20 AM

Reply Reply Privately
transport.outgoingMessageQueueSize=10000
transport.override.isengard.high.concurrency.pool.min.size=20
transport.override.isengard.high.concurrency.pool.max.size=20

I have that set to all of my collectors and MoM. I had a case open about it (separate issue) and these values were suggested.
14. Re: Uncaugh Exception on MoM in Threadpoo

2 Recommend
Billy Cole
Posted Jan 20, 2017 10:42 AM

Reply Reply Privately
My little APM voices are having a field day. Something looks wrong but not really sure what it is. I would agree with musma03, anything above 8000 and there is more than likely something more underlying contributing to the cause.

mparikh72, could you provide some APM environment details?

What version of APM are you running?

Do you have the "MOM_Infra_Monitoring_MM.jar" deployed and adjusted for your environment?
Do you have the "Collector.jar" management modules deployed for each of your collectors?

These dashboards have been very useful to us to understanding what is going on within the APM cluster.

What OS is your enterprise managers running on?
Are the hosts physical or virtual?
How is the CPU/Memory/Disk/Network performing on the enterprise managers?

How many collectors are you running?
Are the collectors pretty well balanced, metric wise?

Do you have any other metrics that are not out of the box, such as an environment performance agent with plugins to gather other OS metrics?

How many agents, and metrics (live/historic) are your collectors dealing with?
How many workstations, end users are typically logged into APM?

Are your collectors running lots of traces?
How is your harvest and smartstor durations look like?
How is your MetaData write duration?

Way back in 9.0.5.6 and also in 9.1.1.1 we had someone from CA Services come in and do a review of the health of the APM cluster, which turned into a professional services engagement to help us tune our hardware (virtual servers) to contend with what and how APM operates.

Sorry for so many questions,

Billy
15. Re: Uncaugh Exception on MoM in Threadpoo

2 Recommend
Broadcom Employee

Matthew Muskaloon
Posted Jan 20, 2017 10:49 AM

Reply Reply Privately
And that is fine. As long as your server is beefy and can handle 10000 and 20 and 20, then you should be ok. But if you wanted to raise it to 20000, I would advise against it for the reasons I specified above.

Some other environments cannot handle 10000 20 and 20 as they're not powerful enough to do so.
16. Re: Uncaugh Exception on MoM in Threadpoo

0 Recommend
Broadcom Employee

Hallett German
Posted Jan 20, 2017 08:00 AM

Reply Reply Privately
Dear Manish:
I was hoping to hear back good news from you. As previously agreed, since there was no response by end of yesterday marking as answered. Matt's last note gives some good leads on what the issue is likely to be. You are more than welcome to post any status updates and further questions as needed. And I will do what I can to get you a response.

Happy Friday
Hal German

DX Application Performance Management

Uncaugh Exception on MoM in Threadpoo

Manish ParikhJan 19, 2017 10:27 AM

Manish ParikhJan 19, 2017 10:41 AM

Matthew MuskaloonJan 19, 2017 11:04 AM

Manish ParikhJan 19, 2017 11:45 AM

Manish ParikhJan 19, 2017 11:47 AM

Hallett GermanJan 19, 2017 11:55 AM

Manish ParikhJan 19, 2017 12:04 PM

Matthew MuskaloonJan 19, 2017 12:08 PMBest Answer

Manish ParikhJan 20, 2017 09:18 AM

Matthew MuskaloonJan 20, 2017 09:20 AM

Billy ColeJan 20, 2017 09:57 AM

Matthew MuskaloonJan 20, 2017 10:11 AM

Manish ParikhJan 20, 2017 10:20 AM

Billy ColeJan 20, 2017 10:42 AM

Matthew MuskaloonJan 20, 2017 10:49 AM

Hallett GermanJan 20, 2017 08:00 AM

1. Uncaugh Exception on MoM in Threadpoo

2. Re: Uncaugh Exception on MoM in Threadpoo

3. Re: Uncaugh Exception on MoM in Threadpoo

4. Re: Uncaugh Exception on MoM in Threadpoo

5. Re: Uncaugh Exception on MoM in Threadpoo

6. Re: Uncaugh Exception on MoM in Threadpoo

7. Re: Uncaugh Exception on MoM in Threadpoo

8. Re: Uncaugh Exception on MoM in Threadpoo Best Answer

9. Re: Uncaugh Exception on MoM in Threadpoo

10. Re: Uncaugh Exception on MoM in Threadpoo

11. Re: Uncaugh Exception on MoM in Threadpoo

12. Re: Uncaugh Exception on MoM in Threadpoo

13. Re: Uncaugh Exception on MoM in Threadpoo

14. Re: Uncaugh Exception on MoM in Threadpoo

15. Re: Uncaugh Exception on MoM in Threadpoo

16. Re: Uncaugh Exception on MoM in Threadpoo

8. Re: Uncaugh Exception on MoM in Threadpoo
Best Answer