DX Application Performance Management

Expand all | Collapse all

Caught exception trying to get the difference between MOM and this Collector's harvest time:

Jump to Best Answer
  • 1.  Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-08-2016 11:12 AM

    My MoM log is full of these ERRORs...

     

    6/07/16 11:29:44.119 PM EDT [ERROR] [Collector xx.xx.xx.xx@5001] [Manager.Cluster] Caught exception trying to get the difference between MOM and this Collector's harvest time: Collector xx.xx.xx.xx@5001: com.wily.isengard.message.MessageUndeliverableException: Outgoing mailbox is closed. Message cannot be sent

     

    6/07/16 12:43:56.155 AM EDT [ERROR] [pool-11-thread-1] [Manager] Uncaught Exception in Enterprise Manager:  In thread pool-11-thread-1 and the message is com.wily.util.exception.UnexpectedExceptionError: Tranport for the registry service at address: {1} is down

     

    the following properties are already in place on MoM:

    transport.outgoingMessageQueueSize=10000

    transport.override.isengard.high.concurrency.pool.min.size=15

    transport.override.isengard.high.concurrency.pool.max.size=15

     

    So the above "Caught Exception" error is ONLY happening for 2 specific collectors (over and over again).

     

    APM v10.1 with standard TIM



  • 2.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-08-2016 02:19 PM

    Hi Manish:

    Looking at prior cases I see this is due to some agents sending to many metrics to EM or an overloaded EM. I am sure that you are already checking for clamps and smartstor/harvest durations, low heap etc.

     

    Thanks

    Hal German



  • 3.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-08-2016 02:44 PM

    Yeah, have checked for clamping and all that other good stuff

    We don't have that much of an overload, only monitoring 5 application of which only 2 of them are reporting to these two collectors.

     

    Let me grab some screenshots to show you



  • 4.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-08-2016 04:03 PM

    Hi Manish,

     

    6/07/16 11:29:44.119 PM EDT [ERROR] [Collector xx.xx.xx.xx@5001] [Manager.Cluster] Caught exception trying to get the difference between MOM and this Collector's harvest time: Collector xx.xx.xx.xx@5001: com.wily.isengard.message.MessageUndeliverableException: Outgoing mailbox is closed. Message cannot be sent

     

    This message means that the number of time slices index behind the Collector is relative to the MOM's harvest.  A positive number means that the MOM is ahead of the Collector.  0 means that they are at the same time period.  A negative number means that the Collector is ahead of the MOM>

     

    When this message is reached, the EM requests a disconnect. 

     

    There are a few things to check here.

    1. Are all the transport settings set on the Collectors?  I see you have it set on the MOM.

    2. Are the EM's on a virtual server?  Sometimes not enough resources by the host server are being distributed out to all VM's equally.

     

    Thanks,
    Matt



  • 5.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-08-2016 04:06 PM

    Which is also the reason why we insist on reservations.



  • 6.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-08-2016 04:12 PM

    musma03 Hiko_Davis

     

    All collectors are set to:

    transport.outgoingMessageQueueSize=6000

    I was told by CA Support to not input max/min pool on Collectors (only on MoM). The Collectors are on VM and MoM is a physical servers. How do I determine whether the number is negative/zero/positive?

     



  • 7.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-08-2016 04:16 PM

    I think DEBUG may give you that number.

     

    As for whoever told you to not put the min/max on the Collectors, no, we require that on MOM and all Collectors when facing any time of communication/performance issue.  Who told you that?



  • 8.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-08-2016 04:19 PM

    If it was me, then my apologies! I know better than that!



  • 9.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-08-2016 06:05 PM

    It wasn't Hiko_Davis who told me, it was CA Support Case. I won't disclose the case # since that would be a dead give away

    Here is what the comments say in EM's property file:

     

    # Maximum simultaneous output connections.  A higher number does not directly increase

    # response time, but will keep isolated slow connections from crippling the whole manager.

    # The default, 5, is suitable for most standalone EMs and Collectors.  MOMs should have

    # one thread per Collector, plus about 1 for every 10 workstations.  Do not oversize, as

    # additional thread overhead may cause OutOfMemory exceptions - especially on 32 bit JVMs.

    # transport.override.isengard.high.concurrency.pool.max.size=10  ## Uncomment for MOMs

     

    So, I have 9 Collectors, 1 WV Server, 1 MoM and no more than 5-7 wkst connections overall. My current value on the MoM is 15. According to the math described above, isn't 15 more than enough?

    What should the value be on each of the 9 collectors? 10 or 15? I would think 10 should suffice, right? B/c I've disabled direct wkst login to any of the collectors (only to MoM).

     

    I have two environment (Prod & Dev). I assume it will require a restart of the EMs (which sucks ) after I change the following property?.

    transport.override.isengard.high.concurrency.pool.min.size=??

    transport.override.isengard.high.concurrency.pool.max.size=??

     

    Should I keep the value at 10 or change it to match MoM's (15)?



  • 10.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-09-2016 07:51 AM

    Hi Manish:

      Once musma03 answers your last question

     

    Should I keep the value at 10 or change it to match MoM's (15)?

     

    Are we good to close since the original question has been answered and some?



  • 11.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:
    Best Answer

    Posted 06-09-2016 08:58 AM

    Hi Manish,

     

    Yes anytime you change any one of those transport properties, a restart is required.

     

    Those comments in the properties file were written a very long time ago, back when clusters were just starting out and environments were less heavy on Collectors and more heavy on MOM due to the Management Modules and what not.

     

    In your situation, I would change both properties to 15. 

     

    After changing and restarting, if you still experience the above messages, then examine your VM's and check your VM hosting server to ensure proper resources are being given to them and that nothing is being choked.

     

    Also if you need further assistance from us on this, I would recommend opening an issue with us and uploading the entire logs directory and config directory from MOM and all Collectors so that we can analyze them and see if anything else is affecting the performance.

     

    Thanks,

    Matt



  • 12.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-09-2016 09:31 AM

    musma03,

     

    I'll go ahead and add that property and set the value to 15 (starting with Dev Env first). Also, for the other property...

         transport.outgoingMessageQueueSize=10000

    Should I make that value 10000 (like it is on MoM) on the Collectors as well or leave it at 6000?

     

    Thanks
    Manish



  • 13.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-09-2016 09:34 AM

    Hi Manish,

     

    I would recommend 10,000 only if your servers are high powered and have the resources to handle it.  If not, then I would go as high as 8,000.

     

    Thanks,
    Matt



  • 14.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-09-2016 09:41 AM

    musma03

    Then I will leave it at 6000 for now on the collectors and just monitor them to see how they behave.

     

    Thanks a lot everyone for all your help.

     

    Cheers,

    Manish



  • 15.  Re: Caught exception trying to get the difference between MOM and this Collector's harvest time:

    Posted 06-13-2016 01:05 PM

    Related official doc supporting this discussion...the text in "bold red" was changed by panja07. Thanks for that.

     

    Capacity Planning and Server Deployment Options - CA Application Performance Management - 10.1 - CA Technologies Documen…

     

    Follow these steps:

    1. On each Collector and the MOM, configure the IntroscopeEnterpriseManager.properties file.
      1. Go to the <EM_Home>/config directory and open the IntroscopeEnterpriseManager.properties file.
      2. Add the outgoing message queue property and set the value to 6000.
        transport.outgoingMessageQueueSize=6000
      3. On the MOM and all Collectors, uncomment the transport.override.isengard.high.concurrency.pool.max.size property and set the value to 10.
        transport.override.isengard.high.concurrency.pool.max.size=10
      4. Save and close the file.
      5. Restart all the Collectors and the MOM.
    2. If necessary, increase the JVM heap size as appropriate for your environment.