We have a issue with an agent that is stopped during an alert downtime schedule and the ADS is longer than the agent unmount time frame (default 60 minutes). When the ADS ends, since the agent is unmounted, the agent connection status has no data, thus there is not an alert sent that the agent did not restart.
My thoughts are to extend the unmount configuration on the EMs to longer than any of our ADS schedules, more than seven hours but less than 24 hours so the agent will not become unmounted while the agent connection status alert is under an ADS.
What are the draw backs to extending the agent unmount time from from 60 minutes to 12 hours?
Let us know if Fred's response was helpful since he did talk about tradeoffs and the question can be marked as answered. Or you have follow-up questions or want to hear from additional users?
We actually extended the time to a week to account for extended weekend plus some
My original setting was 3 days, but Operations wanted more.
I had the same question and at the time (v9.5) it was said not to be too concerned about the performance drawbacks. The EM tracks the agent in case the interruption was temporary and the first property controls the delay before it releases the resources associated with tracking the agents.
The only side effect is that if the agent is down for a week and if you restart the MoM or modify the module with the alert setting, the alerts are re-evaluated and resent. I do not remember the exact case, but one of the alert I set is triggered when it falls to 0 and so I think I get the alert when the agent unmounts 5 days later. If you have lots of agents and a certain percentage in maintenance it is a bit of nuisances and forced us to plan the MoM restarts.
Marking as a question to get a wider audience...
Marking as answered since a response was given by Fred and there are no follow-up questions.
Thank you Fred.
So if the MOM or the module is modified does it resend the alert or does it not send out an alert?
Here is the case:
2nd Sunday 3:00 am till 10:00 am is our ADS in which the APM cluster restarted from 5 am till a bit past 6 am.
If between 3 am to 5 am, an agent is restarted but does not start, then the APM cluster is restarted, will we get an alert when the primary ADS ends at 10:00 am?
We typically set all of our alerts to "whenever severity changes" to try prevent receiving duplicate alerts notices.
I should have phrased better: "As far as I know, after increasing the timeout we have not observed any bad things".
- when you restart the MOM all alerts are re-evaluated.
- I would postulate that with the default you miss the Agent Down alert, but if you extend the timeouts you will see it.
- Our Operation team wanted to ensure that the Down Alert** was never ignored (cleared automatically on the upstream system) so it is set to "When Severity Increase".