Im trying to get all ConnectionStatus from all agents and alert if one of them its down.
I have the following:
But im not receiving values.
This its possible or do i need to set one alert for each ConnectionStatus?
Hi not sure about the definition of the metric grouping you are using - this works for me
and the alert is defined like this
and I received the alert when the agent disconnected.
But bear in mind that the connections status will only show agents that have disconnected - i.e. if the agent doesn't connect in the first place there will be no metric value and so no alert - and also when the agent is unmounted it will disappear from the grouping
welcome - just noticed and stray "t" in the metric expression should be
it was fine for me as all my agents were Tomcat
There has been several discussions on using agent connection status as a up/down check.
One of the issues with this is the behavior of an agent during a ADS and the agent unmount time.
Agent Connection Status - Unmounted during ADS
To help get around this issue, I've started to use a Calculator for the number of EM - Collectors
We have two alerts set up, one for the value of connection status (1, 2, 3) and the second is a calculator that sums all of the collector's connection statuses together and alerts if the total is not equal to 4 (in this environment we have four collectors)
Additionally the resolution is changed to 6 minutes and the trigger is each period, so we would get another email message every 6 minutes.
There is an ADS that covers any maintenance windows that has both alerts included within it. But as soon as the ADS expires, even if the metric is unmounted the calculator will fire off.
Thanks for the info Billy. How are you handling the Alerts when MOM re-loadbalances the agents?
There are two settings we are using to help with the load balancing of agents. First is set the combination to "all" so that on the original collector connection status would be 3 and the new collector would be 1.
Second is to extend the periods over threshold from 1,1 to 8,8 (two minutes) typically the load balancing take less than a minute, but could take a bit longer depending on how loaded the EM-collector is.
In the above case, this would not prevent or catch if there is an ADS and the agent is stopped and not restarted. This is mainly due to two factors, first is the trigger alert notification set to 'Whenever Severity Changes".
With the trigger set to whenever severity changes, the value would need to change and not included in an ADS.
The second, is the agent unmount period. If an agent is within an ADS, goes down and the ADS is longer than the unmount period, after the ADS has expired, there is a good chance that the agent will be unmounted thus the agent connection data would be "no data". There isn't an alert setting to catch a "no data" condition.
Be careful of using a Management Module to do this aggregation. Load balancing will cause false positives.