I'm wondering if there are some way to query alerts and get them grouping by Management Module, the amount of each alert triggered in a period of time and the sum total by criteria i.e.
Last 24 hours
alert_name_2 : 19
thanks in advance.-
Including Hiko_Davis Guenter_Grossberger musma03 SergioMorales to respond if they can...
Thank you German!
Alerts are written to the EM log, so we could theoretically read that and provide a per interval counter.
OOTB we don't have anything that will do that. CLW can list the alerts but won't give you the values.
You could write a shell command script and pass the values over to that script and do it that way. But keep in mind that each script will act on its own, so you'd have to tie them in somehow.
So the responses given that custom program(s) would be needed to be created to do this. If you want this built-in as a feature, consider submitting an idea. Step by step details on creating the custom programs is beyond the scope of this forum.
Please let us know if the answers were helpful and you have what you need to proceed with this effort either through an idea or custom code.
One additional thought... if you have Spectrum or SOI (or some other NMS with alerting/reporting capabilities) that you could integrate APM with, you may be able to use the reporting capabilities in those other tools to produce what you are looking for.
Just a thought... that might be an easier solution than writing a bunch of custom code.
Thanks everybody, I had considered to make an script from logs, but I wasn't sure if APM has a tool that could be help me to do this in better way. I should investigate the better way to do this.
Thanks Osvaldo and Matt, Hiko, and Chris for responding. Marking as answered. We all would like to hear about your solution when implemented!
You can track the alert status as a metric:
*SuperDomain*|Custom Metric Host (Virtual)|Custom Metric Process (Virtual)|Custom Metric Agent (Virtual)|Alerts|<management module name>|<alert name>
0 = no data (alert is configured by metric grouping is not matching anything)
1 = normal (no threshold breeched)
2 = caution level
3 = danger level
Inactive alerts will not appear
Maybe you can track from there any changes in the graph any scenario which indicates a alert condition was raised, like from 1 to 2, 1 to 3 or 2 to 3.