Hello guys,
I stumbled over a problem with alerts and their evaluation.
Short example to explain the problem: I use a metric grouping on a metric and it's value is either 0 or 1. 0 means there is no problem, 1 means there is a problem. An alert is used on this metric grouping with the following settings: Resolution : 15 sec; Comparison Comparator : Equal To; Trigger Alert Notification : When Severity Increases; Combination : All; Notify by Individual metric : disabled; Danger => Threshold : 1; Periods Over Threshold / Observed Periods: both 1. Within the danger action list is a email notification action which informs that there is a problem. Caution: Default values => Threshold : 0; Periods Over Threshold / Observed Periods: 1
So for my understanding the email notification should be send when the value of the metric has changed from 0 to 1. Unfortunately, sometimes the email is send again even if the problem still exists (it's been reported by the metric). I had a deeper lock at the metric and found out that in some cases there isn't a value reported for the metric in one interval. You can see this behaviour in the image below at 01:00:30. There is no data point.
In had the idea of increasing the periods over threshold and observed periods both to 2 because I thought that this will wipe out the wrong evaluation if data is missing for one period. Unfortunately it doesn't.
I looked deeper in the documentation and found out that Introscope provides metrics which show the alert status. The following picture shows this metric for the above alert. So the alert was in the danger status (value 3) as expected. During the one period of missing data it has the value of 0 (not reporting). So it shows the value as described in the documentation.
Taking this facts into consideration I think that Introscope includes "not reporting" periods into alert evaluation. This results in sending a email at 01:01:00 because the severity increases after it has decreased before due to the missing data in one period.
What do you guys think? Does Introscope include the not reporting periods into alert evaluation? Do I use the alerts incorrect?
Basically what I want is that Introscope doesn't fire again an action after periods of missing data occurred and basically the alert is still in danger status.
Greetings,
Matthias