Idea Details

Event rules: set threshold to constant value over baseline

Last activity 06-13-2019 09:40 AM
Mark.melchers's profile image
03-14-2018 11:05 AM

  • Some metrics, such as temperature can be very different between devices and between components within a device.
  • Baseline based alerting would be a good way of alarming when temperatures are higher than normal
  • Baseline based event rules can only use multiples of “standard deviation”. E.g. when a value is higher than 3 standard deviations above the baseline
  • Standard deviations increase when there is more variation in the metric values
  • However, when there is very little variation, the standard deviation is e.g. 0.1 or even 0.0
  • When you configure that an alarm should be generated when a value is more than 3 standard deviations above the baseline and the standard deviation is 0.1, then the alarm would be generated when the temperature is just 0.3 degrees above the baseline. This is far too sensitive. In this scenario, to get any realistic alarms you need to set the alarm to 10 or 20 times the standard deviation.
    • In some cases the standard deviation is 0.0. If you multiply 0.0 by 20 it is still 0.0. So any value higher than the baseline is higher than 20 times the standard deviation. Therefore no matter what, the event will be triggered. This pretty much makes this kind of event rule impractical
  • However, in other cases there is a lot more variation and the standard deviation is e.g. 5 degrees, the alarm only gets generated when the temperature is 50 to 100 degrees above normal. This makes the alarm useless in case of devices with high variation
  • --> standard deviation based event rules either get triggered too early (when there is (almost) no variation in the data) or not at all (when there is more variation for the metric value). Therefore in many cases standard deviation based event rules cannot be practically used

Example of temperature metric and its variation

 

Proposed solution

  • Create the ability to set threshold to constant value (e.g 10 degrees) or percent above baseline. 
  • This is a feature that is already present in SNMPCollector in UIM and works quite well

Feature as implemented in SNMPCollector (UIM)


Comments

09-26-2018 02:58 PM

Hi, the documentation includes some useful examples: Configure Threshold Profiles - CA Performance Management - 3.6 - CA Technologies Documentation.

Percent of Baseline Event Conditions

Event rules that use Percent of Baseline compare the poll results to the calculated baseline plus or minus a percentage of the calculated baseline for the device or component. An event is triggered when the qualifying poll data meets the criteria that are specified for a Percent of Baseline event rule condition. Percent of Baseline event conditions are useful when there is a lot of or very little variation in the metric values. Consider using Percent of Baseline conditions when the standard deviation is above 3 or extremely low like 0.1 or 0.0.

Examples

The calculated baseline is 60 degrees and the specified Percent of Baseline is 50%. The rule states that an event triggers when the temperature rises above 50% of the baseline. This condition triggers when the temperature is higher than 90 degrees.

Math: 60 + (+50%*60) = 60 + 30 degrees = 90 degrees

The calculated baseline is 60 degrees and the specified Percent of Baseline is -50%. The rule states that an event triggers when the temperature falls below -50% of the baseline. This condition triggers when the temperature is lower than 30 degrees.

Math: 60 + (-50%*60) = 60 - 30 degrees = 30 degrees

09-25-2018 05:16 PM

Could you all show examples of how to do this in capm?

04-16-2018 02:18 AM

Hello Chazz,

 

Thanks again for your efforts. It is much appreciated. There are situations where just a fixed value above baseline is sufficient. Examples of these are temperature. There it would be simpler to just use a fixed value since the temperature metric does not have the range that e.g. error and discard counters have. If you would only implement a combination of percentage and fixed value above baseline we could make that work too. That would be no problem. Just a % percentage above baseline without a combination of a minimal fixed value above baseline would be unusable for some metrics such as errors and discards because these values normally are at 0 frames per second so any increase is automatically an infinite amount of percentage increase. So:

 

-Just fixed value: works for some metrics (temperature). Not ideal, but workable for some others (errors, discards).   

-just percentage value: works for some metrics (temperature). Unusable for some other metrics (errors, discards)

-A combination of a minimal fixed value AND percentage above baseline:  Not ideal but workable for some metrics (temperature), ideal for other metrics (errors, discards). We can make this work in all cases as far as I can think of. 

 

I hope this helps. 

 

regards,

 

Mark

04-09-2018 02:55 PM

Hi Mark,

 

We ended up moving forward with the % of baseline solution based on other customer feedback. We think it address the widest set of use cases. We're still open to implementing fixed value from baseline, but it will have to be a future project.

 

For the fixed values above baseline you gave, could you use simple fixed values? This is what we usually see for most thresholds related to temperature, errors, and discards. Are there situations where the combination of fixed value and % from baseline do not address your needs?

 

Thank you,

Chazz

03-23-2018 04:33 AM

Hi Chazz,

 

Thanks for your interest in this idea. A percentage of baseline would be sufficient if that would be easier to implement, but we would rather have a constant value or the choice between constant value and percentage. This makes the trigger level more predictable. 

 

We mainly would like to apply these settings for temperature and memory utilization. These are all in roughly the same range for all devices. We generally trigger temperature (Celsius) alarms at 20 degrees C above baseline. This usually does the job pretty well in UIM.

 

We could use it for things like interface discards and errors as well and there the use of percentage above baseline would be essential as the baseline values could vary widely between devices. However, that could generate very trigger happy alarms if the baseline is very close to 0. A percentage in combination with a minimal constant value would be even better in that case. 

 

regards

 

Mark

03-21-2018 12:51 PM

Rather than a scalar, would a percent of baseline address your needs? Looking at modeling, it should handle both high variance and low variance devices. Do you have devices in vastly different temperature ranges? What's the threshold you'd like to trigger off of?

 

Thank you,