we have this kind of issue:
- use process probe to gather information about key OS process availability
- create SLAs based on that information
- the process probe has a known feature where the timing (probing frequency) is not exactly what is defined, so every now and then the configured 60s might be 61s, or just 60,1s (example from this morning is that one sample is form 9:48:59 and next 9:50:00, that means that we have one 1 minute breach in SLA at 9:49)
- then the SLA calculation defines that there is a missing data -> SLA breach
- though, interval based calculation needs to be on place due the thing that when host is down, there is no data coming, so we need to find those missing datas
- there seems to be no way to make probe precise on the timings, well, fix is hopefully coming
- SLA async calculation can not be used, that would then mean that host can be down and SLA is still 100%
- seems to be no tuning for SLA calculations to add that "fuzziness" so that is would understand that when data is still coming very soon after the "required time" then it is still OK.
Have other had this same issue? Any ideas how to outcome this?
Which version of the processes probe are you currently running?
latest, 4.32. and we have had some binary fix for some linux platforms. (Mark, you know this case)
If you are still seeing this same issue that you raised a case for please update the case. We havent see the time slip in the lab.
this is something that fix is promised, though no ETA for this probe release.
And my question actually is if there is any known way to make SLA calculations work when the interval between samples is not exactly what it should be. Like robot/probe restart will always be seen as a SLA breach.