Idea Details

Spectrum should reliably detect reboots - no more false alarms

Last activity 05-30-2019 12:33 AM
Kristian Schuster's profile image
02-03-2015 07:24 AM

## What is the use case of the new feature?

----------------------------------------

In Spectrum we only want to have an alarm for a rebooted device when it really rebooted and not every time when the counter (sysUptime or better hrSysUptime) wraps.

 

 

## Describe the new feature in detail

----------------------------------

First of all Spectrum should monitor the OID hrSysUptime instead of sysUptime to recognize reboots, because hrSysUptime really informs about the uptime of the Operating System (OS), while sysUptime only informs about the uptime of the SNMP agent.

 

Secondly Spectrum should be able to recognize if the counter reset was caused by a reboot or only be a counter wrap. In latter case Spectrum should not create an alarm.

 

 

## Describe how you envision this new feature being implemented.

-------------------------------------------------------------

Spectrum should save the last polled value for the uptime and compare it with the currently polled value just like eHealth does.

 

 

## What business problem will be solved by adding this new feature?

----------------------------------------------------------------

There are no more false alarms just because the counter wraps.

Currently with the false alarms we have unnecessary support cases.

 

 

## Describe the importance and urgency

-----------------------------------

importance: low

urgency: high


Comments

01-23-2019 06:25 AM

This was missed in the release notes we are working on updating the release note to reflect the fix

 

Regards, 

Amit

01-21-2019 09:01 AM

Hi

Can you please explain how this has been fixed in 10.3?

I've had a quick look through the release notes but don't see anything referencing this.

 

John

11-02-2018 10:44 AM

This could probably be written better, but the logic is "when a cold start trap event (0x10306) appears on a device model, look for the Host_Resource (rfc2790) model. If the device is online and (the uptime is greater than 5 minutes OR the uptime is negative), generate an event that will clear the alarm (0xfff00ce1)" 

 

Our default polling interval is 5 minutes, so the theory goes "if a cold start trap is received and the device is up for less than 5 minutes, generate an alarm".

 

There are cases of false positives where:

  1. device was up the the community string in snmpd.conf was wrong
  2. when that is fixed and the snmp agent is restarted, the eventprocedure runs before spectrum marks the device online

 

I need to introduce a wait timer before running the procedure to fix this, but it is not a common enough occurence to dedicate time to fix it.

 

"ForEach( GetModelsByRelationName( {C CURRENT_MODEL}, {S \"Manages\"}, {U 1} ),
{ V rfcMH },
{ V dummyRetValue },
{ U 0 },
If( Regexp( ReadAttribute( { V rfcMH }, { H 0x10000 } ), { S \"^rfc2790App\" } ),
If( And(Equals(ReadAttribute({C CURRENT_MODEL},{H 0x110ed}),{I 1}),
Or(GreaterOrEqual(ReadAttribute({ V rfcMH },{ H 0xc4072e }),{ U 30000 }),
Less(ReadAttribute({ V rfcMH },{ H 0xc4072e }),{ U 0 }))),
CreateEventWithVariables({ C CURRENT_MODEL },{ H 0xfff00ce1 },GetEventVariableList())),
Nil() ),
Nil())"

10-24-2018 02:25 AM

Dear Community members, 

This has been delivered with our 10.3 release 

Regards, 

Amit

10-22-2018 09:54 AM

Is there a date available for this? Our NOC users are having a pain with this.

03-29-2018 09:33 AM

Do you mind sharing that procedure?

03-29-2018 09:06 AM

Any update on this?  It's tying up our NOC users.

03-14-2018 11:12 AM

and I see one side effect here. Reboot possibly occurring in that interval couldn't be detected  

03-14-2018 10:50 AM

mmm lutelewis but what about following technically pretty possible approach. Why shouldn't be possible evaluate difference between polled sysUpTime and 32bit on every polling interval....if the difference is less than Polling_Interval (300 seconds) than store sysUpTime=0 and during following polling interval you have greater sysUptime (>1) than stored (=0) 

Works?

11-20-2017 11:41 AM

Due to this problem we are disabling Device Reboot alarm. Good to have if this feature is fixed in future releases. 

11-20-2017 09:17 AM

Hello Nagesh,

 

Do you have any idea when this idea will be delivered? It generates a lot of false alarms for us too. 

 

regards

 

mark

03-09-2017 04:18 AM

Dear Spectrum Community Users, 

 

This is idea will be delivered as a part of future releases of Spectrum. 

 

Thanks,

Nagesh 

01-17-2017 07:58 PM

This feature isn't techinically possible since both uptime variables are stored, per their respective MIB files, as 32bit Integers. These values are going to roll over after 5 years and depending on the snmpd agent sometimes they roll over to negative, sometimes not. You could recompile net-snmp to use a 64 bit integer, but there will still be millions of devices and appliances that will always use 32 bit.

 

The way we have reliably handled this for years is an Event Procedure that reads the hrSysUpTime attribute when a Cold Start trap is received. If the uptime is less than 5 minutes then a server restart alarm is generated.

09-09-2016 03:14 AM

Hello NAGESH JAISWAL,

 

as this idea has 29 votes in the meantime and was opened a year and a half ago, would it be possible to provide some feedback about it? Did you already review / discuss this idea?

 

Thanks in advance,

René

06-15-2016 07:22 AM

Dear Spectrum Community Users,

 

Due to some confusion I marked this idea as Delivered. I am reopening this and is open for further voting. My apologies.

 

Thanks,

Nagesh

05-16-2016 11:43 AM

Hi Nagesh_Jaiswal,

 

What information can you provide about this?  Is there a new Event Code that represents reliably detected reboots?

05-09-2016 06:23 AM

Is this also part of any 9.x release?

04-07-2016 01:33 AM

Dear Spectrum Community Members,

 

This idea is delivered as a part of Spectrum 10.1 release.

 

Thanks,

Nagesh

03-22-2016 05:17 AM

Hi CA,

 

One year past already, are there any news about this idea?

It would be nice if ideas that are one year old could definitely be reviewed.

 

Thanks and regards

Kristian