We have a recent requirement to detect boot of a server in a less duration as soon as the server goes down.
We currently have two types of monitoring in place for this:
1) using net_connect : To check the availability of the server at an interval of 10 min.
2) Boot alarms : using cdm ( but it normally occurs when a server got rebooted and came up)
But the requirement here is to detect the boot in case the server got booted and got hung so we will not receive the boot alarm until it will come up and net_connect will take 10 min to detect the availability, so this is a big time for critical application server, we want to detect this as soon as possible (like with in 2 - 3 min).
Is there any way apart from these two options. Or is there any modification which we can do with these two options.
You could lower the hub_update_interval to 80 seconds on critical controllers but I believe such low setting isn't recommended by CA.
If its just one machine that is critical then configure the machine's profile in net_connect with a 1 min polling cycle.
You could also do snmp polling for system up time and when it doesn't respond throw an alert b/c that should be un-responsive when the box goes down.