Hi, Joseph
We have had an incidents in the past when a server has had memory leak, resulting in the Altiris Monitor agent being unable to alert about a failing Windows service.
To overcome this issue I have created a sql query which runs on a Microsoft Sql Server. This SQL database server is the used to store data sent by our application servers. The applications servers send a server alive heartbeat using Microsoft message queue. The sql server has heartbeat table in our custom database. The SQL query checks the current time on the server and compares them against the last server heatbeat time. If the differential is more than 5 minutes. A critical alert is triggered.
This method allows the monitoring of a collection of servers, which are dependent on each other.
My suggestion would be.
1. Create Server Heartbeat Table on the Notifcation Server.
2. By default enable Server Heartbeat Alive and have a polling interval 180 seconds.
3. When the server does not respond in 180 second the alert is triggered. The Critical alert should warn about the server possibly not being online and up. The rule should reset to normal when a server heartbeat is received.