In some cases we've seen CDM probes stop reporting QOS data due to CPU pegged at 100% on host systems. Once CPU reaches 100% the probes effectively become useless. When the CPU spikes quickly, many times we don't get an alarm either. Monitoring for QOS data seems to be a way we can identify if there was a problem. We'd like to do this is a proactive manner. Robot Inactive alerts or net_connect ping checks don't seem to be sufficient.
Is there a way we can monitor when a host stops reporting QOS data?
The only way I've determined is querying the QOS tables and comparing the hosts reporting data for last ~1 hour versus a known list of hosts.