How to monitor my dataengine probe, as i see sometime the dataengine probe stops and queue size is too big. this causes my monitoring system to a halt
You can explore the CA UIM Hub Queue Statistics Probe
usually this is happening because of a problem with the database.
You can monitor the data_engine log for failure and timeout messages to alarm on.
you could also setup the sqlserver probe and do a custom check point to do a row count on the cm_computer_system table or s_qos_data table. if they come back less than 1 send alarm.
Indeed, using the Queue Statistics Probe is helpful.
Monitoring the queue size of the data_engine queue on the NMS hub is the very first thing to do.
You can use callbacks on the data_engine to get some statistics (get_statistics) and also use some of the following metrics:
* CPU used by the data_engine process with the processes probe
* Memory used by the data_engine process with the processes probe
* State of the data_engine process with the processes probe
* Size of the queue directory for data_engine containing *.sds files with the dirscan probe
* Age of the oldest *.sds file in that queue directory with the dirscan probe
* Number of *.sds files in the queue directory with the dirscan probe
That gives you a good overview.
You can also use a small LIA script from: QueueCheck LUA script v2.2 ; this tool will create the needed alarms and qos entries to be alarmed and to do the follow up (with sample list view included)
I would say that LUA script run by NAS is less efficient than a dedicated probe for that, but that's another method! J