what is your current approach to detect and alarm on QOS not receiving data for last hour/days?
I have created a very simple bat file to check a qos via REST API and return "0" if data is present for last our, "1" otherwise. This can be wrapped on a script to monitor critical QOS that must be "alive" and with data all the time.
Sample of usage:
C:\Users\falne02\Desktop>checkqos.bat falne02-ump <user> <password> uimdemo uimdemo QOS_CPU_USAGE
Calling URL: http://falne02-ump/rest/qos/data/name/QOS_CPU_USAGE/uimdemo/uimdemo/lasthour/now/0
QOS data found for last hour
The bat file takes 6 parameters in this order:
NOTE: the script requires curl in the PATH of the server from where we execute it from
Thanks for any comments/feedback
This must be executed in UIM or UMP sever ?
it does not matter as long as you can reach the ump server. Note that you need curl utility in the PATH of the box from where you launch the tool.
I have a LUA script that runs the following SQL:
SELECT max(Q.source) as source, isnull(min(DATEDIFF(SECOND, d.sampletime, getdate())-21600 + rn.tz_offset),99999999) as age, MIN(r.user_tag_1) as user_tag_1 FROM S_QOS_DATA Q left JOIN S_QOS_SNAPSHOT D ON Q.table_id=D.table_id left join CM_NIMBUS_ROBOT R ON q.source=r.robot left join RN_QOS_DATA_0012 RN on q.table_id=rn.table_id and d.sampletime=rn.sampletime where q.qos='QOS_COMPUTER_UPTIME' and ( r.is_hub=1 ) group by q.source order by age desc
I the create alarms based on the age.
And because this is actually being used to detect hubs (is_hub=1) that aren't reporting as a way to determine how long their tunnels have been down, I'm saving off the age in another table so that when the data for a hub ages out of QOS_COMPUTER_UPTIME I still know when it was last heard from.