DX Infrastructure Management

Expand all | Collapse all

Monitor all new QoS_Computer_Uptime entries and alert on if < 300s - Is this possible?

  • 1.  Monitor all new QoS_Computer_Uptime entries and alert on if < 300s - Is this possible?

    Posted 11-21-2014 07:50 PM

    So I have a quesiton. We were having issues with both the cdm probe and the ntevl probe detecting if a machine was rebooted. UIM support stated that yes there were 'defects' in each probe and we are still not 100% sure that we can rely on either of these probes to detect reboot scenarios. 

     

    So yesterday a request introduced me to the QoS_Computer_Uptime metric that is collected by the CDM probe. So that got me thinking that we can use this value to determine if a machine has been rebooted. Since we have cdm deployed everywhere and are collecting this value, I wanted to know if or how eactly would I setup a watcher to look at ALL new QOS_COMPUTER_UPTIME entries being added to the database, for each robot and have the portion of UIM (assuming SLM manager) generate an alert if the value is < 300s. That means the machine was rebooted w/in the last 5 minutes and this would accurately give us the indicaiton that the machine rebooted. 

     

    I don't know the SLM tool very well. I primary use it to check QoS metrics and delete QoS data mostly.

     

    I was playing around in it all morning and can't figure out if this is possible. Is there a way to define a global QoS watcher on this one specific metric and setup a rule that says, if QoS_Computer_Uptime < 300 then Trigger an alert. 

     

    I was trying to set this up using the old SLM fat client, but when I was looking at the help some of the examples don't have those features any longer or those menu entries. 

     

    My questions:

    1. Is this possible. 

    2. If yes can this be setup at a Domain Level and not on each and every specific Hub Level/Robot Level. 

    3. If Yes how and where do I do this exactly? Anyone have any tutorials on setting something like this up?

    4. What exactly is this task refered to exactly? Would this be considered a SLA or a QoS Monitor or ???

     

    Thank you,

    Dan



  • 2.  Re: Monitor all new QoS_Computer_Uptime entries and alert on if < 300s - Is this possible?

    Posted 11-23-2014 12:17 AM
    I don't think this is easily achieved with QOS monitor or SLA. I guess I'd go for pure SQL query and sql_response.
     
    Your query could be similar to this:

    select
    qd.origin, qd.robot, qs.samplevalue
    from
    S_QOS_DATA as qd,
    S_QOS_SNAPSHOT as qs
    where
    qd.qos = 'QOS_DISK_USAGE'
    and
    qs.table_id = qd.table_id

    If you want to get the origins right you'll have to do some nas magic, but that's not a big problem. You could set the robot as row key in the sql_response profile, or if you might have two robots with same name, then you could include table_id in the query. Then just create a threshold for the "samplevalue" column. Or you could say "and qs.samplevalue < 300".
     
    -jon
     


  • 3.  Re: Monitor all new QoS_Computer_Uptime entries and alert on if < 300s - Is this possible?

    Posted 11-26-2014 04:47 PM

    We use ntperf to collect the systemuptime (among other values).

    We use it only for reporting, but the probe has an alarm option.

    You could enter a small value, which would be an indicator that the computer has rebooted.



  • 4.  Re: Monitor all new QoS_Computer_Uptime entries and alert on if < 300s - Is this possible?

    Posted 11-26-2014 06:45 PM

    CDM also has the option to alarm on reboot this works for us. we also have a low hub check interval which also helps detect reboots



  • 5.  Re: Monitor all new QoS_Computer_Uptime entries and alert on if < 300s - Is this possible?

    Posted 11-26-2014 10:26 PM

    Thanks for the suggestions everyone.