DX Unified Infrastructure Management

Expand all | Collapse all

Uptime Monitoring

  • 1.  Uptime Monitoring

    Posted Jan 27, 2011 05:15 PM

    Hi all - have been trying to do (what I would think) it something pretty straightforward on our Nimsoft monitoring system and have run into a brick wall somewhat.

     

    What I basically want is an uptime gauge that appears on the dashboard for a particular server, and also on SLA reports. Have tried putting a gauge on the dashboard with QOS_COMPUTER_UPTIME as it's datasource; however this just seems to return a value in seconds as opposed to a percentage for a set period.

     

    Is this possible? As it's a pretty basic thing to show uptime for a server I would have thought.



  • 2.  Re: Uptime Monitoring
    Best Answer

    Posted Jan 27, 2011 05:47 PM

    Yes, this is exactly what SLAs are designed to report. You just have to choose which QoS metric determines whether a given server is up or down at a particular point in time. Add that to an SLA, and you should be all set.

     

    SLA reports also let you choose a threshold that determines whether the value of the QoS metric indicates a breach of the SLA. If you just want to know if a server was up or down but do not care about how quickly it responded (or the value it returned for the QoS), you can just set the breach criteria to something the measurement can never violate, like "greater than or equal to 0." Then the SLA becomes a simple up/down report.

     

    -Keith



  • 3.  Re: Uptime Monitoring

    Posted Feb 07, 2011 04:40 PM

    Ok that's great :smileyhappy:

     

    Is there a way to easily get this uptime information into the dashboard for customers? What we ideally want is a gauge to show their uptime (as a percentage) presumably basing it off the current SLA value.



  • 4.  Re: Uptime Monitoring

    Posted Feb 07, 2011 05:35 PM

    Yes, after SLA's are created they're available for reference from the "Data Source" section.

     

     

    1. First create an SLA
    2. After the SLA has generated, there should now be QoS messages in the DB for it - open the dashboard designer now to pull in the latest DB info
    3. Drag and drop an widget onto the canvas - something like the Gauge from the Meters section
    4. With the widget selected go up to the "Data Source" section and then select the "SLA" tab
    5. You should now see your available SLA objects, either drag and drop them onto the object or simply hit the "Apply" button
    Best of luck,
    Dustin

     



  • 5.  Re: Uptime Monitoring

    Posted Sep 27, 2011 05:03 PM
      |   view attached

    Hi Matthew /  Keith / Dustin,

     

    There seems to be some confusion here.

     

    To start with, I am not sure whether this approach will report correct host availability. Even when a host is rebooted, the next time CDM probe computes uptime, it will still be a value, though lesser than the earlier value. Selecting a setting of "greater than or equal to 0" will always ensure that the server is available despite it is rebooted.

     

    I tried replicating this on a development VM (with "Detected Reboot" under CDM probe already enabled) and rebooted the VM.

     

    I have pasted the values from SLM Manager for "Computer Uptime" of my development VM below. As confirmed from the data below, the value increases in a linear fashion until the VM is rebooted at which the value again starts increasing from 0.

     

    Time               Value     Sample Rate
    9/26/2011 20:00          31892     3600
    9/26/2011 21:00          35492     3600
    9/26/2011 22:00          39092     3600
    9/26/2011 23:00          42692     3600
    9/27/2011               46292     3600
    9/27/2011 1:00          49892     3600
    9/27/2011 2:00          53492     3600
    9/27/2011 3:00          57092     3600
    9/27/2011 4:00          60691     3600
    9/27/2011 5:00          64291     3600
    9/27/2011 6:00          67892     3600
    9/27/2011 7:00          71491     3600
    9/27/2011 8:00          75092     3600
    9/27/2011 9:00          78691     3600
    9/27/2011 10:00          82291     3600
    9/27/2011 11:00          85892     3600
    9/27/2011 12:00          89492     3600
    9/27/2011 13:00          93092     3600
    9/27/2011 14:00          96692     3600
    9/27/2011 15:00          100291     3600
    9/27/2011 16:00          103891     3600
    9/27/2011 17:00          107492     3600
    9/27/2011 18:00          2415     3600
    9/27/2011 19:00          6014     3600

     

    If I will select QoS as QOS_COMPUTER_UPTIME in the quality of service entry in SLO with same source and target and "Expect Quality of Service to be" setting enabled for "Greater than or equal to" 30s, I still have SLA reported as 0.00%. The screenshots are attached .

     

    Please help.

     

    Regards,

    Amit Saxena

     

    Attachment(s)

    doc
    screenshot1.doc   99 KB 1 version


  • 6.  Re: Uptime Monitoring

    Posted Sep 27, 2011 07:42 PM

    I do not think the SLA is calculating 0%. It looks like the SLA is not calculating. You will notice that the SLO properties show:

     

    N/A %

    N/A %

     

    I am not sure why that would be the case, but I would consider those to be bad signs. You might want to look at the calculation jobs in SLM to see if the SLA is calculating.

     

    I would also recommend not using computer uptime QoS as the basis of an SLA. That is only measured once an hour and does not give you any indication of how long a host was down. You would probably be better with something that measures more often and can give you some idea of the downtime, such as a ping or TCP port test.

     

    -Keith



  • 7.  Re: Uptime Monitoring

    Posted Sep 27, 2011 08:27 PM

    Hi Keith,

     

    How to check whether calculation jobs in SLM to see if the SLA is calculating ?

     

    I would like to calculate server availability and set a SLA for the same. The idea is to display a SLA breached status if a system is rebooted. The only criteria is to do it by monitoring it within the system and not from other host.

     

    That's why I tried to use QOS_COMPUTER_UPTIME for the same. However the QoS value only depicts the system uptime which is continously increasing so there is no value I can specify in the threshold.

     

    Please let me know in case there is an alternate way through which we can achieve this. However we want to monitor the host for reboot within the host itself and not from outside. We have Nimsoft robot deployed on all the hosts.

     

    Regards,

    Amit Saxena

     



  • 8.  Re: Uptime Monitoring

    Posted Sep 27, 2011 08:28 PM

    Hi Keith,

     

    How to check whether calculation jobs in SLM to see if the SLA is calculating ?

     

    I would like to calculate server availability and set a SLA for the same. The idea is to display a SLA breached status if a system is rebooted. The only criteria is to do it by monitoring it within the system and not from other host.

     

    That's why I tried to use QOS_COMPUTER_UPTIME for the same. However the QoS value only depicts the system uptime which is continously increasing so there is no value I can specify in the threshold.

     

    Please let me know in case there is an alternate way through which we can achieve this. However we want to monitor the host for reboot within the host itself and not from outside. We have Nimsoft robot deployed on all the hosts.

     

    Regards,

    Amit Saxena

     



  • 9.  Re: Uptime Monitoring

    Posted Sep 27, 2011 08:53 PM

    Hi Keith,

     

    I changed the QoS threshold value to "Greater than or equal to" 3600 seconds and the SLO start getting calculated. I chose 3600 seconds on the false assumption that it will take maximum of 1 hour for uptime calculation to be invoked.

     

    Regards,

    Amit Saxena

     



  • 10.  Re: Uptime Monitoring

    Posted Apr 18, 2013 02:22 PM
      |   view attached

    This is not an acurate way to determine uptime, it will only work if you have an outage for less than an hour as the uptime metric is reported hourly (hence >3600 to detect the counter being reset)

     

    If you have an outage for 12 hours in a day the result should be 50% (based on 24hours) but it will report 1 hour as upon reporting in after 12 hours offline it will return a value of <3600.

     

    We had a server offline for days an it reported in as 1 hour.

     

    The reason for this are the gaps in the data (see picture) - if server is down and does not return a value, it is not included in the calculation.



  • 11.  Re: Uptime Monitoring

    Posted Apr 18, 2013 03:09 PM

    In my opinion Nimsoft is more focused on performance rather than availability reporting.

     

    Our company care much more about avaliability than performance...

    We use ntperf to collect systemuptime in 10 minutes interval (the hourly collection of cdm is not accurate enough).

    We then extract the data from the db and create our own availability reports.



  • 12.  Re: Uptime Monitoring

    Posted Apr 29, 2013 06:18 PM

    You should be able to create an SLA that treats missing data as if it were NULL.

     

    I do not recall off the top of my head exactly what the option is, but check out the different calculation methods. You may need to define your own with the right set of options.