DX NetOps

  • 1.  CA Spectrum Watch Creation for Hard Disk Busy I/O Utilization > 50% (Sustained 4 Hours)

    Posted Oct 07, 2015 03:05 AM

    Dear All,

     

    We want to monitor Disk I/O busy utilization and generate an Alarm if Disk I/O Busy utilization goes above 50 percent for sustained FOUR hours. We have polling interval of 5 Minutes.

     

    As per CA Spectrum watch user guide Release 9.3 (page 54) , following is the procedure for sustained monitoring of CPU Utilization. cpqHoCpuUtilMin mib name marked in red is for CPU.

     

    Could you please help us to find out the MIB name required for DISK I/O Busy Utilization.

    https://support.ca.com/cadocs/0/CA%20Spectrum%209%203%200-ENU/Bookshelf_Files/PDF/Spectrum_Watches_User_ENU.pdf

    "''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

    The example watches that are described here are used for usability and testing.

    The first watch monitors the CPU for a sustained referenced usage value (80% in this example).

    The second watch triggers an alarm if the CPU usage remains at a certain level (80%) for a sustained period of time. This watch calculates this time period using the threshold value (3), multiplied by the polling interval (5 minutes). Therefore, this watch violates the threshold and triggers an alarm if the CPU usage exceeds 80% for 15 minutes. You can adjust these values to suit your requirements.

    Examples: Usability and Testing Watches

    The first watch consists of the following parameters:

    ■ Name: CPU_Duration_Over_80

    ■ Data Type: Integer

    ■ Expression

    – Expression: MAX(0, MIN(1, INTEGER((INTEGER(cpqHoCpuUtilMin.#) >= 80))))

    – Instance: All

    ■ Properties

    – Default Activation: Active

    – Evaluate: On Demand

    – Inheritable: False

    ■ Threshold: None

    The second watch consists of the following parameters:

    ■ Name: CPU_Time_Duration

    ■ Data Type: Integer

    ■ Expression

    – Expression: ((CPU_Time_Duration.# + 1) * CPU_Duration_Over_80.#)

    – Instance: All

    ■ Properties

    – Default Activation: Active

    – Evaluate: By Polling

    – Poll Interval: 0 + 00:05:00

    – Inheritable:False

    ■ Threshold

    – Threshold violated if value >= 3

    – Threshold reset if value < 3

    ■ Alarm

    – Alarm Severity: Minor

    – Alarm Description: ErrorTholdAlarm

    – Alarm is user clearable

    – Watch is not reset upon user clearing of alarm.

    – Script: None

    "''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''


    Thanks and Regards,

    Swapnil



  • 2.  Re: CA Spectrum Watch Creation for Hard Disk Busy I/O Utilization > 50% (Sustained 4 Hours)

    Posted Oct 08, 2015 02:48 PM

    Hi Swapnil


    Whether an attribute for disk IO is exposed depends on the SNMP agent on the device. The ucdavis MIB, for example, includes:

     

    diskIOLA1

        "The 1 minute average load of disk (%)"

         ::= { diskIOEntry 9 }

     

    -- 1.3.6.1.4.1.2021.13.15.1.1.

     

     

     

    Compaq has:

     

    cpqDaLogDrvPerfTotalIO OBJECT-TYPE

       

        "Array Logical Drive Performance Monitor Total I/O.

     

                This value shows the total number of read and write requests

                for the logical volume expressed in reads and writes per second."

         ::= { cpqDaLogDrvPerfEntry 8 }

     

    -- 1.3.6.1.4.1.232.3.2.8.1.1.

     

     

    Once you find a MIB object, or group of MIB objects, that provides the IO data, you could use an event duration rule.  From the Event Configuration Editor doc:

    "...You can use the D flag to detect that a second event in what was expected to be a pair of events did not occur within a specified period of time."

     

    Use the threshold watch event as the first event, and if the threshold reset event as the second event.  If the second event doesn't generate within 15 minutes, then alarm.

     

    Scott



  • 3.  Re: CA Spectrum Watch Creation for Hard Disk Busy I/O Utilization > 50% (Sustained 4 Hours)

    Posted Oct 09, 2015 02:28 AM

    Dear Scott,

     

    We have CA SysEDGE Agent installed on the server. Could you please let me know the MIB name / attribute for this. Is it    "diskStatsUtilization" ?

     

    Thanks and Regards,

    Swapnil



  • 4.  Re: CA Spectrum Watch Creation for Hard Disk Busy I/O Utilization > 50% (Sustained 4 Hours)
    Best Answer

    Posted Oct 09, 2015 09:32 AM

    Yes:

     

    diskStatsUtilization.png

    sysEDGE_Disk_Stats_table.png



  • 5.  Re: CA Spectrum Watch Creation for Hard Disk Busy I/O Utilization > 50% (Sustained 4 Hours)

    Posted Oct 12, 2015 02:01 AM

    Dear Scott,

     

    Thank you very much for your support.

     

    I am attaching the screenshots of watch which we have created for monitoring of Disk I/O Busy utilization > 50 Percent for Sustained 4 Hours.

     

    First screenshot of the attached doc shows that First watch status is INITIAL , means it has never run so Please Verify whether this procedure is  correct or not?

     

    Thanks and Regards,

    Swapnil



  • 6.  Re: CA Spectrum Watch Creation for Hard Disk Busy I/O Utilization > 50% (Sustained 4 Hours)

    Posted Oct 12, 2015 03:01 PM

    Start by creating a watch that represents your threshold value of 80%.  This watch will be evaluated on demand, and will show up as initial in the watch list:

     

    80pcnt.png

     

    Use the above watch as an attribute in the expression of your disk IO watch.  The disk IO watch expression will include only (all instances of) the diskstatsutilization attribute:

     

    diskiowatchexpress.png

    diskiowatchprops.png

     

    It violates when diskstatsutilization hist 80% (>= Disk_Util_Threshold).  When violated, this watch will generate event 0xffff4321.  When it drops below 80% again, it will reset, and generate event 0xffff9876:

     

    diskiowatchthold.png

     

     

     

    Now create an event pair rule (see the docs regarding an event pair that will generate an alarm if the threshold violation is not reset within 60 seconds:

    https://wiki.ca.com/display/CASP10/Manage+and+Configure+Events)

     

    which will generate event 0xffff2468, if the threshold exceeded event 0xffff4321 is not followed by a reset event 0xffff9876 within 60 seconds:

     

     

    Edit event 0xffff2468 so that it generates alarm 0xffff4321:

     

     

     

    Scott