Input regarding qos driven availability dashboards

View Only

Back to discussions

Expand all | Collapse all

1. Input regarding qos driven availability dashboards

0 Recommend
alberto rodriguez
Posted Aug 22, 2016 11:46 AM

Reply Reply Privately
Hello,

I am in the process of creating 'server availability' dashboards for our business but need scalability since creating a new "button" widget per server is very inefficient. The list widget is nice in this respect as it allows me to create large server list or use USM group to fill in the host.

The issue is that the list widget (or list view) does not allow me to specify the alarm filter of each host. So essentially I need to rely on some sort of QOS metric to reflect a server availability. What QOS could help me identify server availability?

e.g; QOS_NET_CONNECT_PACKETLOSS? It tells me there inconstancy but not "server down" confirmation.

net_connect Metrics - CA Unified Infrastructure Management Probes - CA Technologies Documentation

A
2. Re: Input regarding qos driven availability dashboards

2 Recommend
Broadcom Employee

Sayeed Islam
Posted Aug 23, 2016 04:31 AM

Reply Reply Privately
Hello,
This may help you in this regard.
System Uptime can be monitored through either cdm, net_connect, ntperf, or snmpget.

cdm
Computer uptime, as reported by the cdm probe will be shown as cumulative, in minutes, since the robot was started. If you are looking for availability you might try using the net_connect probe to ping a device and then base your SLA on the maximum response time you consider the node to be 'available' versus unavailable.

Uptime is the number of seconds the computer has been running since reboot. It is not that useful for an SLA because it is a counter that continues to rise until a reboot occurs.

net_connect (most common approach)
You can use net_connect to ping the node every x minutes. Then use SLA/SLO engine to set a good value for each sample - like zero or greater - which shows that the computer was responding. If the computer does not respond, the net_connect probe will send a NULL which will be counted as a 'fail' by the SLA engine. The SLA engine then calculates the number of good samples vs the number of failures and provides a percentage of the SLA period in which the computer was 'up.'

ntperf
performance counter:
\System\System Up Time

In Windows, System Up Time is the elapsed time (in seconds) that the computer has been running since it was last started. This counter displays the difference between the start time and the current time.

snmpget
You can do a web search for the uptime OID and use the MIB browser and browse for it and add it to your snmpget configuration.
MIB variable: sysUpTime
OID: .1.3.6.1.2.1.1.3.0

Create a new QoS definition for uptime and use the collected value as desired.

Displaying Uptime
In UMP, I would probably recommend a dashboard view or maybe even a List View depending on your needs.

In USM the uptime stats from cdm (QOS_COMPUTER_UPTIME) when collected, would be displayed under the given node->Metrics Tab->System->Host

Thanks.
Regards
-Sayeed
3. Re: Input regarding qos driven availability dashboards

0 Recommend
alberto rodriguez
Posted Aug 23, 2016 01:54 PM

Reply Reply Privately
Sayeed,

Your input is much appreciated. Certainly things to consider.

Right now there are unfortunate limits when reflecting your suggestions in the dashboards/list views.

E.g. One host may have many "targets" and there is no granular way of selecting one specific target in a group of host servers.

Thank you

A

DX Unified Infrastructure Management

Input regarding qos driven availability dashboards

alberto rodriguezAug 22, 2016 11:46 AM

Sayeed IslamAug 23, 2016 04:31 AM

alberto rodriguezAug 23, 2016 01:54 PM

1. Input regarding qos driven availability dashboards

2. Re: Input regarding qos driven availability dashboards

3. Re: Input regarding qos driven availability dashboards