HW monitoring

Back to discussions

Expand all | Collapse all

Jump to Best Answer

1. HW monitoring

Recommend
Anon Anon
Posted Jan 11, 2008 03:37 AM

Reply Reply Privately
Hi There

A while ago I put a question to the Nimsoft support regarding HW monitoring

on servers:

Hello
What is the best way to monitor HW according to Nimsoft.....i chosed Component Cim_trap since this seems to be one way using HP compaq servers Haven´t tried it yet though.......but in general...since we might use other brands
Is there any other effective ways to do this and still maintain coop with nimbus
We need to detect Raid errors disk problems, fans etc

This was the answer

The best way to monitor H/W devices according to Nimsoft, is to use the snmp based probes. The probes using snmp are SNMPGET, SNMPGTW, SNMPTD and INTERFACE_TRAFFIC. You may also use cim_trap, which converts traps from Compaq messages into NimBUS alarms.

Well i have now tried the Nimbus way for HW monitoring

snmptd with the cim_trap extension!

ok i must say that the problem descriptions added in the alams are a bit short!!

for eg. i removed a disk in the raid on a lab server ....and Yes! an alarm was created

critical saying: "Status is now 3" !!!!!!!! ????? hrrm ok!

Let´s say this alarm is received by a “stressed tech” or viewed by the operation Noc

what is status 3 ?? on what ???

after a bit of investigation and one hint from the alarm seen throgh the nimbus manager

as: Suppression key snmptd/cpqDa6LogDvrStatusChange..

ok! they might identify this to be Drive or storage related

hmm ok lets look at the eventviewer

This is what i can see

“Drive Array Logical Drive Status Change. Logical drive number 1 on the array controller in Slot 4 has a new status of 3.

(<street>
Logical Drive
</street> status values: 1=other, 2=ok, 3=failed, 4=unconfigured, 5=recovering, 6=readyForRebuild, 7=rebuilding, 8=wrongDrive, 9=badConnect, 10=overheating, 11=shutdown, 12=expanding, 13=notAvailable, 14=queuedForExpansion)

This is a bit more informative

Well i know the Mib itself doesn´t provide all the detailed info as in the eventlog

and from what i can see there is some more variables who can be added to the alarm text

but to interpret this you really have to go through each and every mib possible and manually edit and add every profile in order to get a understandable alarm text

if we where to use another server brand what then ?

using the eventlog instead isn´t gonna help us on a linux server either

Ok this is better than nothing but :/

Well im a bit novice on how to interperet the traps so I do rely on the monitoring software

to do this for me .....so please share youre knowledge

Any tips and trix? Other tools or gadgets but still keep the cooperation to Nimbus to create alarms ….has someone else already done this

Created your own extentions to the snmptd or ….?

What do you guys out there use to secure HW monitoring?
2. HW monitoring
Best Answer

Recommend
Anon Anon
Posted Jan 12, 2008 07:13 AM

Reply Reply Privately
We primarily expect the vendor-provided hardware agents to generate meaningful log messages when there is something wrong, and we would alert on those. In your example, you could use the ntevl probe to get that message from the event log. I think we have a pretty solid setup of this for Windows. I am not sure on our Unix servers if the hardware agents are writing to syslog, but that would be the way it should work here.

Keith
3. HW monitoring

Recommend
Anon Anon
Posted Jan 16, 2008 10:59 PM

Reply Reply Privately
Hi Keith
and thank U for your input.
Yes this seems to be the best way to do it! creating a profile in the ntevl probe filtering on source of the
vendor provided agents......
in our case we did a test on our lab server using reg expression
/^(N100|Storage Agents|NIC Agents|Server Agents|CPQDAEN|cpqasm)+$/
then we did did some controlled HW damage on disk, nic and fan ...seems to work fine.

Pehaps logmon is the option for linux? even though i have´t seen how any HW related errors are presented in the messages log.

so is there any linux gurus out there who knows...? hmmm perhaps time to shedule another lab.

//J.L
4. Re: HW monitoring

Recommend
Anon Anon
Posted Feb 19, 2013 11:54 AM

Reply Reply Privately
Try the next implementation: HPSIM - > (logfile / snmp traps) -> nimsoft snmptd/logmon.

You will have a very strong hardware monitoring environment.

DX Unified Infrastructure Management

HW monitoring

Anon AnonJan 11, 2008 03:37 AM

Anon AnonJan 12, 2008 07:13 AMBest Answer

Anon AnonJan 16, 2008 10:59 PM

Anon AnonFeb 19, 2013 11:54 AM

1. HW monitoring

2. HW monitoring Best Answer

3. HW monitoring

4. Re: HW monitoring

2. HW monitoring
Best Answer