Idea Details

PM Vertica monitoring improvements

Last activity 05-29-2019 06:15 PM
raphael.franck's profile image
01-25-2018 05:05 AM

Hi all, @Lutz_Holzbecher, @Dan_Holmes,

 

customers will probably benefit from improved out-of-the-box monitoring for HPE Vertica, if some trap mapping is added to Spectrum.

 

What have we done in total?

- configure standard net-snmp agent on Vertica hosts

- add device models to Spectrum for all Vertica hosts

- add standard device thresholds, filesystem thresholds and process monitoring in Spectrum

- add trap mapping to Spectrum based on VERTICA-MIB

- configure Vertica to send traps on conditions worth to be notified of

 

The VERTICA-MIB does not specify any pollable attributes. There is no dedicated snmp agent nor any integration with default operating system snmp agent (net-snmp). There is just 1 single traptype defined in that MIB,

 

referring to HPE online documentation (needs to be verified for newer Vertica versions):

 

Version 7.0.x:
https://my.vertica.com/docs/7.0.x/HTML/Content/Authoring/AdministratorsGuide/Monitoring/Vertica/ConfiguringEventTrappingForSNMP.htm

 

$ /opt/vertica/bin/vsql <DB-NAME>

 

=> SELECT SET_CONFIG_PARAMETER('SnmpTrapsEnabled', 1 );
=> SELECT SET_CONFIG_PARAMETER('SnmpTrapDestinationsList', 'host_name1 port1 CommunityString1,hostname2 port2 CommunityString2' );
=> SELECT SET_CONFIG_PARAMETER('SnmpTrapEvents', 'Low Disk Space, Read Only File System, Loss of K Safety, Current Fault Tolerance at Critical Level, Too Many ROS Containers, WOS Over Flow, Node State Change, Recovery Failure, Stale Checkpoint');

 

Version 7.1.x
https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/AdministratorsGuide/Monitoring/Vertica/ConfiguringEventTrappingForSNMP.htm

 

$ /opt/vertica/bin/vsql

 

=> ALTER DATABASE <DB-NAME> SET SnmpTrapsEnabled = 1;
=> ALTER DATABASE <DB-NAME> SET SnmpTrapDestinationsList = 'host_name1 port1 CommunityString1,hostname2 port CommunityString2' );
=> ALTER DATABASE <DB-NAME> SET SnmpTrapEvents ='Low Disk Space, Read Only File System, Loss of K Safety, Current Fault Tolerance at Critical Level, Too Many ROS Containers, WOS Over Flow, Node State Change, Recovery Failure, Stale Checkpoint';

 

The attachement contains sample configuration files for getting the trap mapping done in Spectrum. The event IDs that are used are based on a specific developer ID (not CA). Although you can, technically you should not need to change these. The EventDisp does some enhanced stuff in order to get fault-specific alarm titles as well as automatic alarm clearing on "ok-traps".

 

regards,

Raphael


Comments

01-25-2018 06:23 AM

Fantastic idea!

 

Sent from my iPhone

01-25-2018 05:10 AM

great Idea, Raphael!!!

 

rgds Steve