With the release of SNMPCollector 2.0, we were quick to implement the probe due to the new features and support for network devices. However, as with most releases of the probe, it was more complicated and bug ridden than the documentation let on.
Noteable Bugs: The probe does not work on Tunneled HUBS with robots attached. When configuring the probe you get a 'MONS-002' error and are unable to save the templates, thus not able to monitor anything.
Templating: While the templating process seems to introduce new features, it's a rather complicated process to configure alarms. Features like being able to specify the alert messages are now gone and you get generic messages that are sometimes hard to decode for those that haven't memorized OID values. You are also unable to specify the sampling count and rearm.
Benefits: The new discovery filter is definitely nice for a multily client environment. Also, the new probe is able to monitor many metrics that were unavailable before (ie. interface metrics for SW's, FW's, etc. depending on the versions.)
Does anyone else have any thoughts or want to share their experiences?
Awesome timing as we have been playing around with it ourselves the last couple of days, trying to figure out if we can use it, and if so, how.
The conclusion so far is no, we cant.
Some of my preliminary findings. Keep in mind that it might just be because I don't understand the probe fully yet and haven't figured out how to do a specific thing.
Since we run no monitoring on our primary hub, everything is attempted on remote hub(s).
< /rant >
icmp probe basically have the same problems. I sincerely hope icmp and snmpcollector isn't a sign of a new trend that is starting. If it is, I am worried.
So I do know what probe_discovery messages are. They're compressed JSON data describing the devices being monitored by the probe. The discovery server processes them and updates the CM_ and possibly other tables accordingly.
TNT = the next thing NIS
TNT2 = NIS2 the next thing 2
TNT3 = NIS3 ...
I'm fuzzy on where TNT and TNT2 are separated, but up through TNT2, it's all using niscache on the robot. Discovery_server periodically polls, collects, and cleans up some of that cache while publishing details to the CM tables.
The data is a combination of object style thingies.
device = is a device presumably with an IP.
dev_id = encoded of device id, unique identier for device
ci = is typically a component on a device, like eth0
ci_id = encoded ci id, unique identifier for a ci
ci_type = this is equivelent to subsysid assigned in the nas and is something like System.Disk 1.1 or whatever.
Metric_Item = available metrics under an ci_type. Octets in, octets out etc.
MI_id = numeric value of an MI
ci_type_id = numer value of ci_type or aka subsysid
metric_type_id = combination of ci_type_id:metric_type_id 1.1:39 System.disk:Read In
metric_id = an encoded measurable instance of a metric_type_id for a specific ci on a specific device. Network.InterfacectetsIn eth0
This is basically TNT2. discovery server collects this data, and publishes it to the CM tables. The formal units and types and associations allow for automated reporting in proper formats associated with the devices and components of the devices being reported on.
The down side is that all of the met_id, ci_id, and dev_ids end up as files in a flat directory on the robot ./niscache which can become an io bottleneck when you are monitoring things like switches.
TNT3 adds the probe_framework which is the bases for new probes going forward including snmptoolkit, icmp... and vmware.
It allows the developer of the probe to discover the met_id objects and organize them logically into objects and containers with some other attributes that allow for automated generation of the configuration "gui" they keep calling it, in admin console. This device topology published as a "graph" under the subject probe_discovery. It's compressed JSON that goes to the discovery_server for processing.
ppm fits in their somewhere. Maybe something to do with applying Canonical Topology Description (CTD) to the topology information in a probe to generate the config gui. I think maybe ppm is like a bridge of some sort between cfg, ctd, graph or old nis2 probes not built on Probe Framework, but who knows?
The gateway to TNT2 is using ciopen cialam and metric this that the other functions instead of the much simpler nimalarm ... This buys you magically configured graphs associated with your device in USM from a custom probe like magic for the extra effort.
TNT3 is all probe framework. I haven't sifted it all out yet, but the outputs appear to be very much a work in progress. The whole direction is fairly promising.
Side note: If you have a HUGE vcenter, the compressed graph in probe_discovery messages can exceed 1MB. This is significant due to a bug in hubs prior to the 7.x series where a lazy megabyte, 1000000 bytes, was a hard coded maximum in an internal hub routine that took messages off the spooler in_ queues and pushed them to the hub for processing. The internal spooler would accept the message and say ok to the sender, then bork when trying to send it to the hub and you would lose anywhere between zero and 19 other messages as collateral damage due to a hard coded bulk size of 20 in the operation without any alert.
I think the ctd is related to ppm; but ppm is a black box to everyone that I've spoken to.
I don't think udm_manager is related to it but who knows. From what I understand udm_manager is heavily used by USM and plays a part in datomic keys and discovery.
Alright, was looking at the callbacks and there is stuff for getting and setting ctd configuration.
More info on udm_manager has been posted to the wiki:
One thing has become abundantly clear with 8.1 and anything new that supposed to be replacing "legacy" probes that they don't want to support, they are not developing these probes for large enterprise or service provider environments. The idea of running hundreds of nas instances and then somehow syncing them to support predictive alerting is boggling, and they can't provide any guidance on how that would work because they aren't even trying it.
My guess is they'll start using alarm_enrichment as a hack to rewrite message text on their poorly designed generic qos alarms and expose configuring that "message pool" in admin-console right next to predictive alerting and call it a new feature delivered!
Sad state of things. Maybe if they had a reference architecture all of the new programmers they have hacking out code would have some idea of the large-scale and diverse deployment scenarious they need to keep in mind with their designs, and maybe customers wouldn't have to treat deploying the new version like a research project.
I agree with most of those minus the OOB template. It did include some standard metrics but had no alarms configured which made it useless. Luckily, you can copy the default to a new template and create the alarms (somewhat odd process at first, so I stuck to statics.)
exact same issue over here, have you found any solution or workarounds yet?
I have tested 2.0 as well and as the other have noted I have found the same issues for the most part. One other I experienced is when creating a template you can select an alert threshold but when the template gets deployed no alerts are configured. I also requested they give a way to copy and or push templates out. I was able to copy the template created on one hub to another without a issue just could not edit it cause of the bug described. I would also like to set QOS to the hostname or IP address not just the ip address as I did not see a way to change the source of the qos data.
Because the new probe is able to pull metrics that 1.61 wouldn't (interface statistics on a lot of switches, TCP, etc.), I was somewhat desperate to get it working on a client HUB.
Setup: Primary Hub -> Secondary -> Client Hub -> Robot with SNMPCollector 2.0
Issue: PPM doesn't seem to work with SNMPCollector 2.0 when its on a remote hub that has a robot attached. When saving any configuration it throws a "mons-002" error and templates are never actually created. Nimsoft was able to replicate this bug on their end.
Workaround: I installed the probe on a HUB without robots attached and it worked. From their I created the templates and copied them over to the client HUB/robot. Reboot the probe and the template should apply correctly. I haven't seen any issue with metrics being polled yet.
I know that I have deployed snmpcollector on a remote hub in lab at least. And configured it using Admin Console on the the primary hub. Don't think I had to do anything special? Had to upgrade to latest ppm on all hubs, but yeah.
We created a support case for the MONS-002 issue and not being able to clone/ copy templates. it turned out there was an issue with the monitoring services probe. We received a new version (2.0.2), which fixed the problem for us.
If you run into this issue the best thing to do is create a support case and ask for version 2.0.2 of the monitoring_services probe.
I have not yet found a way to take the template from one hub and deploy it on another. Does anybody know how to do this?
Oh and we created an idea for being able to choose how to publish the QOS from the SNMPcollector:
snmpcollector : QoS source with "Profile name"
Please promote it if you would like to have this feature.
We are in the middle of a UIM8.1 training and the first thing we wanted to learn about was the snmp collector 2.0.
1. First off the major thing that threw me off was the naming convention. Why are the machines or devices that are in monitoring by the SNMP Collector refered to as "Profiles"? They should be called "Resources" or "Monitored Objects". Secondly the Templates I think should be called Profiles and in Profiles you should have different "Templates" for each different filter you can setup in each profile. The naming is totally screwing me up right off the bat.
2. Its is very not intuitive. There really should be a Wizard like the way the Discovery Wizard walks you thru the whole process.
3. Don't see a way for it to load custom MIB files and then create a "Template" off a 3rd party MIB file. This ability is essential for custom hardware and vendors.