Have you ever found something in Nimsoft and thought - Huh?
Whether it's inconsistencies with the probe messages (varying from platform to platform), weird probe pre-requisites, strange messages in the log files or anything else odd post them here. The kind of things to post about are so trivial no one in their right mind would waste time raising a call for them (even speeling mistakes).
cisco_ucs probe 2.14
this has a requirement of the cisco_ucs_migration 2.0 (which is used to convert 1.x config to 2.x). If you've never used used cisco_ucs 1.x there's nothing to migrate!!! In other words, if you've never used it before you've still got to deploy the migration tool first.
Disk space alarms levels are defined as Major/Minor in windows, but Linux (and probably others) alarms levels are Major/Warning.
ntevl (various versions)
On startup, the messages "raj beofore calling evlControlVista fun." and "raj beofore inside evlControlVista fun." can be seen in the log.
There a callback in a lot of probes that allows the loglevel to be changed "on-the-fly" without a restart the probe (very useful).
Unfortunately the callback name is not standard. I've seen the variations "loglevel", "log_level" and "set_loglevel" so far.
controller (5.90, 7.10)
In the Secondary HUB section, uncheck the "Search the subnet for a temporary..." changes the top radio button text to "Wait for primary hub...". Apply this, close the GUI and re-open it. Notice even though the "Search the subnet for a temporary..." is unchecked, the top radio button text is displaying the default "Automatic detect" (which is wrong).
in the web archive (on here), click on the "Help" icon, the link takes you to the 7.0 documentation (this one will be fixed VERY quickly me thinks ).
Anyone got anything to add?
Adding default messages/alerts in the new versions of the probe that cannot be disabled and/or configured. Logmon now sends a critical alert when a logfile cannot be found. Can't turn it off, can't change the severity. VMware and other probes have had the same thing occur.
Callbacks are not documentated at all
Not all probes allow the end users to set basic things like log level or log size
Hitting X to "save"
logging into the support site - you can't hit enter after you enter your password, you have to know to click the login button
"Adding default messages/alerts in the new versions of the probe that cannot be disabled and/or configured. Logmon now sends a critical alert when a logfile cannot be found. Can't turn it off, can't change the severity. VMware and other probes have had the same thing occur."
This thread makes me chuckle in agreement
Here are some more...
Yeah, good stuff. Weird that I can't think of many
Often folders get "stuck" in SLM and you can't drag items from one folder to another.
That's good. I think it is a great feature but needs to be configurable. There have been times when I wanted the probe to let us know if the expected log was missing, and there have been times that I took advantage of the fact that it would ignore missing logs.
One pet peeve.. in log viewer the "clear all" doesn't really clear all.
In case you didn't know yet, the logmon probe has been corrected with version 3.32.
I did - thanks. But as with every new porbe version - it's breaks something else I care about
1) You can't delete the "CIM Traps" and "DOM Traps" folders (looks like they are hard-coded into the GUI!) - Support advised "This is by design".
2) When you use $VARIABLE_DUMP it starts at V(0) yet variables start with $1 - Support told me this was "working as designed". Personally, I think this is counter-intuitive.
Online Documentation: http://docs.nimsoft.com/prodhelp/en_US/Probes/Catalog/data_engine/7.9/1913330.html
Now click on "Release Notes" link. This new page says "Release notes for this probe can be found of the CA Nimsoft Archive".
Follow the link to go to the archive.
Q: Where is the data engine???
Perhaps you can find it on King's Cross Station Platform 9¾ aboard the Hogwarts Express.
Being told to submit a defect as an "idea"
Oh, nice one bvloch! +1 for that.
keithk wrote:Perhaps you can find it on King's Cross Station Platform 9¾ aboard the Hogwarts Express.
Before they redeveloped KingsX station, I knew exactly where platform 9¾ was (walked past it occassionally).
bvloch, have a +1 from me too (logmon by any chance?)
Because the data_engine is a core component of UIM, its release notes were incorporated into the UIM release notes.
You can find the latest UIM RNs here:
Upgrading & Release Notes - CA Unified Infrastructure Management - 8.31 - CA Technologies Documentation
The data_engine does have its own probe documentation, which can be found on the probes wiki:
data_engine - CA Unified Infrastructure Management Probes - CA Technologies Documentation
Hope that helps.
I like how Nimsoft have one setting as the default value in probes, yet the documentation often has a different value as the recommended setting. One example being the *_processor_batch_size settings for dashboard_engine where default value is 0, yet recommended value is 500:
Here's a good one. Just got it from support today:
Engineering has informed me that a fix will be provided in the next release of the robot. Please note that this release will come with NMS 7.5 and will actually be robot version 7.05 which is lower than the current robot version of 7.10.
Kind of hard to take them seriously when they do stuff like this.
perfect logic.. version 1 is good, then you do version 2 and it's ****, you rollback your code and rework stuff out and tada, it's 1.5!
robot_* (in the archive)
Why is robot_deb, robot_exe, robot_rpm and robot_sol v1.10 and robot_update v7.10?
They all contain the same version should they not all be v7.10?
URL_response version 4.16, installed it and created a group and set up four URL's to monitor. In the profile name I put the URL so that the message gave the URL and not the profile name. when I saved it the configuration the probe defaulted back to the default http profile (enabled) and cleared all of my entries and the group I had created.
I would have expected it to have told me I can't use the URL as the profile name not clear what i had set up.........
Now using the Advanced tab with a source override of $url.
Definitely +1 to bvloch
I've got so many of these it's not even funny..
- hub gui can't be resized (someone else already mentioned this one) -- SO annoying!!
- Every time you add/remove a profile in url_response, it refreshes all of the groups and closes them all again
- Every time you reload or relogin to IM it forgets your window layout. I use Ctrl-L on my archive and primary hub probe view and hit the bar on the right so that they get put in the bottom frame. I have to redo that every time.
- log viewer is pretty much useless if the logs are more than a couple megs, I can go have a cig before it stops scrolling (which can't be interrupted till it's caught up)
- The robots with hubs on them can't be "pinned" to the top of mini hubs, depending on the name they could be anywhere (which is really annoying when a lot of hubs are on "utilXYZ" hosts)
- in the archive, the probe status says "new version available" next to every obselete version... How about just putting in a different color entry that's "the new version"
- Robots that are running on systems that have DNS or hosts file issues reporting to the hub that their IP is 127.0.0.1 or 169.254.x.x
- The central hub can tell if tunnels are down but will only ever create one alarm
- Liferay is ajax capable but for some reason the UMP is not, every time you move or resize a portlet the whole page reloads
- You can't have multiple filters in the thick client alarm console
- You can have multiple filters in the UMP alarm console, but they're all AND
- When defining source/probe/etc in PRD, adding an OR automatically is *everything* before the 'or' or *everything* after it
- Hubs that are configured to tunnel to <remote_ip>/<remote_port> often complain that they can't reach the remote hub using <local_ip>/<some_other_port>
- If you sort alarms by date in the NAS status page, it sorts them ALPHABETICALLY by MONTH. Really guys??
- In many/most probes, the column get reset to something annoying by default often just by switching profiles (interface_traffic in particular)
- The USM view shows a dozen columns that are pretty much always empty, at least in our environments. If you rearrange or resize the settings can't be saved
- The "set default view" option in the alarm console doesn't save (for example removing the host column, adding the source column, and moving the source column left)
- Many probes still don't let you override the alarm source (*cough* cisco_monitor)
I could go on...
Best post I've seen in a long time btw!! Big thumbs up!
"This application is busy and bla bla.." - Switch To, Retry, or Cancel?
Don't forget java_jre versions 1.62 AND 1.6.2 in the archive..
- If you sort alarms by date in the NAS status page, it sorts them ALPHABETICALLY by MONTH. Really guys?? -- HAHA You know that's not a bug - since it sorting. Nevermind that in not what users would want. Seriously how does this get past QA?
There's also a sorting bug in UMP's slm portlet. I believe in a case it was attributed to being a "flash's sorting method".
From with IM, Access Service manager app. Only allows 1 connection so if that is used gives you a pop up saying do you want to reconfigure. Click No and it still brings up the configure options. What would happen if you click yes?
"- Robots that are running on systems that have DNS or hosts file issues reporting to the hub that their IP is 127.0.0.1 or 169.254.x.x"
I have seen this and only solution I found was to add robotip= to the robot.cfg to fix (as I'm not allowed to change hosts/DNS).
I don't know if it's directly related to this but I've also seen that it can fail with "not a valid local ip address" in controller.log - that then requires the adding of the local_ip_validation = no to be added to robot.cfg
I wouldn't even know how to add to this thread but. I'll try.
Mostly it would be why dead stop with one path and pickup another. Even though I hate flash and the portal is limited at best at least SDP could display a customer a view of just their systems. I haven't tried my new install of 7.5 because 6.2 was broke and so much stuff for me is broken I don't even know where to start. Every time I think, hey I'll use UMP half of the features are missing. It's like 7.5 with the admin console was 10% of what they wanted it to be. So now I'm asking if what I am doing will even be doable this summer or whenever IM is going away? It feels like I've been put into a blender.
iostat (this is taken from the release note attached to the archive listing:
Installation notesThe probe will fail to start if the SDK_Perl package is not installed prior to the installation
The probe will fail to start if the SDK_Perl package is not installed prior to the installation
But isn't that why we have a dependency tab in the packages?
Looks the lastest version 1.80 allows you to actually get alerts when the devices you want to monitor are timing out (send session alarms) - this of course is not documented in release notes.
The tab in which you define the timeout theasholds does not contain the check box to send the alarms when these values are exceed, oh no, this is on another completly different tab. I don't know why I did not think to check the General tab to tell the probe to alert me when what I configured to be monitored, can't be.
the latest robot update (7.62) file on the archive - filename: robot_update (1).zip
Nice to see the QA/Standardards department earning their keep
Thank you Charile.
In order to figure out some random FAIL blips we are seeing in our net_connect.log files, we have been picking net_connect apport and trying to decipher how it actually work. And in the process, we have discovered something deserving of a major facepalm.
It seems the threading net_connect is split up into something like this:
1x MAIN SEND (ALARM)1x MAIN RECV (ALARM)N PING QOS1 x QOS DISPATCHER1 x RESOLVERN x ConnectToServiceSrv (that connects to tcp services)
And we notices that the both the MAIN SEND and assorted PING QOS threads are doing pings to our servers:
Jul 30 08:30:46:217  nc: send_ping - 31: SEND 80.76.x.x, ident=19481, seq=1Jul 30 08:30:46:217  nc: send_ping - 31: SEND 80.76.x.x, ident=19481, seq=2Jul 30 08:30:46:217  nc: send_ping - 31: SEND 80.76.x.x, ident=19481, seq=3Jul 30 08:30:48:656  nc: send_ping - 9: SEND 80.76.x.x, ident=20029, seq=1Jul 30 08:30:48:656  nc: send_ping - 9: SEND 80.76.x.x, ident=20029, seq=2Jul 30 08:30:48:656  nc: send_ping - 9: SEND 80.76.x.x, ident=20029, seq=3Jul 30 08:31:20:289  nc: send_ping - 7: SEND 80.76.x.x, ident=20270, seq=1Jul 30 08:31:20:289  nc: send_ping - 7: SEND 80.76.x.x, ident=20270, seq=2Jul 30 08:31:20:289  nc: send_ping - 7: SEND 80.76.x.x, ident=20270, seq=3
In this case, 47955888007488 is the MAIN SEND thread, and 47954818042176 is one of a handful of "PING QOS" threads.
In my config, we have interval set to 2min, and burst set to 3 packages.
So the first 3 entries at 08:30:46:217 is the alarm check running at interval, and it gets a random FAIL due to what appears to be a different problem in the net_connect probe (but we'll get to that later). The next 3 at 08:30:48:656 is the retry 2sec later (I have a timeout of 2min). And then the 3rd part 08:31:20:289 is something else which actually appears to be another ping used for qos also collecting every 2 min:
Jul 30 08:31:20:289  nc: ptNetPingWait: host = 80.76.x.x socket, 7 is aquiredJul 30 08:31:20:289  nc: send_ping - 7: SEND 80.76.x.x, ident=20270, seq=1Jul 30 08:31:20:289  nc: send_ping - 7: SEND 80.76.x.x, ident=20270, seq=2Jul 30 08:31:20:289  nc: send_ping - 7: SEND 80.76.x.x, ident=20270, seq=3Jul 30 08:31:20:307  nc: (recv_ping) 7: RECV 80.76.x.x, ident=20270, seq=1Jul 30 08:31:20:307  nc: ConnectForPingQos - 80.76.x.x [18ms], status=18)
And we find all 9 packets with tcpdump as well.
So, it REALLY does seem like net_connect does DOUBLE pings against each device. One used for alarms and one used for qos, which basically doubles the work net_connect has to do.
Good thing they resolved that in any one of these:
v2.90: "Revamped probe for scalabilityv2.03: "Performance improvements done to handle 5000+ profiles."
Probably gonna put up a new thread "Dissecting net_connect" with all the findings I end up with today, but the executive summary is mainly that net_connect is horribly broken.
1) ntevl open the probe UI, change the loglevel from 0 to max with the slider control (in one step). You'll notice that the apply button stays gray and when you press OK, your chages are ignored
2) Beeing told by support that 1) is not a defect because previous versions of the ntevl probe behaved the same way so it must be "works as designed". I guess the "design" here is "make assumptions of what the user wants to do -- and then do exactly the opposite" ;-)
3) open a probe UI and be informed that the probe is locked by yourself eventhough you've closed the probe UI cleanly before
4) sqlserver: The sqlserver probe runs (like any other probe) as user system. When you configure a new database connect you can choose between a pure SQL user or a domain account. In some cases this domain account is then used to do further filesystem access in C:\Program Files\Nimsoft\niscache which fails
5) Finding out that support cannot replicate an issue because they are using a newer version of the probe that has been released *after* I opened the issue. I only noticed because I saw the new version in the archive looked at the release notes and found my exact problem described
6) Seeing that the oracle probe still does not support oracle 12
forgot a nice one
7) If you are using a non-english windows version with the ntevl probe you have to tell the probe the localized strings for "critical", "warning", etc (http://docs.nimsoft.com/prodhelp/en_US/Probes/Catalog/ntevl/3.9/index.htm?toc.htm?2252074.html) At the same time the package editor does not allow to select a robot by OS language so it is basically impossible to ship one package for every new windows host if you have a mixed environment (e.g. both german and english servers)
Pattern matching: This method is similar to the one found in many shells (UNIX)
For example, *C:* will not monitor the disk usage matching the given regular expression *C:* i.e. Disk C.
On support.nimsoft.com it's 2.04 but when imported into the archive it's 2.03
The included PDF is v2.01 and missing the function "file.mkdir" which was introduced in 2.00 (according to the release notes)
This was a recent is idea:
Prerequisites for logmon version 3.42 are asking for the manual registering of a DLL file using the Administrator ID. It's new in version 3.42 and should NOT ask for such thing as it might bring security issues. Make the DLL install fully automatic when the probe is deployed using the Nimsoft Administrator credentials.
This will cause us some adminstrative overheard and I am wondering why the installer cannot perform this action?
1. impliment on object based monitoring structure.
2. don't document it.
3. use the sendmail queue design for storing objects. (every object is a file in a flat directory)
4. start making everything an object.
5. Ignore reports of IO death for systems due to hotspot on the disk in niscache and opt to leave the bad cache design in place.
6. Explain that TNT2 will replace niscache while expanding it's use in probes and APIs then integrating it with TNT2.
Boo. Filesystem as a database is lazy 70's stile programming. It shouldn't have lasted past the POC.
All I wanted to do was monitor some simple snmp stuff using snmpcollector...
There is so much wrong with this I am at a loss for words.
We faced similar woes. However, that alarm can be ignored I believe, it just means that you cant use the time to/over alarms. I have yet to try and configure alarm_enrichment. Does anyone have tips or words of wisdom?
On a side a note, after having spent well over 30-40 hours working with/on the snmpcollector probe (v1.61-2) I feel that I'm confident with the way it's setup and configured. The new templating is somewhat tedious at first since it has no alarm thresholds configured (maybe that's due to the ppm error?) but once you start digging in, it's definitely powerful. Devices that previously didn't have supported MIBS/OIDS in the probe now do and pull an array of metrics. The interfaces tab/discovery is also solid.
Have you tried looking up the documentation for the ppm probe? Oh yeah, that doesn't exist either.
Taken from https://wiki.ca.com/display/UIM81/Primary+Hub+Component+Reference
I don't know what's worse, the '?' - which seems to indicate that they aren't sure about their own product, or the phrase "It does something".
ntevl release notes from web archive: http://support.nimsoft.com/unsecure/archive.aspx?id=74
For Detailed Release Notes-Please referhttps://wiki.ca.com/display/UIMPGA/NT+Event+Log+Monitor+%28ntevl%29+Release+Notes
the wiki page actually contains LESS information.
I know this reaks of user error but try to change your password on the wiki site. It will tell you that your current password is incorrect. I had others tried just becasue this seemed so silly
In the nas when creating a trigger, one of the filter options is the "green clear alarm status box". If you select just this one thing, it will only ever report zero. That's because triggers only count active alarms and the clear alarm takes the record of the alarm out of the alarm table.
Wonder if there's spacial code to handle that or if it looks for active alarms with a status of clear anyway?
Processes probe v3.83 has the single following changelog:
Fixed a defect where alarm was not getting cleared-Salesforce case 00146533
Well, it turns out it uses different supp_key on the messages for the alarm and clear message:
alarm: "processes/ntpd/process_state" clear: "process/ntpd"
I'm not sure what processes v3.83 fixes, but it clearly doesn't fix the alarm clearing.
Hey, that was my defect they tried to fix.
Haven't had the chance to test yet - looks like I don't need to....
anders_synstad wrote:Processes probe v3.83 has the single following changelog: Fixed a defect where alarm was not getting cleared-Salesforce case 00146533 Well, it turns out it uses different supp_key on the messages for the alarm and clear message: alarm: "processes/ntpd/process_state" clear: "process/ntpd" I'm not sure what processes v3.83 fixes, but it clearly doesn't fix the alarm clearing.
From support - "Defect DE40797 is already raised for the same"
Some articles that contain kind of funky information.. such as these
https://na4.salesforce.com/articles/TroubleshootingObj/Nas-alarm-counts-are-not-in-sync-between-UMP-USM-and-the-Infrastructure-Manager-alarm-subconsole?popup=true resets all alarms and doesn't say so. A bit of an overkill I'd say as the first step, anyway.
https://na4.salesforce.com/articles/HowToProcedures/How-to-delete-or-merge-old-obsolete-origins-so-they-no-longer-display-in-IM-or-UMP?popup=true "(Best Practices/Other Notes)" has some issues..
update: they've now fixed the latter article.
kananda wrote:Issue with process probe should be fixed at the earliest as this is leading to improper notification. Have encountered two such instances in just one week.-kag
While I agree whole heartedly with the desire, I think that I'd poke my left eye out in joy if that was the worst of the issues I faced with UIM. I opened my support case on this issue 6 months and 8 days ago. On the other hand I've been told to expect a beta for a fix in the next week or two depending on your interpretation of what "mid" means with respect to a month.
I've included a link here to the most recent ppm probe documentation. If there is anything in particular that is missing from the doc or any information you want us to enhance, please let us know.
ppm - CA Unified Infrastructure Management Probes - CA Technologies Documentation
v1.3 prediction_engine AC GUI Reference - CA Unified Infrastructure Management Probes - CA Wiki
v2.6 baseline_engine Raw Configuration - CA Unified Infrastructure Management Probes - CA Wiki
I know I've not posted in a while, but: 1 and 2 = Binary?
Ironically, the next parameter uses binary (this time true/false) correctly.
Should I be concerned about the future of this product with programming and documentation gaffs like this?
Thanks for the feedback on these docs. You're saying that "binary" is not the technically correct term. We'll make that fix in the documentation soon.
I think you are missing the whole point of this thread - it was solely created to highlight (or poke fun) at some of the things the user base has found when using the product. The thread could have easily been called WT.? (but chose the more polite 'Huh?')
Your reply to my 'data_engine' post missed the point I was making: the fact that the documentation quoted in the post pointed to something that did not actually exist.
To quote the (in)famous saying: "There are only 10 types of people in the world: Those who understand binary, and those who don't".
I do understand the point of the thread. And empathize with the users who posted to it. I figured if there was anything we could do to improve the docs and make the users' experience better, I would do that. Can you tell me more about what you mean by "the documentation quoted in the post pointed to something that did not actually exist"? I changed the wording about that particular setting so that it is not described as binary. Is there anything else you'd recommend to improve these documents? Thanks!