Hello, looking for some guidance here with regards to creating a script or a pu command to check if all my vmware probe instances are working correctly and not in a corrupted state.
What I mean by this is out of the blue, we'd try to open a vmware probe and we get the error: key host not found The probe's GUI doesn't load and while in this state isn't monitoring anything.
We've already implemented the increased java memory startup change to start up the vmware probe with 256MB but that doesn't fix it 100% of the time.
Investigating the probe while in this broken state the <properties> section which contains the username, password, etc is missing from the <resources> section of the vmware probe.
So if you open the probe in RAW mode, you'd see under the resources folder the entries but there would be no sub-folders. When you don't see the sub folders which would be 'auto_monitors' and 'properties' the probe is in this broken state.
We then have to go thru the whole process of restoring the probe to a good working state with a backed up good cfg file.
Anyway my question; is there a way to check, for the existence of these sub-entries under the <resource> entries to verify they exist? Right now the only way to verify is to manually go to each instance and open in RAW mode and check. Would this be possible thru either thru a pu.exe call back or something else? There is no way to know the vmware probe is in this broken state unless you by chance open the probe and actually get the error pop up: key host not found
We are also facing the same issue sometimes ,once the probe is restarted it is working fine .Ideally the probe is used to monitor the servers ,It would be bad to monitor the probe itself .It's needs to be fixed permanently by probe side itself.And it would be helpful to have a dashboard in UMP for vmware probe health status.
May be you can look at the log file find anything matching the error and put up a logmon. I do this for quite a few critical probes. But again, this is just to diagnose and it is useful when we know the solution to fix it. And yes, vmware probe, many times is prone to issues. So, I always have a package backup in my archive.
Okay just figured out a way to do this much quicker than manually having to go thru each one by using the pu.exe command in the \Nimsoft\bin directory.
So in IM tool do a Tools > Find, select Probe and then enter vmware. Generate your full list and copy out results to excel. What we need is the probe address column. Delete everything else and leave the probe address column in column B.
In Column A put: pu -u administrator -p password
Column B vmware probe address list entires. Exmaple: /UIM/Hub1/prihub1/vmware
Column C: get_node_values NULL >> vm_check.txt
So the full command would be:
pu -u administrator -p password /UIM/HUB1/Robot1/vmware get_node_values NULL >> vm_check.txt
Were getting each node in vmware and checking its properties and appending the output to this file: vm_check.txt
So a good working return looks like:
======================================================Address: /UIM/HUB2/robot1/vmware Request: get_node_values
resources PDS_PPDS 229
0 PDS_PDS 220
port PDS_PCH 4 443
host PDS_PCH 13 HOST1-vc01
interval PDS_PCH 6 10min
name PDS_PCH 13 HOST1-vc01
ID PDS_PCH 13 HOST1-vc01
active PDS_PCH 5 true
user PDS_PCH 5 root
msg PDS_PCH 17 ResourceCritical
key PDS_PCH 21 HOST1-vc01.Profile
pass PDS_PCH 25 N6nxas2dFvs2wdH8qh0ToGw==
status PDS_PCH 3 OK
May 16 12:45:10:169 pu: SSL - init: mode=0, cipher=DEFAULT, context=OK
A bad entry would look like this since there is no user, or pass entries we can tell this is in a broken state:
Address: /UIM/HUB1/robot1/vmware Request: get_node_values
resources PDS_PPDS 110
0 PDS_PDS 101
name PDS_PCH 11 10.1.10.27
ID PDS_PCH 11 10.1.10.27
May 16 12:44:59:357 pu: SSL - init: mode=0, cipher=DEFAULT, context=OK
So once you have the full list of all vmware probe instance locations, copy out the results into Notepadd++, then do a remove on all the \t (tabs) and replace with " " (space). Throw this into a batch file.
Create a batch file with the results:
cd D:\Program Files (x86)\Nimsoft\bin
pu -u administrator -p password /UIM/Hub1/prihub1/vmware get_node_values NULL >> vm_check.txt
Then you can go thru the list and check. This beats having to go to each one, 1x1 and opening and verifying. Hope this helps folks.
Found a better method: get_status which returns the last status of each profile setup in a vmware instance. So the returned values are easier to determine if good or bad:
The results looks like this now:
Address: /UIM/Hub1/robot/vmware Request: get_status
loc1vc1 PDS_PCH 7 POLLED
loc2vc1 PDS_PCH 7 POLLED
May 16 13:27:37:087 pu: SSL - init: mode=0, cipher=DEFAULT, context=OK
So it’s easier now to see which ones are broken or empty:
So broken entries have a NOK values:
Address: /UIM/Hub2/robot1/vmware Request: get_status
PROD-VC PDS_PCH 4 NOK
May 16 13:28:49:341 pu: SSL - init: mode=0, cipher=DEFAULT, context=OK
So the new command would be:
pu -u administrator -p password /UIM/Hub1/prihub1/vmware get_status NULL >> vm_check.txt
How and where can we run "get_status" for vmprobles? Could you send a example?
What do you think if we save the results from "get_status" into vmware-log.txt file and use logmon probe to looking for the word "nok" into the vmware-log.txt?
note: I'm about to have CA UIM as a procution monitoring system and I'm a beginner user for UIM.
So the command would be:
pu -u administrator -p password /UIM/HUB1/Robot1/vmware get_status NULL >> vm_check.txt
Yes you could set up a profile in logmon and scan this file output. The above command would have to be ran on a regular interval so the logmon probe would read it.