DX Unified Infrastructure Management

View Only

Back to discussions

Expand all | Collapse all

NAS script to ping inactive robots to check if just robot or robot and server are down

1. NAS script to ping inactive robots to check if just robot or robot and server are down

1 Recommend
Anon Anon
Posted Apr 14, 2014 03:58 PM
| view attached

Reply Reply Privately
Afternoon All,

I came across the following problem at a number of customers recently –

When we get a robot inactive alert ops don’t know how to prioritise the alarm as they don’t know if it’s just the robot or the server which is down.

I wrote the following simple script as a solution, it uses action.ping rather than a callback to net_connect to run a profile, mainly because the customers I’ve dealt with on this don’t want to set a profile in net_connect for each and every robot. Should be pretty straightforward to change it to use a callback to net_connect, I’ll give it a go.

The script runs on robot inactive alarms, gets the ip of the offending robot, pings it and updates the alarm message and severity. The sript assumes that robot inactive alarms have been set to major on the hub, mainly as the customers I spoke to wanted it this way.

We’re assuming network connectivity from the primary hub to the robot too, in multi-site tunnelled environments we’d have to go with a nas on remote hubs, I’ll set up a test environment and make any required additions to the script later.

I don’t know about anybody else but in my opinion this is the sort of functionality we should be incorporating into NMS in future.

I’d welcome any feedback good or bad (I’ll just ignore the bad stuff :-), only joking)

Cheers
Dave

David Higginbotham
CA Technologies
Sr Consultant, Pre-Sales

Attachment(s)

ping inactive robot.txt 1 KB 1 version
2. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

0 Recommend
Christopher Duryea
Posted Aug 28, 2017 11:39 AM

Reply Reply Privately
Works great! displaying the helpful message BUT then moments later all inactive alarms are converted back to the original default message. Haven't figured out what is triggering that to happen. Seems like the script works on the alarm count is = 1, but once the alarm count increases above 1 on the next check is when the script does not fire and reverts to default message. Still playing around with AO settings.
3. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

0 Recommend
Christopher Duryea
Posted Aug 28, 2017 12:30 PM

Reply Reply Privately
Found it, but there is a secondary issue. First make sure that the following is set as greater than or equal in your AO.

Second, after setting the above correctly now the alarm console has gone mad. Basically it appears AO is too slow to respond and does not process before initial alarm display. So when the alarm comes in it is displayed with the default message, then a couple seconds later AO comes through and changes it with the script, on next alarm count increase the message is changed back to default, then seconds later the AO comes through and updates the message with the script, repeat this scenario nonstop.

The AO needs to be processed before alarm is displayed.
4. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

0 Recommend
Broadcom Employee

Gene Howard
Posted Aug 28, 2017 01:27 PM

Reply Reply Privately
Did you try setting this up as a pre-processing script rather than an AO?
5. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

2 Recommend
Alquin
Posted Aug 29, 2017 12:37 PM

Reply Reply Privately
Hi,

We have a similar situation but we took the AO pre-processing route that Gene suggested above. The challenge with that is you lose some of the CA custom extensions to Lua like the action.* functions. The code we utilize is available in GitHub at the link below.

robot-inactive/robot-inactive.lua at master · adgayle/robot-inactive · GitHub

Our pre-processing setup is below. We change the down server alarms to informational but you can set the level to 0 which will throw them away. The code is commented to tell you how to do that.
6. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

0 Recommend
Michael Arnone
Posted Aug 29, 2017 04:59 PM

Reply Reply Privately
Hi Alquin,

Looks like you changed the Hub setting for Robot Alarms (Major by Default) to Critical, since you have the Critical filter checked on the Pre-Processing Rule screen shot.

Your Pre-Processing Rule is leaving the Alarm at Critical if the Ping Test fails to ping the server.
Otherwise if the problem is only communication with the Robot (really Inactive), then the script itself changes the Alarm to Informational [or whatever Severity # you put into the line: event.level = # ].

Looks good , especially since the HUB has no built-in Ping Test. Utilizing the Pre-Processing Rules is a better solution that the Profile Rules, as they process Before the Alarm is published. The trick is to work with limited commands, as only a small subset of the normal LUA methods are supported on Events. The os.execute initiating the PING command was a good work-around.

Thanks,
Mike
7. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

1 Recommend
Michael Arnone
Posted Aug 29, 2017 05:11 PM

Reply Reply Privately
The UIM Developers should consider building this Ping Test solution into the product, so it's part of UIM Out-Of-The-Box, rather than everyone having to reinvent the wheel (or Copy & Paste from here if they manage to find this posting).
I don't think they understand how many people are having this problem with these ambiguous "Robot server_name is inactive" Alarms.
8. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

0 Recommend
GregPolenta
Posted Sep 11, 2017 01:17 PM

Reply Reply Privately
Care needs to be taken with these types of actions. Using Alquin's example we have seen the os.execute
    executing a ping -n xx cause a delay in nas processing. The ping against a device that is down can take
    nearly 10secs to complete (~1 sec when the device responds). This delay holds up the nas probe processing
   and will cause it to fall behind processing its alarm queue if there are multiple robots inactive.

The original script attached to this case calls `alarm.list()` which reads all alarms and tries to match for
    the pattern. If your AO rule is matching off the alarm message then it would be advised to use the alarm.get()
    instead. I am not sure if the `action.ping` used  in the initial script causes the same amount of delay or not.

Greg
9. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

0 Recommend
Alquin
Posted Sep 12, 2017 09:47 AM

Reply Reply Privately
Hi All,

Greg is 100% right on this. nas is single threaded so any task you give it will potentially cause the queue to back if it is not completed in a timely manner. Coupled with the frequency of robot inactive alarms this can take a significant toll on nas. I should have mentioned this for this I apologize.

For our installation we are looking at alternative ways of doing this but it based on the information above it has to be external to nas using a polling method via the API to manipulate the alarms after arrival.

DX Unified Infrastructure Management

NAS script to ping inactive robots to check if just robot or robot and server are down

Anon AnonApr 14, 2014 03:58 PM

Christopher DuryeaAug 28, 2017 11:39 AM

Christopher DuryeaAug 28, 2017 12:30 PM

Gene HowardAug 28, 2017 01:27 PM

AlquinAug 29, 2017 12:37 PM

Michael ArnoneAug 29, 2017 04:59 PM

Michael ArnoneAug 29, 2017 05:11 PM

GregPolentaSep 11, 2017 01:17 PM

AlquinSep 12, 2017 09:47 AM

1. NAS script to ping inactive robots to check if just robot or robot and server are down

2. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

3. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

4. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

5. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

6. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

7. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

8. Re: NAS script to ping inactive robots to check if just robot or robot and server are down

9. Re: NAS script to ping inactive robots to check if just robot or robot and server are down