DX Infrastructure Management

 View Only

CA on CA Tech Tip - Using the UIM API to close robot inactive alarms for down systems 

Nov 20, 2017 11:17 AM

Hi Community,

When CA Unified Infrastructure Management robots go offline an alarm is raised so we can fix it and restore monitoring. However, this is really only appropriate when the server hosting the robot is up and functional. To avoid burdening the nas probe with checking whether the robot host is down we utilize the REST API to gather the alarms, check if the system is down, remove the robot from hub and close the alarm. The code to do this is shared as a part of our library of UIM REST API calls which require that you have the webservices_rest package installed and configured. To do this we will utilize the UIM API calls to retrieve a filtered list of alarms, making a callback to a probe and finally a call to close the alarm

 

To do this created the Python script ackrobotinactive.py. Let us take a quick walk through it so you can understand it and see if it can help you as well.

# Setup ping TTL and retry
ttl = 300
retry = 2

# Setup alarm filter
alarm_filter = {}
alarm_filter['level'] = 5
alarm_filter['probe'] = 'hub'
alarm_filter['subsystem_id'] = '1.2.2'

Above we are setting up our ping settings so we can set the retries and how long to wait before declaring the target host down. Then we setup our alarm filter. This is essentially saying we are looking for critical (level 5) alarms from the hub with subsystem if 1.2.2 which is how robot inactive alarms are sent by default. You can refine the filter by looking at the documentation as needed.

 # Init the dict with UIM REST API information
uim_ws = {}
uim_ws['user'] = 'uim_web_service_user'
uim_ws['password'] = 'uim_web_service_user_password'
uim_ws['url'] = 'http://ump.ca.com/rest'
uim_ws['domain'] = 'uim_domain'

Above we are specifying access to our UIM REST API as in our previous example. Now to the fun part. First we have to get the open robot inactive alarms

 alarms = get_alarms(uim_ws, alarm_filter)
for alarm in alarms:
   logging.info('%s --> %s', alarm['source'], alarm['message'])

As you can see we are making a call to get_alarms in the cauimws library which returns a list of the open alarms and we are logging the source and alarm message. Now that we have the list of robot inactive alarms let us ping each of the alarm source and if they are down take action

# Check to see if alarm source is online
if is_reachable(alarm['source'], retry, ttl):
   logging.info('Device %s is online. Leaving alarm open', alarm['source'])
else:
   logging.warning(
      'Robot %s is offline. Removing from hub and acknowledging alarm',
      alarm['robot'][0]
   )
   # Get a list of all the UIM hubs
   hubs = get_hubs(uim_ws)

   # Search the hubs by name to find the robot hosting our hub
   hub_robot = find_hub_robot(hubs, alarm['hub'][0])

   # Remove the offline robot from the hub so it stops checking it
   remove_robot(uim_ws, alarm['hub'][0], hub_robot, alarm['robot'][0])

   # Close the robot inactive alarm
  # --> Robot will join hub when it comes back online
   acknowledge_alarm(uim_ws, alarm['id'])

Above we are simply asking if the source of the robot inactive alarm is reachable. If it is then most likely we need to go fix the robot. If it is not then we need to find the hub which owns that robot. Once we find it we make a probe callback to the hub to remove the robot (hence stopping the robot inactive alarms from coming). Once this is done we can safely acknowledge the alarm and move on the next alarm. To get the most out of this you will have to schedule it using your favorite job scheduler be it CA Workload Automation - Autosys Edition, CA Automic, Windows Task Scheduler or for simple cases like this even the logmon probe would work (coming in another CA on CA Tech Tip).

 

Both the Python library cauimws.py and the code above (ackrobotinactive.py) are available in GitHub. For the true programmers out there please free to make improvements. I have tried to make things as simple and as readable as I can (for me mostly) so no advanced Python features are in use.

 

Hoping you find it useful

 

Disclaimer: I work for CA Technologies in the IT department in the Tools and Automation Group. However, neither can I share any insight into product futures (typically I learn of product changes when you do) nor can make the product management or development team change the product. I also recommend consulting with CA support and validating all changes in test environments.

Statistics
0 Favorited
4 Views
0 Files
0 Shares
0 Downloads

Tags and Keywords

Related Entries and Links

No Related Resource entered.