Automic Workload Automation

View Only

Back to discussions

Expand all | Collapse all

Monitoring an agent's status

1. Monitoring an agent's status

0 Recommend
Legacy User
Posted Jun 28, 2016 09:55 AM

Reply Reply Privately
Should you need to know whether one agent or the other is up, you may use the SYS_HOST_ALIVE script command to get its current status.

You can even imagine that a recurring EVNT.TIME will check the status of the agent(s) and restart it or trigger an alert if it has gone down.

> From the PostProcess (!Process) tab of the event, trigger a notification if the agent is down (the "ALARM.AGENT.DOWN" object will need to be created separately) :

:SET &STATUS# = SYS_HOST_ALIVE("SAP01", "CONN.R3.ECC.ABAP")
:IF &STATUS# ='N'
:SET &ACT#= ACTIVATE_UC_OBJECT("ALARM.AGENT.DOWN")
:ENDIF

> restart the agent with MODIFY_SYSTEM

:SET &STATUS# = SYS_HOST_ALIVE("SAP01", "CONN.R3.ECC.ABAP")
:IF &STATUS# ='N'
: SET &START# = MODIFY_SYSTEM ("STARTUP", "Agent")
:ENDIF
Please note this will only work if the Service Manager Record has been filled in correctly in the agent's properties in the System Overview.

https://us.v-cdn.net/5019921/uploads/editor/t0/g9ijnzpxv0x0.png" width="409">

You may also want to check the Agent Restarter if you want to be able to restart several agents at once.
2. Monitoring an agent's status

0 Recommend
laura_albrecht_automic
Posted Jun 29, 2016 03:25 PM

Reply Reply Privately
Just curious - I looked at the Agent Restarter. It seems like that is a manual process. So you run the script, approve which agents to restart, etc. I'm assuming we could modify the process slightly so that it could run every x minutes and just go ahead and start any downed agents.
3. Monitoring an agent's status

0 Recommend
Eric Felker
Posted Jun 29, 2016 04:12 PM

Reply Reply Privately
Laura Albrecht said:
Just curious - I looked at the Agent Restarter. It seems like that is a manual process. So you run the script, approve which agents to restart, etc. I'm assuming we could modify the process slightly so that it could run every x minutes and just go ahead and start any downed agents.

I wrote something like this that runs periodically (I think mine runs every 10 minutes) and restarts any offline agents that are 'marked' for monitoring in the "VARA.AGENT.MONITOR" variable object. This VARA has 1st column holding exact agent name, and second column basically as boolean for whether or not to monitor or omit that agent from the process. "Y" will be monitored, anything else will ignore that agent line in the VARA object.

The trouble I found is that the system is not always able to restart an agent. Depending on the agent's state, I've found they can get hung in such a way that the system's MODIFY_SYSTEM function fails to restart the agent. So it's not entirely reliable.

Here is my script -- not guaranteed to be the best way to do this!

(syntax highlighting is erroneous)
: SET &HND# = PREP_PROCESS_VAR(VARA.AGENT.MONITOR) : PROCESS &HND# : P " " : PSET &AGENT_NAME# = GET_PROCESS_LINE(&HND#,1) : PSET &MONITOR_FLAG# = GET_PROCESS_LINE(&HND#,2) ! Check if restart flag is set in vara object : IF &MONITOR_FLAG# = "Y" ! Check agent's state/health : PSET &AGENT_STATE# = SYS_HOST_ALIVE(&AGENT_NAME#) ! Is agent online? : IF &AGENT_STATE# = "N" : P "Trying to restart &AGENT_NAME#" ! Restart agent : PSET &MOD_SYS_RC# = MODIFY_SYSTEM('STARTUP', &AGENT_NAME#) : P "Return code for restart was: &MOD_SYS_RC#" ! Send notification of this event : PSET &CALL_RC# = ACTIVATE_UC_OBJECT(CALL.MAIL.HTML.AGENT.MONITOR,,,,,PASS_VALUES) : ENDIF : ENDIF : P " " : ENDPROCESS : CLOSE_PROCESS &HND#

More recently I am tending to use service manager dialog via CLI commands, to try a graceful shutdown followed by abnormal kill if needed of any agent that is no longer connected from the platform's stand point of view, and then restart using service manager dialog CLI as well. This seems to be more reliable, but there are a lot of moving parts to get exactly right. So far I've only messed with this for Linux agents.
4. RE: Monitoring an agent's status

0 Recommend
Laura Albrecht
Posted Aug 12, 2019 10:07 AM

Reply Reply Privately
Hi. I'm posting to this kind of old thread because we're seeing a problem with SYS_HOST_ALIVE and just wondering if anyone else has seen / experienced it.

I implemented that process to run every 15 minutes to check the status of all the agents in the system. If SYS_HOST_ALIVE returns N, it then submits a job to try and restart the agent using the ucybsmgr program. This works great 99% of the time. Of course, if the service manager is down, it doesn't work, but otherwise, it does.

However, what we've started seeing recently (or maybe not started recently, but just recently reported) is that sometimes people would be getting alerts for an agent being down, but when they went to go check - the agent was up. So basically, SYS_HOST_ALIVE returned an N, the job to restart the agent failed (because the agent was already up) when in fact the agent was UP.

In an attempt to get around what appears to be an intermittent network issue I implemented a couple of additional checks / waits in my process. So now i:

- Check the status of an agent using SYS_HOST_ALIVE.
- If it returns an N, it waits 15 seconds.
- Check the status of the agent using SYS_HOST_ALIVE.
- If it still returns an N, it submits the job to restart the agent.
- In the PreProcess tab, it first waits 120 seconds before doing anything.
- Check the status of the agent using SYS_HOST_ALIVE.
- If it still returns an N, the command to restart the agent via ucybsmgr will run.
- Check the status of the agent using SYS_HOST_ALIVE.
- If it still returns an N, it will send an email / open up a ticket.

Even doing this, we're still experiencing the issue with the agent being up and SYS_HOST_ALIVE returning an N. It's not the end of the world, but it does cause a little extra work for the support team as they have to go close these tickets that are being erroneously opened and do troubleshooting on an agent that is technically fine.

Any thoughts / ideas? I've reached out to the Linux / Networking team to see if there is high CPU and/or anything else going on, but I'm not 100% clear what or how exactly SYS_HOST_ALIVE is working.

Thanks in advance.

------------------------------
Enterprise Scheduling Lead
Takeda
------------------------------

Original Message
5. Monitoring an agent's status

0 Recommend
Legacy User
Posted Jul 04, 2016 08:00 AM

Reply Reply Privately
LauraAlbrecht608310

Yes that is correct the Agent restarter has to be triggered manually.

You may create an EVNT object that will periodically check agent's statuses and restart them if they should be.

An easy way to get an agent's status is to use the SYS_HOST_ALIVE script command. It will then return either 'Y' or 'N'.

Here's a glimpse of what it could look like (the code should be set in the !Process tab of the EVNT):
:SET &STATUS# = SYS_HOST_ALIVE("MyAgent") :IF &STATUS# = 'N' :SET &ACT# = MODIFY_SYSTEM("STARTUP", "MyAgent") :ENDIF
6. Monitoring an agent's status

0 Recommend
Legacy User
Posted Aug 31, 2016 05:59 AM

Reply Reply Privately
you can also use the UC_HOSTCHAR_DEFAULT with the keys:

EXECUTE_ON_ASSIGNMENT
EXECUTE_ON_END
EXECUTE_ON_LOST
EXECUTE_ON_START

to react, if an Agent will get down.
you can use these keys to start any executable object, for example to start the agent again.
7. Monitoring an agent's status

0 Recommend
VidyaPraveenBatchu605049
Posted Jan 03, 2017 01:15 AM

Reply Reply Privately
Christian Boeck said:
you can also use the UC_HOSTCHAR_DEFAULT with the keys:

EXECUTE_ON_ASSIGNMENT
EXECUTE_ON_END
EXECUTE_ON_LOST
EXECUTE_ON_START

to react, if an Agent will get down.
you can use these keys to start any executable object, for example to start the agent again.
This will be very helpful as we get immediate notifications (create manually) when agent end & starts. I have used these objects in my previous assignments and was accurate.
8. Monitoring an agent's status

0 Recommend
Legacy User
Posted Feb 24, 2017 11:15 AM

Reply Reply Privately
Antoine Sauteron said:
Should you need to know whether one agent or the other is up, you may use the SYS_HOST_ALIVE script command to get its current status.

You can even imagine that a recurring EVNT.TIME will check the status of the agent(s) and restart it or trigger an alert if it has gone down.

> From the PostProcess (!Process) tab of the event, trigger a notification if the agent is down (the "ALARM.AGENT.DOWN" object will need to be created separately) :

:SET &STATUS# = SYS_HOST_ALIVE("SAP01", "CONN.R3.ECC.ABAP")
:IF &STATUS# ='N'
:SET &ACT#= ACTIVATE_UC_OBJECT("ALARM.AGENT.DOWN")
:ENDIF

> restart the agent with MODIFY_SYSTEM

:SET &STATUS# = SYS_HOST_ALIVE("SAP01", "CONN.R3.ECC.ABAP")
:IF &STATUS# ='N'
: SET &START# = MODIFY_SYSTEM ("STARTUP", "Agent")
:ENDIF
Please note this will only work if the Service Manager Record has been filled in correctly in the agent's properties in the System Overview.

You may also want to check the Agent Restarter if you want to be able to restart several agents at once.

Additionally, you can add another script object that will check the status of the agent and send email:

:wait 30
:SET &STATUS# = SYS_HOST_ALIVE("enter agent name here")
:IF &STATUS# ='Y'
: SET &YES# = send_mail("enter email address here",,"Agent Restarted Successfully","Agent restarted")
: ELSE
: SET &NOT# = send_mail("enter email address here",,"Agent NOT restarted Successfully","Agent NOT restarted")
:ENDIF

Automic Workload Automation

Monitoring an agent's status

Legacy UserJun 28, 2016 09:55 AM

laura_albrecht_automicJun 29, 2016 03:25 PM

Eric FelkerJun 29, 2016 04:12 PM

Laura AlbrechtAug 12, 2019 10:07 AM

Legacy UserJul 04, 2016 08:00 AM

Legacy UserAug 31, 2016 05:59 AM

VidyaPraveenBatchu605049Jan 03, 2017 01:15 AM

Legacy UserFeb 24, 2017 11:15 AM

1. Monitoring an agent's status

2. Monitoring an agent's status

3. Monitoring an agent's status

4. RE: Monitoring an agent's status

5. Monitoring an agent's status

6. Monitoring an agent's status

7. Monitoring an agent's status

8. Monitoring an agent's status