Automic Workload Automation

Expand all | Collapse all

Monitoring an agent's status

  • 1.  Monitoring an agent's status

    Posted 06-28-2016 09:55 AM
    Should you need to know whether one agent or the other is up, you may use the SYS_HOST_ALIVE script command to get its current status.

    You can even imagine that a recurring EVNT.TIME will check the status of the agent(s) and restart it or trigger an alert if it has gone down.


    > From the PostProcess (!Process) tab of the event, trigger a notification if the agent is down (the "ALARM.AGENT.DOWN" object will need to be created separately) :

    :SET &STATUS# = SYS_HOST_ALIVE("SAP01", "CONN.R3.ECC.ABAP")
    :IF &STATUS# ='N'
    :
    SET &ACT#= ACTIVATE_UC_OBJECT("ALARM.AGENT.DOWN")
    :ENDIF

    > restart the agent with MODIFY_SYSTEM

    :SET &STATUS# = SYS_HOST_ALIVE("SAP01", "CONN.R3.ECC.ABAP")
    :IF &STATUS# ='N'
    :  SET &START# = MODIFY_SYSTEM ("STARTUP", "Agent")
    :ENDIF
    Please note this will only work if the Service Manager Record has been filled in correctly in the agent's properties in the System Overview.

    g9ijnzpxv0x0.pnghttps://us.v-cdn.net/5019921/uploads/editor/t0/g9ijnzpxv0x0.png" width="409">

    You may also want to check the Agent Restarter if you want to be able to restart several agents at once.


  • 2.  Monitoring an agent's status

    Posted 06-29-2016 03:25 PM
    Just curious - I looked at the Agent Restarter.  It seems like that is a manual process.  So you run the script, approve which agents to restart, etc.  I'm assuming we could modify the process slightly so that it could run every x minutes and just go ahead and start any downed agents.


  • 3.  Monitoring an agent's status

    Posted 06-29-2016 04:12 PM
    Just curious - I looked at the Agent Restarter.  It seems like that is a manual process.  So you run the script, approve which agents to restart, etc.  I'm assuming we could modify the process slightly so that it could run every x minutes and just go ahead and start any downed agents.

    I wrote something like this that runs periodically (I think mine runs every 10 minutes) and restarts any offline agents that are 'marked' for monitoring in the "VARA.AGENT.MONITOR" variable object. This VARA has 1st column holding exact agent name, and second column basically as boolean for whether or not to monitor or omit that agent from the process. "Y" will be monitored, anything else will ignore that agent line in the VARA object.

    The trouble I found is that the system is not always able to restart an agent. Depending on the agent's state, I've found they can get hung in such a way that the system's MODIFY_SYSTEM function fails to restart the agent. So it's not entirely reliable.

    Here is my script -- not guaranteed to be the best way to do this!

    (syntax highlighting is erroneous)
    : SET &HND# = PREP_PROCESS_VAR(VARA.AGENT.MONITOR) : PROCESS &HND# : P " " :   PSET &AGENT_NAME# = GET_PROCESS_LINE(&HND#,1) :   PSET &MONITOR_FLAG# = GET_PROCESS_LINE(&HND#,2) !   Check if restart flag is set in vara object :   IF &MONITOR_FLAG# = "Y" !     Check agent's state/health :     PSET &AGENT_STATE# = SYS_HOST_ALIVE(&AGENT_NAME#) !     Is agent online? :     IF &AGENT_STATE# = "N" :       P "Trying to restart &AGENT_NAME#" !       Restart agent :       PSET &MOD_SYS_RC# = MODIFY_SYSTEM('STARTUP', &AGENT_NAME#) :       P "Return code for restart was: &MOD_SYS_RC#" !       Send notification of this event :       PSET &CALL_RC# = ACTIVATE_UC_OBJECT(CALL.MAIL.HTML.AGENT.MONITOR,,,,,PASS_VALUES) :     ENDIF :   ENDIF : P " " : ENDPROCESS : CLOSE_PROCESS &HND#

    More recently I am tending to use service manager dialog via CLI commands, to try a graceful shutdown followed by abnormal kill if needed of any agent that is no longer connected from the platform's stand point of view, and then restart using service manager dialog CLI as well. This seems to be more reliable, but there are a lot of moving parts to get exactly right. So far I've only messed with this for Linux agents.



  • 4.  RE: Monitoring an agent's status

    Posted 08-12-2019 10:07 AM
    Hi.  I'm posting to this kind of old thread because we're seeing a problem with SYS_HOST_ALIVE and just wondering if anyone else has seen / experienced it.

    I implemented that process to run every 15 minutes to check the status of all the agents in the system.  If SYS_HOST_ALIVE returns N, it then submits a job to try and restart the agent using the ucybsmgr program.  This works great 99% of the time.  Of course, if the service manager is down, it doesn't work, but otherwise, it does.

    However, what we've started seeing recently (or maybe not started recently, but just recently reported) is that sometimes people would be getting alerts for an agent being down, but when they went to go check - the agent was up.  So basically, SYS_HOST_ALIVE returned an N, the job to restart the agent failed (because the agent was already up) when in fact the agent was UP.

    In an attempt to get around what appears to be an intermittent network issue I implemented a couple of additional checks / waits in my process.  So now i:

    - Check the status of an agent using SYS_HOST_ALIVE.
    - If it returns an N, it waits 15 seconds.
    - Check the status of the agent using SYS_HOST_ALIVE.
    - If it still returns an N, it submits the job to restart the agent.
    - In the PreProcess tab, it first waits 120 seconds before doing anything.
    - Check the status of the agent using SYS_HOST_ALIVE.
    - If it still returns an N, the command to restart the agent via ucybsmgr will run.
    - Check the status of the agent using SYS_HOST_ALIVE.
    - If it still returns an N, it will send an email / open up a ticket.

    Even doing this, we're still experiencing the issue with the agent being up and SYS_HOST_ALIVE returning an N.  It's not the end of the world, but it does cause a little extra work for the support team as they have to go close these tickets that are being erroneously opened and do troubleshooting on an agent that is technically fine.

    Any thoughts / ideas?  I've reached out to the Linux / Networking team to see if there is high CPU and/or anything else going on, but I'm not 100% clear what or how exactly SYS_HOST_ALIVE is working.  

    Thanks in advance.


    ------------------------------
    Enterprise Scheduling Lead
    Takeda
    ------------------------------



  • 5.  Monitoring an agent's status

    Posted 07-04-2016 08:00 AM
    LauraAlbrecht608310

    Yes that is correct the Agent restarter has to be triggered manually.

    You may create an EVNT object that will periodically check agent's statuses and restart them if they should be.

    An easy way to get an agent's status is to use the SYS_HOST_ALIVE script command. It will then return either 'Y' or 'N'.

    Here's a glimpse of what it could look like (the code should be set in the !Process tab of the EVNT):
    :SET &STATUS# = SYS_HOST_ALIVE("MyAgent")
    :IF &STATUS# = 'N'
    :SET &ACT# = MODIFY_SYSTEM("STARTUP", "MyAgent")
    :ENDIF




  • 6.  Monitoring an agent's status

    Posted 08-31-2016 05:59 AM
    you can also use the UC_HOSTCHAR_DEFAULT with the keys:

    EXECUTE_ON_ASSIGNMENT
    EXECUTE_ON_END
    EXECUTE_ON_LOST
    EXECUTE_ON_START

    to react, if an Agent will get down.
    you can use these keys to start any executable object, for example to start the agent again.


  • 7.  Monitoring an agent's status

    Posted 01-03-2017 01:15 AM
    Christian Boeck said:
    you can also use the UC_HOSTCHAR_DEFAULT with the keys:

    EXECUTE_ON_ASSIGNMENT
    EXECUTE_ON_END
    EXECUTE_ON_LOST
    EXECUTE_ON_START

    to react, if an Agent will get down.
    you can use these keys to start any executable object, for example to start the agent again.
    This will be very helpful as we get immediate notifications (create manually) when agent end & starts. I have used these objects in my previous assignments and was accurate.


  • 8.  Monitoring an agent's status

    Posted 02-24-2017 11:15 AM
    Should you need to know whether one agent or the other is up, you may use the SYS_HOST_ALIVE script command to get its current status.

    You can even imagine that a recurring EVNT.TIME will check the status of the agent(s) and restart it or trigger an alert if it has gone down.


    > From the PostProcess (!Process) tab of the event, trigger a notification if the agent is down (the "ALARM.AGENT.DOWN" object will need to be created separately) :

    :SET &STATUS# = SYS_HOST_ALIVE("SAP01", "CONN.R3.ECC.ABAP")
    :IF &STATUS# ='N'
    :
    SET &ACT#= ACTIVATE_UC_OBJECT("ALARM.AGENT.DOWN")
    :ENDIF

    > restart the agent with MODIFY_SYSTEM

    :SET &STATUS# = SYS_HOST_ALIVE("SAP01", "CONN.R3.ECC.ABAP")
    :IF &STATUS# ='N'
    :  SET &START# = MODIFY_SYSTEM ("STARTUP", "Agent")
    :ENDIF
    Please note this will only work if the Service Manager Record has been filled in correctly in the agent's properties in the System Overview.

    g9ijnzpxv0x0.png

    You may also want to check the Agent Restarter if you want to be able to restart several agents at once.


    Additionally, you can add another script object that will check the status of the agent and send email:


    :wait 30

    :SET &STATUS# = SYS_HOST_ALIVE("enter agent name here")

    :IF &STATUS# ='Y'

    : SET &YES# = send_mail("enter email address here",,"Agent Restarted Successfully","Agent restarted")

    : ELSE

    : SET &NOT# = send_mail("enter email address here",,"Agent NOT restarted Successfully","Agent NOT restarted")

    :ENDIF