AutoSys Workload Automation

 View Only
  • 1.  Autosys HA problem

    Posted Sep 08, 2020 03:43 AM
    version: 11.3.6.8

    We have Autosys Application Servers running in HA mode across 2 different sites, A and B and server with agent executing jobs in site C.

    If the Server in site A goes down (server turned off), Server in site B takes over as expected and jobs executing in site C return expected results.

    If the network in site A goes down (completely unavailable due to a site outage) server in site B is able to contact and send jobs to agent in site C, but does not get any response - effectively the job hangs.

    Can anyone think of why this could be happening?

    ------------------------------
    steve
    ------------------------------


  • 2.  RE: Autosys HA problem

    Posted Sep 10, 2020 11:33 AM
    You will need to review the AutoSys agent logs on one of the servers in C.

    Are the job going to Starting state, then staying there?  Or do they go to Running and then not respond when done?

    Regards,
    JoeP


  • 3.  RE: Autosys HA problem

    Posted Sep 17, 2020 06:59 AM
    Hi Joe, the jobs it's the second point you mention: "They go to Running and then not respond when done"  - and we have confirmed that the jobs do complete at the agent end. The network team is currently checking routing, but on your point, have you seen this behavior before?

    ------------------------------
    steve
    ------------------------------



  • 4.  RE: Autosys HA problem

    Posted Sep 17, 2020 07:05 AM

    It is either Firewall.. not being true bi-directional, as opposed to stately/stateful. Or the agent needs to be restarted or it is talking to the wrong scheduler. (check agentparm.txt)

     

     

     

     

    Steve C.

     



    Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.

    Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.





  • 5.  RE: Autosys HA problem

    Posted Sep 14, 2020 12:08 PM
    autoping -m ALL 
    will trigger machines to know which EP is now in charge. 
    the issue is you maybe be working with load balancers? 
    also be wary of agents where the agentparm maybe read only. 
    Good Luck 

    Steve C.


  • 6.  RE: Autosys HA problem

    Posted Sep 17, 2020 06:53 AM
    Thanks Steve, we use load balancing for the website, but not for the connection out to agents, but our network team is currently checking routing from the agent based machines.

    ------------------------------
    steve
    ------------------------------



  • 7.  RE: Autosys HA problem

    Broadcom Employee
    Posted Oct 06, 2020 04:22 PM
    Hi Steve,

    When a scheduler starts it will send a special message to the agent that all messages for it's instance should be sent to it.  It will update the agentparm.txt file if it is new or updated information.  That is how a scheduler failover starts getting status.  If The primary is being started back up, it should start sending those messages again.
    I would check the agentparm.txt file for those settings to be sure they are correct and as Steve mentioned the transmitter.log to see if there are any errors trying to send messages back.

    Regards,
    Mike