AutoSys Workload Automation

 View Only
Expand all | Collapse all

Autosys 4.5 Event Scheduler Heartbeat Missing

  • 1.  Autosys 4.5 Event Scheduler Heartbeat Missing

    Posted Oct 21, 2015 09:18 AM

    Hi.
    We are currently running the unsupported Autosys 4.5.0.G2 due to problems migrating our batch jobs to Autosys 11.3.x.
    Our existing EP's run without issue on Solaris 9 servers with Sybase ASE 12.5 event servers on Solaris 10.
    We are now trying to setup a new instance of Autosys 4.5.0.G2 with the EP's on Solaris 10 and Sybase ASE 15.5 event servers on RHEL 6. 
    The EP's start ok, except for a couple of deadlock errors when trying to access the alamode table, and our jobs are then processed successfully.
    But then after some time we get "heartbeat missing, appears busy" errors in the EP log.
    Trussing of the relevant EP reveals it is polling making repeated pollsys, fstat, getmsg, nanosleep, alarm, fstat & putmsg system calls. 

    If anyone could assist with this issue it would be most appreciated. 



  • 2.  Re: Autosys 4.5 Event Scheduler Heartbeat Missing

    Posted Oct 21, 2015 09:37 AM

    Steve

     

    Not everything uses a heartbeat and unsure what issues you may be having migrating.

    only the adapters really issue heartbeats.. unless its the EP heartbeat and that could be a network issue..

     

    I am available btw - if you want reach me offline - scarrobis@stirlingsystems.com

     

    Good Luck

     

    Steve C.



  • 3.  Re: Autosys 4.5 Event Scheduler Heartbeat Missing

    Posted Oct 21, 2015 12:50 PM

    Thanks Steve.   I have just received more errors in the EP log :

     

    [17:34:09.2802] [1] EP #7 heartbeat missing, appears busy. kill -9 1394 to restart.

    [17:34:09.2809] [1] EP #2 heartbeat missing, appears busy. kill -9 1389 to restart.

     

    As we have never had these errors on Solaris 9 and as the errors on our Solaris 10 servers begin after the EP's have been working correctly for some time,  I suspect it could be a memory leak.  Do you know if Autosys 4.5.0.G2 was supported on Solaris 10?

     

    Regards



  • 4.  Re: Autosys 4.5 Event Scheduler Heartbeat Missing

    Posted Oct 21, 2015 12:55 PM

    upgrade to 4.5.1 if possible.

    There were issues in 4.0 with EP#0 4.5.0 had some issues of its own. it is possible something is killing the eps.

    Not sure if you will be able to get 4.5.1 now or if its prudent best to get on Latest and greatest.

    Hope that helps.



  • 5.  Re: Autosys 4.5 Event Scheduler Heartbeat Missing

    Posted Oct 21, 2015 01:01 PM

    check the event_demon.misc .. it may have messages in there..



  • 6.  Re: Autosys 4.5 Event Scheduler Heartbeat Missing

    Posted Oct 21, 2015 10:54 AM

    You can try increasing the EPHeartbeatInterval.

    We are interested in helping to resolve your migration issues.

    Please consider engaging CA Services, Support or excellent independent consultants like Steve C.



  • 7.  Re: Autosys 4.5 Event Scheduler Heartbeat Missing

    Posted Oct 21, 2015 12:57 PM

    Thanks Mark.

     

    The EPHeartbeatInterval is already set to 20 minutes and suppose I could set this much higher but I think this would just delay me finding out there was a problem because the heartbeat missing errors would appear to be bona fide in that the EP's have stopped logging into the event servers and genuinely need restarting before they start processing jobs again:

     

    [17:34:09.2802] [1] EP #7 heartbeat missing, appears busy. kill -9 1394 to restart.

    [17:34:09.2809] [1] EP #2 heartbeat missing, appears busy. kill -9 1389 to restart.

     

    As I said to Steve C, I suspect it is a problem with Autosys 4.5 running on Solaris 10.

     

    Regards



  • 8.  Re: Autosys 4.5 Event Scheduler Heartbeat Missing

    Posted Oct 21, 2015 01:05 PM

    also check /var/log /var/adm/messages

    whichever is on solaris. its been a wee bit.

    ;-)



  • 9.  Re: Autosys 4.5 Event Scheduler Heartbeat Missing

    Posted Oct 21, 2015 01:16 PM

    Version 4.5.0 G2 is the same as 4.5.1 G2.

    Solaris 10 is (was) supported.

    Sybase 15 however is (was) not supported.



  • 10.  Re: Autosys 4.5 Event Scheduler Heartbeat Missing

    Posted Oct 21, 2015 01:21 PM

    yeah i didnt see that they went to 15.

    yep 12.5 is as good as it should have been .. its possible the EP is timing out of the DB.

    check DBA logs.



  • 11.  Re: Autosys 4.5 Event Scheduler Heartbeat Missing

    Posted Oct 21, 2015 01:26 PM

    Steve & Mark have very valid observations.

    Based on your statement "couple of deadlock errors" & "kill -9 1394 to restart", I would check the database error logs [/usr/u/sybase/logs] and the Solaris logs [/var/log/syslog & /var/adm/messages]

    Are you able to locate the owner of PID 1394 ?

    - Chris