AutoSys Workload Automation

 View Only

 Filewatcher and Autosys agent behind VIP

marcovan's profile image
marcovan posted Mar 19, 2025 02:10 PM

Question for anyone that has similar requirements or has thoughts on this.  Testing the following set up:

We have two client machines -   test1 and test2 that are under Cluster VIP TEST.  Both client machines share SPOOL. The Cluster VIP is defined as a machine in Autosys as TESTVIP, and jil for test jobs have machine: TESTVIP.   The non file watcher jobs will run in either test1 and test2 with no issue. 

The problem is with Filewatcher type jobs. during testing the filewatcher will intermittently will remain in RU status when a kill job is issued and is never terminated.   This one is part of a cyclic box and multiple runs of the filewatcher terminates successfully but the last one kept running past the run window of the box, so the box remains in RU status. 

03/14/2025 19:17:17]      CAUAJM_I_40245 EVENT: CHK_TERM_RUNTIME  JOB: MVTEST_ABCD_f
[03/14/2025 19:17:17]      CAUAJM_I_10082 [TESVIP connected for KILLJOB MVTEST_ABCD_f 392.2210347.1]
[03/14/2025 19:17:18]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: TERMINATED      JOB: MVTEST_ABCD_f MACHINE: TESTVIP
[03/14/2025 19:17:18]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JOBTERMINATED    JOB: MVTEST_ABCD_f MACHINE: TESTVIP EXITCODE:  9

 

Example of same file watcher that did not terminate from log:

 

[03/14/2025 20:17:19]      CAUAJM_I_40245 EVENT: CHK_TERM_RUNTIME  JOB: MVTEST_ABCD_f
[03/14/2025 20:17:19]      CAUAJM_I_10082 TESTVIP connected for KILLJOB MVTEST_ABCD_f 392.2210597

 

As you can see the KILLJOB connected but the status was never changed to terminated. This impacted a total of about 10 different test file watchers and their respective boxes, which caused delays in subsequent jobs not to trigger because box was still in RU status.   

From what it looks like, autosys started the fw job in test2 client machine but when the term time was reached, it issued killjob in test1 client machine and so did not locate the job information.  

Opened a case with Broacomm but their suggestion was to change fw to single machine instead of cluster vip machine name, but I already knew that answer.

Was just wondering if anyone has run into something like this and has a possible solution to keep the cluster vip machine name for filewatchers.