AutoSys Workload Automation

 View Only
  • 1.  Event management in Autosys

    Posted Sep 08, 2020 10:25 AM
    Hi!
    I have 2 instances of Autosys under Linux (dev and prod). There is a problem with event management on one of them. Event management worked on both instances earlier, now only one is working, but I did not perform any changes.
    For example, I need to send SMS or mail, if a job fails. So I use cautil –f filename command to add next construction into internal postgres database.
    define msgrec
           msgid="CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE JOB: NDST-?M* MACHINE: * EXITCODE:  *"
           type="MSG"
           cont='N'
           msgact='Y'
           wcsingle='?'
           wcmany='*'
           case="y"
           regexp="n"
    define msgact
           name=(*,100)
           action="UNIXSH"
           attrib="DEFAULT"
           color="DEFAULT"
           evaluate='Y'
           quiet='N'
           runid="autosys"
           status="ACTIVE"
           sim='N'
           text="sh /apps/CA/scripts/sms_autosys/sms_send.sh sou sou  ORO_Phone.lst ""NDST-SCN2_FAILURE: &7 SRV:&9 RC=&11"""
    There are about 20 different constructions and each of them had worked before.
    As I see with unifstat , all services are in running state. Internal postgres database is working, data inside is correct. Such processes as caiopr, oprsafd, event_demon, caiccid and other are also work.
    I have no more ideas. Please help me to solve this problem.


  • 2.  RE: Event management in Autosys

    Posted Sep 09, 2020 03:46 AM
    Hi

    You can have a look at Event Management log file:
    - load environment
    .  /etc/profile.CA

    - extract today's log:
    cautil select conlog list conlog  > today.txt

    Do you see any CAUAJM_I_40245  message or error message in today.txt ?

    An other explanation could be that Event Management was not available few minutes while event_demon process was running and the Scheduler has stopped to send messages to Event Management. So a stop/start of Scheduler can solve this.


  • 3.  RE: Event management in Autosys

    Posted Sep 10, 2020 10:42 AM
    Thanks for the answer.

    There were no messages in today's log. I restarted all services and then event management started to work.
    But few hours later WAAE Scheduler and CA-Event Manager fell down without any reason.
    autosyslog -e
    CAUAJM_W_00144 The CA WAAE shadow scheduler appears to have failed over prematurely. Consider a higher value for the HAPollInterval setting. Proceeding with shutdown...
    oplog
    p-autosys-app10.bnppua.net.intraroot.CAOP_I_DAEMONTERM Daemon terminated by signal 63583,caioprCAOPLinux-x
    I restarted services again and now they seems to be working. I'd like to understand what reason could lead to event management crash or scheduler and event manager shutdown. As I know, there were no any network or hardware problems


  • 4.  RE: Event management in Autosys

    Posted Sep 14, 2020 04:21 AM
    Hi

    On the primary Scheduler, what do you see in event_demon log file, at the same time you see "CAUAJM_W_00144 The CA WAAE shadow scheduler appears to have failed over prematurely" on the Shadow ?

    Does caiopr stop/crash before or after Scheduler ?




  • 5.  RE: Event management in Autosys

    Posted Sep 14, 2020 12:05 PM
    Just drop EM and go to SNMP traps. 

    just my 3 cents 
    Steve C.


  • 6.  RE: Event management in Autosys

    Posted Sep 15, 2020 07:57 AM
    I've found, that event management always fails at 03:30am, when DBMaint script starts. And the situation with scheduler+em crash was exclusive, now there is no failed processes in unifstat output. I also see running "caiopr" in ps output
    event demon log:
    [09/15/2020 03:30:01] CAUAJM_I_40264 **** Internal Database Maintenance <Date:09/15/2020 03:30:01> ****
    [09/15/2020 03:30:01] CAUAJM_I_20218 Executing Command: exec $AUTOSYS/bin/DBMaint, DBMaint 0.

    [09/15/2020 03:30:55] CAUAJM_I_80021 The agent inventory service has evaluated the statuses of 21 machine(s) in 0.102 seconds.
    [09/15/2020 03:31:00] ----------------------------------------
    [09/15/2020 03:31:00] CAUAJM_I_40266 DBMaint is ended with: normal termination, exit status = 0
    dbmaint log
    DBMaint: Starting at: 09/15/20 03:30
    ------------------------------------------------------------------------
    CAUAJM_I_85000 Compliance Application running
    CAUAJM_I_85001 Compliance Application terminated successfully [0]
    CAUAJM_I_60007 Archiving Events...
    CAUAJM_I_60013 Number of Events Deleted: 37
    CAUAJM_I_60014 Archiving Job Runs...
    CAUAJM_I_60017 Number of job runs deleted: 9
    CAUAJM_I_60436 Archiving Machines...
    CAUAJM_I_60437 Number of Machines Deleted: 0
    CAUAJM_I_60018 Archiving Audit Info...
    CAUAJM_I_60021 Number of Audit Infos Deleted: 0
    CAUAJM_I_10186 No jobs to archive
    ------------------------------------------------------------------------
    09/15/20 03:30
    CAUAJM_I_60019 dbstatistics: Running dbstatistics at: Tue Sep 15 03:30:13 2020
    CAUAJM_I_60020 dbspace: Running dbspace at : Tue Sep 15 03:30:44 2020
    CAUAJM_I_60031 The AutoSys tables have used 19.0 MB disk space.
    ------------------------------------------------------------------------
    DBMaint: Finished at: 09/15/20 03:31

    Autosys version 11.3.6.8

    I've opened a case in tech support, but mabye anyone had the same problem


  • 7.  RE: Event management in Autosys

    Posted Sep 30, 2020 09:57 AM
    Hi,

    You say, that the Event Management is crashing. Does that mean your event manager and AE sched server are same?

    Do you get any message on Event Console after this crash?

    Normally we keep Event Manager server separate from AE Server and just use the Event Agent on AE Server. 

    Regards
    Pothen





  • 8.  RE: Event management in Autosys

    Posted Sep 30, 2020 10:09 AM
    Edited by Pothen Verghese Sep 30, 2020 10:23 AM
    Also do you have duplicate Message IDs under your message record? This could also cause problems. Normally we try to keep it unique.

    Perhaps you can share the full output of MsgRec.

    Rgds