ESP Workload Automation

 View Only
Expand all | Collapse all

Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

  • 1.  Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Mar 13, 2019 01:23 AM

    Hi,

     

    Can someone please help me to schedule a job which has to be resubmitted 4 times upon failure, and if the failure occurs on 5th time, it should create a remedy ticket (Remedy automation in place).

     

    Based on the alert code, alert should be triggered.



  • 2.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Mar 13, 2019 08:45 AM

    If this is for ESP, then there is a CookBook entry.  Look it over and se if you could just increase the frequency from 2 to 5:

     

    https://docops.ca.com/ca-workload-automation-esp-edition/11-4/en/examples-cookbook/resubmit-failed-job-a-maximum-of-two-times

     

    <JC>



  • 3.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Mar 13, 2019 04:14 PM

    #

    Can we trigger the alert (Notify failure statement) based on the return code of a job?

     

    Actually, We have been asked to do auto restart of failed job ,for a specific return code , with the time delay of 5 mins and 3 attempts. We did it using the Alert statement using %MN variables , but for the incident generation we are using the same 'ALERT' mechanism (Notify failure statement). If the job fails for a specific return code, for the first 2 times , incident should not be generated,but for the third time and also for the other return codes , incident should be created.

     

     Any suggestions are welcome!  Thanks!  #resubmit#autorestart#notify



  • 4.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Mar 14, 2019 07:48 AM

    I did this recently by notifying an event on job completion.  The second event uses monitoring variables to "test" for the return code.  However, in what I did, the job contuse to be resubmitted until the job gets a different return code.  This also makes it easier to handle the time delay.  I am not really sure how you could limit the number of times this occurred using this method.  Perhaps you could use a resource, and have a task that marks the job as failed and 'trips' the ticket generation process once it is depleted?

     

    <JC>



  • 5.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Mar 14, 2019 08:54 AM

    Hi, 

    I have multiple solutions for this...... I am not sure what works best in your case. It has been a while since I looked at this. 

    In the method below, the job fails and initiates the RSUB alert/appl. The job shows failed in CSF but the status field is changed to " Resubmit by automation in 10" . It restarts in 10 minutes. If it fails 4 times, the message on the job changes AND the REMEDY job.is submitted to create the REMEDY ticket.   The >4 line in the code will prevent the job from creating additional tickets if it is restarted manually and fails again. 

     

    APPL DPRSUB05
    IF %MNSUB# > 4 THEN EXIT
    IF %MNSUB# = 4 THEN DO
        ESPNOMSG MGRMSG * . . . %MNFULLNAME/%MNAPPL..%MNAPPLGEN/MAIN +
        STATE FAILED STATUS('RESUB FAILED 3 TIMES, NEED OPERATOR') SETEND
        JOB REMEDY
            RUN ANY
        ENDJOB
        EXIT
    ENDDO
    ESPNOMSG MGRMSG * . . . %MNFULLNAME/%MNAPPL..%MNAPPLGEN/MAIN
    STATE FAILED STATUS('RESUB BY AUTOMATION IN 10 MINS') SETEND
    JOB RESUB.%MNJOB
    RUN ANY
    RELDELAY 10
    ESPNOMSG AJ %MNFULLNAME RESUBMIT APPL(%MNAPPL..%MNAPPLGEN)
    ENDJOB

     

    I also have a more complex case where the job ends successfully the first 3 times and fails only on the 4th time. There are additional pieces in case the job is the last job in the appl or if it releases jobs. These cases make it tricky... . 

     

    Don



  • 6.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Mar 15, 2019 01:47 AM

    Hi Don, Thanks much for the response. Please find my questions below

     

    Question 1:

     

    For generating incident , we are using alert mechanism (Notify failure alert(Remedy) , for the autorestart setup I have created my own alert, Notify failure alert(bad). . If I place the alert (Remedy) inside the BAD Proc , MNvariables are not getting populated.

    CODE:

     

    Procedure:

    Application Autorestart

    Job A

    RUN ANY

    NOTIFY FAILURE ALERT (BAD)

    ENDJOB

     

    Procedure for Bad:

     

    IF (%MNJOB='A' AND %MNCMPC EQ '4001') THEN DO
        REXXON
           TRACE N
           X=TRAPOUT('STS.')
           "ESPNOMSG LTJ %MNJOB I"
           X=TRAPOUT('OFF')
           PARSE VAR STS.3 "COMPLETED," V1 ","
           PARSE VAR STS.4 "COMPLETED," V2 ","
           PARSE VAR STS.5 "COMPLETED," V3 ","
           PARSE VAR STS.6 "COMPLETED," V4 ","
    /*     PARSE VAR STS.6 "COMPLETED," V5 "," */
     IF V1 = ' CC 4001' & V2 = ' CC 4001' & V3 = ' CC 4001' ,
     & V4 = ' CC 4001'  THEN
      DO
      NOTIFY FAILURE ALERT(REMEDY)  ==> (Incident not getting generated due to MN variables not found)
      END
      ELSE DO
     "ESPNOMSG AJ %MNJOB HOLD APPL(%MNAPPL..%MNAPPLGEN)"
     "ESPNOMSG AJ %MNJOB RESUB APPL(%MNAPPL..%MNAPPLGEN)"
     "ESPNOMSG AJ %MNJOB RESET +
       DELAYSUB('REALNOW PLUS 2 MIN') +
       APPL(%MNAPPL..%MNAPPLGEN)"
     "ESPNOMSG AJ %MNJOB RELEASE APPL(%MNAPPL..%MNAPPLGEN)"
      END
    REXXOFF
    EXIT
    ENDDO

     

    If job A fails with the return code 4001, it will try to auto restart 5 times, 5th time incident should be generated . but am getting the error 'Application not found' due to Monitor variables not being populated.

     

    Please advise.

     

    QUESTION 2:

    Also Is there history field available to distinguish between autorestart and manual restart of a job. Basically Reporting mechanism to quantify the number of auto-restart job’s success and failure rate .



  • 7.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Mar 15, 2019 11:30 AM

    Hi, 

    I modified the code some. Try it out. Let me know if there are any issues. 

    There is not a field that shows whether it was auto restarted or manually started. 

     

    APPL DPRSUB05

    /*ECHO 'MNSUBABC' %MNSUB#
    IF %MNSUB# = 4 THEN DO
        ESPNOMSG MGRMSG * . . . %MNFULLNAME/%MNAPPL..%MNAPPLGEN/MAIN +
        STATE FAILED Status('RESUB FAILED TICKET CREATED')
        JOB REMEDY LINK PROCESS
            RUN ANY
        ENDJOB
    ENDDO
    IF %MNSUB# < 4 THEN DO
        ESPNOMSG MGRMSG * . . . %MNFULLNAME/%MNAPPL..%MNAPPLGEN/MAIN +
        STATE FAILED Status('RESUB IN 1 MINUTE')
        JOB RESUB.%MNJOB LINK PROCESS
            RUN ANY
            RELDELAY 1
            ESPNOMSG AJ %MNFULLNAME RESUBMIT APPL(%MNAPPL..%MNAPPLGEN)
        ENDJOB
    ENDDO



  • 8.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Mar 15, 2019 10:58 AM

    Hi, 

    Let me play with this for a little bit. I will probably respond later today.... 



  • 9.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Mar 22, 2019 05:24 PM

    Hi, 

    This is the code I ended up with. 

    IF %MNSUB# < 4 THEN DO
        ESPNOMSG MGRMSG * . . . %MNFULLNAME/%MNAPPL..%MNAPPLGEN/MAIN +
        STATE FAILED Status('RESUB IN 1 MINUTE')
        JOB RESUB.%MNJOB LINK PROCESS
            RUN ANY
            RELDELAY 1
            ESPNOMSG AJ %MNFULLNAME RESUBMIT APPL(%MNAPPL..%MNAPPLGEN)
        ENDJOB
    ENDDO

    IF %MNSUB# = 4 THEN DO
        ESPNOMSG MGRMSG * . . . %MNFULLNAME/%MNAPPL..%MNAPPLGEN/MAIN +
        STATE FAILED Status('RESUB FAILED TICKET CREATED')
        JOB REMEDY LINK PROCESS
            RUN ANY
        ENDJOB
    ENDDO



  • 10.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Mar 22, 2019 05:30 PM

    in our environment ,for the incident creation we are using the alert mechanism

     

    NOTIFY FAILURE ALERT(REMEDY) 



  • 11.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Mar 22, 2019 05:38 PM

    Yes, 

    When the job fails it kicks off the ALERT(REMEDY) that triggers an application to build. 

    The code shown above is at the top of the application. 

    The first 3 times it reruns the job. 

    On the 4th time it submits the job below(shown in above code). This is the job that creates the ticket. This job is just a dummy job. The real one is more complex. 

       

    JOB REMEDY LINK PROCESS 
        RUN ANY 
    ENDJOB 



  • 12.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Apr 01, 2019 05:16 PM

    Hi Don,

     

     For Reporting mechanism to quantify the number of auto-restart job’s success and failure rate .Is there any way to update the job status in CSF based on the automation.? To distinguish between autorestart and manual restart of a job

     

    Suppose if the job A fails for specific return code, automation should try to restart 3 times .If the job completes successfully by automation, then the job status of A should be 'AUTORESTART COMPLETED'  in CSF and if the job fails after 3 attempts , then the job status should be 'AUTORESTART FAILED AFTER 3 ATTEMPTS'



  • 13.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Apr 02, 2019 07:41 AM

    Hi, 

    In the code given above there are two ESPNOMSG commands. This will update CSF status field.  Change the text to what you would like. 

    ESPNOMSG MGRMSG * . . . %MNFULLNAME/%MNAPPL..%MNAPPLGEN/MAIN + 
        STATE FAILED Status('RESUB FAILED TICKET CREATED') 



  • 14.  Re: Resub a job 4 times, if it fails for the fifth time, create a ticket in remedy

    Posted Apr 02, 2019 10:33 AM

    Above the message is for failed status .what about the successful completion ? whether the below code will work for successful completion

     

    ESPNOMSG MGRMSG * . . . %MNFULLNAME/%MNAPPL..%MNAPPLGEN/MAIN + 
        STATE COMPLETED Status('AUTORESTART SUCCESS')