ESP Workload Automation

View Only

Back to discussions

Expand all | Collapse all

ESP: Job runs and abends in the same generation with the same job number

Jump to Best Answer

1. ESP: Job runs and abends in the same generation with the same job number

0 Recommend
Jonathan Calloway
Posted Nov 09, 2017 05:36 PM

Reply Reply Privately
Hello!

We have had an issue where a job seemingly ran twice in the same generation of an application. It failed, but also completed. Looking at the history, both instances of the job were given the same job number. I thought I would see if anyone in communities has seen a similar issue before I open a ticket. Below I have included information from the job history as well as entries for the job from the log:

JOB ZB395W01, MODEL MODEL1, OWNER NONE, 30 JOBS INDEXED, 30 MAX
JOB08792 ON RDR AT 01.02.30 THU 9NOV2017, COMPLETED, CC 0, 14.2 MINS_EXEC, 0:00 CPU
JOB08792 ON RDR AT 01.02.30 THU 9NOV2017, COMPLETED, CC 222, CCFAIL, 0.0 MINS_EXEC, 0:00 CPU
JOB08380 ON RDR AT 01.00.01 WED 8NOV2017, COMPLETED, CC 0, 20.5 MINS_EXEC, 0:00 CPU
JOB03176 ON RDR AT 01.00.00 TUE 7NOV2017, COMPLETED, CC 0, 13.3 MINS_EXEC, 0:00 CPU
JOB06944 ON RDR AT 01.00.01 MON 6NOV2017, COMPLETED, CC 0, 18.2 MINS_EXEC, 0:00 CPU
JOB03324 ON RDR AT 05.16.23 SUN 5NOV2017, COMPLETED, CC 0, 11.7 MINS_EXEC, 0:00 CPU
JOB04720 ON RDR AT 01.00.01 SAT 4NOV2017, COMPLETED, CC 0, 12.6 MINS_EXEC, 0:00 CPU

01.00.00 THU 9NOV17 ESP6260I ResoMGR: Job ZB395W01 in Appl FACETS.1109 Resources Satisfied
01.00.00 THU 9NOV17 ESP6633I APPLMGR: APPL FACETS.1109 JOB ZB395W01 READIED
01.00.00 THU 9NOV17 ESP6108I EVENTEX: ESPPROD.FACETS scheduled, for Job ZB395W01 in Appl FACETS.1109, Unconditional
01.00.30 THU 9NOV17 ESP6220I MgrMsg: . ZB395W01/FACETS.1109/MAIN State SUBDELAY Status(Transmission error)
01.01.58 THU 9NOV17 ESP6220I MgrMsg: bcapp602 ZB395W01/FACETS.1109/MAIN State EXEC SetStart Status(Executing at bcapp602) Jobno(2556)
01.01.58 THU 9NOV17 ESP6220I          JobLogId(C91B6FB7227BF5D0E16C0F4E5E43F92908744AF2) User(bcapp602) Host(bcapp602)
01.02.30 THU 9NOV17 ESP6220I MgrMsg: . ZB395W01/FACETS.1109/MAIN State READY Status(Agent Notified)
01.02.30 THU 9NOV17 ESP6220I MgrMsg: bcapp602 ZB395W01/FACETS.1109/MAIN State EXEC SetStart Status(Executing at bcapp602) Jobno(8792)
01.02.30 THU 9NOV17 ESP6220I          JobLogId(7978C65C04A2F7E5ED32340903C06BCE2E42AE32) User(bcapp602) Host(bcapp602)
01.02.30 THU 9NOV17 ESP6641I APPLMGR: APPL FACETS.1109 JOB ZB395W01 Failed
01.02.31 THU 9NOV17 ESP6015I AgentMR: 20171109 01023116+0500 PROD_MANAGER1 bcapp602 D3696D57.2D5BBEC7/A/QUERY RESPONSE SPOOLFILE D
01.02.31 THU 9NOV17 ESP6015I          ZB395W01/FACETS.1109/MAINStart date Thu Nov 09 01:01:58 2017------
01.04.43 THU 9NOV17 ESP6612I APPLMGR: APPL FACETS.1109 JOB ZB395W01 USER STATUS UPDATED BY T72363B
01.16.40 THU 9NOV17 ESP6220I MgrMsg: bcapp602 ZB395W01/FACETS.1109/MAIN State COMPLETE Cmpc(0) SetEnd
01.16.40 THU 9NOV17 ESP6220I          JobLogId(C91B6FB7227BF5D0E16C0F4E5E43F92908744AF2) User(bcapp602) Host(bcapp602)
01.16.40 THU 9NOV17 ESP6640I APPLMGR: APPL FACETS.1109 JOB ZB395W01 Now Complete
2. Re: ESP: Job runs and abends in the same generation with the same job number

0 Recommend
powdo03
Posted Nov 09, 2017 05:50 PM

Reply Reply Privately
Hi Jonathan,
In one shop I worked there was a requeue job that ran on the system. If there was contention when the job started(for example it needed a dataset that was in use) there was a process behind the scenes to S222 the job and put it back on the input queue on hold with the same job number. This prevented these jobs that couldn't run from holding initiators so other jobs could run. When the dataset became available the job was taken off of hold.

It could be something like this
2¢
Don
3. Re: ESP: Job runs and abends in the same generation with the same job number

0 Recommend
Jonathan Calloway
Posted Nov 09, 2017 05:54 PM

Reply Reply Privately
Don,

This is a distributed job, and doesn't use or require a dataset. Does that matter?
4. Re: ESP: Job runs and abends in the same generation with the same job number

0 Recommend
powdo03
Posted Nov 09, 2017 05:57 PM

Reply Reply Privately
This is not the issue I described. It was not a S222 it was a CC 222
5. Re: ESP: Job runs and abends in the same generation with the same job number

0 Recommend
Jonathan Calloway
Posted Nov 09, 2017 06:02 PM

Reply Reply Privately
The CC 222 is the result of output sent to STDOUT from the script. The output comes from a internally written scripting framework that tries to delete an existing log file on the server (in this case BCAPP602) before writing a new one. If the log was being used by a process, it cannot be deleted, so the script ends (and fails) with a an error code of 222.

In this case, what happened was the job started (first instance) then tried to run a second time, abending with a 222. The developer complained that they got a ticket (for the 222), but that the job actually completed, which it did.
6. Re: ESP: Job runs and abends in the same generation with the same job number

0 Recommend
Jonathan Calloway
Posted Nov 09, 2017 06:19 PM

Reply Reply Privately
I just found another instance of this that occurred on the same server at the same time last night. What's strange is that there is a "transmitter busy" message on both at the same time (01:30). I wonder if this is / was an agent issue?

JOB OJ400W01, MODEL MODEL1, OWNER NONE, 30 JOBS INDEXED, 30 MAX
JOB02472 ON RDR AT 01.02.30 THU 9NOV2017, COMPLETED, CC 0, 2.8 MINS_EXEC, 0:00 CPU
JOB02472 ON RDR AT 01.02.30 THU 9NOV2017, COMPLETED, CC 222, CCFAIL, 0.0 MINS_EXEC, 0:00 CPU

01.00.00 THU 9NOV17 ESP6633I APPLMGR: APPL OJ003DLY.2619 JOB OJ400W01 READIED
01.00.00 THU 9NOV17 ESP6108I EVENTEX: ESPPROD.OJJOBS scheduled, for Job OJ400W01 in Appl OJ003DLY.2619, Unconditional
01.00.30 THU 9NOV17 ESP6220I MgrMsg: . OJ400W01/OJ003DLY.2619/MAIN State SUBDELAY Status(Transmission error)
01.01.58 THU 9NOV17 ESP6220I MgrMsg: bcapp602 OJ400W01/OJ003DLY.2619/MAIN State EXEC SetStart Status(Executing at bcapp602) Jobno(3880)
01.02.30 THU 9NOV17 ESP6220I MgrMsg: . OJ400W01/OJ003DLY.2619/MAIN State READY Status(Agent Notified)
01.02.30 THU 9NOV17 ESP6220I MgrMsg: bcapp602 OJ400W01/OJ003DLY.2619/MAIN State EXEC SetStart Status(Executing at bcapp602) Jobno(2472)
01.02.30 THU 9NOV17 ESP6220I          JobLogId(DA15BE771D9A7AEB86A8517F19E94686BD5075E6) User(bcapp602) Host(bcapp602)
01.02.30 THU 9NOV17 ESP6220I MgrMsg: bcapp602 OJ400W01/OJ003DLY.2619/MAIN State FAILED Cmpc(222) SetEnd
01.02.30 THU 9NOV17 ESP6641I APPLMGR: APPL OJ003DLY.2619 JOB OJ400W01 Failed
01.04.55 THU 9NOV17 ESP6612I APPLMGR: APPL OJ003DLY.2619 JOB OJ400W01 USER STATUS UPDATED BY T72363B
01.05.19 THU 9NOV17 ESP6220I MgrMsg: bcapp602 OJ400W01/OJ003DLY.2619/MAIN State COMPLETE Cmpc(0) SetEnd
01.05.19 THU 9NOV17 ESP6220I MgrMsg: bcapp602 OJ400W01/OJ003DLY.2619/MAIN State COMPLETE Cmpc(0) SetEnd
01.05.19 THU 9NOV17 ESP6220I          JobLogId(A530278EF46C38D59A00A633152FCFCF42EC1424) User(bcapp602) Host(bcapp602)
01.05.19 THU 9NOV17 ESP6640I APPLMGR: APPL OJ003DLY.2619 JOB OJ400W01 Now Complete
01.05.19 THU 9NOV17 ESP6610I APPLMGR: APPL OJ003DLY.2619 JOB OJ400W01 FORCED COMPLETE BY T72363B
7. Re: ESP: Job runs and abends in the same generation with the same job number
Best Answer

1 Recommend
Broadcom Employee

Lucy Zhang
Posted Nov 10, 2017 08:34 AM

Reply Reply Privately
Hi Jonathan,

I believe here is what happened:
- ESP sent a request to the agent, and agent didn't reply as needed within 30 seconds, so the related job was marked with Transmission error or transmitter busy;
- Then agent replied after 1.58 min (very close to 2 mins) that now the job was executing;
- ESP however tried to resend it after two mins.

This will cause two runs on agent server, the initial request and the resent one. But since both are pointed to the same job, the last update will take place.

You can check related agent receiver.log and transmitter.log, where the timestamps are more accurate.

The solution that resolved similar problems is to increase RETRYINTERVAL on related AGENT definition, the default is 2 mins. So that ESP won't resend the request before or around the same time when the connection to agent is resumed.

Hope this helps,

Lucy
8. Re: ESP: Job runs and abends in the same generation with the same job number

0 Recommend
Jonathan Calloway
Posted Nov 10, 2017 09:35 AM

Reply Reply Privately
Lucy,
Thank you very much for this! We are going to monitor to see if this continues to happen, and if so, we will have the server owner make the change you suggested.

Thanks again!

ESP Workload Automation

ESP: Job runs and abends in the same generation with the same job number

Jonathan CallowayNov 09, 2017 05:36 PM

powdo03Nov 09, 2017 05:50 PM

Jonathan CallowayNov 09, 2017 05:54 PM

powdo03Nov 09, 2017 05:57 PM

Jonathan CallowayNov 09, 2017 06:02 PM

Jonathan CallowayNov 09, 2017 06:19 PM

Lucy ZhangNov 10, 2017 08:34 AMBest Answer

Jonathan CallowayNov 10, 2017 09:35 AM

1. ESP: Job runs and abends in the same generation with the same job number

2. Re: ESP: Job runs and abends in the same generation with the same job number

3. Re: ESP: Job runs and abends in the same generation with the same job number

4. Re: ESP: Job runs and abends in the same generation with the same job number

5. Re: ESP: Job runs and abends in the same generation with the same job number

6. Re: ESP: Job runs and abends in the same generation with the same job number

7. Re: ESP: Job runs and abends in the same generation with the same job number Best Answer

8. Re: ESP: Job runs and abends in the same generation with the same job number

7. Re: ESP: Job runs and abends in the same generation with the same job number
Best Answer