ESP Workload Automation

 View Only

 Intermittently not getting the SetStart Status message after "agent notified"

Keith Grizzell's profile image
Keith Grizzell posted Dec 09, 2025 04:08 PM

Hello, we have encountered a peculiar issue on one of our model test servers. Intermittently the SetStart Status message doesn't seem to be coming through after agent notified message. The jobs are stuck in "ready" status until I recycle the ESP services. Once recycle is complete jobs will run successfully for several hours or even days then issue pops up again all of a sudden. No patching or upgrades have ben done on this server recently so kind of curious what it could be. I would appreciate any thoughts or suggestions. Example from syslog below

Normal Successful Run  at 14:37 today 

ESP6633I APPLMGR: APPL MDLESB64.139486 JOB OSMEC064 READIED

ESP6108I EVENTEX: SPRODID.MDLESB64 scheduled, for Job OSMEC064 in Appl MDLESB64.139486, Unconditional

ESP6125IEVENTEX: PARMS: ESPTRUSR(SPRODID)ESPTRTYPE(P)RESUB_USER()ESPFTFILE('\\gmcc.grange.local\ftp\Data\PaperlessESP6125I EVENTEX:        MDL\PaperlessSendPaperCommunication\Daily_Bounce_Manifest.xml')

ESP6220I  MgrMsg: . OSMEC064/MDLESB64.139486/MAIN State READY Status(Agent Notified)

ESP6220I  MgrMsg: VMINAPS01 OSMEC064/MDLESB64.139486/MAIN State EXEC SetStart Status(Executing at VMINAPS01)

This one was stuck in Ready at 14:50

ESP6633I APPLMGR: APPL MDLESB64.139488 JOB OSMEC064 READIED

ESP6108I EVENTEX: SPRODID.MDLESB64 scheduled, for Job OSMEC064 in Appl MDLESB64.139488, Unconditional

ESP6125I EVENTEX: PARMS: ESPTRUSR(SPRODID)ESPTRTYPE(P)RESUB_USER()ESPFTFILE('\\gmcc.grange.local\ftp\Data\Paperless

ESP6125I EVENTEX:        MDL\PaperlessSendPaperCommunication\Daily_Bounce_Manifest.xml')

ESP6220I  MgrMsg: . OSMEC064/MDLESB64.139488/MAIN State READY Status(Agent Notified)

                                                                                                          No set start message after agent notified

Rick Romanowski's profile image
Rick Romanowski

Have you checked the transmitter log on the server in question?

Check that IP connectivity exists between server and ESP Workload Manager

For Windows 
PowerShell: Test-NetConnection
Open PowerShell (Start > PowerShell).
Type: Test-NetConnection <IP_Address_or_Hostname> -Port <Port_Number>
(e.g., Test-NetConnection <ESP-WLM Address> -Port <ESP-WLM Port>).
TcpTestSucceeded: True means it's open.

Lucy Zhang's profile image
Broadcom Employee Lucy Zhang

Hi Keith,

Would suggest that you open a support case on this.

As you may already know, after the agent wob request was sent to the agent, ESP negatively waits for the status update from the agent. So if the expected "EXEC SetStart" message doesn't show in ESP auditlog, the possible causes can be:
- The connection from agent to ESP stopped, so the message couldn't be sent, you may check the connection like Rick R mentioned above;
- The agent didn't get the related update from the server OS or other applications if business agents are used, and therefore didn't create the message.

Agent transmitter log can be checked if it contains the message or not; if not, then more research needs to be done from agent side.

Hope this helps.

Lucy

Keith Grizzell's profile image
Keith Grizzell

Thanks Rick and Lucy. Unfortunately I do not have permissions on this server past our cyb_jobs folder and stop/start of ESP services on this server. I have created internal ticket with our server team to provide me with logs and to test the connection between server and ESP workload manager. If we cannot determine the issue from these I will open a support ticket per Lucy's suggestion below 

Keith Grizzell's profile image
Keith Grizzell

Update: server team found no connection errors in the logs. They did find a drive that contains the PowerShell scripts our ESP jobs kick off had zero free space. Since server team increased the free space we have not encountered the issue. Still need to verify if the drive space was the actual cause of the issue

Lucy Zhang's profile image
Broadcom Employee Lucy Zhang
Hi Keith,
 
Thank you for your feedback.
 
It reminds me another possible root cause. As the knowledge doc below, it can be related to no space to create the agent wob spool output:
https://knowledge.broadcom.com/external/article?articleId=420527
 
Regards,
 
Lucy                                                  
Chris_Elvin's profile image
Broadcom Employee Chris_Elvin

In addition to Lucy's answer, if disk space on the drive/partition where the agent is installed is indeed the root cause of the problem then you might wish to consider setting up the agent to send alerts to the ESP logs (or generate an SNMP trap) when disk space gets low. https://techdocs.broadcom.com/us/en/ca-enterprise-software/intelligent-automation/workload-automation-system-agent/24-1/configuring/configure-the-agent/configure-the-agent-to-monitor-available-disk-space.html gives more details.

Other techniques for managing spool space (including Lucy's suggestion) are found at https://techdocs.broadcom.com/us/en/ca-enterprise-software/intelligent-automation/workload-automation-system-agent/24-1/configuring/maintain-spool-and-log-files/spool-file-maintenance.html

Rick Romanowski's profile image
Rick Romanowski

You could set up a DISK_MON Workload Object then check available space before running job.  If space is not available run a cleanup script then run the job.

The link below describes an ESP PROC for how to check space after DISK_MON completes.

https://community.broadcom.com/communities/community-home/digestviewer/viewthread?GroupId=1903&MID=768720&CommunityKey=a63272f0-fb9f-44be-b0ff-9657f904076e

Keith Grizzell's profile image
Keith Grizzell

Thank you all for the suggestions. We now believe the issue was caused by the lack of free space. This drive was indeed the drive that has the agent installed on it. It looks like when the agent was installed it was set to not delete spool files. When I drilled down in the spool folder we had spool files in there from as far back as 2020 when the server was first setup. I deleted some 800,000 old spool files to get down to just the last 10 days worth.   I checked the production counter part and it was setup to delete spool files so there is several gigs of space on this one. Server team has already setup monitoring for space on this drive and I am looking into implementing the suggestions in this post