Automic Application Manager Version: 9.1.3_28572_28626
Fiserv DNA: Y
Oracle RAC for DNA on 3 Nodes
Automic Master, Automic Agents, and DNA File Share (Output path) are virtual systems.
We are currently encountering a nightly batch jobs that are aborting with exit code 255 on most nights, which I believe is a communication related issue. We have had Fiserv specialist look into it, and reach out to Automic on our behalf but have had no real progress and I'm hoping the community will be able to help me narrow in on the issue.
I believe Automic is getting through the SQT processing and failing on the final build of the output file based on the MS_QRSLT output and the Task Details (See Attached Task_Details_EF_PBGEN.jpg and Queue_Folder_Contents.jpg).
Elapsed Return Checkpoint Start Date/Time Stop Date/Time Minutes Code Information10-09-2018 19:00:06 10-09-2018 19:29:15 29.150 0 Positive Balance File Creation Program Complete
Prog Info : (NTWkNode=2677, OraUID=63, OraSession=)RETURNCODE=0
And example details from the EF_PBGEN log:
===== ===== ===== ===== ===== MS_QRSLT REPORT ENDS ===== ===== ===== ===== =====
Combining SQT OS return code (255) and MS_QUERR over-threshold code (0) and MS_QRSLT return code (0) -------to arrive at overall return code: (255).
Processing sqr.log output file (deleting if size less than or equal to 6): sqr.log file size is 0. Deleting empty/small sqr.log file.
Has the community encountered anything like this?
Do you do port scanning? If so, for Automic and DNA related systems/processes, which ports do you exclude?
Do you do virus scanning? If so, for Automic and DNA related systems/processes, which folders do you exclude?
Thank you for any assistance and please let me know if you need more detail.
Were you able to get this sorted out? Sorry, I'm adding to this post so late. I just got notifications set back up again for the Automic Community since it changed over to CA. I believe you are correct that it is a communication issue. We have seen this before and haven't really been able to pinpoint root cause. It hasn't happened in a while, so it has fallen off of our radar. I would agree that something is interrupting Automic during the write process to the DNA file share. Does it happen at certain times of the day? Did anything in your environment recently change when you first started seeing these abort errors?