Automic Workload Automation

 View Only

  • 1.  V24.4.3 WP's crash after db reconnect

    Posted 28 days ago
    Hello everyone,
     
    in our V24.4.3, the WP's crashes in the db reconnect after a connection loss.
     
    And this happens without any log output or trace.
    It seems to be a known behavior.
     
    Has anyone else ever had this problem?
    BR
    Ralf


    -------------------------------------------


  • 2.  RE: V24.4.3 WP's crash after db reconnect

    Posted 27 days ago

    Hello,

    We encountered the same problem. But corresponding messages were visible in the WP-logs in our case.

    This is a known issue and has already been reported to Broadcom.

    Regards

    Stephan

    -------------------------------------------



  • 3.  RE: V24.4.3 WP's crash after db reconnect

    Posted 27 days ago

    Hello Stephan,

    thanks for the information.

    It's bad that this is happening to customers. It seems that Broadcom doesn't really do internal testing.

    Greetings

    Ralf

    -------------------------------------------



  • 4.  RE: V24.4.3 WP's crash after db reconnect

    Posted 26 days ago

    👍

    -------------------------------------------



  • 5.  RE: V24.4.3 WP's crash after db reconnect

    Posted 25 days ago

    Hi Ralf, hi Stephan,

    I did a quick test with our v24.4.3 environment (Windows+MS-SQL) and the reconnect was working properly and the WP processes stayed online:

    20260213/153435.161 - U00029108 UCUDB: SQL_ERROR    Database handles  DB-HENV: e0441870  DB-HDBC: e04465e0
    20260213/153435.161 - U00003591 UCUDB - DB error info: OPC: 'SQLExecDirect' Return code: 'ERROR'
    20260213/153435.161 - U00003592 UCUDB - Status: '08S01' Native error: '10054' Msg: 'TCP Provider: An existing connection was forcibly closed by the remote host.
    '
    20260213/153435.161 - U00003592 UCUDB - Status: '08S01' Native error: '10054' Msg: 'Communication link failure'
    20260213/153435.161 - U00003536 UCUDB: FATAL DATA BASE ERROR: Re-connection will be attempted in 10 seconds.
    20260213/153435.161 - U00003537 UCUDB - RECONNECT: DB call 'SQLEndTran(Rollback)': Return code: '-1'.
    20260213/153435.161 - U00003590 UCUDB - DB error: 'SQLEndTran(Rollback)', 'ERROR   ', '08S01', 'TCP Provider: An existing connection was forcibly closed by the remote host.
    '
    20260213/153435.161 - U00003592 UCUDB - Status: '08S01' Native error: '10054' Msg: 'TCP Provider: An existing connection was forcibly closed by the remote host.
    '
    ...
    20260213/153449.145 - U00003538 UCUDB: Re-connection to database successful. Processing will continue.

    What DB/OS do you use for your test?

    Best regards

    Stephan

    -------------------------------------------



  • 6.  RE: V24.4.3 WP's crash after db reconnect

    Posted 22 days ago
    Hello Stephan,
     
    thanks for the test on your site.
    We have an Oracle database on AIX.
     
    Greetings,
    Ralf

    -------------------------------------------



  • 7.  RE: V24.4.3 WP's crash after db reconnect

    Posted 21 days ago
    Hello everyone,
    I can confirm that we are experiencing the same issues with the PostgreSQL 15 Database and SUSE Linux.
    Greetings,
    Samir
    -------------------------------------------



  • 8.  RE: V24.4.3 WP's crash after db reconnect

    Posted 22 days ago

    Ralf,

    I just had this happen in my development environment. We have Windows Server 2019 running AE, Windows Server 2019, and SQL Server 2019 running our database. We have it setup using high availability groups. So when a server is restarted it fails over to the secondary. During this failover, a process came back that the database was read-only and all WP's stopped on my system. This did not cause WP's to stop prior to 24.4.3. I am opening a case with support.

    20260216/015841.216 - 45     U00003590 UCUDB - DB error: 'S0002', 'An error occurred during the current command (Done status 0). Failed to update database "DATABASENAMEHERE" because the database is read-only.', '3906', 'com.microsoft.sqlserver.jdbc.SQLServerException'
    20260216/015841.216 - 45               SQL Statement which caused this DB error:
    20260216/015841.216 - 45               DELETE TOP (1) FROM MQ1JWP WITH(READPAST) OUTPUT deleted.MQJWP_PK,deleted.MQJWP_CAddr,deleted.MQJWP_BAcv,deleted.MQJWP_BAddr,deleted.MQJWP_BSRName,deleted.MQJWP_Status,deleted.MQJWP_Msg,deleted.MQJWP_BTable,deleted.MQJWP_CSRName,deleted.MQJWP_PhysAddr,deleted.MQJWP_CAcv
    20260216/015841.217 - 45     U00045014 Exception 'com.automic.database.api.DBException: "DELETE TOP (1) FROM MQ1JWP WITH(READPAST) OUTPUT deleted.MQJWP_PK,deleted.MQJWP_CAddr,deleted.MQJWP_BAcv,deleted.MQJWP_BAddr,deleted.MQJWP_BSRName,deleted.MQJWP_Status,deleted.MQJWP_Msg,deleted.MQJWP_BTable,deleted.MQJWP_CSRName,deleted.MQJWP_PhysAddr,deleted.MQJWP_CAcv"' at 'com.automic.database.impl.DBConnectionImpl.executeInternal():570'.
    20260216/015841.217 - 45     U00045015 The previous error was caused by 'com.microsoft.sqlserver.jdbc.SQLServerException: "An error occurred during the current command (Done status 0). Failed to update database "DATABASENAMEHERE" because the database is read-only."' at 'com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError():278'.


    20260216/015841.217 - 45     U00003620 Routine 'com.automic.kernel.impl.DBAction' forces trace because of error.
    20260216/015841.221 - 45     U00003450 The TRACE file was opened with the switches '0000000000000000'.
    20260216/015841.268 - 45     U00003449 Output to the TRACE file is finished.
    20260216/015842.541 - 40     U00003489 Server 'SYSTEMNAMEHERE#CP007' logged on (Client connection='69124').
    20260216/015842.544 - 39     U00003472 Connection to Server 'SYSTEMNAMEHERE#CP007' has been closed.
    20260216/015842.544 - 39     U00003407 Client connection '69124' from 'atm1dvatmas001:2218' has logged off from the Server.
    20260216/015845.368 - 39     U00045014 Exception 'java.net.SocketException: "Connection reset"' at 'sun.nio.ch.SocketChannelImpl.throwConnectionReset():394'.
    20260216/015845.368 - 39     U00003472 Connection to Server 'SYSTEMNAMEHERE#WP014(SYNC)' has been closed.
    20260216/015845.368 - 39     U00003407 Client connection '61' from '10.109.78.170:57743' has logged off from the Server.
    20260216/015845.705 - 39     U00045014 Exception 'java.net.SocketException: "Connection reset"' at 'sun.nio.ch.SocketChannelImpl.throwConnectionReset():394'.
    20260216/015845.705 - 39     U00003472 Connection to Server 'SYSTEMNAMEHERE#WP012(SYNC)' has been closed.
    20260216/015845.706 - 39     U00003407 Client connection '59' from '10.109.78.170:57693' has logged off from the Server.
    20260216/015845.846 - 39     U00045014 Exception 'java.net.SocketException: "Connection reset"' at 'sun.nio.ch.SocketChannelImpl.throwConnectionReset():394'.


    20260216/015845.925 - U00029108 UCUDB: SQL_ERROR    Database handles  DB-HENV: 2502a5f0  DB-HDBC: 250271b0
    20260216/015845.925 - U00003591 UCUDB - DB error info: OPC: 'SQLExecDirect' Return code: 'ERROR'
    20260216/015845.925 - U00003592 UCUDB - Status: '25000' Native error: '3906' Msg: 'Failed to update database "DATABASENAMEHERE" because the database is read-only.'
    20260216/015845.925 - U00003594 UCUDB Ret: '3590' opcode: 'EXEC' SQL Stmnt: 'update RH set RH_TimeStamp4 = ? where RH_AH_Idnr = ? and RH_TYPE = ?'
    20260216/015845.925 - U00003590 UCUDB - DB error: 'SQLExecDirect', 'ERROR   ', '25000', 'Failed to update database "DATABASENAMEHERE" because the database is read-only.'
    20260216/015845.926 - U00003590 UCUDB - DB error: 'SQLExecDirect', 'ERROR   ', '25000', 'Failed to update database "DATABASENAMEHERE" because the database is read-only.'
    20260216/015845.927 - U00029108 UCUDB: SQL_ERROR    Database handles  DB-HENV: 2502a5f0  DB-HDBC: 250271b0
    20260216/015845.927 - U00003591 UCUDB - DB error info: OPC: 'SQLExecDirect' Return code: 'ERROR'
    20260216/015845.927 - U00003592 UCUDB - Status: '25000' Native error: '3906' Msg: 'Failed to update database "DATABASENAMEHERE" because the database is read-only.'
    20260216/015845.927 - U00003594 UCUDB Ret: '3590' opcode: 'EXEC' SQL Stmnt: 'UPDATE AH set AH_TimeStamp4 = ?, AH_Status = ? where AH_Idnr = ?'
    20260216/015845.927 - U00003620 Routine 'UCMAIN_R' forces trace because of error.
    20260216/015845.927 - U00003590 UCUDB - DB error: 'SQLExecDirect', 'ERROR   ', '25000', 'Failed to update database "DATABASENAMEHERE" because the database is read-only.'
    20260216/015846.001 - U00003449 Output to the TRACE file is finished.
    20260216/015846.003 - U00003100 Memory TRACE was opened with the switches '0000000000000000'.
    20260216/015846.003 - U00003380 Server 'SYSTEMNAMEHERE#WP003' version '24.4.3+build.1765543096850' (Runtime '3/23:59:02', Log# '1', Trc# '0').
    20260216/015846.003 - U00003491 There is a time difference of '0/00:00:00' or '0' seconds to the Primary Server.
    20260216/015846.005 - U00003375 Server usage of the last minute '0%', the last 10 minutes '0%' and the last hour '0%'.
    20260216/015846.005 - U00003343 Server 'SYSTEMNAMEHERE#WP003' processes roles 'O'.

    -------------------------------------------



  • 9.  RE: V24.4.3 WP's crash after db reconnect

    Posted 8 days ago

    Trying to upgrade from AAKE 21.0.15 to AAKE 24.4.3+HF1, the WP pods keep crashing with code 139 and created lots of entries until 999. Not sure if that is the same issue.

    20260212/134634.348 -          AUTOMIC#WP979  WP   0 100.73.11.12                0 2026-01-29 06:44:30 2026-01-29 06:44:43
    20260212/134634.348 -          AUTOMIC#WP980  WP   0 100.73.11.7                 0 2026-01-29 06:44:32 2026-01-29 06:44:46
    20260212/134634.348 -          AUTOMIC#WP981  WP   0 100.73.11.18                0 2026-01-29 06:48:27 2026-01-29 06:48:37
    20260212/134634.348 -          AUTOMIC#WP982  WP   0 100.73.11.19                0 2026-01-29 06:48:38 2026-01-29 06:48:51
    20260212/134634.348 -          AUTOMIC#WP983  WP   0 100.73.10.205               0 2026-01-29 06:48:39 2026-01-29 06:48:53
    20260212/134634.348 -          AUTOMIC#WP984  WP   0 100.73.10.203               0 2026-01-29 06:49:40 2026-01-29 06:49:48
    20260212/134634.348 -          AUTOMIC#WP985  WP   0 100.73.11.12                0 2026-01-29 06:49:53 2026-01-29 06:50:06
    20260212/134634.348 -          AUTOMIC#WP986  WP   0 100.73.11.7                 0 2026-01-29 06:49:54 2026-01-29 06:50:08
    20260212/134634.348 -          AUTOMIC#WP987  WP   0 100.73.10.207               0 2026-01-29 06:50:10 2026-01-29 06:50:19
    20260212/134634.348 -          AUTOMIC#WP988  WP   0 100.73.11.140               0 2026-01-29 09:11:38 2026-01-29 09:12:18
    20260212/134634.348 -          AUTOMIC#WP989  WP   0 100.73.11.19                0 2026-01-29 06:54:03 2026-01-29 06:54:15
    20260212/134634.348 -          AUTOMIC#WP990  WP   0 100.73.10.205               0 2026-01-29 06:54:06 2026-01-29 06:54:18
    20260212/134634.348 -          AUTOMIC#WP991  WP   0 100.73.10.203               0 2026-01-29 06:55:01 2026-01-29 06:55:09
    20260212/134634.348 -          AUTOMIC#WP992  WP   0 100.73.11.12                0 2026-01-29 06:55:13 2026-01-29 06:55:25
    20260212/134634.348 -          AUTOMIC#WP993  WP   0 100.73.11.7                 0 2026-01-29 06:55:13 2026-01-29 06:55:28
    20260212/134634.348 -          AUTOMIC#WP994  WP   0 100.73.11.18                0 2026-01-29 06:59:13 2026-01-29 06:59:23
    20260212/134634.348 -          AUTOMIC#WP995  WP   0 100.73.11.19                0 2026-01-29 06:59:27 2026-01-29 06:59:40
    20260212/134634.349 -          AUTOMIC#WP996  WP   0 100.73.10.205               0 2026-01-29 06:59:29 2026-01-29 06:59:41
    20260212/134634.349 -          AUTOMIC#WP997  WP   0 100.73.10.203               0 2026-01-29 07:00:21 2026-01-29 07:00:30
    20260212/134634.349 -          AUTOMIC#WP998  WP   0 100.73.11.7                 0 2026-01-29 07:00:36 2026-01-29 07:00:48
    20260212/134634.349 -          AUTOMIC#WP999  WP   0 100.73.11.12                0 2026-01-29 07:00:40 2026-01-29 07:00:53
    

    -------------------------------------------



  • 10.  RE: V24.4.3 WP's crash after db reconnect

    Posted 8 days ago

    Hello Kwun,

    in our case, the following messages appear in the log:
    U00003367 No primary work process is registered in the database; next attempt in '10' seconds.

    The following messages appear in the smgr log:
    20260119/130318.178 - U00022012 Process 'UC4 WP6/UC4 DWP-Server [UC4_xxx#WP010] - 8 Connections' (ID '4784') ended.
    20260119/130318.178 - U00022022 Process 'UC4 WP6/UC4 DWP-Server [UC4_xxx#WP010] - 8 Connections' ended, exit code='6'.

    RG
    Ralf

    -------------------------------------------