Automic Workload Automation

 View Only
  • 1.  How does the Service Manager stop processes?

    Posted Nov 19, 2024 05:08 AM
    Edited by Tony Beeston Nov 20, 2024 07:46 AM

    What exactly does the Service Manager do when you instruct it to stop a process?

    Immediately single process corresponds to ucybsmcl stop mode C.
    I guess that it sends an ordinary TERM (kill -15) signal.

    Abnormally corresponds to ucybsmcl stop mode A.
    I guess that it sends a KILL (kill -9) signal.

    Non-Java AE server processes have an additional stop mode: Shutdown (UC4 System).
    This corresponds to ucybsmcl stop mode S.

    I guess that it sends messages via internal AE communications channels, instructing all running AE server processes to stop. (It cannot merely send kill signals, because it also stops AE server processes running on other nodes.)

    What exactly is the mechanism and order of operations?

    Below is a summary of my guesses about how it works. Please correct me if I'm mistaken.

    Service Manager GUI (ucybsmdi) Service Manager CLI (ucybsmcl) Description
    Immediately single process … -c STOP_PROCESS -m C kill -15 <PID>
    Abnormally … -c STOP_PROCESS -m A kill -9 <PID>
    Shutdown (UC4 System) … -c STOP_PROCESS -m S Stop all running AE server processes on all nodes, via internal AE communication.


  • 2.  RE: How does the Service Manager stop processes?

    Posted Nov 19, 2024 08:34 AM
    Edited by Tony Beeston Nov 20, 2024 07:46 AM

    I'm also interested in AAKE. In an AAKE system, there is no Service Manager for the AE server. Instead, Kubernetes fills the role previously filled by the Service Manager, and is responsible for starting and stopping the pods for the AE server processes.

    What are the technical details of how AE server processes are started and stopped in AAKE? How do the AE server processes interact with the Kubernetes pod lifecycle?

    Also, some of the functions available in the AWI do not seem applicable. Others surely work differently than in a conventional AE system.

    Do the Service Manager details that appear when one selects an AE server process and chooses Open play any role in AAKE?

    Does the Update Service Manager Link function do anything in AAKE?

    What does the Stop Process function do in AAKE?



  • 3.  RE: How does the Service Manager stop processes?

    Posted Nov 20, 2024 01:16 PM

    Doesn't stoping an agent work from AWI also when the Agents' Service Manager is not linked with the engine, or if it is not there at all? I was always under the impresion that when you stop the agent it just stops itself. You need the ServiceManager to start it back again :) 

    In AAKE you control the amount of proceses by scaling the deployments. If a process ends itself / crashes the cluster restarts the pod automatically. So there is no functionality to control that baked in AWI. You could use the Kubernetes Integration (https://marketplace.automic.com/marketplace/browse/integration-kubernetes) to acomplish a similar result. 

    Cheers,

    Marcin



    ------------------------------
    Cheers,
    Marcin
    ------------------------------



  • 4.  RE: How does the Service Manager stop processes?

    Broadcom Employee
    Posted Nov 21, 2024 04:07 AM
    Edited by Oana Botez Nov 21, 2024 04:07 AM

    Starting with AAKE 24.2.0 it is possible to use auto-scaling based on memory and cpu utilization removing the need to "manually" change the number of replicas (processes) in case there is more load on the system.
    You can find more details in the documentation.

    BR,
    Oana




  • 5.  RE: How does the Service Manager stop processes?

    Posted Nov 21, 2024 11:15 AM
    Edited by Michael A. Lowry Nov 21, 2024 11:28 AM

    I did some tests and found answers to a few of my AAKE questions.

    Does the Update Service Manager Link function do anything in AAKE?
    Michael Lowry,  Nov 19, 2024 08:34 AM

    The answer is no. First of all, SMGR_LOOKUP is disabled by default in UC_SYSTEM_SETTINGS in AAKE. If you enable it (by setting its value to YES), you can use the Update Service Manager Link in the AWI, but any attempt to do so will result in an error message.

    Connection to Service Manager 'ae-wp-f64b48bfd-2ct5v' cannot be established

    This probably answers the related question too.

    Do the Service Manager details that appear when one selects an AE server process and chooses Open play any role in AAKE?
    Michael Lowry,  Nov 19, 2024 08:34 AM

    I suppose you could fill in some details in these fields, but because there is no Service Manager in AAKE, there will be no program to receive the connection attempts.

    That leaves just one question.

    What does the Stop Process function do in AAKE?
    Michael Lowry,  Nov 19, 2024 08:34 AM

    It stops the associated AE server process, but not the pod. The AE sever process ends in error and starts up again.

    U00011853 Server 'AAKE_EXP#WP011' was terminated by user 'MYUSER/MYCORP'.
    U00003432 Termination of Server 'AAKE_EXP#WP011' initiated.
    U00011816 Server 'AAKE_EXP#WP011': Termination initiated.
    ...
    U00004108 UCUREP module closed. Total time: '0.262138' seconds.
    WARNING:  there is no transaction in progress
    U00003523 UCUDB: Maximum time required for a DB call: '23:908.653.999'.
    U00003522 UCUDB: Database closed. Total time for DB calls: '64:990.709.999' seconds.
    U00003549 UCUDB: '            4621' 'OTHERS    ' calls took '24:195.852.999' sec.
    U00003549 UCUDB: '            2098' 'SELECT    ' calls took '23:491.793.000' sec.
    U00003549 UCUDB: '             174' 'EXECUTE   ' calls took '0:798.006.000' sec.
    U00003549 UCUDB: '              14' 'UPDATE    ' calls took '0:053.651.000' sec.
    U00003549 UCUDB: '               0' 'DELETE    ' calls took '0:000.000.000' sec.
    U00003549 UCUDB: '             127' 'INSERT    ' calls took '0:772.166.999' sec.
    U00003549 UCUDB: '            7063' 'READ      ' calls took '0:065.346.000' sec.
    U00003549 UCUDB: '            5697' 'CLOSESTMT ' calls took '0:013.258.000' sec.
    U00003549 UCUDB: '            2731' 'TRANSACT  ' calls took '15:600.635.999' sec.
    U00003620 Routine 'UCMAIN_R' forces trace because of error.
    U00000001 NO DATA (end of file, invalid key, no message...)
    Stream closed EOF for default/ae-wp-f64b48bfd-2ct5v (wp)

    It probably makes sense to disable or hide inapplicable fields and functions in the AWI for AAKE systems.



  • 6.  RE: How does the Service Manager stop processes?

    Posted Nov 20, 2024 05:02 AM

    Hi Michael,

    your assumptions are correct: "Immediately" sends kill -TERM/kill -15 and "Abnormally" sends kill -KILL/kill -9 (checked it with my signal handler tool). But was not able to test "Shutdown" because this context menu entry is not available for my tool.

    best regards,
    Peter



    ------------------------------
    Automic Certified Professional/Expert & Broadcom Knight

    For AUTOMIC trainings please check https://www.qskills.de/qs/workshops/automic/
    ------------------------------



  • 7.  RE: How does the Service Manager stop processes?

    Posted Nov 20, 2024 08:39 AM
    Edited by Michael A. Lowry Nov 20, 2024 08:38 AM

    Thanks for the confirmation, @Peter Grundler.

    Next I'd like to document how stop mode S works. I suppose I could capture a trace, but perhaps someone from Broadcom will save me the trouble.



  • 8.  RE: How does the Service Manager stop processes?

    Broadcom Employee
    Posted Nov 20, 2024 11:02 AM

    Correct kill 15 or kill 9 is used on Unix.
    On Windows the service have to provide Window, so that WM_QUIT can be send to the window or TerminateProcess is done. 
    Which is equivalent to the kill, listed previously. 

    For the shutdown, kill 10 (SIGUSR1) is used ... 




  • 9.  RE: How does the Service Manager stop processes?

    Posted Nov 21, 2024 06:32 AM
    Edited by Michael A. Lowry Nov 21, 2024 06:57 AM

    Stop mode S (Shutdown UC4 System) is handled by the Primary Work Process (PWP). I verified this by enabling tcp/ip 2 & db 4 tracing.

    When you select a CP or WP from the Service Manager GUI and choose Shutdown (UC4 System)the Service Manager receives this message and passes it along to the selected AE server process. (Interestingly, the Service Manager also appears to send some messages to the CPs.)

    ...
             Command: STOP-PROCESS
    U00022020 Client statement: process 'EXP2-NodeA-WP1' should be stopped
    U00022031 Client data: User 'MYCORP\MYUSER' of computer 'MYPC'.
    U00022010 Terminate process 'EXP-NodeA-WP1/UC4 PWP-Server [AE_EXP#WP003] - 4 Connections' (ID '18561') with mode 'S'.

    The next step takes place on the CP or WP that receives the shutdown message from the Service Manager.

    If you initiate the shutdown from a CP or from a WP other than the PWP, this server process will identify the PWP, and send the shutdown message (F90000?) to the PWP.

    handle_terminate_shutdown(SRV(name=AE_EXP#WP003,runmode=N,srvmode= ,pcx-roles=,netarea=AE_EXP)) -->
    sockets_get_primary() -->
    sockets_get_primary <-- (primary@0x317fa50)
    ActionSendShutdown() -->
    ActionSend(msgsize=51, SOCKET(s=15,name=AE_EXP#WP001,type=04,host=,add=10.95.42.1,port=22100,id=0,netarea=AE_EXP,roles=,nxt=0x2980150)) -->
    U00009909 TRACE: (Send to Server AE_EXP#WP001)                                         0x7ffd6bdafb10 000051
              00000000  30303030 30303531 5543343A 676C6F62  >00000051UC4:glob<
              00000010  616C3030 314E4154 20202020 20202020  >al001NAT        <
              00000020  20202020 20202020 20202020 20202020  >                <
              00000030  F90000                               >...<
    ActionSend <-- (OK)
    ActionSendShutdown <-- (OK,rslt=0)
    handle_terminate_shutdown <-- (OK)
    U00003432 Termination of Server 'AE_EXP#WP003' initiated.

    In this case, I selected WP3 (not the PWP) from the Service Manager GUI to initiate the shutdown. Note that WP3 stopped itself immediately after sending the shutdown message to the PWP. If you initiate the shutdown from the PWP, the above step is skipped.

    The PWP receives the shutdown message. (If the shutdown is initiated from the PWP, several handle_terminate_shutdown messages appear here.)

    socket_read(SOCKET(s=19,name=AE_EXP#WP003,type=02,host=uc4exp.mycompany.com,add=10.95.42.1,port=0,id=223219001,netarea=AE_EXP,roles=R,nxt=0x37a5f70)) -->
    U00009909 TRACE: (Recv from Server AE_EXP#WP003)                                       0x3c848f0 000051
              00000000  30303030 30303531 5543343A 676C6F62  >00000051UC4:glob<
              00000010  616C3030 314E4154 20202020 20202020  >al001NAT        <
              00000020  20202020 20202020 20202020 20202020  >                <
              00000030  F90000                               >...<
    socket_read(1424): msg=0x3c848f0, remaining=51
    socket_read(1455): logical msg len=51
    U00009909 TRACE: (Msg Server AE_EXP#WP003)                                             0x3c848f0 000051
              00000000  30303030 30303531 5543343A 676C6F62  >00000051UC4:glob<
              00000010  616C3030 314E4154 20202020 20202020  >al001NAT        <
              00000020  20202020 20202020 20202020 20202020  >                <
              00000030  F90000                               >...<
    

    Immediately after receiving the shutdown message, the PWP begins sending shutdown messages to all of the other running AE server processes. Here is an example.

    ActionSendShutdown() -->
    ActionSend(msgsize=51, SOCKET(s=16,name=AE_EXP#CP003,type=40,host=uc4exp.mycompany.com,add=10.95.42.1,port=22190,id=223212027,netarea=AE_EXP,roles=,nxt=0x1ea9470)) -->
    U00009909 TRACE: (Send to Server AE_EXP#CP003)                                         0x7ffc090046b0 000051
              00000000  30303030 30303531 5543343A 676C6F62  >00000051UC4:glob<
              00000010  616C3030 314E4154 20202020 20202020  >al001NAT        <
              00000020  20202020 20202020 20202020 20202020  >                <
              00000030  F90000                               >...<
    ActionSend <-- (OK)
    ActionSendShutdown <-- (OK,rslt=0)

    Judging from the PWP trace, the rest of the shutdown process is quite complicated and involves many additional steps including communication with the other processes. Some processes do not stop immediately. Once these additional shutdown steps have been completed and all of the other AE server processes have stopped, the PWP also stops.

    As far as I can tell, most of the inter-process communication related to an AE system shutdown is conveyed via TCP sockets.