Automic Workload Automation

 View Only
  • 1.  Bash scripts for the Automation Engine

    Posted Mar 04, 2021 08:13 AM
    Edited by Michael A. Lowry Mar 04, 2021 05:41 PM
      |   view attached
    Over the past few years of working with the Automation Engine, I have been working on a set of scripts to simplify working with the Automation Engine and the Service Manager that controls it. Today I'd like to share these scripts with the broader AE community, in the hopes that others may find them useful.

    Attached are three scripts:
    • server_env.sh - defines lists of servers
    • uc4_env.sh - sets global variables including pathnames
    • uc4_functions.sh - defines functions including scml and ae
    The first two scripts must be customized for your environment.
    The smcl and ae functions are the interesting parts.
    These scripts are meant to be sourced from your user's .bashrc file. I.e.,
    . server_env.sh
    . uc4_env.sh
    . uc4_functions.sh
    Once everthing is set up, you should be able to run the smcl and ae functions directly from the command line.

    Documentation of these functions follows.

    smcl is a wrapper function for the Service Manager Command Line program ucybsmcl. It is specifically adapted for use with the Service Managers that control the Automation Engine (UC4). Because it automatically reads some information from configuration files, it is simpler to use than ucybsmcl. When using smcl, you do not need to specify three ucybsmcl options:

    • -h <hostname:port>
    • -n <phrase>
    • -p <password>

    The smcl function automatically fills in these details based on context - mostly based on the server where you run it.

    If a password is required, you will be prompted for it.

    Usage

    Abbreviated syntax

    This is a shortened syntax adapted to make commonly-used functions quicker to type.

    smcl [list] List all services (AE server processes)
    smcl start service Start a service
    smcl stop service Stop a service
    smcl shutdown service Shut down the AE system (provide the name of a running WP)


    If no action is specified, the default is list.

    Classic syntax

    This is the classic syntax used by ucybsmcl.

    smcl -c GET_PROCESS_LIST List all services (AE server processes)
    smcl -c START_PROCESS -s service Start a service
    smcl -c STOP_PROCESS -s service [-m stop mode] Stop a service or stop the system
    smcl -c SET_DATA -s service -d property value Change the properties of a service

    The smcl function is defined in uc4_functions.sh, a script containing shared functions that are also used in other scripts like ae_server.sh. The uc4_functions.sh script is sourced by the .bashrc of all UC4 technical users so, smcl should be available to all of these users.

    Limitations

    smcl works only with the AE server Service Manager running on the same node where you run it. It cannot be used to interact with the Agent Service Manager, or with Service Managers running on other nodes. The script was developed primarily for one AE implementation. It may need to be customized for your environment.

    Implementation

    The script is implemented as a Bash function in uc4_functions.sh. It relies on many other functions in that script.



    AE control script (ae)

    ae is a function that can be used to quickly start or stop the Automation Engine (UC4), or to display Service Manager information for the AE.

    Here's a bit more detail about how these actions work:

    • start starts the Automation Engine, optionally in a specified start mode.
    • kill just kills the AE Server service manager, after first asking for confirmation.
    • status action displays information about any running AE Server service manager.
      • ps listing of SMgr process
      • number of child processes
    • list, stop and shutdown use smcl wrapper to the ucybsmcl program to interact directly with the Service Manager.
      • list: runs the GET_PROCESS_LIST command and print tab-delimited output.
      • stop: sends STOP_PROCESS -m C to all running AE Server processes, after first asking for confirmation.
      • shutdown: sends STOP_PROCESS -m S to the WP with the highest CPU time (usually the PWP). This will stop all AE server processes on both nodes.

    The script performs lots of error-checking and prompts for confirmation before performing potentially destructive actions. It also prompts for the Service Manager password, if required.

    Examples

    Start the Automation Engine

    To start the Automation Engine normally, just run ae start by itself or run ae start normal.

    Display status of the Automation Engine

    Run ae by itself or run ae status. This will list details of the AE server Service Manager process and display the number of child processes.

    List running AE server processes

    Run ae list to list all of the running AE server processes.

    This lists only the AE server processes that the Service Manager knows about. If the AE is experiencing problems, there may be a breakdown in the communication between the AE and the Service Manager. In this case, the list of processes displayed may be incomplete or not up-to-date. In this situation, you should use the ps command to make sure you identify running  all AE server processes, including those that have lost communication with the Service Manager.

    Stop all running AE server processes

    Run ae stop and type 1 to confirm.

    The stop action does not stop the Service Manager. You will need to run ae kill before trying to start it again. (See the following example.)

    Kill the AE server Service Manager

    Just run ae kill and type 1 to confirm.

    If any child processes of the Service Manager are still running, a kill signal will be sent to them when the Service Manager process ends. Usually this will cause the child processes to end too. However, the Automation Engine is experiencing problems, some AE server processes may refuse to die. In this situation, it may be necessary to list hung AE server processes using the ps command, and then kill them one by one, or all at once using a command like killall.

    Shutdown the AE server on both (all) nodes

    Run ae shutdown and type 1 to confirm.

    This will completely shut down the AE server on all nodes. This is done by sending STOP_PROCESS with stop mode S (shutdown) via the SMgr to the WP with the highest CPU usage. (This is often the PWP.) This WP will in turn instruct all other running AE server processes to stop.

    Start the AE server with a minimal set of processes

    For troubleshooting, it can use useful to start the AE Server gradually, initially starting just a few processes. This makes it much simpler to find errors in the AE server logs, because there are fewer active logs to search through.

    Run ae start with start mode mini.

    The start modes rely on a particular naming convention for Service Manager Command (SMC) files. Take a look at the uc4_functions.sh script to learn more or to customize this for your needs.

    Limitations

    ae was developed primarily for one particular AE implementation. It may need to be customized for your environment.

    Implementation

    The script is implemented as a Bash function in uc4_functions.sh. It relies on many other functions in that script, including the smcl function described above.

    Attachment(s)

    zip
    AE Bash scripts.zip   9 KB 1 version


  • 2.  RE: Bash scripts for the Automation Engine

    Posted Mar 04, 2021 10:34 AM
    Edited by Michael A. Lowry Mar 04, 2021 10:34 AM
    Just a few additional pieces of information to help you get started:

    • The paths where the Service Manager and AE Server are installed are defined in uc4_env.sh.You will almost certainly need to update these to point to where you have these programs installed.
    • The server_env.sh script contains lists of servers by environment, e.g., experimental, development, testing, and production. You'll need to update these lists based on your host names and environments.
    • The functions in uc4_functions.sh are designed based on several assumptions regarding the AE server installation, including a naming convention for configuration files and server manager destination names. This naming convention is based on the environment of the server (e.g., DEV), and whether the server is the primary or secondary in the AE server cluster (i.e., A or B). Both of these details are specified in the aforementioned server_env.sh script.
    • You will have to adhere to the expected naming convention, or customize the script to use a different convention.
    • The script is designed to do plenty of error checking, and to print informative messages if an unexpected condition arises.
    Enjoy.



  • 3.  RE: Bash scripts for the Automation Engine

    Posted Mar 18, 2024 11:32 AM
    Edited by Michael A. Lowry 30 days ago
      |   view attached

    I've made some updates to my script of handy Bash functions for AE administrators. I've made a lot of changes in three years, but the most useful is probably a new function called aex that lists running AE processes using the AE Server's Service Manager, and then adds extended information for each running process, including process type, role, and log file path.

    Below is an example of the output.

    $ aex
    SMgr service name     AE Proc  Type/Role  Status    PID      #Conn  Log file
    EXP2-NodeA-REST       CP001    REST API   Running   17537    83     /var/uc4/server/CPsrv_EXP2_log_001_00.txt
    EXP2-NodeA-JCP1       CP002    JCP        Running   3397     92     /var/uc4/server/CPsrv_EXP2_log_002_00.txt
    EXP2-NodeA-JCP2       CP003    JCP        Running   3847     85     /var/uc4/server/CPsrv_EXP2_log_003_00.txt
    EXP2-NodeA-XA2-JCP1   CP004    JCP        Running   7614     82     /var/uc4/server/CPsrv_EXP2_log_004_00.txt
    EXP2-NodeA-XA2-JCP2   CP005    JCP        Running   22691    81     /var/uc4/server/CPsrv_EXP2_log_005_00.txt
    EXP2-NodeA-JWP1       WP001    JWP-AUT    Running   58949    29     /var/uc4/server/WPsrv_EXP2_log_001_00.txt
    EXP2-NodeA-JWP2       WP071    JWP-IDX    Running   59632    35     /var/uc4/server/WPsrv_EXP2_log_071_00.txt
    EXP2-NodeA-JWP3       WP072    JWP-PER    Running   61096    38     /var/uc4/server/WPsrv_EXP2_log_072_00.txt
    EXP2-NodeA-JWP4       WP073    JWP-UTL    Running   64924    36     /var/uc4/server/WPsrv_EXP2_log_073_00.txt
    EXP2-NodeA-JWP5       WP074    JWP        Running   65202    35     /var/uc4/server/WPsrv_EXP2_log_074_00.txt
    EXP2-NodeA-CP1        CP006    CP         Running   9344     79     /var/uc4/server/CPsrv_EXP2_log_006_00.txt
    EXP2-NodeA-CP2        CP007    CP         Running   9545     78     /var/uc4/server/CPsrv_EXP2_log_007_00.txt
    EXP2-NodeA-CP3        CP008    CP         Running   9559     78     /var/uc4/server/CPsrv_EXP2_log_008_00.txt
    EXP2-NodeA-XA2-CP1    CP009    CP         Running   9565     78     /var/uc4/server/CPsrv_EXP2_log_009_00.txt
    EXP2-NodeA-XA2-CP2    CP010    CP         Running   22709    78     /var/uc4/server/CPsrv_EXP2_log_010_00.txt
    EXP2-NodeA-WP1        WP006    PWP*       Running   10040    97     /var/uc4/server/WPsrv_EXP2_log_006_00.txt
    EXP2-NodeA-WP2        WP007    WP         Running   16942    27     /var/uc4/server/WPsrv_EXP2_log_007_00.txt
    EXP2-NodeA-WP3        WP008    DWP        Running   16946    27     /var/uc4/server/WPsrv_EXP2_log_008_00.txt
    EXP2-NodeA-WP4        WP009    DWP        Running   16952    27     /var/uc4/server/WPsrv_EXP2_log_009_00.txt
    EXP2-NodeA-WP5        WP010    DWP        Running   17157    27     /var/uc4/server/WPsrv_EXP2_log_010_00.txt
    EXP2-NodeA-WP6        WP044    WP         Running   18245    27     /var/uc4/server/WPsrv_EXP2_log_044_00.txt
    EXP2-NodeA-WP7        WP012    DWP        Running   17175    27     /var/uc4/server/WPsrv_EXP2_log_012_00.txt
    EXP2-NodeA-WP8        WP013    DWP        Running   17191    27     /var/uc4/server/WPsrv_EXP2_log_013_00.txt
    EXP2-NodeA-WP9        WP002    WP         Running   22713    27     /var/uc4/server/WPsrv_EXP2_log_002_00.txt
    EXP2-NodeA-WP10       WP015    DWP        Running   17661    27     /var/uc4/server/WPsrv_EXP2_log_015_00.txt
    EXP2-NodeA-WP11       WP016    DWP        Running   17861    27     /var/uc4/server/WPsrv_EXP2_log_016_00.txt
    EXP2-NodeA-WP12       WP017    DWP        Running   17865    27     /var/uc4/server/WPsrv_EXP2_log_017_00.txt
    EXP2-NodeA-WP13       WP018    DWP        Running   17875    27     /var/uc4/server/WPsrv_EXP2_log_018_00.txt
    EXP2-NodeA-WP14       WP019    DWP        Running   17883    27     /var/uc4/server/WPsrv_EXP2_log_019_00.txt
    EXP2-NodeA-WP15       WP020    DWP        Running   17888    27     /var/uc4/server/WPsrv_EXP2_log_020_00.txt
    EXP2-NodeA-WP16       WP021    DWP        Running   17898    27     /var/uc4/server/WPsrv_EXP2_log_021_00.txt
    EXP2-NodeA-WP17       WP022    RWP*       Running   18102    27     /var/uc4/server/WPsrv_EXP2_log_022_00.txt
    EXP2-NodeA-WP18       WP023    DWP        Running   18105    27     /var/uc4/server/WPsrv_EXP2_log_023_00.txt
    EXP2-NodeA-WP19       WP024    DWP        Running   18111    27     /var/uc4/server/WPsrv_EXP2_log_024_00.txt
    EXP2-NodeA-WP20       WP025    OWP*       Running   18116    27     /var/uc4/server/WPsrv_EXP2_log_025_00.txt
    EXP2-NodeA-WP21       WP026    DWP        Running   18585    27     /var/uc4/server/WPsrv_EXP2_log_026_00.txt
    EXP2-NodeA-WP22       WP027    DWP        Running   18590    27     /var/uc4/server/WPsrv_EXP2_log_027_00.txt
    EXP2-NodeA-WP23       WP028    DWP        Running   19089    27     /var/uc4/server/WPsrv_EXP2_log_028_00.txt
    EXP2-NodeA-WP24       WP029    DWP        Running   19094    27     /var/uc4/server/WPsrv_EXP2_log_029_00.txt
    EXP2-NodeA-WP25       WP030    DWP        Running   19103    27     /var/uc4/server/WPsrv_EXP2_log_030_00.txt
    EXP2-NodeA-WP26       WP031    DWP        Running   19116    27     /var/uc4/server/WPsrv_EXP2_log_031_00.txt
    EXP2-NodeA-WP27       WP032    DWP        Running   19120    27     /var/uc4/server/WPsrv_EXP2_log_032_00.txt
    EXP2-NodeA-WP28       WP033    DWP        Running   19124    27     /var/uc4/server/WPsrv_EXP2_log_033_00.txt
    EXP2-NodeA-WP29       WP034    WP         Running   19326    27     /var/uc4/server/WPsrv_EXP2_log_034_00.txt
    EXP2-NodeA-WP30       WP035    WP         Running   19331    27     /var/uc4/server/WPsrv_EXP2_log_035_00.txt
    EXP2-NodeA-WP31       WP036    WP         Running   19336    27     /var/uc4/server/WPsrv_EXP2_log_036_00.txt
    EXP2-NodeA-WP32       WP037    DWP        Running   19343    27     /var/uc4/server/WPsrv_EXP2_log_037_00.txt
    

     

    For those who are curious, the script identifies log file paths by listing the open file descriptors of each PID returned by the SMgr. The number of connections is obtained in the same way. (Note that these numbers are somewhat larger than those displayed in the Service Manager Dialog Client.)

    Identification of process types and roles relies on parsing recent logs and looking for particular messages.

    If would be nice if ucybsmcl were capable of collecting and displaying all of this information, in additional to the information displayed in the Service Manager Dialog Client. However, it seems the command line SMgr client is unlikely to receive any significant updates in the future.

    Attachment(s)

    zip
    AE Bash scripts.zip   11 KB 1 version


  • 4.  RE: Bash scripts for the Automation Engine

    Posted Mar 18, 2024 12:33 PM
    Edited by Michael A. Lowry Mar 18, 2024 01:02 PM

    I updated the script once more to fetch a connection count for each AE server process, again by listing the active (network) file descriptors for each PID.

    For some reason, this returned numbers of connections somewhat higher than the numbers listed in the Service Manager Dialog Client. I haven't looked into the reason for the discrepancy. Perhaps some connections are excluded for some reason in the list displayed in the SMgr GUI.



  • 5.  RE: Bash scripts for the Automation Engine

    Posted 30 days ago
    Edited by Michael A. Lowry 30 days ago

    I made another small update. I added a column for the assigned process name, improved identification of DWPs, and added a separate function called aep that can be used to quickly identify a specified AE server process.

    $ aep wp6
    SMgr service name     AE Proc  Type/Role  Status    PID      #Conn  Log file
    EXP2-NodeA-WP1        WP006    PWP*       Running   10040    97     /var/uc4/server/WPsrv_EXP2_log_006_00.txt