AutoSys Workload Automation

 View Only
  • 1.  Did you know...what various cold start levels in WA DE mean?

    Broadcom Employee
    Posted Sep 26, 2016 01:16 AM

    CA Workload Automation DE (dSeries) server has 5 possible levels for a Cold Start. Let's see what these 5 different levels are and how they are different from one another.

     

    The cold start process is purposed to clear out active workload and delete the events' schedules. Below are the different levels of Cold Start available in Workload Automation DE:


    start.type.level = -2

    The server starts with normal cold start, but scheduled events that have not been processed at the time of the shutdown will be re-scheduled.
    Events that require a manual trigger are not preserved.

     

    start.type.level = -1

    The server starts with a normal cold start, the generations count for all applications is set to zero - these applications will also be purged.

     

    start.type.level = 0 

    The server cold starts its scheduler and active workload (nothing is preserved).

     

    start.type.level = 1 

    The server starts with the cold start, but all scheduled events are preserved.
    Events that were not yet processed at the time of the shutdown will be triggered after the server is started, based on prerequisites.
    Events that require a manual trigger are also preserved.

     

    start.type.level = 2 

    The server starts with the cold start, but active workload is preserved.
    Workload that was running at the time of shutdown continues to run after the server is started.



  • 2.  Re: Did you know...what various cold start levels in WA DE mean?

    Posted Sep 26, 2016 02:16 PM

    How do you know which one to pick.  Ideally we would always want to do level 2 to preserver workload that was running but we don't always have luck with that and have had to choose a level 1 or 0.



  • 3.  Re: Did you know...what various cold start levels in WA DE mean?

    Posted Nov 14, 2016 07:12 PM

    Hi Sharon and April,


    This is a common question that comes up often. I am working on a tech document to address this, but here is a first draft. Feedback is appreciated for the final doc.

    To answer your question one needs to understand what dSeries components are affected by a cold start. These are scheduler (anything to do with events), and the runtime (which refers to the active workload) in the system (mainly jobs and application).

     

    What happens when the scheduler is cold started?

     

    1. All timers associated with the scheduler is removed. Timers group events that should occur at the same time. For example, all the events that should be processed together (eg 10:00:00 AM) is stored in a single timer.

     

    2. All data in ESP_TDR_DATA table is truncated. This essentially wipes out all scheduled event triggers.

     

    What happens when a runtime (Distributed Manager) is cold started?

     

    1. All incoming messages to be processed by distributed manger and all outgoing messages to be sent from the distributed manager are cleared from the database. These include messages to and from the Agents, or other internal dSeries components.

     

    2. ESP_RT_WOB table is truncated. All active workload is lost. Jobs that were running on the Agent may complete, but manager would have lost track of it.

     

    3. Any timers associated with the distributed manager are cleared. These included job’s time dependencies, external dependencies, ….

     

    4. If the global.variables.cold.start set to true in (default is false) in runonce.properties, global variables are deleted.

     

    5. All variable dependencies removed

     

    6. All resources are reset to their initial values (run time values are lost).

     

    7. If the global.variables.cold.start is set true (default is false) all invalid applications are removed from ESP_APPLICATION table.

     

    8. All desktop client related data is truncated, including ESP_WSS_APPL and ESP_WSS_JOB table.

     

    9. Status message tables are truncated.

     

    Now what happens at each cold start level?

     

    start.type.level = -2

    Cold start the scheduler
    Cold start the runtime
    scheduleallevents command issued

     

    start.type.level = -1
    Cold start the scheduler
    Cold start the runtime
    scheduleallevents command issued
    Application generation count for all applications is set to zero

     

    start.type.level = 0
    Cold start the scheduler
    Cold start the runtime

     

    start.type.level = 1
    Cold start the runtime
    Warm start the scheduler.
    All scheduled events (ESP_TDR_DATA) are preserved and executed.
    All active workload is lost (ESP_RT_WOB).

     

    start.type.level = 2
    Cold start the scheduler
    Warm start the runtime
    This preserves all active workload.
    All scheduled events are lost.


    start.type.level 1 and 2 are useful depending on what you want to achieve. If you want to preserver your active workload, then 1, if you want to preserve your events, then 2. Majority of the time, we end up performing cold start when we have a data corruption or poorly maintained database which has degraded in performance to a point where we cannot be recover. Your data will dictate which start level to choose.

     

    Hope this explain the different cold start levels in more detail.

     

    Best regards,
    Pradeepan Gunabalasingam
    Principal Support Engineer



  • 4.  Re: Did you know...what various cold start levels in WA DE mean?

    Posted Oct 10, 2016 02:58 PM

    What would be the point of a level 2 cold boot?  From this description it sounds like a plain ole' restart.  

     

    Also, I am not sure I have ever heard an actual technical explanation of what a cold boot is actually doing from a pure technical perspective.  Do you have a technical description somewhere of a cold boot and exactly what is happening to the system during a cold boot?



  • 5.  Re: Did you know...what various cold start levels in WA DE mean?

    Posted Nov 14, 2016 07:12 PM

    Hi Sharon and April,


    This is a common question that comes up often. I am working on a tech document to address this, but here is a first draft. Feedback is appreciated for the final doc.

    To answer your question one needs to understand what dSeries components are affected by a cold start. These are scheduler (anything to do with events), and the runtime (which refers to the active workload) in the system (mainly jobs and application).

     

    What happens when the scheduler is cold started?

     

    1. All timers associated with the scheduler is removed. Timers group events that should occur at the same time. For example, all the events that should be processed together (eg 10:00:00 AM) is stored in a single timer.

     

    2. All data in ESP_TDR_DATA table is truncated. This essentially wipes out all scheduled event triggers.

     

    What happens when a runtime (Distributed Manager) is cold started?

     

    1. All incoming messages to be processed by distributed manger and all outgoing messages to be sent from the distributed manager are cleared from the database. These include messages to and from the Agents, or other internal dSeries components.

     

    2. ESP_RT_WOB table is truncated. All active workload is lost. Jobs that were running on the Agent may complete, but manager would have lost track of it.

     

    3. Any timers associated with the distributed manager are cleared. These included job’s time dependencies, external dependencies, ….

     

    4. If the global.variables.cold.start set to true in (default is false) in runonce.properties, global variables are deleted.

     

    5. All variable dependencies removed

     

    6. All resources are reset to their initial values (run time values are lost).

     

    7. If the global.variables.cold.start is set true (default is false) all invalid applications are removed from ESP_APPLICATION table.

     

    8. All desktop client related data is truncated, including ESP_WSS_APPL and ESP_WSS_JOB table.

     

    9. Status message tables are truncated.

     

    Now what happens at each cold start level?

     

    start.type.level = -2

    Cold start the scheduler
    Cold start the runtime
    scheduleallevents command issued

    start.type.level = -1
    Cold start the scheduler
    Cold start the runtime
    scheduleallevents command issued
    Application generation count for all applications is set to zero

    start.type.level = 0
    Cold start the scheduler
    Cold start the runtime

     

    start.type.level = 1
    Cold start the runtime
    Warm start the scheduler.
    All scheduled events (ESP_TDR_DATA) are preserved and executed.
    All active workload is lost (ESP_RT_WOB).

    start.type.level = 2
    Cold start the scheduler
    Warm start the runtime
    This preserves all active workload.
    All scheduled events are lost.


    start.type.level 1 and 2 are useful depending on what you want to achieve. If you want to preserver your active workload, then 1, if you want to preserve your events, then 2. Majority of the time, we end up performing cold start when we have a data corruption or poorly maintained database which has degraded in performance to a point where we cannot be recover. Your data will dictate which start level to choose.

     

    Hope this explain the different cold start levels in more detail.

     

    Best regards,
    Pradeepan Gunabalasingam
    Principal Support Engineer



  • 6.  Re: Did you know...what various cold start levels in WA DE mean?

    Posted Nov 16, 2016 10:35 AM

    This is all good information. 

     

    Ideally we would always like to preserver active workload and events next scheduled runs.  Next important would be to preserve active workload since we can so the CLI command "scheduleallevents". 

     

    There is no sure way to know which cold restart to choose first?  Choose the least impactful cold start if that doesn't work choose another?  

     

    Before any cold restart we run an sql for active workload so we can retrigger applications/jobs if needed.  I see you reference ESP_RTWOB for active workload, I run my sql against ESP_WSS_JOB.  do you have any thoughts and which one of these would be better to use.

     

    SQL we use before any cold restart to have incase we end up having to do one that doesn't preserve active workload.

    Select ESP_WSS_APPL.APPL_NAME,ESP_WSS_APPL.STATUS, ESP_WSS_JOB.JOB_NAME,ESP_WSS_JOB.STATE_AFM

    From Esp_Wss_Appl,ESP_WSS_JOB

    Where ESP_WSS_APPL.APPL_ID=ESP_WSS_JOB.APPL_ID

    and ESP_WSS_JOB.STATE_AFM like '%WAIT%';

     

    Sharon