CA Workload Automation DE (dSeries) server has 5 possible levels for a Cold Start. Let's see what these 5 different levels are and how they are different from one another.
The cold start process is purposed to clear out active workload and delete the events' schedules. Below are the different levels of Cold Start available in Workload Automation DE:
start.type.level = -2
The server starts with normal cold start, but scheduled events that have not been processed at the time of the shutdown will be re-scheduled. Events that require a manual trigger are not preserved.
start.type.level = -1
The server starts with a normal cold start, the generations count for all applications is set to zero - these applications will also be purged.
start.type.level = 0
The server cold starts its scheduler and active workload (nothing is preserved).
start.type.level = 1
The server starts with the cold start, but all scheduled events are preserved. Events that were not yet processed at the time of the shutdown will be triggered after the server is started, based on prerequisites. Events that require a manual trigger are also preserved.
start.type.level = 2
The server starts with the cold start, but active workload is preserved. Workload that was running at the time of shutdown continues to run after the server is started.
How do you know which one to pick. Ideally we would always want to do level 2 to preserver workload that was running but we don't always have luck with that and have had to choose a level 1 or 0.
Hi Sharon and April,
This is a common question that comes up often. I am working on a tech document to address this, but here is a first draft. Feedback is appreciated for the final doc.
To answer your question one needs to understand what dSeries components are affected by a cold start. These are scheduler (anything to do with events), and the runtime (which refers to the active workload) in the system (mainly jobs and application).
What happens when the scheduler is cold started?
1. All timers associated with the scheduler is removed. Timers group events that should occur at the same time. For example, all the events that should be processed together (eg 10:00:00 AM) is stored in a single timer.
2. All data in ESP_TDR_DATA table is truncated. This essentially wipes out all scheduled event triggers.
What happens when a runtime (Distributed Manager) is cold started?
1. All incoming messages to be processed by distributed manger and all outgoing messages to be sent from the distributed manager are cleared from the database. These include messages to and from the Agents, or other internal dSeries components.
2. ESP_RT_WOB table is truncated. All active workload is lost. Jobs that were running on the Agent may complete, but manager would have lost track of it.
3. Any timers associated with the distributed manager are cleared. These included job’s time dependencies, external dependencies, ….
4. If the global.variables.cold.start set to true in (default is false) in runonce.properties, global variables are deleted.
5. All variable dependencies removed
6. All resources are reset to their initial values (run time values are lost).
7. If the global.variables.cold.start is set true (default is false) all invalid applications are removed from ESP_APPLICATION table.
8. All desktop client related data is truncated, including ESP_WSS_APPL and ESP_WSS_JOB table.
9. Status message tables are truncated.
Now what happens at each cold start level?
Cold start the schedulerCold start the runtimescheduleallevents command issued
start.type.level = -1Cold start the schedulerCold start the runtimescheduleallevents command issuedApplication generation count for all applications is set to zero
start.type.level = 0 Cold start the schedulerCold start the runtime
start.type.level = 1 Cold start the runtime Warm start the scheduler. All scheduled events (ESP_TDR_DATA) are preserved and executed. All active workload is lost (ESP_RT_WOB).
start.type.level = 2 Cold start the schedulerWarm start the runtime This preserves all active workload.All scheduled events are lost.
start.type.level 1 and 2 are useful depending on what you want to achieve. If you want to preserver your active workload, then 1, if you want to preserve your events, then 2. Majority of the time, we end up performing cold start when we have a data corruption or poorly maintained database which has degraded in performance to a point where we cannot be recover. Your data will dictate which start level to choose.
Hope this explain the different cold start levels in more detail.
Best regards,Pradeepan GunabalasingamPrincipal Support Engineer
What would be the point of a level 2 cold boot? From this description it sounds like a plain ole' restart.
Also, I am not sure I have ever heard an actual technical explanation of what a cold boot is actually doing from a pure technical perspective. Do you have a technical description somewhere of a cold boot and exactly what is happening to the system during a cold boot?
Cold start the schedulerCold start the runtimescheduleallevents command issued start.type.level = -1Cold start the schedulerCold start the runtimescheduleallevents command issuedApplication generation count for all applications is set to zero start.type.level = 0 Cold start the schedulerCold start the runtime
start.type.level = 1 Cold start the runtime Warm start the scheduler. All scheduled events (ESP_TDR_DATA) are preserved and executed. All active workload is lost (ESP_RT_WOB). start.type.level = 2 Cold start the schedulerWarm start the runtime This preserves all active workload. All scheduled events are lost.
This is all good information.
Ideally we would always like to preserver active workload and events next scheduled runs. Next important would be to preserve active workload since we can so the CLI command "scheduleallevents".
There is no sure way to know which cold restart to choose first? Choose the least impactful cold start if that doesn't work choose another?
Before any cold restart we run an sql for active workload so we can retrigger applications/jobs if needed. I see you reference ESP_RTWOB for active workload, I run my sql against ESP_WSS_JOB. do you have any thoughts and which one of these would be better to use.
SQL we use before any cold restart to have incase we end up having to do one that doesn't preserve active workload.
Select ESP_WSS_APPL.APPL_NAME,ESP_WSS_APPL.STATUS, ESP_WSS_JOB.JOB_NAME,ESP_WSS_JOB.STATE_AFM
and ESP_WSS_JOB.STATE_AFM like '%WAIT%';