Hello Alan,
> We have quite a busy system and occasionally have Attached Events firing late.
> We are running r12.6
Consider upgrading. Depending on how busy "busy" is, and if it is warranted by the load, you may see some substantial benefits from switching to ITSM 17.2 and using the "Advanced Availability" mode. This gives each Application server its own virtual database, and its own channel to the database, resulting in much better load spreading of the SDM processes. The Animator process is still a singleton process, so it doesn't benefit from having multiple copies of itself, but it does mean that a lot of the rest of the load is moved away, which frees resources for the Animator process.
CA Service Desk Manager Considerations - CA Service Management - 17.2 - CA Technologies Documentation
Besides that, it gets you an updated technology stack and you'll probably update hardware to support the new configuration.
I no longer have access to that document. It's got one of our old "TEC" numbers, and I can't even see a copy in the Google Cache. Is that information about a one minute grace period still in any of the current DocOPs or "KB" prefix knowledge documents? A lot of the old TECs got retired if they were not applicable to the current releases.
Still, the reason that the suggestion is there is to avoid a race condition between the Animator entries and other housekeeping on the ticket when it is first Saved. I can't recall if there is anything hardwired about how often the Animator first checks in, but definitely there is housekeeping on the ticket that needs to complete first. A specific example is the Affected End User field which kicks off a bunch of checks for other fields on the ticket. As the Animator processes often includes conditional checks on field values, these should be present first - and issues have arisen where the Animator runs its checks before other fields have completed their population. The "one minute" recommendation is probably just a "common sense" value that someone decided on, rather than a strict programming "the Animator has a one minute limit before it can attach or fire."
The genuine underlying issue is the Animator firing late, and this can be from any number of causes:
* Too many Events for the system to handle.
* Configuration not appropriate for the number of Events.
* Hardware not appropriate for the system load.
These all tie into each other of course, but often you can find that one is more of a limiting factor than the others.
The Animator is often the visible sign of a performance bottleneck, simply because it is one of the most used processes on the system. It can be the workhorse of the system, and so delays are seen and felt here first. There may not be an Animator issue (although though there could be), but rather a performance issue elsewhere is having an impact.
This is where a general performance review should come in.
Here are some common things that we see with Animator.
* Called too frequently.
- Do you really need to check conditions every minute or 10 minutes, or can once an hour or once a day suffice?
- Too many Events for the reality of the business needs.
* Called when not needed.
- Would the functionality be better served by dedicated SPL code, changing business process to not use an Event or adding in a Workflow, or email? etc
* System overloaded.
- Are there additional domsrvr/webengine pairs to handle web client load?
- Are there secondary servers for web client load, knowledge, attachments, Web Services etc?
- Do other SDM processes need their own agents?
- Do the pdm_vdbinfo and dbagent commands reveal system stress?
- Is hardware sufficient for needs? (One CPU per domsrvr/webengine pair, for example).
- Where is the bottleneck? SDM process, database, network, CPU, memory, SQL query format etc.
* Are the Events efficient?
- Custom code and Events/Macros can slow processing if there are faults.
* Is the database overloaded?
- A busy system on SDM 12.6 may have got a large number of entries in tables like session_log, not_log_header, call_req etc which simply aren't needed and which draw unnecessary resources from the virtual database. Archive and Purge can free this up.
Really, a good review of the system is the only way to understand what is actually going on with performance delays. You'll also find guides to tuning performance in the DocOps for SDM. But any system from that time is likely to have outgrown its original planned size. What was a good system setup then, may not match to what is asked of it now. Or there may have been setup choices made then which were okay when a system is small, but which become an issue as a system gets larger, such as Tomcat memory allocation, number of DB agents, monitor_joins etc.
Thanks, Kyle_R.