Is anybody using CA monitoring tools (Spectrum, APM, UIM, SOI) to monitor the health of their AutoSys installation? I've been tasked with doing this, and have been running into dead ends. Since it's in Java, I can monitor WCC with CA APM, but there's not much I can find for the other components, other than the AutoSys command line tools.
I think I can write a PowerShell script that will parse the chk_auto_up output and push it to Spectrum as a trap, but it has to run in that special shell, which makes it more difficult.
Anybody doing anything interesting?
Just thinking aloud some options available, out of the box...
1) Autosys can forward SNMP traps to any SNMP manager, any alarm generated in Autosys is also forwarded as SNMP traps.
2) with Autosys 11.3 and higher, one can setup process monitor jobs to monitor any process on UNIX/NT systems. The challenge here is, if the Scheduler is down, then these jobs don't have much use.
3) As for chk_auto_up, it returns different exit codes based on the status of each component; event server, scheduler and Application server. Challenge is, if the application server is down, then chk_auto_up doesn't run.
I am eager to see what others suggest for options in this thread.
Thanks & Regards,
P.S: Sent from Android phone, please pardon any typos.
Using AutoSys tools/commands to monitor AutoSys itself is OK as a start but it needs external monitoring as well. I've set up a bunch of additional monitoring items for AutoSys using UIM probes for tings like processes, log files and also HA polling intervals getting missed.
To detect if the scheduler is running jobs, what if you run a cyclic job every 5 minutes and check its status from a script, in an infinite loop, with an "autorep -J <job> -d " | grep SUCCESS" command
Then parse this output to check that the Time of the SUCCESS event is always increasing by comparing the time of the last run with the previous run of the job.
If the application server is not running, then this script will also give you an error message.
And finally, you can also run 'chk_auto_up -r 111' from the script an analyze the output automatically from the script
In case of any problem, raise an alert by sending an email with any tool like sendmail on Linux