Thought I would add a Linux perspective. We use the open source
Xymon (formerly Big Brother and Hobbit) for monitoring many different aspects of Automic, as well as various other application tools my team owns. So far we've not seen the zombie process issue with any of the core components, except for the snmp agent-- for this we monitor the proc and also the log to ensure it stays fresh... same could be done with any other log.
We monitor:
- cpu
- memory
- disk
- files (logs)
- http (ECC)
- procs (WP/CP/smgr/core agents/SNMP/Tomcat/etc)
Xymon has a fairly minimalist GUI, but it's perfect for our needs. While at work, I leave the root Xymon Automic monitoring screen open in a tab, and will usually notice the tab favicon switch from "happy/green" to "angry/red" before an alert goes out (~10 minutes). Off hours it sends email and phone texts in some cases. Could also hook it into our paging system, but has not been necessary so far. We also get great trending graphs.
There is also a Windows version of the agent, but it's rooted in Linux.