Idea Details

Native Infrastructure Agent (CA IM lite :p)

Last activity 05-26-2016 06:39 PM
chrisjeagles's profile image
05-26-2016 06:35 PM

So Java monitoring (and generally any other type of monitoring) provided by Introscope monitors the container of an application and all the usage within but provides very limited information of the surrounding environment (the underlying OS) which could be a contributary cause to any issues seen.

This information is currently collectable by EP-Agent's though plugins and/or third-party scripts but the data and therefore colleced metrics lack context from the co-located agents.

With Introscope's future state in mind it would make a great deal of sense to provide tooling to provide this context so that when native issues are a cause Introscope has the context to identify that.

 

 

For example three commands come to mind when initially triaging a JVM issue in UNIX (these being things I cannot deduce from Introscope):

1) top/topas to identify CPU usage, memory usage and disk usage in relation to PIDs.

2) netstat -an to identify port states.

3) df -h to identify disk usage.

(I'm sure others know more but hey-ho)

 

If I have 30 JVMs on a server and that server has an errant process run that consumes all the available memory and/or CPU I will start to see the performance of my JVM deteriorate in Introsocpe but with very little context as to why. As far as Introscope is aware 30 JVMs have issues, they are on the same host and I should go investigate. The best case is I have an EP-Agent installed running custom scripts/plugins to provide me some sense of the detail to help me make the call on the process, how very descriptive (because..).

 

EP-Agent's providing this data means Introscope can't be improved to provide context to the data in an easily configurable way, and especially not as part of any planned features. At best without a dedicated Infrastructure agent you could allow Introscope Agents to pull the data from an EP-Agent (kind of  reverse-ACC comms) so the data can be passed back to Introscope in-context (a useful feature on its own I would think).

 

However by providing an infrastructure agent that collects and processes metrics from data similar to the mentioned commands (very basic stuff not warranting an entire CA IM product). You can provide contextual information on the state of the underlying OS (the PID underpinning the context). This data could be provided in one of two ways, by transmitting the data to the EMs directly (but lose a little context outside Team Center or the like) or ideally by allowing the Introscope agents to poll the agent for the metrics a bit like how ACC exposes a local port. That way you preserve the context in the investigator view as well. You then have a basis for identifying the PID causing a problem and highlighting it to the users or even identifying the problem application container in the case of memory leaks and again informing the users. By the agent being native it allows for the most accuracy from the OS and by it having a purpose rather than being a generic EP-Agent then the contextual correlation possibilities of the data can be fully utilised to in future product enhancements rather than by user custom configurations.