In my time with APM, I have seen both sides of the coin -- extensive architectural planning for APM deployment and the business impact of failing to do so.
Here are some suggestions and common scenarios on this overlooked topic:
The Proactive Approach
- Takes the time to appropriately define APM requirements for current and forecasted changes for the forthcoming future.
- Reviews one to four times a year their architecture against present and future requirements.
- Only captures the metrics, defects, etc. that it needs. Does not go with default APM configurations.
- Ties the above to overall monitoring/APM deployment strategy.
The Minimum Administration/Reactive Approach - Keeps adding Introscope agents, monitored business applications etc. until one day things breaks. - Captures all metrics turned on by default whether needed or not just because they may be needed some day. - Has that fateful day when performance and functionality is impacted and must review architecture reactively. Much like the unwary boiling
frog. http://en.wikipedia.org/wiki/Boiling_frog - Repeats the above cycle - Has no APM deployment/roadmap strategy
Troublesome Architecture Scenarios
There are many times that I see one of the following situations:
* No/minimal Architecture Planning was performed during deployment. CA Technologies may not have sanity-checked the final architecture. * An Architecture is selected and expectations are not adjusted according to choices made. An example may be higher expectations on a virtual environment than is possible with deployed architecture. * Not following best practices outlined in the APM Sizing and Performance Guide. One example is running all of the APM CE services on an undersized or 32-bit collector which can lead to various performance issues.
* The Architecture was sized correctly at the start. But the architecture is not reviewed periodically to keep up with changes in the environment.
My wish is not to see any of the above scenarios in future Support cases.
Time spent on architectural planning now may reduce or eliminate outage time in the future. Please consider the above for present and future deployments.