Blog Viewer

How I Blew Up my First APM Cluster

By Hallett German posted 03-06-2016 07:32 AM


[Some months ago, I received an oddly-shaped oversized package in the mail with no return address. It contained a barely readable handwritten document and a Betamax video. I reflected for some time whether to release or not. But with April Fools' Day around the corner, it seemed like a good time to publish. Unfortunately, the video had nothing on it but the manuscript provided the following chilling tale. Read at your own risk. ]


This may the last thing that I ever write. I have been running for such a long time and about to be cornered.


Oh my company, I am so sorry for what I did. If only I could go back in time and undo my misdeeds.


It started so simply. I was hired as an Application Performance Management administrator at Glowski's Insurance, a fifty person company, gently nestled somewhere in a western mountain range.

My whole job was to keep the application performance metrics coming in and the dashboards lit up. I wasn't given much training so I let pretty much things run themselves. That was my first mistake. The monitoring software needs to be managed actively.


I also aggravated the situation by adding many more applications to monitor. That made the application teams happy while they could graphically depict how their systems were doing in peak times, analyze historical trends, and much more.

That was mistake #2. You should never make a system do more than it is capable of handling.


As I increased the load on the monitoring system, I kept the default settings and failed to adjust them to the new conditions. That was my biggest and last mistake. Capacity monitoring and configuration modifications are always needed when conditions change.


Then one day, all things came to a head. The system became sluggish, dashboards and graphs were blank, and the application went down. The company was running blind. All heads started to turn in my direction with scornful looks. I could not take it any more and fled.


So if you are an APM admin, I beg and plead with you, don't follow my example:


- Read the manuals to understand what is going on.
- Tune your settings to increases in metrics and monitored applications.
- Study if later releases can improve performance.
- Review the logs twice a week at least to see what is happening.


Looks like this is it. I see bright lights flooding the windows and the pounding steps of many people heading ever closer to my door. If only I...


[The manuscript ends there. What happened afterwards may never be known.]