I have a stall count metric going off, but the app team believe the method has returned (I don't know how that's happened, but either way we no longer consider it a stall).
Is there a way of telling the agent to disregard that invocation, without restarting the app?
I'm not sure I completely understand the issue you are facing, but you can try to configure the delay of response to be considered a stall; default value is 30 secs.
There is really never a case to change the stall interval.. It is a simple measure of application quality.
When something stalls, it simply means it took longer than 30 seconds - even if it does manage to return. While stall counts are especially useful when a transaction does not return, becasue the transaction trace cannot be completed in that case, the fact that the number is always increasing simply means the application has poor performance chartacteristics.
The 'job' for APM is to point out the performance issues and bottlenecks. Getting folks to fix them - that is another issue altogether.
I'm not actually talking about the period, that's fine as it is, the issue is that the method HAS returned, so the stall count should be back down to 0 now, but isn't. That said, what makes you think there's never a case to increase the period? What if you have an app that regularly takes longer than 30 seconds to return, or where anything over one second should be considered a stall?
I have seen this behavior recently. Your application experiences a stall but after things recover, the stall doesn't appear to decrement. Contact support, there is a known issue. They had supplied a patch for a customer that I know that experienced the same issue. Out of curiosity, is this a websphere environment?
No, it's TCServer (SpringSource) I think. I'll probably just wait to see if the bug's still there when we upgrade to 9.7.
Can you reproduce/isolate this behavior?
Is Concurrent Invocations returning to 0? It should go to 1 two intervals before the Stall Count.
Do you see a Responses Per Interval=1 and ART>30s in the interval when the method returns?
If none of that happens your method is really still running.
If the stall count remains > 0 and the method definitely has returned it is a bug and you should open a support ticket and not wait for 9.7.
Sam, just curious... have you resolved this issue or is it still ongoing? I too experience the persistent stalls even after they've released and everything returns to normal. Just curious if you were able to fix it and if so, what was the solution?