If you are an APM Admin/customer or CA partner, we would like you to be the first to respond to this important question. Thanks in advance for helping your neighbor.
try to find the bottleneck using following link
also try to increase heap size as much as you can. if you have 16 GB physical assign 12 GB heap.
use wild card in your metric grouping as less as you can. Reduce calculators. Reduce number of metrics / agent.
try to deploy Infrastructure monitoring Management module and see
can you share the following.
Q1 how many collectors in on cluster do you have?
Q2 how many Agent do you have in your cluster
thanks for your reply.
5 collectors, rigth now one collector aout of cluster
66 agents are nothing for 5 collectors.
do the following
In IntroscopeEnterpriseManager.properties set:
Step 2: Max heap size
In Introscope_Enterprise_Manager.lax or or EMService.conf set:
Change -Xms and Xmx value at least 6 GB. if you have total 16 GB assign 12 GB heap size.
if you still facing problem grep ERROR and WARN in IntroscopeEnterpriseManager.log and share with us. Delete logs before restarting so you will have fresh logs.
I will review and apply your suggestions. This implementation is six years old, and when we received it the previous team was already doing bad performance, I think that more than 60% of the metrics that are registered are not used and partly because the agents are not 100% customized to the real need.
In my previous answers does not appear the cross EM (CE) that we have for now out of the cluster and that they average more than 3000 metrics per LPAR (based on a default installation for CE and SysView).
Once again, thank you very much for your help.
What EM and Agent Version you have?
Also try to find how many metrics / agent you are receiving. I believe that some of your agents have more then 50K metrics which is causing this problem.
there is a supportability MM in your management module editor. Can you send me the "metrics by agent" metrics grouping screenshots.
For now we are in a hybrid platform and we are doing the version upgrade from 10.0.0.12 to 10.5.1.8
But the performance problems are dragged from 9.7.
The collectors are in 10.5 the agents follow in 10.0
(yellow mark) This agent is a EPAgent, is a custom old implementation, generates many metrics from POS messages analysis,but into the dashboards we are not using all metrics. Originally, these metrics were sent to Datamart, but this information is not downloaded today.
I am 100% sure that this agent is a culprit. try to find following line is IntroscopeEPAgent.profile file and uncomment it. CA recommend clamp value 5000 but you can put 10000 if you want.
Restart the agent to reflect the changes. I hope this will solve your problem.
As soon as I can plan the revision of the EP agent, unfortunately my customer has no testing environment, so I must coordinate the change to do it in the production environment.
I've review IntroscopeEPAgent.profile file and it have this definition
I never saw EPAgent has 150K metrics. What are you monitoring with this agent? can you share the plugins you enable in this EPAgent. also delete 0 from the clamp value and every thing will be good.
Thank you for your participation and helping Fernando. Greatly appreciated!
Our client is a bank and enabled a few years ago to carry out some banking operations from stores, as if they were made from a bank branch. These operations are performed through POS devices.
Without going into more detail regarding communication, these operations are received in MQServer and processed. For its monitoring, we replicate the MQ queue in another mqserver, some time ago an EPA plugin was implemented that generates metrics from another java process that is reading the MQ queue and analyzing the messages. In this way, performance metrics and statistics metrics of the transactions are generated according to the bank's requirement.
I do not know the problems for the previous support team and the reasons for both this design and the configurations that were applied to the EPA.
From my point of view, this whole process is totally inefficient and today it generates metrics that are useless.
However, thanks to your indications I have been able to sustain the inefficiency of the processes, in such a way that my client authorizes me to redo or improve the process of these processes that have been running for several years under these conditions.
Again, thank you very much for your help!!
Thanks Fernando for sharing your questions/experiences. Much appreciated!
Quite the contrary, thank you very much to all for clarifying my doubts and sharing your knowledge.
When you start to talk about heap sizes greater than 8 - 10GB, you should be looking at enabling G1GC. You'll want to test to ratios for the configurations in your test environment, of course.