I have an environment with one MoM and 8 collectors using CA APM 9.5.2. When I do a recycle of the environment, it seems that the following happens:
1) The collectors that get started up first get handed the majority of the agents by the MoM
2) These collectors get completely overloaded because 5 or 6 of the other collectors are sitting there doing nothing
3) Because these collectors are overloaded, they never respond to the MoM for future load balancing requests so they sit there with all the agents overloaded throwing errors
4) The other collectors never get handed over any agents via the load balancer
So has anyone else seen this behavior? What can I do to stop it from happening?
We have had a similar issue where one of our collectors take the largest amount of traffic. There are a couple of things to look at. First insure your agents are configured to point to your EM-MOM and not a collector directly. Second, the agents have an affinity to the EM that has data for that agent. With the Affinity, you may need to delete the smartstor data (completely or if you feel lucky, with the smartstor tool to remove a number of agents.
Is it always the same EM? If so, then change the order you start them to see if the EMs that aren't taking traffic will pick up anything.
Check within the Introscope Investigator when all of the EMs are running to insure the MOM can communicate with each of them. Each of the collectors should be listed under
*SuperDomain*|Custom Metric Host (Virtual)|Custom Metric Process (Virtual)|Custom Metric Agent (Virtual)|Enterprise Manager|MOM|Collectors
Also, look into the loadbalance.xml file. Within it, you can direct specific agents to a named collector. Double check within this file to insure you don't have all your agents reporting to specific collectors.
In the IntroscopeEnterpriseManager.properties file, there is a weight property for the EM collectors that might be useful. (I'm on 18.104.22.168)
Read up on the property "introscope.enterprisemanager.loadbalancing.threshold" it is how far from the average the EM has to be to start to rebalance. So if you have 180k metrics and three collectors, then a collector can have from 40k to 80k metrics reporting to it with a threshold set to 20,000.
Hope this helps,