Few Minutes ago I received an alert that one of the collector is disconnected from the MOM. when I check the logs I found following ERROR in logs.
3/28/18 12:00:38.141 PM EDT [ERROR] [Carver data management] [Manager] Uncaught Exception in Enterprise Manager: In thread Carver data management and the message is java.lang.OutOfMemoryError: Java heap space
This error doesn't make sense because this is our new cluster have 16 GB of physical memory in which 12 GB of heap assigned.
This is not the fist time we are getting this problem almost every week. There is no agent connected to this cluster. I build it for our future need.
Here is the Cluster Details.10.5.2 SP2 "2017-10-31 15:15:52,880 [main] INFO com.ca.fix.apm.hotfix.Hotfix - Hotfix 10.5.2-HF16 was applied successfully"
SUSE Linux Enterprise Server 12 (x86_64)VERSION = 12PATCHLEVEL = 2
Linux hostname 4.4.74-92.35-default #1 SMP Mon Aug 7 18:24:48 UTC 2017 (c0fdc47) x86_64 x86_64 x86_64 GNU/Linux
Are you seeing a high usage in Garbage Collection?
What are the GC settings you have for the EM? What are the Xms and Xmx settings for the EM?
If there are no agents connected, then something is triggering the heap to be used. On an Agentless EM, one culprit would be smartstor reperiodization.
Another thought is, if there is any antivirus or other outside process hitting the RAM hard, this could affect the heap on the EM and the EM could run out of memory.
If you run the top command, what are your top processes taking up RAM?
Beside Muskaloon suggestion, If the Xmx are properly set to have 12gb and it keep getting OOM, try to collect a HeapDump from the process so you can understand what exactly is happening.
yes I can see high usage in Garbage Collection
value is Xms12288m -Xmx12288m physical Memory 16 GB
%Cpu(s): 0.2 us, 0.2 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 stKiB Mem: 16387720 total, 3190672 used, 13197048 free, 584968 buffersKiB Swap: 4194300 total, 273396 used, 3920904 free. 1120280 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND21049 apmadm 20 0 14.726g 1.007g 36672 S 0.332 6.442 1:34.78 java 1 root 20 0 185048 5348 3888 S 0.000 0.033 3:32.84 systemd 2 root 20 0 0 0 0 S 0.000 0.000 0:02.11 kthreadd 3 root 20 0 0 0 0 S 0.000 0.000 0:12.82 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.000 0.000 3:15.99 rcu_sched 8 root 20 0 0 0 0 S 0.000 0.000 0:00.00 rcu_bh 9 root rt 0
Ok, good that you have 12 GB set for both.
What GC settings are you using? Are you using Java 1.8? For large heaps on Java 1.8, use G1GC in this manner instead of the current GC settings you are using.
Just replace -XX:+UseParNewGC with -XX:+UseG1GC?????
lax.nl.java.option.additional=-Xms12288m -Xmx12288m -Djava.awt.headless=true -Dmail.mime.charset=UTF-8 -Dorg.owasp.esapi.resources=./config/esapi -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Xss512k
That is part of it.
Remove -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
And then add -XX:+UseG1GC
Thanks for t he help. lax file has been updated. I will keep watching GC Heap and will let you guys know if I found the same kind of problem in future.