VMware vSphere

 View Only
Expand all | Collapse all

Multiple agents gets started and runs out of memory

  • 1.  Multiple agents gets started and runs out of memory

    Posted Feb 11, 2009 12:12 AM
    Hi,
    I have an agent running on the same box as server and agent starts successfully but over the time their are more agents created automatically and each agent keeps on consuming memory and finally machine runs out of memory.

    I have other agents running on other machines and they are able to communicate with server and they are running perfectly fine and each machine has only one agent running , Do any one have idea why multiple agents gets created for the agent on the same machine as Hyperic server?

    Thanks,
    Kishore.


  • 2.  RE: Multiple agents gets started and runs out of memory

    Posted Feb 11, 2009 12:23 AM
    Hi,

    what OS is this ? Which HQ Agent Version are you running ?

    Cheers,
    Mirko


  • 3.  RE: Multiple agents gets started and runs out of memory

    Posted Feb 11, 2009 04:08 PM
    Hi,
    The OS is :Solaris 10 Sun Sparc
    Hyperic Agent : hyperic-agent-3.2.3-EE

    Thanks,
    Kishore.


  • 4.  RE: Multiple agents gets started and runs out of memory

    Posted Feb 11, 2009 05:45 PM
    Hi,

    could you please upgrade your Agent to the latest 3.x version (3.2.6) or to 4.0.3 and report if the error still occurs ?

    Cheers,
    Mirko


  • 5.  RE: Multiple agents gets started and runs out of memory

    Posted Feb 12, 2009 03:16 PM
    Can you please send me the link which mentions the step by step approach of upgrading the Hyperic Client?

    I will give it a try.

    Thanks.


  • 6.  RE: Multiple agents gets started and runs out of memory

    Posted Feb 12, 2009 06:35 PM
    Hi,

    documentation is available right here: http://support.hyperic.com/display/DOC/Upgrade+HQ+Components

    Cheers,
    Mirko


  • 7.  RE: Multiple agents gets started and runs out of memory

    Posted Feb 12, 2009 06:59 PM
    Could this be related to bug in jre, which spawns extra hq java processes. For me it happened when solaris jre did a fork to run external scripts. There is at least 2 support cases in jira for this issue, with workarounds.

    Hard to say until there's stack dumps from jre and os, thought.


  • 8.  RE: Multiple agents gets started and runs out of memory

    Posted Feb 12, 2009 07:06 PM
    Hi Janne,

    OS users do not have access to JIRA support cases, so could your probably post a workaround ?

    Cheers,
    Mirko


  • 9.  RE: Multiple agents gets started and runs out of memory

    Posted Feb 12, 2009 07:19 PM
    This was the situation:
    These are processes shown by ps:
    root 20175 1 0 Dec 18 ? 21:41 /opt/hyperic/hyperic-hq-agent-4.0.1-EE/wrapper/sbin/../../wrapper/sbin/wrapper-
    root 20176 20175 0 Dec 18 ? 246:46 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
    root 1771 20176 0 Dec 25 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
    root 11942 20176 0 Jan 01 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
    root 15521 20176 0 Jan 03 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
    root 18470 20176 0 Jan 05 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
    root 20349 20176 0 Jan 06 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
    root 24932 20176 0 06:20:16 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..
    root 24007 20176 0 17:30:16 ? 0:00 /usr/java/bin/java -Djava.compiler=NONE -Djava.security.auth.login.config=../..

    As you can see that original process(20176) is started by wrapper. All other stucked childs are forked by this main process, which you see by comparing parent id's.

    Snippets from jstack:
    #/usr/jdk/jdk1.5.0_14/bin/jstack 20176
    Thread t@58166: (state = IN_NATIVE)
    - java.lang.UNIXProcess.waitForProcessExit(int) @bci=0 (Interpreted frame)
    - java.lang.UNIXProcess.access$900(java.lang.UNIXProcess, int) @bci=2, line=17 (Interpreted frame)
    - java.lang.UNIXProcess$2$1.run() @bci=17, line=86 (Interpreted frame)

    #/usr/jdk/jdk1.5.0_14/bin/jstack 1771
    Thread t@109: (state = IN_NATIVE)
    - java.lang.UNIXProcess.forkAndExec(byte[], byte[], int, byte[], int, byte[], boolean, java.io.FileDescriptor, java.io.FileDescriptor, java.io.FileDescriptor) @bci=0 (Interpreted frame)
    - java.lang.UNIXProcess.<init>(byte[], byte[], int, byte[], int, byte[], boolean) @bci=62, line=53 (Interpreted frame)
    - java.lang.ProcessImpl.start(java.lang.String[], java.util.Map, java.lang.String, boolean) @bci=182, line=65 (Interpreted frame)
    - java.lang.ProcessBuilder.start() @bci=112, line=451 (Interpreted frame)
    - java.lang.Runtime.exec(java.lang.String[], java.lang.String[], java.io.File) @bci=16, line=591 (Interpreted frame)
    - org.hyperic.util.exec.Execute.execute() @bci=16, line=316 (Interpreted frame)
    - org.hyperic.hq.product.ExecutableProcess.collect() @bci=98, line=202 (Interpreted frame)
    - org.hyperic.hq.product.Collector.run() @bci=41, line=562 (Interpreted frame)
    - edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.runWorker(edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker) @bci=46, line=1061 (Interpreted frame)
    - edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=575 (Interpreted frame)
    - java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)

    At this time below found from agent.log:
    2008-12-25 20:17:55,342 INFO [pool-1-thread-12] [Execute] waitFor() interrupted
    2008-12-25 20:17:57,359 ERROR [pool-1-thread-12] [ExecutableProcess] [../../bundles/agent-4.0.1-EE-905/pdk/work/scripts/sendmail/hq-sendmail-stat]: Timeout
    running [../../bundles/agent-4.0.1-EE-905/pdk/work/scripts/sendmail/hq-sendmail-stat ]
    -------------------------------------------------

    Workarounds are:
    - See http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6276483 and apply workaround #2 (jre/lib/security/java.security) to your JRE
    - Add plugins.exclude=ntp,sendmail to your agent.properties (so exclude plugins which runs external scripts)

    Modifying java.security resolved my problems.
    -----------------------------------------------------

    This specific issue was related to x86 solaris. But it may also happen in sparc. I've seen these spawned processes also on sparc once. Unfortunately I was too quick to restart agent and I forget to store jstack and pstack outputs from the processes. So I'm not exactly sure if this is the case.

    It's nasty issue with 1.5 java. Only fixed on 1.6 and I believe Sun wont backport the fix to older jre's.


  • 10.  RE: Multiple agents gets started and runs out of memory

    Posted Feb 12, 2009 07:26 PM
    Also removing 'security.provider.1=sun.security.pkcs11.SunPKCS11 ${java.home}/lib/security/sunpkcs11-solaris.cfg' from java.security will brake agent.

    Jre will expect to find default provider which is the first one. This wasn't that clear in workaround. So after removing security.provider.1 rename security.provider.2 to security.provider.1. security.provider.3 to security.provider.2, etc....


  • 11.  RE: Multiple agents gets started and runs out of memory

    Posted Mar 02, 2009 05:25 PM
    I finally found this bug to happen also on Solaris sparc. Process dumps and thread dumps from solaris is showing exact match if comparing to Solaris x86.

    I've done same workaround by modifying java.security. We'll see within few days whether this fix works or not.