vCenter

 View Only
  • 1.  HHQ-2676 fix breaks pid checking on Solaris

    Posted May 22, 2009 04:49 PM
    According to http://jira.hyperic.com/browse/HHQ-2676, a fix was made to the 4.1.0 agent to ensure the PID is found by the wrapper script on Solaris in the case of really long ps "args." This works, but it breaks the testpid() function within the wrapper script, which expects $PSEXE to accept a -p argument.

    Here are some test results. This is executing a stop where the agent command is running. It does stop the agent (most of the time anyway), but it looks like something went wrong:

    solaris10sparc% ./bin/hq-agent.sh stop
    Stopping HQ Agent...
    /usr/ucb/ps: illegal option -- p
    usage: ps [ -aceglnrSuUvwx ] [ -t term ] [ num ]
    /usr/ucb/ps: illegal option -- p
    usage: ps [ -aceglnrSuUvwx ] [ -t term ] [ num ]
    Stopped HQ Agent.

    In this case, the agent is aleady running, and the wrapper script thinks the PID is missing and starts the agent again. Rut roh, 'raggy.

    solaris10sparc% ./bin/hq-agent.sh start
    Starting HQ Agent.../usr/ucb/ps: illegal option -- p
    usage: ps [ -aceglnrSuUvwx ] [ -t term ] [ num ]
    Removed stale pid file: /opt/hyperic/hq-agent/wrapper/sbin/../../wrapper/hq-agent.pid


    To fix this, I changed the instances of "testpid" to "getpid" in the stopit() function. This way the Solaris specific 'ps' command is used. I actually don't fully understand why testpid() exists in addition to getpid() in the script, as getpid() was already used to "test" whether the PID existed.

    In my testing, it keeps a second agent from being started, and prevents the error messages from being printed on stop. I've attached a diff from the 4.1.2-1053 agent I downloaded for Solaris. Note, it also has some changes from "echo -n" to "printf" to try to get the message "Agent starting..." to print out correctly.

    This is how it looks without changes:

    solaris10sparc% ./bin/hq-agent.sh start
    -n Starting HQ Agent...

    Even with these changes, it still doesn't print "Successful" after the agent starts up. But I wanted to at least post out this question to see if anyone else has seen this behavior on Solaris or if another fix was more obvious :)

    Cheers.


  • 2.  RE: HHQ-2676 fix breaks pid checking on Solaris

    Posted May 25, 2009 07:37 AM
    Hi edan,

    thanks for your posting and your fix. I have a Solaris 9 Box running an Agent 4.0.3 and because I'm planning to upgrade to 4.1.2 I will verify the things you've described.

    Cheers,
    Mirko