DX Infrastructure Manager

  • 1.  How Kill a Process

    Posted 05-14-2010 10:02 PM

    What options are available to kill a process that's identified by the processes probe (with PID)?  We have some software which sometimes runs away with CPU, and I can identify this via the processes probe.  What I need next is to be able to kill the process to keep it from cratering the server and impacting overall performance.  The processes will auto-restart naturally.

     

    Thanks!

    Toby



  • 2.  Re: How Kill a Process

    Posted 05-14-2010 10:07 PM

    Good news--that is pretty easy to do. The processes probe has a callback named kill_process that does the trick. Here is how you would call it in Lua:

     

    args = pds.create()
    pds.putInt(args, "pid", process_id)
    nimbus.request (probe_addr, "kill_process", args)

     

    If you are using the processes probe to send an alarm when the process uses too much CPU time, you could create an AO profile that calls a script to kill the process when the alarm comes in.

     

    -Keith



  • 3.  Re: How Kill a Process

    Posted 05-14-2010 10:18 PM

    Fantastic, thanks!  I'm new to scripting nimsoft, though... Can you elaborate on AO?

     

    Thanks!

    Toby



  • 4.  Re: How Kill a Process

    Posted 05-14-2010 10:26 PM

    Sure thing! The auto-operator (AO) in the NAS (Nimsoft alarm server) can be configured to pre-process alarm messages or react to certain alarms. In the NAS GUI, select the Auto-Operator tab to configure it. The Profiles sub-tab lets you configure what the AO does in response to certain alarms. One of the options is to run a script.

     

    I have a script that I used to kill a WMI process when it used too much memory. Here is the entire script:

     

    -- Get alarm that fired script
    al = alarm.get()

    -- Make sure the alarm came from the processes probe
    if al.prid ~= "processes" then
       print("Invalid probe '"..al.prid.."' in alarm message - no restart")
       return
    end

    -- Get PID from alarm message
    pid = tonumber(string.match(al.message, "%(PID ([%d]+)%)"))
    if pid == nil then
       print("Failed to get PID from alarm message: "..al.message)
       return
    end

    -- Create PDS with 'pid' argument set to process ID
    args = pds.create()
    pds.putInt(args, "pid", pid)

    -- Build robot address from alarm fields
    robot_addr = "/"..al.domain.."/"..al.hub.."/"..al.robot

    -- Kill process
    resp, rc = nimbus.request(robot_addr.."/processes", "kill_process", args)
    pds.delete(args)
    if rc ~= NIME_OK then
       print("Failed to kill process "..pid.." on "..robot_addr.." (rc = "..rc..")")
    else
       print("Killed process "..pid.." on "..robot_addr)
    end

     

    To make this work, I had to define a custom alarm message in the processes probe that put the PID into the alarm and made it easy for me to extract in the script.

     

    -Keith



  • 5.  Re: How Kill a Process

    Posted 05-15-2010 12:43 AM

    This is great, thanks!

     

    Can you post how you modified the QOS message to include PID?  Can it also show things like process owner?

     

    Thanks!

    Toby



  • 6.  Re: How Kill a Process

    Posted 05-15-2010 02:50 AM

    I created a custom version of the MsgProcSize alarm with the following text:

     

    $watcher: Process $process (PID $pid) memory usage $size KB $which $expected_size KB

     

    When you are editing an alarm message in the processes probe GUI, you can type the dollar sign, and the GUI will list all of the variables available. I think $user is what you are looking for.

     

    -Keith