Automic Workload Automation

Expand all | Collapse all

How to detect a crashed WP service (V9)

  • 1.  How to detect a crashed WP service (V9)

    Posted 12-29-2014 08:06 PM
    I know this has been discussed before, but I've had difficulty finding those discussions.

    We've had some WP services crash now and again for database deadlock issues which we are working on.  But in the mean time we would like an automated way to detect and alert us when a WP service stops.  Even a message in the message table (meld) would be useful, but there are none that I have found.   Currently we are manually checking the WP's several times a day.  What do others do?

    I'm considering writing something to scrape the WP log files periodically to check for the deadlock messages.  There are no "WP IS ENDING" messages in the logs though, so it would only be a partial solution as it would only detect one type of WP failure.

  • 2.  How to detect a crashed WP service (V9)

    Posted 12-30-2014 06:19 PM

    We run UC4 on a windows server, so I am implementing a PowerShell script solution.  I'm asking the windows scheduler to run the PowerShell script regularly, so this new alarm system will function even if UC4 has no WPs running.  The PowerShell script counts how many CP and how many WP tasks are running, and if they don't meet our expectations, it sends email alarms.  (I feel much better now that we don't have to monitor this manually!)

  • 3.  How to detect a crashed WP service (V9)

    Posted 12-30-2014 06:57 PM

    Database Method:

    The HOST table shows the WP's as well as the Agents.  You can use the OH table to find only the "SERV" items (WP, CP, etc), and then the AH table to find items that are online (TImeStamp4 is not empty for at least one of them).  Then since its a LEFT join, find the items that are missing.  The results are the offline WP and CP's.

    select * from HOST JOIN OH      on OH.OH_Idnr = HOST.HOST_OH_Idnr      and OH.OH_OType = 'SERV'      and OH.OH_DeleteFlag = 0 LEFT JOIN AH      on AH.AH_OH_Idnr = OH.OH_Idnr      and AH.AH_OType = 'SERV'      and AH_TimeStamp4 is null where AH.AH_Name is null

  • 4.  How to detect a crashed WP service (V9)

    Posted 12-30-2014 07:11 PM

    Service Manager Command Line Method:

    With scripting you can use the Automic utility [ucybsmcl.exe]: 

  • 5.  How to detect a crashed WP service (V9)

    Posted 12-30-2014 08:01 PM
    There's a also an Automic script function that checks status called sys_server_alive(). You can use it in a loop to check all the server processes, either from a vara you setup or from an sqli query like Jeremy used above.

    If your server process names are in a predictable sequence you should be able to check them all by starting at 1 and looping until you find one that doesn't exist:

    :set &i# = 1 :define &processType#, string, 2 :set &processType#[1] = "WP" :set &processType#[2] = "CP" :while &i# <= 2 :  print "Starting &processType#[&i#] check." :  set &count# = 1 :  set &count# = format(&count#, "000") :  set &processName# = "&$SYSTEM##&processType#[&i#]&count#" :  set &ret# = sys_server_alive(&processName#) !  Return code 20349 indicates process does not exist, so stop loop. :  while &ret# <> 20349 :    if &ret# = "N" !    Do something to alert :      print " &processName# is down!" :    else :      print " &processName# is OK!" :    endif :    set &count# = &count# + 1 :    set &count# = format(&count#, "000") :    set &processName# = "&$SYSTEM##&processType#[&i#]&count#" :    set &ret# = sys_server_alive(&processName#) :  endwhile :  print " &processName# does not exist. &processType#[&i#] check is complete." :  print "" :  set &i# = &i# + 1 :endwhile

  • 6.  How to detect a crashed WP service (V9)

    Posted 01-02-2015 11:39 AM
    It great to have all of these options documented in one thread.  Thanks!

  • 7.  How to detect a crashed WP service (V9)

    Posted 01-05-2015 06:35 AM
    Hi all,
    We are running Automation Engine on Windows and one issue we have faced on several occasions is that the WP/CP server process is running according to ServiceManager/windows Task Manager but inside the GUI the server proccess is dimmed (=stopped). In these cases we cannot rely on the answer from ServiceManager/Windows Task Manager wether a certain WP/CP is active or not.

    I like the idea of having the Automic script SYS_SERVER_ALIVE to check the Server processes, but instead of querying over and over again, I would prefer to set up some kind of notification as soon as a WP/CP prosess stops (similar to EXECUTE_ON_END when an Agent ends and EXECUTE_ON_START when an Agent starts) .
    -> Question: How do I set up this kind of monitoring up for Server Processes ??

  • 8.  How to detect a crashed WP service (V9)

    Posted 01-05-2015 10:01 PM
    I'm confused @Keld Mollnitz , what program's GUI is showing you a greyed out WP/CP?

    EDIT: I think you mean the System Overview > Automation Engine section.  Yeah if EXECUTE_ON_END doesn't work for WP/CP, I'm not sure what else there could be.  I would suggest making sure that the WP/CP names match, as what is displayed in the GUI/Database/Dialog may not all match up.

  • 9.  How to detect a crashed WP service (V9)

    Posted 01-06-2015 03:06 AM
    Yes, Jeremy, I mean System Overview -> Automation Engine section. Some times we have seen that one or more of WP/CP are grayed out but when you go to the Windows server where Automation engine is installed and open ServiceManagerdialog then all Server processes are running. This is what I refer to a zombie process...

  • 10.  How to detect a crashed WP service (V9)

    Posted 01-06-2015 05:18 PM
    Thought I would add a Linux perspective. We use the open source Xymon (formerly Big Brother and Hobbit) for monitoring many different aspects of Automic, as well as various other application tools my team owns. So far we've not seen the zombie process issue with any of the core components, except for the snmp agent-- for this we monitor the proc and also the log to ensure it stays fresh... same could be done with any other log.

    We monitor:
    • cpu
    • memory
    • disk
    • files (logs)
    • http (ECC)
    • procs (WP/CP/smgr/core agents/SNMP/Tomcat/etc)
    Xymon has a fairly minimalist GUI, but it's perfect for our needs. While at work, I leave the root Xymon Automic monitoring screen open in a tab, and will usually notice the tab favicon switch from "happy/green" to "angry/red" before an alert goes out (~10 minutes). Off hours it sends email and phone texts in some cases. Could also hook it into our paging system, but has not been necessary so far. We also get great trending graphs.

    There is also a Windows version of the agent, but it's rooted in Linux.

  • 11.  How to detect a crashed WP service (V9)

    Posted 01-15-2015 06:01 AM

    We are using a Console Event object (EVNT.CONS) wich listens all the time on the Windows Event log for a WP Error message, and if so, sends an email via the !Script. Works like a charm.

  • 12.  How to detect a crashed WP service (V9)

    Posted 01-22-2015 05:05 AM
    MikeBurnham603785    Thank you for that, it was exactly what i was looking for. 

    The results are shown in the " activation tab"  instead of the " report"  Is it possible to set up a trigger/escalation based on the result in the " activation tab" in case one of the processes is marked as " down" ? Should i use the PostProces or is there another way?

  • 13.  How to detect a crashed WP service (V9)

    Posted 07-02-2015 04:10 PM
    I would set this up in a SCRI object rather than a JOBS, since there is nothing to be done by an agent (at least in the code I posted). You could trigger an alert of some kind in many different ways from the SCRI. For example, you could keep a count of down processes and then directly send an email if count > 0 at the end using send_mail(), or you could do an :exit 1 and get an object to activate via the Runtime tab, or store the results in a VARA and put the SCRI in a JOBP with other task following it that look at the VARA for results, etc.

  • 14.  How to detect a crashed WP service (V9)

    Posted 08-09-2015 06:46 AM
    Thanks for the script MikeBurnham603785. We have been seeing deadlocks in the DB recently and needed this solution.

    I modified it with this line:
    :      print " &processName# is down!"

    :  SET &ACT# = MODIFY_SYSTEM("STARTUP" , &processName#)

    :    else

    So the process gets restarted also.

  • 15.  How to detect a crashed WP service (V9)

    Posted 08-10-2015 11:13 AM

    We have not had any WP failures since we applied a database deadlock solution that we found in the knowledge base.  See KB0809230.  This is a fairly new KB article, published 04/22/2015.  (This solution is for SQLServer)

    We are going to continue running our PowerShell solution that monitors our CPs and WPs.  It is a good safety net.