DX Unified Infrastructure Management

 View Only
  • 1.  LUA script, how to alert on nothing happening between certain times

    Posted May 03, 2016 08:10 AM

    Hi all,

     

    I have an informational alert that will inform me when a job (AS400 job) has completed. 

     

    I'm looking for a way of alerting if the job doesn't start at all.  So I was thinking a script to check, has the informational alert been received between midnight and 1am, if not, alert.

     

    The problem I've found is that it leaves no trace when it fails to start, nothing in the logs or anywhere, so I've found myself asking the question, how to alert when nothing has happened?

     

    Is there a way to check for the presence of an alarm, if the alarm isn't present, then alert.

     

    Thanks in advance,

    Sam



  • 2.  Re: LUA script, how to alert on nothing happening between certain times

    Posted May 03, 2016 03:08 PM

    I would think you could use the NAS scheduler and LUA to look for that alarm in an alarm.list. Just loop through all the alarms looking for that specific one. Or query the nas_alarms table. 



  • 3.  Re: LUA script, how to alert on nothing happening between certain times
    Best Answer

    Broadcom Employee
    Posted May 18, 2016 01:58 PM

    Hi,

    currently out of the box there is not way to do this.

    I would not suggest checking every alarm in the system. IE connect to nas check every open alarm and then take action.

    This could cause a real problem in your nas if you have a very large number of open alarms.

     

    I would suggest using an external alarm from a probe such as

    nsa.

    this is nimsoft scripting probe. there is an example of its use here:

    http://www.ca.com/us/support/ca-support-online/product-content/knowledgebase-articles/tec000004553.aspx

     

    another suggestion might be to use the sqlserver probe to check the nas_alarms table for count = 0 when the alarms.

     

    Hope this helps.



  • 4.  Re: LUA script, how to alert on nothing happening between certain times

    Posted Sep 14, 2017 12:50 PM

    Hi,

     

    I forgot to post my solution to this one..

     

    rc = database.open("provider=nis;database=nis;driver=none")

    query = "SELECT message from nas_transaction_summary where created > DATE_SUB(CURRENT_TIMESTAMP(),INTERVAL 2 HOUR) and message like 'MESSAGE TEXT%' and hostname like 'HOSTNAME%' union select 'Dummy Record' from dual limit 1;"

    alarms, rc = database.query(query)

    for _, al in pairs (alarms) do

    message = "ALERT TEXT"
    SUPPKEY = "*******"
    SUBSYS = "1.1"
    SOURCE = "**********"

    if regexp (al.message,"*REGEX FOR MESSAGE*")

    --then print (al.message)

    then print ("Everything is fine")

    --else print (al.message)

    else nimbus.alarm (4, message , SUPPKEY , SUBSYS , SOURCE)

    database.close()

    end

    end

     

    So...in my case I wanted to check if a job ran on an AS400 Server.  I created a metric within the History probe to create an informational alert to state that the job has ran.  This is useful to monitor jobs that run very briefly i.e. 1 second.

     

    The Jobs probe is not very reliable to monitor for these types of jobs.

     

    So the script above looks back over the last 2 hours and checks to ensure that the informational "Job completed" alert is present.  If not, then it alerts.

     

    You have to enter a dummy record into your DB as the regex check doesn't work with a blank result.

     

    Hope it helps others!