DX Infrastructure Manager

Expand all | Collapse all

LUA Correlation Example

  • 1.  LUA Correlation Example

    Posted 01-28-2015 02:40 AM

    I was wondering (hoping really) if someone could post an example snipet of how to do the following with a LUA script. Send one alarm when you have more than 10 alarms from the same probe/hub with similar text after an overdue age of one hour. I know how to send the alarm and the regex statement, what I can seem to figure out (find an example of) is how to querry the alarm list and count the alarms that I am interested in bundling up. Any suggesstion or examples would be greatly appreciated. 

     

    Thank you.



  • 2.  Re: LUA Correlation Example

    Posted 01-28-2015 06:57 PM

    Would doing it with SQL be acceptable?

     

    I have this SQL query that I use to report on reoccurring alarms. I believe it would suit your needs as well if you modify the grouping and where clause, and make it select from NAS_ALARMS

     

    select 
         count(*) as 'count', origin, robot, prid, hostname 
    from 
         NAS_TRANSACTION_SUMMARY 
    where 
         created > dateadd(day, -7, convert(date, getdate())) 
    group by 
         origin, robot, prid, hostname 
    having 
         COUNT(*) > 20 
    order by 
            COUNT(*) desc

     Here's the basic syntax for querying nis db from nas:

    rc = database.open("provider=nis;database=nis;driver=none")
    
    query = "select * from nas_alarms where origin = '***'"
    
    alarms, rc = database.query(query)
    
       for _, al in pairs (alarms) do
          print (al.nimid)
       end
    
    
    database.close()

     

    -jon



  • 3.  Re: LUA Correlation Example

    Posted 01-28-2015 07:20 PM

    I will give it a try. I really appreciate the reply, thank you as always 



  • 4.  Re: LUA Correlation Example

    Posted 01-28-2015 11:25 PM

    Below is what I ended up using with the help of one of my collegues here and examples/posts from the benovlent kkruepke. I hope this helps others

     

    --Check and count alarms 
    list = alarm.list() 
    local match_count = {}
    local current_count = 0
    
    for i=1, #list do 
       alm = list[i] 
       hub = alm.hub
    
       
       match = regexp( alm.message, "/sometext/") 
       match = (match and regexp (hub, "/^somehub-/"))
       
       if match == true then 
          
          
          if match_count[hub] == nil then
             current_count = 0
          else
             current_count = match_count[hub]
          end
    
          current_count = current_count +1
          match_count[hub] = current_count 
       end 
    end 
    
    for hub, alarm_count in pairs(match_count) do
       suppression_key = "snmpagent-"..hub 
       if alarm_count >=20 then 
          message = "Excessive number of snmp agents not responding on " .. hub .. " with a total of " .. alarm_count .. " alarms"
          print (message)
    
          nimbus.alarm (4, message , suppression_key) 
       end    
    end

     



  • 5.  Re: LUA Correlation Example

    Posted 02-18-2016 06:44 AM

    Hi Bvloch,

     

    The script runs based on the suppression key. What do you think will be the easy way to use? Suppression key or subsystem? For example, a service down alert. We have many services configured in almost all servers. So any unplanned reboot will result many alerts eventually many tickets. So, in that case do you think using subsystem will be easy or using suppression key. Please help me understand.

     

    -kag



  • 6.  Re: LUA Correlation Example

    Posted 02-26-2016 12:03 PM

    This script looks really useful!

     

    I'm looking into getting Nimsoft to perform an action on receipt of an alarm:

     

    If the number 9 in the example below goes up, then I would like an alarm, if it goes down then I'd like to stay clear and not alarm..

     

    Would the script above be able to do this?

     

    Alert example:

    Jobs waiting to run in job queue: Number of jobs in job queue MIMIXVFY is 9