DX Unified Infrastructure Management

Correlating multiple alarms in to a single alarm

  • 1.  Correlating multiple alarms in to a single alarm

    Posted May 15, 2011 02:47 PM

    Hi all,

     

    This is regarding correlating multiple alarms in to a single alarm.

     

    I have set of hosts added in monitoring in Nimsoft. At present, I am monitoring CPU, Disk and Memory related alarms. My requirement is that when there are 3 different alarms received from the same host, I want to correlate multiple alarms from a host into a single correlated alarm.

     

    The solution, which I thought of, is not that good in terms of performance or even might fail. The same is mentioned below.

     

    1) Create 3 triggers, each for CPU, Disk and Memory, generic for all the hosts.

    2) Create a LUA script (as shown below) which gets called every AO interval (like 5 minutes) so that it can work not only on the new alarms but also on the current alarm.

     

    Note :- I am just writing the logic here, the syntax might not be accurate.

     

    function CorrelateAlerts    all_CPU_alerts = trigger.list ("CPU_Alert")   for i = 1,#all_CPU_alerts do       --extract hostname       --The following line might be incorrect as per syntax       hostname1 = string.match ("CPU Alert from ([^ ]*)", all_CPU_alerts[i].message)       messageid1 = all_CPU_alerts[i].nimid        all_Disk_alerts = trigger.list ("Disk_Alert")       for j = 1,#all_Disk_alerts do           --extract hostname           --The following line might be incorrect as per syntax           hostname2 = string.match ("Disk Alert from ([^ ]*)", all_Disk_alerts[j].message)           messageid2 = all_Disk_alerts[j].nimid            if ( hostname1 != hostname2 ) then              break           end           all_Memory_alerts = trigger.list ("Memory_Alert")          for k = 1,#all_Memory_alerts do              --extract hostname              --The following line might be incorrect as per syntax              hostname3 = string.match ("Memory Alert from ([^ ]*)", all_Memory_alerts[k].message)              messageid3 = all_CPU_alerts[k].nimid               if ( hostname2 != hostname3 ) then                 break              else
                    action.close (messageid1,messageid2,messageid3)
                     nimbus.alarm (5, "Correlated alert from " .. hostname1 )
                 end          end        end    end  end  if trigger.state ("CPU_Alert") and trigger.state ("Disk_Alert") and trigger.state ("Memory_Alert") then          CorrelateAlerts ()  end

     

    Please suggest.

     

    Regards,

    Amit Saxena