Hi,
Since i have my new UIM lab, i started today to work a complete and high performance rafale mode probe.
What's rafale mode ?
Rafale mode is the name i choosed two years ago for a script created to answer a real production need : The Customer wanted to catch all logmon alarms, and only trigger a new alarm if they had received 'X' occurrences in less than 'X' seconds with a severity 'X'.
Some of the service monitored by this Customer generally trigger few alarms (that mean that was not critical). But when the service trigger a lot a alarms this meant that they have a problem. So the purpose was to not trigger alarm for Nothing.
Old Script : https://github.com/fraxken/rafale_mode
So what's the problem ?
- Lua Script will block the NAS (and the SQLite is not a very good solution for performance and high availability).
- Putting a pattern in the alarm message is not very cool (and not possible for every probes).
The solution !
The solution is to create a probe like alarm_enrichment. We attach the probe to a queue (rafale) that subscribe to a subject 'alarm1' and we update AE route subject to 'alarm1' and our probe will post to 'alarm2' for NaS.
Alarm_enrichment > rafale > NAS
The probe is multithreaded (pool of threads). From an old identical probe i can say it will be capable to handle around 300 alarms/second for each thread (more if i found a way to post in bulk alarms). But on this side we will need a real production benchmark with the whole stack !
This time no SQLite database for performance and high availability reason (will be directly hosted on the UIM database).
Configuration overview
<setup>
loglevel = 1
logsize = 1024
debug = 0
post_subject = alarm2
pool_threads = 3
</setup>
<rafale-rules>
exclusive_rafale = yes
<100>
match_alarm_field = udata.message
match_alarm_regexp = .*Your\salarm\smessage\shere.*
trigger_alarm_on_match = yes
required_alarm_rowcount = 2
required_alarm_interval = 60
required_alarm_severity = 5
</100>
</rafale-rules>
<database>
provider = MSSQL
connectionString =
</database>
I saw many integrator/customers with the same kind of need (and people are making weird rule in NAS to handle this).
Common mistakes to avoid
The biggest mistake to avoid is to implement "custom" case that are not really needed (that can bring performance issues). The goal is defined and we have to stay on a fix implementation (every steps is mastered and know).
My goal is to support 1000 alarms/s with 10-20 rules on ~5 threads (maybe less).
End
I work for a beta stage begining of the next week. If you have any ideas or Something to tell dont hesitate
Best Regards,
Thomas