Notifier in Fault Tolerance

Back to discussions

Expand all | Collapse all

Notifier in Fault Tolerance

Jump to Best Answer

1. Notifier in Fault Tolerance

0 Recommend
GBPopular
Posted Mar 11, 2015 04:58 AM

Reply Reply Privately
I have a Spectrum 9.3 in fault tolerance and distributed environment.
The Notifier is running in one of the SpectroServer principal, and send the alarms for all landscape. I´m not sure what is the best change to configure the Notifier in the redundancy situation.

I have saw other people that have put the "if " condition in the scripts on the secondary SS, and if the hostname in the alarm is the first SS, the script of Notifier say "nothing to do". I think this could send for the alarms in the other landscape the same mail from the primary and secondary SS.

I have three SANM applications, with three SANM policies and three setscripts in the principal SS. In the secondary SS I copied the NOTIFIER directory from the primary SS. I did not put to run the three notifier process in the secondary because I think this could send the alarma mails for duplicated as I already said. I need to found the way to the secondary send mails only when the SS principali is down, and its work for the other landscape, and with the three application SANM...

Any idea?

Saludos,
Susana
2. Re: Notifier in Fault Tolerance
Best Answer

0 Recommend
Anon Anon
Posted Mar 12, 2015 02:02 AM

Reply Reply Privately
Hello Susana,

May be you can use the precedence attribute instead of hostname

1.Primary spec DB will have precedence 10 (attribute 0x12c0a in every model)
2.Assume your secondary Spectrum is precendence 20
3.On both servers add this to $specroot/Notifier/.alarmrc
EXTRA_ATTRS_AS_ENVVARS=0X12C0A

4.In setscript and clearscript in the bit just after

if [ "$SENDMAIL" = "True" ]
then
RECIPIENTS=$VARFORMAIL
........."
RECIPIENTS="NotificationData/RepairPerson"
fi

On primary add this -
if [[ "$SANM_0X12C0A" = "20" ]]
then
echo "SS Secondary is running"
echo "Precedence = $SANM_0X12C0A"
exit 0
fi

On secondary add this

if [[ "$SANM_0X12C0A" = "10" ]]
then
echo "SS Primaryis running"
echo "Precedence = $SANM_0X12C0A"
exit 0
fi

save the set script and recycle Alarm Notifier

Outcome:

whenever an alarm is generated ,the model in the DB is checked and the attribute 0x12x0a is read - if its 10 (primary server precedence) then Primary Alarm notifier sends the email and secondary will write a line to the notifier log file saying primary is running

If 0x12x0a is 20 then Seconday server Alarm notifier will send the mail and the primary would write to the notifier file saying secondary is running

HTH

Kalyan
3. Re: Notifier in Fault Tolerance

0 Recommend
GBPopular
Posted Mar 16, 2015 11:00 AM

Reply Reply Privately
Thank you, your idea has been very helpful for me!!

Regards!

Susana
4. Re: Notifier in Fault Tolerance

1 Recommend
Legacy User
Posted Dec 14, 2015 06:25 PM

Reply Reply Privately
We handled this a little differently. We wanted to account for the case where the AlarmNotifier process could die or fail even when the SpectroSERVER process was still running (a situation we've seen on numerous occasions, particularly with default type logging when NOTIFIER.OUT exceeds 2GB). Also note, we have a very large distributed Spectrum environment (over a dozen primary and over a dozen fault-tolerant SpectroSERVERS).

We configure the same custom Notifier scripts (SetScript, etc.) on the designated Primary and Secondary (Fault-Tolerant) Spectrum systems. Inside the scripts is a check to look for the file "$HOME/Notifier/.SpectrumAlert.stop". If that file exists, they will log the alert, but not actually generate a ticket to our ticketing system.

The primary system should never have that file, unless we're manually placing it for testing. The secondary will always have that file, unless there is a problem with the primary. We verify it by running a cron job script every 5 minutes that does the following:

Secondary connects to the Primary via SSH and runs a "health check" script.
If the SSH attempt fails, the .SpectrumAlert.stop file is rm'ed (enabling alerting on the secondary).
The health check script verifies the following:
The AlarmNotifier process is running
That the AlarmNotifier process has generated an alarm within the last 180 seconds (we have a big Spectrum environment; we never go more than a minute or so without at least a minor alarm tripping somewhere)
If the health check comes back with a success, the .SpectrumAlert.stop file touch'ed (disabling alerting on the secondary)
If the health check comes back with a fail, the .SpectrumAlert.stop file is rm'ed (enabling alerting on the secondary)
A ticket is generated through a Spectrum process monitor on the AlarmNotifier process
If the remote connection failed or the health check failed, a notification is sent to inform us that AlarmNotifier is down on the primary Spectrum Server via out-of-band notification (e-mail). This is a fail-safe to ensure that we know about any alerting issues between Spectrum and the ticketing system

Because it runs via cron every 5 minutes, it will automatically enable or disable alerting between the primary AlarmNotifier and the secondary AlarmNotifier without us having to intervene. It also covers every failure scenario we could think of, and helps

DX NetOps

Notifier in Fault Tolerance

GBPopularMar 11, 2015 04:58 AM

Anon AnonMar 12, 2015 02:02 AMBest Answer

GBPopularMar 16, 2015 11:00 AM

Legacy UserDec 14, 2015 06:25 PM

1. Notifier in Fault Tolerance

2. Re: Notifier in Fault Tolerance Best Answer

3. Re: Notifier in Fault Tolerance

4. Re: Notifier in Fault Tolerance

2. Re: Notifier in Fault Tolerance
Best Answer