DX NetOps

 View Only
  • 1.  Notifier in Fault Tolerance

    Posted Mar 11, 2015 04:58 AM

    I have a Spectrum 9.3 in fault tolerance and distributed environment.

    The Notifier is running in one of the SpectroServer principal, and send the alarms for all landscape. I´m not sure what is the best change to configure the Notifier in the redundancy situation.


    I have saw other people that have put the "if " condition in the scripts on the secondary SS, and if the hostname in the alarm is the first SS, the script of Notifier say "nothing to do". I think this could send for the alarms in the other landscape the same mail from the primary and secondary SS.


    I have three SANM applications, with three SANM policies and three setscripts in the principal SS. In the secondary SS I copied the NOTIFIER directory from the primary SS. I did not put to run the three notifier process in the secondary because I think this could send the alarma mails for duplicated as I already said. I need to found the way to the secondary send mails only when the SS principali is down, and its work for the other landscape, and with the three application SANM...


    Any idea?




  • 2.  Re: Notifier in Fault Tolerance
    Best Answer

    Posted Mar 12, 2015 02:02 AM

    Hello Susana,


    May be you can use the precedence attribute instead of hostname


    1.Primary spec DB will have precedence 10 (attribute 0x12c0a in every model)

    2.Assume your secondary Spectrum is precendence 20

    3.On both servers add this to $specroot/Notifier/.alarmrc



    4.In setscript and clearscript in the bit just after


    if [ "$SENDMAIL" = "True" ]







    On primary add this -

    if [[ "$SANM_0X12C0A" = "20" ]]


    echo "SS Secondary is running"

    echo "Precedence = $SANM_0X12C0A"

    exit 0



    On secondary add this


    if [[ "$SANM_0X12C0A" = "10" ]]


    echo "SS Primaryis running"

    echo "Precedence = $SANM_0X12C0A"

    exit 0



    save the set script and recycle Alarm Notifier




    whenever an alarm is generated ,the model in the DB is checked and the attribute 0x12x0a is read - if its 10 (primary server precedence) then Primary Alarm notifier sends the email and secondary will write a line to the notifier log file saying primary is running


    If 0x12x0a is 20 then Seconday server Alarm notifier will send the mail and the primary would write to the notifier file saying secondary is running





  • 3.  Re: Notifier in Fault Tolerance

    Posted Mar 16, 2015 11:00 AM

    Thank you, your idea has been very helpful for me!!





  • 4.  Re: Notifier in Fault Tolerance

    Posted Dec 14, 2015 06:25 PM

    We handled this a little differently.  We wanted to account for the case where the AlarmNotifier process could die or fail even when the SpectroSERVER process was still running (a situation we've seen on numerous occasions, particularly with default type logging when NOTIFIER.OUT exceeds 2GB).  Also note, we have a very large distributed Spectrum environment (over a dozen primary and over a dozen fault-tolerant SpectroSERVERS).


    We configure the same custom Notifier scripts (SetScript, etc.) on the designated Primary and Secondary (Fault-Tolerant) Spectrum systems.  Inside the scripts is a check to look for the file "$HOME/Notifier/.SpectrumAlert.stop".  If that file exists, they will log the alert, but not actually generate a ticket to our ticketing system.


    The primary system should never have that file, unless we're manually placing it for testing.  The secondary will always have that file, unless there is a problem with the primary.  We verify it by running a cron job script every 5 minutes that does the following:


    1. Secondary connects to the Primary via SSH and runs a "health check" script.
      1. If the SSH attempt fails, the .SpectrumAlert.stop file is rm'ed (enabling alerting on the secondary).
    2. The health check script verifies the following:
      1. The AlarmNotifier process is running
      2. That the AlarmNotifier process has generated an alarm within the last 180 seconds (we have a big Spectrum environment; we never go more than a minute or so without at least a minor alarm tripping somewhere)
      3. If the health check comes back with a success, the .SpectrumAlert.stop file touch'ed (disabling alerting on the secondary)
      4. If the health check comes back with a fail, the .SpectrumAlert.stop file is rm'ed (enabling alerting on the secondary)
    3. A ticket is generated through a Spectrum process monitor on the AlarmNotifier process
      1. If the remote connection failed or the health check failed, a notification is sent to inform us that AlarmNotifier is down on the primary Spectrum Server via out-of-band notification (e-mail).  This is a fail-safe to ensure that we know about any alerting issues between Spectrum and the ticketing system


    Because it runs via cron every 5 minutes, it will automatically enable or disable alerting between the primary AlarmNotifier and the secondary AlarmNotifier without us having to intervene.  It also covers every failure scenario we could think of, and helps