DX NetOps

 View Only
Expand all | Collapse all

Fault Tolerant Database Synchronization failing??

  • 1.  Fault Tolerant Database Synchronization failing??

    Posted Feb 02, 2015 11:43 AM


    Hello All,

     

    I am running CA Spectrum 9.2.2 H09 in a fault tolerant configuration. It appears that the remote copy process daemon is not running and when I go to view the rcpd.out log file it does not exist in the ss folder on either of the servers.

     

    Any ideas what to try out to fix this?

     

    I was able to go to the ss folder in the bash -shell and run the "../ss-tools/rcpd.exe" successfully. It appears that the rcpd has started. I am going to run a database backup and sync and see if it happens to work now. Not sure what else to do though. I have also reviewed the processd log but it did not have any information on a sync failure.

     

    Thank you, Ken



  • 2.  Re: Fault Tolerant Database Synchronization failing??

    Posted Feb 02, 2015 11:47 AM

    Ken,

     

    Any alarms on the online backup model in Oneclick? Is the db sync failing?

     

    Kalyan



  • 3.  Re: Fault Tolerant Database Synchronization failing??

    Posted Feb 02, 2015 12:56 PM

    Hello Kalyan,

     

    Thank you for the quick reply! I have received events related to the backup and it's sync failure but interestingly I don't get the "OnLineBackup" events every time I run the Backup. It does, however, fail to sync with the secondary every time.

     

    As  you can see below in the event column there is an alarm indicating that the synchronization has failed because the process daemon is not running on the standby server.

     

     

    Sync Failure.bmp

     

    I think I will try to manually run the rcpd sync backup in the bash -shell. Even if it works, it does not help to figure out why the synchronizaton fails when running an online backup.


    Crazy thing is I don't even have rcpd.out files on either of the servers??? I don't get it.

     

    Help!!!! Thank you, Ken



  • 4.  Re: Fault Tolerant Database Synchronization failing??

    Posted Feb 02, 2015 09:48 PM

    If processd isn't running on the standby server, the sync will fail every time.... I'd check that first (since this is Windows - it should be a Windows Service).

     

    If processd *IS* running on the secondary server, you may have a firewall or something blocking communication between the primary and secondary servers.



  • 5.  Re: Fault Tolerant Database Synchronization failing??

    Posted Feb 02, 2015 09:50 PM

    PS - Your event display would seem to indicate that processd is not running on the standby server (the second major alarm)



  • 6.  Re: Fault Tolerant Database Synchronization failing??

    Posted Feb 03, 2015 09:27 AM

    Hello Matt and thank you for the comments,

     

    All of my Spectrum services are running on both servers including Processd Daemon. After reviewing the events that I attached above I found it interesting and slightly confusing that I received a "process daemon not running on the standby server" because after running some tests, I could see it was running. Further investigation shed light on the fact that it was rcpd, the remote copy process daemon, that was not running.

     

    I did some netstats on a "good working synching system" and compared to that of a "failing synchronization" and see that what is lacking during the "database synchronizing" in the SCP just after a database backup on the primary is the rcpd attached to port 51966. I do not have any firewalls on at all. And there are no other programs installed and/or running.

     

    I am not 100% sure but being that when I have gone to review the RCPD. OUT files and there are none there, I am guessing that the RCPD does not start at all therefore not producing a RCPD.OUT log..

     

    I will figure this out!! Any other advice is greatly appreciated! I am going to try the KILL -Trap <PID> debug deal.

     

     

    Where are the CA Top Dog Engineers?



  • 7.  Re: Fault Tolerant Database Synchronization failing??

    Broadcom Employee
    Posted Feb 17, 2015 02:16 PM

    Hi Ken,

     

    What happens if you manually launch the RCPD (Remote Copy Process Daemon) on both machines (primary and secondary)?

    Below I outline the steps to launch the RCPD by hand on both machines at the same time:

     

    On the Primary SpectroSERVER

     

    > cd SS-Tools

    <primary server>%/c/win32app/Spectrum/SS-Tools

    > ./rcpd.exe -send -h <secondary server> -f C:/win32app/Spectrum/SS-DB-Backup/db_20130304_2200.SSdb -compress (run this command)

    (Note:  Do NOT add the .gz extension in the filename above – the “-compress” parameter indicates the file is compressed)

     

    Mar 05 14:51:22 : rcpd started

     

    command: SEND

        host: <secondary server>

        file: C:/win32app/Spectrum/SS-DB-Backup/db_20130304_2200.SSdb

        rcpd: 0xcafe

       procd: 0xfeeb

    compress:  1

     

    Mar 05 14:51:22 : Waiting for remote processd to startup rcpd...

    Mar 05 14:51:52 : Waiting for remote processd to startup rcpd...

    Mar 05 14:52:23 : Waiting for remote processd to startup rcpd...

    Mar 05 14:52:53 : Waiting for remote processd to startup rcpd...

    Mar 05 14:53:23 : Waiting for remote processd to startup rcpd...

    Mar 05 14:53:53 : Successfully connected to remote rcpd.  Initiating file transfer...

    Mar 05 14:53:53 : Starting file transfer using 1048576 byte application buffer, 1048576 byte TCP socket buffer.

    Mar 05 14:53:57 : C:/win32app/Spectrum/SS-DB-Backup/db_20130304_2200.SSdb: has successfully been copied over to <secondary server>

    Mar 05 14:53:57 : Waiting for remote rcpd to process the database file...

    Mar 05 14:55:18 : The remote rcpd has successfully completed processing.

    Mar 05 14:55:18 : Final status is 0

    <primary server>%/c/win32app/Spectrum/SS-Tools

     

     

    On the Secondary SpectroSERVER:

     

    > cd SS

    <secondary server>%/c/win32app/Spectrum/SS

    > ../SS-Tools/rcpd.exe -recv -f .ft_save_file.SSdb (run this command)

    Mar 05 14:51:27 : rcpd started

     

    command: RECV

        file: .ft_save_file.SSdb

        rcpd: 0xcafe

     

    file name: .ft_save_file.SSdb.gz

    file size: 8114778

    compress: 1        (This indicate that the incoming file is compressed)

    peer host: <primary server>

     

    Mar 05 14:53:53 : Starting file transfer using 1048576 byte TCP receive socket buffer.

    Mar 05 14:53:57 : receiving file successfully finished : .ft_save_file.SSdb.gz

    Mar 05 14:53:58 : Uncompressing the database file...success.

    Mar 05 14:54:00 : Stopping the SpectroSERVER...success.

    Mar 05 14:54:06 : Running SSdbload on the uncompressed database file...success.

    Mar 05 14:55:18 : Restarting the SpectroSERVER...success.

    Mar 05 14:55:18 : Exiting.

    <secondary server>%/c/win32app/Spectrum/SS

     

    Note: The RCPD will create a temporary directory and file to save the file: $SPECROOT/SS/rcpd_core/.ft_save_file.SSdb

     

    If it fails, try an alternate port for the RCPD.

     

    If the RCPD runs fine when launched by hand, I am interested in the "$SPECROOT/SS-Tools/MapUpdate.exe -v" output along with the processd_log in debug mode (from both SSs machines)

     

    bash -login

    cd lib/SDPM

    ps -f |grep -i processd

    ./kill -TRAP PID (PID of the Spectrum Process Daemon)

     

    Verify the debug is on in processd_log file

     

    Then reproduce the OLB backup sync issue.

     

    Thanks,

    Silvio



  • 8.  Re: Fault Tolerant Database Synchronization failing??

    Posted Feb 24, 2015 12:17 PM


    Hello Silvio, and thank you for the great detail in your advice. I actually ended up trying out what you suggested before finally figuring out what the problem was.

     

    I am almost embarassed to say that someone had logged the primary SpectroSERVER using lower case username credentials. The CA Installation/Windows user utilyzes a combination of upper and lower case. What finally got me looking at the user was that when attempting to look at the user profiles through the Spectrum Control Panel I got an error pointing to host file not having proper entries or a "user profile" did not exist.

     

    Not that am shedding blame but the system I was helping to troubleshoot was not my own. I always have made a habit of logging into my CA Spectrum Installation Owner/Windows User account using the upper and lower case username even though I can log in using all lower. Funny thing is i do log into other domain administrative users using all lower case. I never knew it could produce a problem for spectrum.

     

    So we learned the hard way.... 20-30 hours of troubleshooting before realizing it was an issue with logging into windows with improper username case.

     

    Thank you again for you detailed suggestion. I have learned alot while troubleshooting this problem. I know numerous ports by heart now having to do with the SpectroSERVER to SpectroSERVER communications, reading wireshark packets during synchronization, and more.

     

    At least I have gained quite a bit more knowledge on how Spectrum works.

     

    Thank you, Take Care, Sincerely, Ken Jefferson



  • 9.  Re: Fault Tolerant Database Synchronization failing??

    Posted Mar 24, 2015 08:55 AM

    Hi All,

     

    Im sadly having a similar issue however it is not a user account one.

     

    I have run through the steps above and i get the following:

     

    on the primary:

    DSYCSSPR01%/e/win32app/Spectrum/SS-Tools

    > ./rcpd.exe -send -h dsycsspr03 -f E:/win32app/Spectrum/SS-DB-Backup/db_1.SSdb

    -compress

    Mar 24 12:29:22 : rcpd started

     

    command:  SEND

        host:  dsycsspr03

        file:  E:/win32app/Spectrum/SS-DB-Backup/db_1.SSdb

        rcpd:  0xcafe

       procd:  0xfeeb

    compress:  1

     

    Mar 24 12:29:35 : Waiting for remote processd to startup rcpd...

    Mar 24 12:30:05 : Waiting for remote processd to startup rcpd...

    Mar 24 12:30:35 : Waiting for remote processd to startup rcpd...

    Mar 24 12:31:05 : Waiting for remote processd to startup rcpd...

    Mar 24 12:31:35 : Waiting for remote processd to startup rcpd...

    Mar 24 12:32:13 : Successfully connected to remote rcpd.  Initiating file transf

    er...

    stat: No such file or directory

    Mar 24 12:32:13 : Final status is -1

    DSYCSSPR01%/e/win32app/Spectrum/SS-Tools

    >

     

    on the secondary

     

    bash-3.2$ ../SS-Tools/rcpd.exe -recv -f .ft_save.SSdb

    Mar 24 12:29:52 : rcpd started

     

    command:  RECV

        file:  .ft_save.SSdb

        rcpd:  0xcafe

     

    Mar 24 12:32:13 : get a null parm block

    Mar 24 12:32:13 : Exiting.

    bash-3.2$

     

    Am thinking it is ports as i cannot telnet on port 51966 but it looks like it does start communication, other then the DB sync all other aspects of the fail over look ok.



  • 10.  Re: Fault Tolerant Database Synchronization failing??

    Broadcom Employee
    Posted Mar 31, 2015 05:20 PM

    Hi Dan,

     

    It looks like the rcpd.exe did not find the E:/win32app/Spectrum/SS-DB-Backup/db_1.SSdb -compress file based on the following message error:

    stat: No such file or directory

    Mar 24 12:32:13 : Final status is -1

     

    Make sure the db_1.SSdb.gz file exists in the E:/win32app/Spectrum/SS-DB-Backup/ directory.

    If the file does not have the .gz extension, then remove the -compress option from the rcpd.exe command.

     

    Regards,

    Silvio



  • 11.  Re: Fault Tolerant Database Synchronization failing??

    Posted Apr 01, 2015 05:14 AM

    Hi,

     

    I have got this working now, Annoyingly enough it was similar to Kens issue, however it was not the username causing the issue, but Windows defaulting the domain name to be in capitals when someone logged on to start the spectro server.

     

    Thanks for looking though.

     

    Dan



  • 12.  Re: Fault Tolerant Database Synchronization failing??

    Posted Apr 01, 2015 05:19 AM

    I feel your pain.

    I've had these issues before.

    Windows doesn't care about the case of the login but Spectrum cares very much.

    I find it quite frustrating....

     

    Lesley