Unfortunately its not shutting itself down. When I looked through the DA logs for the period of the Vertica backup I see failed transcations where the DA is trying to post to the database for the whole 1 hour 15 minutes that the backup is taking. What is the method that the DA uses to detect if Vertica is available. This is obviuosly where the problem is.
When the DR does its backup does it halt the database? I am using the standard script that comes with the product following the instructions in the DR manual. Isn't this script really just doing an rsync? Iwould have thought that could be done with the DB staying live. If that is the case then something else is causing the DA not to be able to post during the backup.
So in summary I gues my questions are:
- when the backup script runs, does it stop the database?
- if it does stop the database, why doesn't the DA recognise this and act appropriately?
- if it doesn't stop the database, why does the DA have problems posting?
The way I would have expected it to work (assuming there is a requirement to stop the db to do a backup):
- backup starts
- stops the database
- DA detect this and goes to sleep
- DC detects DA gone to sleep so caches locally it data
- DR backup finishes and start DB as last step
- DA wakes up (as per cronjob that continually tries to start dadaemon)
- DC detects DA available and send of cached data
I do have a support ticket in as this is a major issue for us. I could accept a five minute loss of data a day but an hour and a quarter across all devices every day is a bit much.