Hi,
Sounds mostly good to me - I assume what you mean is shutting down the main site, but then letting the DR site take over for a while between steps 2 and four, including starting the queues at the DR site for a test, right? I mean - otherwise, what would be the point of the test, me thinks ...
Three small notes:
#1 we have occasionally seen scenarios where organizational mechanisms (such as queues) didn't end all jobs as desired by us or didn't prevent them from starting - either because jobs were stuck, long-running, or because of supposed bugs. So you may want to double-check that no jobs of truly high importance are running before shutting down either site. While jobs on most Automic agents
should continue even without an engine as disowned OS processes, and report back to the engine when it (i.e. the engine) comes back, you never know.
#2 also think about how you want to deal with attached systems, at least the important ones. The point of a DR setup (and DR test) with a remote site is usually to be operational when your main site gets flooded / set on fire / invaded by Ninjas at an inconvenient moment. How about your agent systems, do they get mirrored at the DR site as well? If not, you might have an engine with no agents and nothing to do when disaster strikes. If so - well, then a full DR test suddenly gets more complicated.
#3 think about how you replicate your database. It's part of the engine for all intents and purposes. If your engine database is on site, and the site is on fire, the db needs to have been replicated to the DR site. Depending on your DBMS and vendor, it needs to be brought up, or it may need time to switch. This should also be tested in a full DR test. And
never ever have Automic servers of the same engine connect against databases in different or inconsistent states if at all avoidable (like, using an old snapshot at the DR site instead of the actual, current, fully replicated DB). It might work but it might just as well spell havoc, so it's best not to.
Also, of course a
most realistic DR test involves yanking out the main server's power cord during the time of heaviest load with no stopping of queues or other preparation - train as you fight, fight as you train, as they say - but maybe don't do that too often / without a good backup / without ample warning :)
Best,
Carsten