Disaster Recovery

 View Only
  • 1.  SRM - Unplanned Disaster Scenarios ( DR Drill)

    Posted Nov 03, 2025 11:38 AM
    Edited by Mark Koh Nov 12, 2025 09:49 PM

    Hi 

    I have 2 x SRM Server configured between two vCenter sites ( Prod and DR) , with Replication at Storage Level

    The Deployment has 2 SRM servers (1 each site) and 2 vCenter servers (1 each site) , 2 Storage (1 each site). Furthermore the Protected site contain ESXi hosts and VMs running production environment and the DR site  contain ESXi hosts in standby mode and would connectivity to secondary storage. Storage and the replicated VMs will be presented to these hosts post failover.

    I am preparing to do an Unplanned Disaster Scenario.

    1. Yes - I know  I can manually bring the VMs up with the DR SRM - If I loose the Production Site 
    2. Are they any Risk that I should take into consideration 



    -------------------------------------------



  • 2.  RE: SRM - Unplanned Disaster Scenarios ( DR Drill)

    Broadcom Employee
    Posted Dec 03, 2025 04:40 PM

    Hi Innocent Mapanga,

    I'm not sure what you mean by risks here.
     There are the obvious gotcha's associated with data replication, the VMs on the DR site will come up with data from the last sync, as if they all crashed at that time of the last competed sync.

    SRM is a management tool and is designed to run in a mode where if one site is missing it can complete the recovery operation; however if both sites are available it will be more graceful and perform a planed migration, shutting off the VMs on prod, syncing the last changes then powering on on DR

    The latency of your site to site link could be a concern.

    The usual BCDR concerns of RPO/RTO are always the same.

    Please be more specific and I will happily go deeper.

    Thanks,

    Fouad

    -------------------------------------------



  • 3.  RE: SRM - Unplanned Disaster Scenarios ( DR Drill)

    Posted Dec 08, 2025 01:28 PM

    SRM has the ability to do a test failover whereby it tells the storage, via its SRA, to clone the storage and present it to the DR hosts. The DR hosts will then boot up the recovered VMs per the plan. Since this is a test failover, they will be in a sandbox network so they don't interfere with production workloads. Your application owners can still be given access to the VMs in the sandbox so that they can validate that their applications all failed over correctly.

    If you want to do a real test, as Fouad mentioned, you can either do a planned failover or an unplanned failover. Unplanned failover is not preferred because your RPO is only up to the most recently replicated data. A graceful failover will avoid any data loss.

    Whenever possible, I always recommend that the DR site have as similar a configuration as possible to the production site. That way, you can expect similar levels of performance until your main site is repaired and returned to service.

    -------------------------------------------