Hi Everyone,
I'm hoping someone can educate me with a problem I am having in a test lab I am using to demo vSphere replication. I have a proposed design I have asked to implement in a lab comprising:
2 Sites - connected via redundant GB links - all vLAN's stretched between the two locations
3 vLAN's deployed - Public (Management and VM traffic), vMotion and Replication
Both sites having a two node cluster (vSphere 6.5)
vCenter residing at a third site (vCenter 6.5)
I have deployed the vCenter, sites, clusters and the vRA's and can successfully replicate a test VM and fail it back and forth between the two clusters. However, when testing more of a true disaster I was a bit surprised at the result. I killed the primary site ESX host along with the vRA and found that I was unable to recover the VM in the DR site. I don't even get the option within vCenter to recover the VM and refreshing the incoming/outgoing replications screen I just get greeted with a 503 notification within vCenter. My theory at that point was that there was some sort of dependency on the first vRA deployed. To test this I Fired up the host and vRA again and vMotioned the first vRA deployed to the DR site and vice versa the second vRA to the primary site. When I then repeated the test I was able to recover the VM despite the primary sites host and vRA being offline. This suggests to me there is some sort of dependency on the first vRA deployed. My expectations with having two vRA's deployed was that recovery would be possible being that I have no SPOF.
My gut tells me that I must be doing something wrong and that there cannot be a SPOF whereby the first vRA is required to recover a VM. Is this scenario I am testing even architecturally possible? Are my expectations of how vSphere Replication works misplaced?
Many thanks in advance for all comments and advice.