We need a better DR solution than the HA probe can provide. There seems to be no way to fully configure redundancy between hubs connected via tunnels.
This is a fundamental component of redundancy, and the fact that the HA probe can't do this is ridiculous to me. There are two core issues with the HA probe: 1) it can't monitor hubs via tunnels, and 2) it can only monitor one hub, while it should be able to monitor TWO -- one in the same environment, and one across the tunnel.
Here's an example. The main hubs communicate with the remote hubs via tunnels:
mainhubA - queue A1 - remotehub1
mainhubA - queue A2 - remotehub2
mainhubB - queue B1 - remotehub1
mainhubB - queue B2 - remotehub2
If mainhubA goes down, the HA probe on mainhubB will enable B1, but remotehub1 will not switch over to B1, it will continue sending data on A1. This is not high availability. The ENTIRE infrastructure will collapse because all of the remote hubs are trying to send data to mainhubA.
If remotehub1 goes down, the HA probe on remotehub2 will enable queue A2, however mainhubA will not receive any data because it's going to continue using queue A1.
I will probably need to open a case about this issue, but if anyone has any suggestions I would appreciate it.
Maybe I'm missing something?
I am not following you. If mainhubA goes down, why would you want A2 to be enabled by the HA probe on mainhubB? Shouldn't A2 only be used when mainhubA is up?
And I don't think remotehub1 going down should require the HA probe to do anything with the queues. I think you should only need the HA probe on the main hubs. You can use it on the remote hubs if you need it to failover probes, but you should not need it to mess with the queues on the remote hubs.
It might help for you to explain which of the queues are supposed to be active when everything is up. It is not clear if you want everything going to mainhubA normally and then switch to mainhubB if mainhubA fails. Or maybe you want to split the load between mainhubA and mainhubB when everything is up.
Sorry, fixed the mistake in the original post.
Only one queue would be up at a time, and here are the different scenarios:
Everything is up: A1
mainhubA is up, remotehub1 is down: A2
mainhubA is down, remotehub1 is up: B1
mainhubA is down, remotehub1 is down: B2
If remotehub1 goes down, all of the robots that are on that hub will automatically switch over to remotehub2. remotehub2 will then start sending everything through queue A2. I guess queue A2 can be active on mainhubA all the time.
The bigger problem is that if mainhubA goes down, how do all of the remote hubs know to use the queues to mainhubB?
In my opinion, you should have A1 and A2 active at all times. Then it does not matter if the robots move around between remotehub1 and remotehub2. Either way all messages from both hubs will end up back at mainhubA. If all robots are connected to remotehub1, maybe remotehub2 will not be sending much data, but the queue can still be connected. (And to be honest, it is impossible for remotehub2 to have no robots associated; it always has its own robot.)
Then you just run the HA probe on mainhubB and enable B1 and B2 if mainhubA goes down.
And just to be clear, I think you should be using attach and get queues, not post queues. Based on your last question, you probably have or plan to use post queues. Post queues are great, but I think get queues work better when it comes to redundancy, especially with the HA probe.
Oh I just realized the other part of this: the remote hubs don't actually send data *to* any hub, they just create a queue and that's then pulled by the primary hubs.
All set, thanks for clarifying!
(I think our previous 2 posts may have crossed paths, but we ended up on the same page by then anyway.)
Okay one more issue then: on smaller hubs we just send everything via queues but on bigger hubs, we have separate nas probes with replication. How do I tell the remote nas to replicate to mainhubB if mainhubA goes down?
This is a very good question. I used NAS replication for a while some time ago, but it was only between two core hubs (mainhubA and mainhubB in your scenario). I am no expert on what happens when you get into more complex replication schemes. (I also had some trouble with keeping the two in sync even with the rather simple replication scheme we used, and I really hope the replication code has improved since then.)
My thought would be that you may need to have each remote hub replicate with both main hubs. But then if the main hubs are also replicating with each other, that may not work well. I think there is an option to not replicate alarms that were received from another hub, so maybe that is how you would handle it. Then each remote hub could be responisble for making sure both main hubs are always updated with the latest alarm info, and the two main hubs should match even though they do not replicate some alarms directly. Note that I have no idea if this would work well in practice though.
How do you have it setup today?