VMware GemFire

 Disaster recovery with multi datacenter setup

Unmesh Joshi's profile image
Unmesh Joshi posted Apr 09, 2019 09:47 AM

Hi,

We are designing for disaster recovery of gemfire with multi data centre setup. Multi DC replication of gemfire with gateway sender and receivers is eventually consistent. We are thinking of a possibility of setting a single gemfire cluster which spans multiple data centers and having replication factor to replicate data across all the nodes.

Curious to know if its recommended to span a single gemfire cluster across data centers with replication factor of all.?

 

Thanks,

Unmesh

 

 

 

Rajiv CE's profile image
Rajiv CE

Hi Unmesh,

 

Regarding

 

>Curious to know if its recommended to span a single gemfire cluster across data centers with replication factor of all.?

 

You can technically do it, assuming you donot suffer from network latency. But it would be recommended to use different clusters connected via WAN for disaster recovery and not relying on single cluster spread spanning across multiple data centers.

 

Regards,

Rajiv

Unmesh Joshi's profile image
Unmesh Joshi

Can we create a redundancy zone in a different data centre? That way redundant copies always getting created in a separate data centre.. Again I think technically its possible. And will guarantee consistency as the redundant writes synchronous?

Rajiv CE's profile image
Rajiv CE

Yes, you can do that, but network latency will a be a factor here.

Unmesh Joshi's profile image
Unmesh Joshi

>>Yes, you can do that, but network latency will a be a factor here.

If write performance is not a problem (because it mostly happens as end of day batch), will read and function executions be guaranteed from nearest servers to the client? I assume locator must be giving server ips which are nearest to the client for a particular bucket for reading data or executing functions?

Rajiv CE's profile image
Rajiv CE

>If write performance is not a problem (because it mostly happens as end of day batch), will read and function executions be guaranteed from nearest servers to the client?

 

In case of replicated region the read can go to any node, the locator uses simple load balancing mechanism to route requests. For partitioned regions the read will be directed to primary or secondary node, but writes are primarily directed to nodes having primary buckets. Function executions will depend on how you invoke or whether it is data dependent or data independent functions.

 

 

It would be recommended to use separate clusters connected via WAN.

 

Regards,

Rajiv

Unmesh Joshi's profile image
Unmesh Joshi

For function execution on specific region, will the function executed on servers having buckets which is closest to the client or will be go to the primary of the bucket, and it can be anywhere? particularly when Single Hop is enabled.

Rajiv CE's profile image
Rajiv CE

Hi Umesh,

 

There is nothing like nearest or closest node. In case of partitioned region with optimizeForWrite the function execution tries to happen on node which host primary data, but in case of partioned region the write to primary and secondary buckets are synchronous so there will be network latency in consideration.

 

If you donot want single hop or optimize for write also, in this case also it can go to any node, but still if the data relies on other node, the call will go and will cause network hops impacting function performance.

 

So you probably need to do all your performance and failure tests.

 

Regards,

Rajiv

Unmesh Joshi's profile image
Unmesh Joshi

If we need to figure out whether a gemfire cluster is in working state, what is the best way to check for it from the client? We want to build a healthcheck and automatically switch to a cluster in another datacenter.

Rajiv CE's profile image
Rajiv CE

Hi Unmesh,

 

You can use gfsh to perform health check, the following commands might be handy

gfsh>list members //to check on members

gfsh>list regions //to make sure regions are ready

 

do some standard get/put/query operation.

 

You can do the same checks programatically. If you have rest enabled. You can look at the Admin and Region endpoints exposed. See the below link for more information.

 

http://gemfire.docs.pivotal.io/97/geode/rest_apps/rest_api_reference.html

 

Regards,

Rajiv

Unmesh Joshi's profile image
Unmesh Joshi

Is there a way to check that programmatically from the gemfire client?

Unmesh Joshi's profile image
Unmesh Joshi

Submitted too early..

Is there a way to check that programmatically from the gemfire client with ClientCache java API? or using Rest API for the locator is the only way?

 

Rajiv CE's profile image
Rajiv CE

From client side, you can get the list or locators and servers using something like below

List<String> currentServerNames = pool.getCurrentServerNames(); List<InetSocketAddress> locators = pool.getLocators();   private static PoolImpl getPool(Region r) { PoolImpl result = null; String poolName = r.getAttributes().getPoolName(); if (poolName != null) { result = (PoolImpl) PoolManager.find(poolName); } return result; }

and use the same client for doing region operations.

 

Regards,

Rajiv

Unmesh Joshi's profile image
Unmesh Joshi

and the semantics is same as "list members" from gfsh? Meaning if the servers have crashed they wont appear in the list?

Rajiv CE's profile image
Rajiv CE

Yes, it will only list available servers and will not include crashed servers or locators.

Amar Das's profile image
Amar Das

Hi Unmesh,

 

I am trying something very similar. But I am opting for Pivotal Cloud Cache. Reference https://docs.pivotal.io/p-cloud-cache/1-7/design-patterns.html - Bidirectional Replication Across a WAN.

 

Did you consider this option? How is your experience with Gemfire cluster so far?

 

Thanks/Amar Das