VMware GemFire

Disaster recovery with multi datacenter setup

Unmesh Joshi posted Apr 09, 2019 09:47 AM

Hi,

We are designing for disaster recovery of gemfire with multi data centre setup. Multi DC replication of gemfire with gateway sender and receivers is eventually consistent. We are thinking of a possibility of setting a single gemfire cluster which spans multiple data centers and having replication factor to replicate data across all the nodes.

Curious to know if its recommended to span a single gemfire cluster across data centers with replication factor of all.?

Thanks,

Unmesh

Rajiv CE posted Apr 09, 2019 09:56 AM

Hi Unmesh,

Regarding

>Curious to know if its recommended to span a single gemfire cluster across data centers with replication factor of all.?

You can technically do it, assuming you donot suffer from network latency. But it would be recommended to use different clusters connected via WAN for disaster recovery and not relying on single cluster spread spanning across multiple data centers.

Regards,

Rajiv

Unmesh Joshi posted Apr 09, 2019 11:22 AM

Can we create a redundancy zone in a different data centre? That way redundant copies always getting created in a separate data centre.. Again I think technically its possible. And will guarantee consistency as the redundant writes synchronous?

Rajiv CE posted Apr 09, 2019 11:31 AM

Yes, you can do that, but network latency will a be a factor here.

Unmesh Joshi posted Apr 09, 2019 11:40 AM

>>Yes, you can do that, but network latency will a be a factor here.

If write performance is not a problem (because it mostly happens as end of day batch), will read and function executions be guaranteed from nearest servers to the client? I assume locator must be giving server ips which are nearest to the client for a particular bucket for reading data or executing functions?

Rajiv CE posted Apr 09, 2019 11:46 AM

>If write performance is not a problem (because it mostly happens as end of day batch), will read and function executions be guaranteed from nearest servers to the client?

In case of replicated region the read can go to any node, the locator uses simple load balancing mechanism to route requests. For partitioned regions the read will be directed to primary or secondary node, but writes are primarily directed to nodes having primary buckets. Function executions will depend on how you invoke or whether it is data dependent or data independent functions.

It would be recommended to use separate clusters connected via WAN.

Regards,

Rajiv

Unmesh Joshi posted Apr 09, 2019 11:54 AM

For function execution on specific region, will the function executed on servers having buckets which is closest to the client or will be go to the primary of the bucket, and it can be anywhere? particularly when Single Hop is enabled.

Rajiv CE posted Apr 09, 2019 12:21 PM

Hi Umesh,

There is nothing like nearest or closest node. In case of partitioned region with optimizeForWrite the function execution tries to happen on node which host primary data, but in case of partioned region the write to primary and secondary buckets are synchronous so there will be network latency in consideration.

If you donot want single hop or optimize for write also, in this case also it can go to any node, but still if the data relies on other node, the call will go and will cause network hops impacting function performance.

So you probably need to do all your performance and failure tests.

Regards,

Rajiv

Unmesh Joshi posted Apr 10, 2019 12:55 PM

If we need to figure out whether a gemfire cluster is in working state, what is the best way to check for it from the client? We want to build a healthcheck and automatically switch to a cluster in another datacenter.

Rajiv CE posted Apr 10, 2019 01:03 PM

Hi Unmesh,

You can use gfsh to perform health check, the following commands might be handy

gfsh>list members //to check on members

gfsh>list regions //to make sure regions are ready

do some standard get/put/query operation.

You can do the same checks programatically. If you have rest enabled. You can look at the Admin and Region endpoints exposed. See the below link for more information.

http://gemfire.docs.pivotal.io/97/geode/rest_apps/rest_api_reference.html

Regards,

Rajiv

Unmesh Joshi posted Apr 10, 2019 01:06 PM

Is there a way to check that programmatically from the gemfire client?

Unmesh Joshi posted Apr 10, 2019 01:16 PM

Submitted too early..

Is there a way to check that programmatically from the gemfire client with ClientCache java API? or using Rest API for the locator is the only way?

Rajiv CE posted Apr 10, 2019 01:19 PM

From client side, you can get the list or locators and servers using something like below

 List<String> currentServerNames = pool.getCurrentServerNames();
 List<InetSocketAddress> locators = pool.getLocators();
 
private static PoolImpl getPool(Region r) {
		PoolImpl result = null;
		String poolName = r.getAttributes().getPoolName();
		if (poolName != null) {
			result = (PoolImpl) PoolManager.find(poolName);
		}
		return result;
	}

and use the same client for doing region operations.

Regards,

Rajiv

Unmesh Joshi posted Apr 10, 2019 01:22 PM

and the semantics is same as "list members" from gfsh? Meaning if the servers have crashed they wont appear in the list?

Rajiv CE posted Apr 10, 2019 01:27 PM

Yes, it will only list available servers and will not include crashed servers or locators.

Amar Das posted May 02, 2019 06:59 PM

Hi Unmesh,

I am trying something very similar. But I am opting for Pivotal Cloud Cache. Reference https://docs.pivotal.io/p-cloud-cache/1-7/design-patterns.html - Bidirectional Replication Across a WAN.

Did you consider this option? How is your experience with Gemfire cluster so far?

Thanks/Amar Das