Idea Details

Data Collector High Availability

Last activity 07-23-2019 07:57 AM
Jose Vicente Espinosa's profile image
10-28-2013 03:05 PM

In the current version of the PM solution, DC is a single point of failure in terms of data polling.

Would be very interesting to have a device failover policy between DCs under the same IP domain. If a collector in an IP domain goes down, its devices would be failovered (load-balanced) between alive DCs in the same IP. When back, it should take ownership of his devices.

Regards


Comments

07-20-2017 02:18 PM

Another added feature might be to have a configurable heat-beat function, similar to that available in Spectrum Fault tolerance, where the timeout value is user-configurable and once breached, automated action to migrate that collectors devices to a defined standby collector would take place to assume polling of the failed collector devices.

07-19-2017 09:14 AM

Note: Fail-over for CAPC Data Collector is a duplicate idea of this one.

07-19-2017 08:48 AM

Jose Vicente, what you've described above is exactly our vision for enhancing HA at the Data Collector tier.  We'd like admins to be able to configure policies that control load balancing and fail over behavior among multiple DCs within an IP domain.

 

Some of the DC policy parameters we've discussed include:

  • DC Auto Load Balance - Should polling within an IP Domain be automatically load balanced across available DCs vs respecting admin assigned DCs to poll specific devices
  • DC Failover mode:
    • Warm standby: detect when a DC goes down, then load balance it's poll load to other DCs.  There might be some limited polling loss during the switch over.
    • Hot standby: Have more than one DC poll values for a given device, first one to complete is loaded into the system.  If a DC is unexpectedly unavailable, no polled values are lost.

 

We'd appreciate any discussion on how you'd like DC failover/load balancing to behave.