Introduction:
LDAP failover performance in general
Question:
What is reasonable LDAP failover time, and can it achieve 0-second system downtime?
Environment:
Windows/Unix/All
Answer:
Any high-performance failover setup should expect some time lapse between when the primary server is down and the secondary server is up, and start transitioning traffic over to the new LDAP server.
The exact time lapse can be vastly different from system to system, as it depends on a lot of factors, including vendor, design, network, machine capability, etc.
Siteminder can use CA LDAP as a policy store, user store and session store.
Any failover scenario recoveries under 60 seconds is considered reasonable, considering that the LDAP ping interval by default is at 30 seconds. 60 seconds is the longest round trip between last server heart bit check and the most recent one. On top of this, during failover, PS will need to rebind LDAP which is a very expensive operation.
If the client uses CA Directory as a solution, they should double-check their CA Directory configuration, to ensure its configuration is optimized.
Check the CA Directory knowledge files to add "dsp-idle-time = 30" to each router DSA and "dsp-idle-time = 40" to each data DSA.
If there is no router DSA, then just set "dsp-idle-time = 30" to each data DSA in the knowledge files.
Modify the CA Directory limits files-set maximum operation time to 60 seconds, with "set max-op-time = 60".
What this change does is to ensure that LDAP disconnects the idle socket connection, or a dead failover LDAP connection, when the time limit is reached.
By default, these idle connections in CA Directory are set at 600 seconds, which is waiting too long during failover incident.
Additional Information:
Policy Server Guides -> Policy Server Configuration Guide -> User Directories -> LDAP Load Balancing and Failover