Tuesday Tip by Vijay Masurkar, Principal Support Engineer, for 4-10--2012
A Web Agent goes offline during a Policy Server request; for example, during a network outage or due to a network component failure. Consequently, it can’t notify the Policy Server of the communication failure, and the Policy Server continues to wait for the Web Agent data. With multiple requests from one or more Web Agents are lost in this manner, the Policy Server can become unresponsive because the worker threads handling the requests are not released. The symptoms can be seen in the Policy Server logs in the form of failed authnetications, authorizations, or, also, as increasing connections queues.
In R6SP6 and R12 SP3, creating and enabling the SiteMinder Enable TCP Keep Alive (SM_ENABLE_TCP_KEEPALIVE) environment variable configures the Policy Server to send KeepAlive packets to what appears to the Policy Server as idle Web Agent connections. The initial wait period and the frequency or interval at which the Policy Server sends the packets is based on OS–specific, configurable TCP/IP parameters:
* When the Policy Server must start to send the packets.
* The number of times the Policy Server sends the packets before determining that the Web Agent connection is lost.
* The interval at which the Policy Server sends the packets.
Note: For more information about configuring TCP/IP parameters, see your OS–specific documentation.
To configure the Policy Server to send KeepAlive packets to idle Web Agent connections, log into the Policy Server host system and do one of the following:
* (Windows) Create the following system environment variable with a value of 1:
SM_ENABLE_TCP_KEEPALIVE
* (UNIX)
a. Create the following system environment variable:
SM_ENABLE_TCP_KEEPALIVE=1
b. Export the environment variable.
Note: The value must be 0 (disabled) or 1 (enabled). If a value other than 0 or 1 is configured, the environment variable is disabled. If the environment variable is disabled, the Policy Server does not send KeepAlive packets to idle Web Agent connections.
Subsequently, in 6SP6CR8 and in R12SP3CR8, another related fix was introduced to improve the connection management mechanism further. If a Policy Server thread hangs in TCP recv(), it doesn't respond to requests. This is because a thread, that has taken a read lock, is waiting on recv() and another thread waiting on the write lock. Since write lock request is pending, all other threads waiting for read lock won't be granted access. This situation gets resolved when recv() call returns and Policy Server recovers.