Symantec Access Management

View Only

Back to discussions

Expand all | Collapse all

Policy Server High Queue after network outage and AD timeout

1. Policy Server High Queue after network outage and AD timeout

1 Recommend
Vivek_S
Posted Mar 18, 2015 09:53 AM

Reply Reply Privately
Recently we had an issue in our prod environment where due to a network outage a subset of active directories could not connect to the policy server ,which resulted in policy server queues going up and it reached a point where policy server stopped processing requests all together.

Although the network outage lasted for 90 seconds , policy server did not connect back to the AD's after network recovery and kept on timing out while trying to establish connections to the AD's.
We had to restart the policy server which then restored the connectivity.

With respect the socket connections on policy server we generally hover around 2500 but during that outage it spiked up to 8000.

Need expert guidance on why would policy server not restore connectivity back to the user directories even after network connectivity had restored itself. Is it a capacity issue , although our CPU usage for policy server stays at around 15-20% and our general processing is around
Average Throughput : 95 (request/sec)
Average Transaction Time: 11.555901ms
Policy Server version L R12sp3cr07
2. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Karmeng
Posted Mar 19, 2015 01:53 AM

Reply Reply Privately
Hi,
Policy server queue goes up when there is network outage to backend store is expected as policy server can't process the requests fast enough.
By design, Policy server should be able to restore the connection when the backend store back online. If it didn't reconnect by itself, I suspect the policy server could be in hang state and that's explain why the restart helps to restore the connection.
If the policy server is on unix system, pstack capture at the time will help to understand what policy server was doing. If this is on windows, process monitor might gives us some clue.
It's hard to determine whether this is due to capacity issue or not. The policy server trace log will provide additional information if this is related to the capacity issue.

Regards,
Kar Meng
3. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Vivek_S
Posted Mar 19, 2015 08:40 AM

Reply Reply Privately
Thanks Karmeng,

I agree without logs it is hard to predict if its a capacity problem but overtime i have captured the response times and avg transactions using smtrace tool analyzer and i see 100/sec with 10-11 ms response time from policy server with 18-20% CPU usage on the policy server process which looks good to me.

Its a UNIX box and with the criticality to bring it back up i had to restart it , i will take care next time to take a thread dump.

What i feel looking at the stats is that because of the high queues and everything policy server ping thread didn't get a response on time and timed out but this is only theoretical explanation , i will be generating traces next time i see.
4. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Legacy User
Posted Mar 19, 2015 09:10 AM

Reply Reply Privately
Vivek, due to changes over time, namely the addition of the lines with the queue size, minimal tracing to know the queue size and monitor by that over stats is something i would recommend.
5. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Vivek_S
Posted Mar 19, 2015 09:16 AM

Reply Reply Privately
Thanks Josh, We do monitor the queues , i have never seen our policy server queue up except for the time when these time outs happen, as suggested i will get some traces going to see what does policy server report during those times.
6. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Karmeng
Posted Mar 22, 2015 09:24 PM

Reply Reply Privately
Hi,
In addition to the policy server trace log and the pstack (I suggest capture 3 pstack with 1 minute interval each), please run a cron job to run smpolicysrv -stats every 5 minutes. This will print the policy server statistic to smps log and we can get better information on the policy server status at that time.
Thanks.

Regards,
Kar Meng
7. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Vivek_S
Posted Mar 23, 2015 08:32 AM

Reply Reply Privately
HI Karmeng,

We do run the policy server stats every 10 minutes and i see the policy server queue depth as 0 for most of the times except during the timeouts when the queue's start to grow up , if you could please help me understand how the failover works in case an IP is marked as bad, that would be of great help.

Thanks
8. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Karmeng
Posted Mar 23, 2015 07:25 PM

Reply Reply Privately
Hi Vivek,
In general, there are two main fail-over scenarios.
scenario 1 is when a request fails with a network error. In this case, the connection is re-initialized. The current and all the subsequent requests will be sent to the new server.
scenario 2 is when prior to receiving a request the ping thread detects that the server is not available. The connection to this server is then marked as bad, the request thread creates a new connection and all the subsequent requests will be sent over the new connection.

Regards,
Kar Meng
9. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Legacy User
Posted Mar 23, 2015 08:46 AM

Reply Reply Privately
hey Kar,

is the stats just for queue size?
If so, and the version is 6sp6, 12.0 sp3 or any 12.5, wouldn't is be more efficient to use the trace log given the count is displayed there?

just thinking that if he's in prod then less tax on the system is desirable.
10. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Karmeng
Posted Mar 23, 2015 07:28 PM

Reply Reply Privately
Hi Josh,

Not only the queue size, but also want to see the Connection. The stats provide more straight forward information and glad that Vivek has that run in the system every 10 minutes.
11. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Anon Anon
Posted Mar 27, 2015 01:41 PM

Reply Reply Privately
have you tried exploring thread pool

You can tweak the High Priority Thread pool; that should improve the connections management
12. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Anon Anon
Posted Mar 27, 2015 01:51 PM

Reply Reply Privately
if it is one time event not sure if you can reCreate it to test. ( considering you cannot bring down production and you can not bring traffic in DEV or QA )

if this is repeat event you might want to use Wily if available
You can also request CA to assist / provide the scripts collecting the dumps if you do not have them already
the dump analysis can help identify information
13. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Vivek_S
Posted Mar 27, 2015 01:57 PM

Reply Reply Privately
Thanks for the reply santosh.

Currently we dont have wily and since it was a one time outage only so its hard to reproduce.Moreover there are no queues w.r.t either High Priority on Normal priority.

Regarding High Priority Thread pool, can you please elaborate on it , are you talking about increasing the number of threads ?

Thanks
14. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Anon Anon
Posted Mar 27, 2015 02:45 PM

Reply Reply Privately
Vivek

Quick answer : yes I was talking about increasing the number of thread

Long answer: there is so many things to consider before you tweak your config

you certainly want the script in place to capture the policy server dump ; to give you data if any dump generated in future

High Priority Thread job is to ensure Policy server is able to manage connections where as Normal priority thread ensure authentications

if you foresee more connection issue in future you may want to review OS limitation, sockets and high priority thread count
15. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Anon Anon
Posted Mar 27, 2015 02:57 PM

Reply Reply Privately
I assume

a) you have multiple policy server and one of them crashed
b) certain set of agents and policy server stop working as expected
c) other servers worked as expected
d) the server support different geographical location or group of apps

is that correct ?
16. Re: Policy Server High Queue after network outage and AD timeout

0 Recommend
Vivek_S
Posted Mar 27, 2015 03:33 PM

Reply Reply Privately
Well nothing crashed but what happened is that we have some active directories located at a different GEO location than our policy servers , Now due to a network outage PS couldn't connect to the the AD's in question which is fine, but problem happened when network had recovered , even then SM was timing out to the active directories and i had to restart the PS to bring it back to a functional state.

And yes i totally agree tweaking the configuration is a complex activity with many variables.

Symantec Access Management

Policy Server High Queue after network outage and AD timeout

Vivek_SMar 18, 2015 09:53 AM

KarmengMar 19, 2015 01:53 AM

Vivek_SMar 19, 2015 08:40 AM

Legacy UserMar 19, 2015 09:10 AM

Vivek_SMar 19, 2015 09:16 AM

KarmengMar 22, 2015 09:24 PM

Vivek_SMar 23, 2015 08:32 AM

KarmengMar 23, 2015 07:25 PM

Legacy UserMar 23, 2015 08:46 AM

KarmengMar 23, 2015 07:28 PM

Anon AnonMar 27, 2015 01:41 PM

Anon AnonMar 27, 2015 01:51 PM

Vivek_SMar 27, 2015 01:57 PM

Anon AnonMar 27, 2015 02:45 PM

Anon AnonMar 27, 2015 02:57 PM

Vivek_SMar 27, 2015 03:33 PM

1. Policy Server High Queue after network outage and AD timeout

2. Re: Policy Server High Queue after network outage and AD timeout

3. Re: Policy Server High Queue after network outage and AD timeout

4. Re: Policy Server High Queue after network outage and AD timeout

5. Re: Policy Server High Queue after network outage and AD timeout

6. Re: Policy Server High Queue after network outage and AD timeout

7. Re: Policy Server High Queue after network outage and AD timeout

8. Re: Policy Server High Queue after network outage and AD timeout

9. Re: Policy Server High Queue after network outage and AD timeout

10. Re: Policy Server High Queue after network outage and AD timeout

11. Re: Policy Server High Queue after network outage and AD timeout

12. Re: Policy Server High Queue after network outage and AD timeout

13. Re: Policy Server High Queue after network outage and AD timeout

14. Re: Policy Server High Queue after network outage and AD timeout

15. Re: Policy Server High Queue after network outage and AD timeout

16. Re: Policy Server High Queue after network outage and AD timeout