We have three Apache Servers behind a F5 VIP (LB) connecting to two Policy Server's. We have interval Poll at F5 set at 17 Seconds, and PS Poll Interval in ACO set at 30 Seconds. F5 has a three time Heart Beat check , that is 52 seconds before marking the Member down in a Pool. As part of our Monthly Security patches on our Linux server's we reboot the Policy servers once a month. We reboot one policy server at a time with 30 minutes apart between the two policy servers.
The F5 has a protected heart beat, and waits for a confirmation of 200 code, to mark the member as up. We are seeing that as part of the policy server reboot, the F5 is marking one of the apache down ,as it is getting a HTTP 500 Error (Internal Server Error), though the other Policy Server is up and running.
Can you please let us know if we can do anything in Siteminder to avoid this.
Avi, You can start with Apache web server error log, examine it and map the time of error into Apache agent log/ trace and policy server log/ trace to see if any more specific clues can be found. 500 error is a very general error.
Thanks & Rgds, - Vijay
The logs got rolled over, but when we took a look at the logs also it did not tell much, one thing to be noted is that, the LLAWP Process ID did not change, or there was no knowledge of LLAWP restarting in the logs.
What is the version of webagent ?
Are the Policyservers in LB or Failover mode,(HCO setting) ?
Is this happening only when restarting the second PS (as listed in HCO) ?
Please find the answers below:
What is the version of webagent ? - R 12.5 CR 4
Are the Policyservers in LB or Failover mode,(HCO setting) ?- It is in Fail Over Mode
Is this happening only when restarting the second PS (as listed in HCO) ? - This is happening when we restart the first server also.
It sounds to me, your PS failover is not happening ? Was this failover setup tested earlier in this environment ?
Are you using the right HCO in SMHost.conf file which lists both the Policyservers ?
By any chance, did you add the second PS server recently to the HCO and have not restarted webagents after the change ?
If you verified all the basic configurations and everything looks good and still facing issue, I would suggest to open a support ticket as there could be multiple reasons for this failure. I found the below defect which may be related to your issue, just wanted to highlight it here.
Defects Fixed in 12.52 SP1 CR06 - CA Single Sign-On - 12.52 SP1 - CA Technologies Documentation
Web Agent is not failing back to the first Policy Server and requests are not processed successfully when starting the first Policy Server.
Thanks a lot Team. I have opened a Support Ticket also. This is the Fail over Setup we have done:
With Failover Threshold Percent set to 0%. This is the same in the lower environment also. Should I go ahead and change this? I think this has been happening in the past also, but we did not have Monitoring in Place, at that point of time, and last month we have set up Monitoring for this alert and that is the reason this is being noticed I think so.
Policy Server 01,44443;Policy Server 02,44443
I will also go over the defect List. Thanks a lot.