Layer7 Access Management

Expand all | Collapse all

LLA: SiteMinder Agent Api function failed - intermittent 500 errors

  • 1.  LLA: SiteMinder Agent Api function failed - intermittent 500 errors

    Posted 03-29-2019 04:52 PM

    Hi,

     

    We started received intermittent 500 error for our website which is using Siteminder, after restarting webserver instances it got resolved.  While analyzing the root cause we found following error in SM logs

     

    [8395/703276800][Tue Mar 26 2019 04:57:05][CSmLowLevelAgent.cpp:546][ERROR][sm-AgentFramework-00520] LLA: SiteMinder Agent Api function failed - 'Sm_AgentApi_IsProtectedEx' returned '-1'.

    [8395/703276800][Tue Mar 26 2019 04:57:05][CSmProtectionManager.cpp:192][ERROR][sm-AgentFramework-00420] HLA: Component reported fatal error: 'Low Level Agent'.

    [8395/703276800][Tue Mar 26 2019 04:57:05][CSmHighLevelAgent.cpp:423][ERROR][sm-AgentFramework-00420] HLA: Component reported fatal error: 'Protection Manager'.

     

    It looks like problem due to policyserver and webagent.  We also see following message in server during that time

    SiteMinder Agent

            SiteMinder agent is enabled.

    SiteMinder Agent

            SiteMinder agent is running.

    It looks like SM agent got rebooted due to policy server error but some webinstances become not responsive.

     

    The question we like to understand is, will SM agent auto reboot if it fails due to policy server error? or is it possible due to this failure some webagent still not recovered after policy server failure solved and kept throwing 500 until webinstance reboot?

     

    Please help to understand.

     

    Thank you

     

    Regards,

    Vinoth Marimuthu



  • 2.  Re: LLA: SiteMinder Agent Api function failed - intermittent 500 errors

    Posted 04-01-2019 10:45 AM

    Hello Vinoth, 

    There are a couple of reasons why an agent might not issue the "'Sm_AgentApi_IsProtectedEx' returned '-1'." message. 

     

    First review the following KB for further information regarding the error message. 

    https://comm.support.ca.com/kb/smagentapi-errors/kb000045157

     

    As the article discusses, It could be related to network issues and normally is. 

     

    However, it could also be related to hitting the 'maxconnections" setting on the policy server. 

    https://docops.ca.com/ca-single-sign-on/12-8/en/using/policy-server-management-console/#PolicyServerManagementConsole-ManagementConsole--SettingsTab

     

    Connection Options Group Box

    This group box allows you to specify the maximum number of Policy Server threads, and the idle timeout for a connection to the Policy Server.

    • Max Connections field
      Indicates the maximum number of connections supported by the Policy Server, independent of the number of threads. All connections share the thread pool to fulfill requests.
      Default: The default value is 256. This number can be increased significantly, especially in deployments with the following: Apache Web servers protected by web agents and IIS Web servers using virtual servers protected by web agents.

    Try this..
    Enable the stats command on the policy servers if not already and monitor to confirm that the policy servers are not hitting the configured "max connections" value. 

    https://comm.support.ca.com/kb/policy-server-stats-information/kb000015867

     

    Also enable the connections trace on the web agents in question. 
    Collect Detailed Agent Connection Data with an Agent Connection Manager Trace Log

    https://docops.ca.com/ca-single-sign-on/12-8/en/configuring/web-agent-configuration/logging-and-tracing/how-to-set-up-trace-logging#HowtoSetUpTraceLogging-CollectDetailedAgentConnectionDatawithanAgentConnectionManagerTraceLog

     

    Together with this information, you might be able to determine if this issue is related to network issues or policy server configuration of the max connections. 

     

    One last bit of information, if your policy servers are on linux, then you might want to check entropy as well. 

    Agent connection issues is one of the symptoms of low entropy in a Linux environment. 

     

    If you need further assistance in reviewing this type of an issue, please open a new case with Broadcom SSO Support. 

    James Atchley
    Principal Support Engineer  - SSO 
    CA a Broadcom Company



  • 3.  Re: LLA: SiteMinder Agent Api function failed - intermittent 500 errors

    Posted 04-01-2019 11:04 AM

    Hi James,

     

    Thank you for the detailed response, Policy server in our case is maintained by different team, we are trying to work with them to try the recommendations are already in place.

     

    There are two questions which we try to get answer.

     

    1. We noticed from SM log that the agent restarted, is that normal behavior?

    2. We have multiple instances, looks like one among them could not re-establish connection, once restart webserver instances it is all worked fine.  Is there a chance some instance may not able to automatically reconnect after policy server issue resolved?

     

    These are the questions our technical team try to understand more about the SM agent behavior.  If you can help us to understand that will be great help. 

     

    Thank you

     

    Regards,

    Vinoth Marimuthu



  • 4.  Re: LLA: SiteMinder Agent Api function failed - intermittent 500 errors

    Posted 04-01-2019 11:28 AM

    I"ll try to answer the remaining question bellow. 

     

    1. We noticed from SM log that the agent restarted, is that normal behavior?
    I would want to validate that in the logs. Did the agent crash? or stop? 

    If unable to connect to a policy server and all requests have timed out, would fail to start or come to stop gracefully. 

    However, if it restarting, a crash condition might have occurred? 

     

    What's the release of the agent? 

     

    2. We have multiple instances, looks like one among them could not re-establish connection, once restart webserver instances it is all worked fine.  Is there a chance some instance may not able to automatically reconnect after policy server issue resolved?

    Again, this might be related to configuration of the policy server connections or it might be related to the agent configuration. 

    For instance, you need to have a unique value for "ServerPath" in the webagent.conf file. 

    Also, adding 'agentwaittime' = N (where N is a value of 30x number of Policy Server in the boot strap. + 10) to the webagent.conf file might assist in the agents timing out during boot strap under network latency. 

    If you have 6 servers defined in the SMHOST.conf file, then (6*30) +10 = 190. agentwaittime=190

    Also, if you have the Policy Server listed as FQDN consider testing with IP addresses instead. 

     

    In summary, I would .. .
    1.  check the release notes for code issues within your release of the agent. 

    https://docops.ca.com/ca-single-sign-on/12-52-sp1/en/release-notes/cumulative-releases

    2. remove DNS resolution from the equation by using IPs for the policy server connections by the agent in the SMHOST.conf files. 

    3. Confirm that the value for "ServerPath" is unique for each instance.  

    note: Serverpath does NOT need to be "Actual directory Path", you can use any unique value. For instance "ServerPath=MyAppName

    4: Add "agentwaittime" to the webagent.conf. 

    https://docops.ca.com/ca-single-sign-on/12-52-sp1/en/configuring/web-agent-configuration/basic-agent-setup-and-policy-server-connections#BasicAgentSetupandPolicyServerConnections-AccommodateNetworkLatency

     

    Does this help? 

     

    -James 



  • 5.  Re: LLA: SiteMinder Agent Api function failed - intermittent 500 errors

    Posted 04-01-2019 12:44 PM

    Hi James,

    Thanks again,

     

    Here is an extract from one of our agent log

     

    We are running version 12.52

     

    [21327/4160678352][Tue Mar 26 2019 03:57:10][CSmHighLevelAgent.cpp:206][INFO][sm-AgentFramework-00390] HLA: Stopping.
    [21327/4160678352][Tue Mar 26 2019 03:57:11][SmPlugin.cpp:103][INFO][sm-AgentFramework-00180] Agent Framework plug-in 'SM_WAF_HTTP_PLUGIN' shutdown.
    [21327/4160678352][Tue Mar 26 2019 03:57:11][SmAgentAPI.cpp:1703][INFO][sm-AgentFunc-00040] Agent API has been released.
    [31698/4160670528][Tue Mar 26 2019 04:47:42][LLAWorkerProcess.cpp:1893][WARNING][sm-AgentFramework-00700] LLAWP: DoManagement lost connection to Policy Server.
    [13956/62794608][Tue Mar 26 2019 10:20:05][CSmLowLevelAgent.cpp:5207][INFO][sm-AgentFramework-00510] LLA: Logging initialized.

     

    It looks like agent stopped, I am not sure it is graceful shutdown or crash.  and in few mins it started getting requests.

     

    With the help of our middleware team i am trying to get the configurations verified.

     

    Regards,

    Vinoth Marimuthu