Layer 7 Access Management

Tech Tip : CA Single Sign-On : SPS Crashing intermittently

  • 1.  Tech Tip : CA Single Sign-On : SPS Crashing intermittently

    Posted 05-06-2019 03:28 AM

    Issue:

     

    We're running CA Access Gateway (SPS) on Windows, this one crashes
    frequently showing a problem with the SspiCli.dll. A mdmp file is
    created too.

     

    Users report 503 error message in the browser while attempting to
    access application protected with CA Access Gateway (SPS).

    Crash and traces files show this :

     

    hs_err_pid1664.log

     

    # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000007fc135e9b5b,
    # pid=1664, tid=0x0000000000001f6c
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_172-b11) (build
    # 1.8.0_172-b11) Java VM: Java HotSpot(TM) 64-Bit Server VM
    # (25.172-b11 mixed mode windows-amd64 compressed oops) Problematic
    # frame: C [SspiCli.dll+0x9b5b]

     

    Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)

    J 5848
    com.netegrity.proxy.jagent.proxy.CSmJavaAgentFacadeProxyImpl.doJNIPr
    ocessRequest(Ljava/lang/String;Lcom/netegrity/proxy/jagent/JavaSeria
    lizedAgentData;)I (0 bytes) @ 0x00000000025da0a3 [0x00000000025da040
    +0x63]

     

    J 7625 C2
    com.netegrity.proxy.ProxyValve.processRequest(Lorg/apache/catalina/c
    onnector/Request;Lorg/apache/catalina/connector/Response;Lcom/netegr
    ity/proxy/VirtualHost;Ljava/lang/String;Z)V
    (1967 bytes) @ 0x0000000002cb3b3c [0x0000000002cb2c00+0xf3c]

    time: Tue Mar 26 08:46:39 2019

     

    Debug Diag

     

    In hs_err_pid .mdmp the assembly instruction at
    sspicli!AcceptSecurityContext+e6 in C:\Windows\System32\sspicli.dll
    from Microsoft Corporation has caused an access violation exception
    (0xC0000005) when trying to read from memory location 0x00002744 on

    thread 31

     

    Visual Studio

     

    Unhandled exception at 0x00007FFEECA1F586 (sspicli.dll) in
    hs_err_pid1224.mdmp: 0xC0000005: Access violation reading location
    0x0000272700002744. occurred

     

    The process crash at ntlm authentication :

     

    SPStrace.log :

     

    [03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
    6b-2eba63f9-e][IsResourceProtected][Resource
    is protected from Policy Server.]

     

    [03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
    6b-2eba63f9-e][ProcessResponses][Calling
    SM_WAF_HTTP_PLUGIN->ProcessResponses.]

     

    [03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
    6b-2eba63f9-e][CSmHttpPlugin::ProcessResponses][Processing
    IsProtected responses.]

     

    [03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
    6b-2eba63f9-e][ProcessResponses][SM_WAF_HTTP_PLUGIN->ProcessResponses
    returned SmSuccess.]

     

    [03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
    6b-2eba63f9-e][ProcessResponses][Calling
    SM_WAF_AG_PLUGIN->ProcessResponses.]

    [03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
    6b-2eba63f9-e][ProcessResponses][SM_WAF_AG_PLUGIN->ProcessResponses
    returned SmNoAction.]

     

    [03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
    6b-2eba63f9-e][CSmCredentialManager::GatherAdvancedAuthCredentials]
    [Calling SM_WAF_HTTP_PLUGIN->ProcessAdvancedAuthCredentials.]

     

    [03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
    6b-2eba63f9-e][SmNtc::getCredentials][user-agent
    received Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64;
    Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR
    3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)]

    [03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
    6b-2eba63f9-e][SmNtc::getCredentials][Request
    for SSPI NTLM Authentication]

     

    [03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
    6b-2eba63f9-e][DeleteCookie][Deleted
    cookie 'SM_NTLMCTX'.]

    How can we solve this ?

    Environment:
    SPS / Access Gateway 12.7 all service packs, 12.8 all service packs.
    Windows OS.
    Using IWA Authentication or IWA Failover to Forms.

    Cause:

     

    The problem of the crash in SspiCli.dll is due to a problem in the
    Microsoft code. As Microsoft haven't given any solution so far, to
    bypass this, you have to configure the sticky bit on the loadbalancer
    and add the ACO parameter "usentlmmapforntlmauth" and set it to "yes".

    This would prevent the load balancer to forward the NTLM type 1
    authentication request from a browser (or other client) to one SPS
    box, and then forward the continuation of the authentication process,
    the NTLM type 3 request to a different SPS box.

     

    About the "usentlmmapforntlmauth=yes" :

     

    When this is set to yes, then the SPS will use an internal map to
    track NTLM requests types. If an NTLM type 3 request is sent to the
    SPS, but this SPS did not receive a prior NTLM type 1 request from the
    same client in this authentication flow, it will treat the NTLM
    request as type 1. Thus, CA SSO will not send out of sequence messages
    to the AcceptSecurityContext() function, avoiding the crash.

    Here's a sample how to trouble shoot and see this behavior :

    The code stack SspiCli.dll+0xf586 or sspicli!AcceptSecurityContext+e6,
    via a code review shows that the NTLM Authentication was received by
    the crashing process out of order.

     

    For example, the AUTHENTICATE_MESSAGE is recieved by the Access
    Gateway server for a request prior to the NEGOTIATE_MESSAGE

    The NTLM Authentication Protocol consists of three message types used
    during authentication and one message type used for message integrity
    after authentication has occurred. The authentication messages:

     

    NEGOTIATE_MESSAGE (2.2.1.1)
    CHALLENGE_MESSAGE (2.2.1.2)
    AUTHENTICATE_MESSAGE (2.2.1.3)

     

    This "Out of order" flow is a symptom of a network load balancer or
    similar device in front of the Access Gateway Server not configured as
    needed for Sticky Sessions.

     

    To troubleshoot this issue, we saw that during
    the flow of the NTLM Authentication, the requests were sent to more
    then one SPS / Access Gateway in the Server Farm.

    We made the following changes to each Apache instance within SPS on
    the servers to generate a unique header.

     

    EXAMPLE:

     

    In the httpd.conf file (\CA\secure-proxy\httpd\conf)


    #Adding load headers_module for testing remove after
    LoadModule headers_module modules/mod_headers.so

    <IfModule headers_module>
    #RequestHeader unset DNT env=bad_DNT
    Header set ServerName "SPSSVR01"
    </IfModule>

     

    NOTE: The Access Gateway services need to be restarted after making this chance.

    During a replication of this issue. We can see the header created by Apache changes during the NTLM Authentication Flow.

     

    Example:

     

    ServerName: SPSSVR01

    then on the next response we would see

    ServerName: SPSSVR02

    This showed that the load balancer in front of the Access Gateway servers generate a sticky session for the requests.

     

    Resolution:

     

    Set the Loadbalancer Sticky Bit and add ACO parameter
    usentlmmapforntlmauth=yes in the CA Access Gateway (SPS) agent
    configuration object to solve the issue.

     

    On F5 loadbalancer, set Sticky Sessions / Session Persistence / Sicky-bit.
    On ProxySG, set "cookie persistence".

     

    Additional Information:


    https://httpd.apache.org/docs/2.4/mod/mod_headers.html

     

    KB : KB000131594