Issue:
We're running CA Access Gateway (SPS) on Windows, this one crashes
frequently showing a problem with the SspiCli.dll. A mdmp file is
created too.
Users report 503 error message in the browser while attempting to
access application protected with CA Access Gateway (SPS).
Crash and traces files show this :
hs_err_pid1664.log
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000007fc135e9b5b,
# pid=1664, tid=0x0000000000001f6c
#
# JRE version: Java(TM) SE Runtime Environment (8.0_172-b11) (build
# 1.8.0_172-b11) Java VM: Java HotSpot(TM) 64-Bit Server VM
# (25.172-b11 mixed mode windows-amd64 compressed oops) Problematic
# frame: C [SspiCli.dll+0x9b5b]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 5848
com.netegrity.proxy.jagent.proxy.CSmJavaAgentFacadeProxyImpl.doJNIPr
ocessRequest(Ljava/lang/String;Lcom/netegrity/proxy/jagent/JavaSeria
lizedAgentData;)I (0 bytes) @ 0x00000000025da0a3 [0x00000000025da040
+0x63]
J 7625 C2
com.netegrity.proxy.ProxyValve.processRequest(Lorg/apache/catalina/c
onnector/Request;Lorg/apache/catalina/connector/Response;Lcom/netegr
ity/proxy/VirtualHost;Ljava/lang/String;Z)V
(1967 bytes) @ 0x0000000002cb3b3c [0x0000000002cb2c00+0xf3c]
time: Tue Mar 26 08:46:39 2019
Debug Diag
In hs_err_pid .mdmp the assembly instruction at
sspicli!AcceptSecurityContext+e6 in C:\Windows\System32\sspicli.dll
from Microsoft Corporation has caused an access violation exception
(0xC0000005) when trying to read from memory location 0x00002744 on
thread 31
Visual Studio
Unhandled exception at 0x00007FFEECA1F586 (sspicli.dll) in
hs_err_pid1224.mdmp: 0xC0000005: Access violation reading location
0x0000272700002744. occurred
The process crash at ntlm authentication :
SPStrace.log :
[03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
6b-2eba63f9-e][IsResourceProtected][Resource
is protected from Policy Server.]
[03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
6b-2eba63f9-e][ProcessResponses][Calling
SM_WAF_HTTP_PLUGIN->ProcessResponses.]
[03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
6b-2eba63f9-e][CSmHttpPlugin::ProcessResponses][Processing
IsProtected responses.]
[03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
6b-2eba63f9-e][ProcessResponses][SM_WAF_HTTP_PLUGIN->ProcessResponses
returned SmSuccess.]
[03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
6b-2eba63f9-e][ProcessResponses][Calling
SM_WAF_AG_PLUGIN->ProcessResponses.]
[03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
6b-2eba63f9-e][ProcessResponses][SM_WAF_AG_PLUGIN->ProcessResponses
returned SmNoAction.]
[03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
6b-2eba63f9-e][CSmCredentialManager::GatherAdvancedAuthCredentials]
[Calling SM_WAF_HTTP_PLUGIN->ProcessAdvancedAuthCredentials.]
[03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
6b-2eba63f9-e][SmNtc::getCredentials][user-agent
received Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64;
Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR
3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)]
[03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
6b-2eba63f9-e][SmNtc::getCredentials][Request
for SSPI NTLM Authentication]
[03/26/2019][08:46:37][1664][8044][68e29997-4d53e723-c3b63f32-afb9a7
6b-2eba63f9-e][DeleteCookie][Deleted
cookie 'SM_NTLMCTX'.]
How can we solve this ?
Environment:
SPS / Access Gateway 12.7 all service packs, 12.8 all service packs.
Windows OS.
Using IWA Authentication or IWA Failover to Forms.
Cause:
The problem of the crash in SspiCli.dll is due to a problem in the
Microsoft code. As Microsoft haven't given any solution so far, to
bypass this, you have to configure the sticky bit on the loadbalancer
and add the ACO parameter "usentlmmapforntlmauth" and set it to "yes".
This would prevent the load balancer to forward the NTLM type 1
authentication request from a browser (or other client) to one SPS
box, and then forward the continuation of the authentication process,
the NTLM type 3 request to a different SPS box.
About the "usentlmmapforntlmauth=yes" :
When this is set to yes, then the SPS will use an internal map to
track NTLM requests types. If an NTLM type 3 request is sent to the
SPS, but this SPS did not receive a prior NTLM type 1 request from the
same client in this authentication flow, it will treat the NTLM
request as type 1. Thus, CA SSO will not send out of sequence messages
to the AcceptSecurityContext() function, avoiding the crash.
Here's a sample how to trouble shoot and see this behavior :
The code stack SspiCli.dll+0xf586 or sspicli!AcceptSecurityContext+e6,
via a code review shows that the NTLM Authentication was received by
the crashing process out of order.
For example, the AUTHENTICATE_MESSAGE is recieved by the Access
Gateway server for a request prior to the NEGOTIATE_MESSAGE
The NTLM Authentication Protocol consists of three message types used
during authentication and one message type used for message integrity
after authentication has occurred. The authentication messages:
NEGOTIATE_MESSAGE (2.2.1.1)
CHALLENGE_MESSAGE (2.2.1.2)
AUTHENTICATE_MESSAGE (2.2.1.3)
This "Out of order" flow is a symptom of a network load balancer or
similar device in front of the Access Gateway Server not configured as
needed for Sticky Sessions.
To troubleshoot this issue, we saw that during
the flow of the NTLM Authentication, the requests were sent to more
then one SPS / Access Gateway in the Server Farm.
We made the following changes to each Apache instance within SPS on
the servers to generate a unique header.
EXAMPLE:
In the httpd.conf file (\CA\secure-proxy\httpd\conf)
#Adding load headers_module for testing remove after
LoadModule headers_module modules/mod_headers.so
<IfModule headers_module>
#RequestHeader unset DNT env=bad_DNT
Header set ServerName "SPSSVR01"
</IfModule>
NOTE: The Access Gateway services need to be restarted after making this chance.
During a replication of this issue. We can see the header created by Apache changes during the NTLM Authentication Flow.
Example:
ServerName: SPSSVR01
then on the next response we would see
ServerName: SPSSVR02
This showed that the load balancer in front of the Access Gateway servers generate a sticky session for the requests.
Resolution:
Set the Loadbalancer Sticky Bit and add ACO parameter
usentlmmapforntlmauth=yes in the CA Access Gateway (SPS) agent
configuration object to solve the issue.
On F5 loadbalancer, set Sticky Sessions / Session Persistence / Sicky-bit.
On ProxySG, set "cookie persistence".
Additional Information:
https://httpd.apache.org/docs/2.4/mod/mod_headers.html
KB : KB000131594