Summary:
In this guide, we will discuss about the steps performed during web agent initialization.
Then, we will also deep dive into some of the common agent initialization issues and discuss approaches to troubleshoot and resolve theses issues.
Environment:
- Web Agent : 12.5 and above
- OS : ANY
For this tech tip , we will test on following platform :
- Web agent version : 12.52 SP1 CR7
- Web Server : Apache 2.2
- Web Server OS : RHEL 6.5 64 bit
Web Agent Startup Process
On the high level the web agent startup process happens in the following order :
- Read WebAgent.conf
- Locate the path to the SmHost.conf file from WebAgent.conf and read SmHost.conf
- Identify the following details from SmHost.conf :
- Policy server IP ( this policy server is used only for the initial bootstrapping)
- Shared Secret
- Trusted Host Name (hostname)
- Host configuration object (HCO)
#agentname="<AgentName>, <IPAddress>"
HostConfigFile="/opt/CA/webagent/config/SmHost.conf"
AgentConfigObject="aco_rhel65"
EnableWebAgent="YES"
ServerPath="/etc/httpd/conf"
#localconfigfile="/etc/httpd/conf/LocalConfig.conf"
LoadPlugin="/opt/CA/webagent/bin/libHttpPlugin.so"
#LoadPlugin="/opt/CA/webagent/bin/libSessionLinkerPlugin.so"
#LoadPlugin="/opt/CA/webagent/bin/libAffiliate10Plugin.so"
#LoadPlugin="/opt/CA/webagent/bin/libSAMLAffiliatePlugin.so"
#LoadPlugin="/opt/CA/webagent/bin/libeTSSOPlugin.so"
#LoadPlugin="/opt/CA/webagent/bin/libIntroscopePlugin.so"
#LoadPlugin="/opt/CA/webagent/bin/libSAMLDataPlugin.so"
#LoadPlugin="/opt/CA/webagent/bin/libOpenIDPlugin.so"
#LoadPlugin="/opt/CA/webagent/bin/libDisambiguatePlugin.so"
#LoadPlugin="/opt/CA/webagent/bin/libOAuthPlugin.so"
#LoadPlugin="/opt/CA/webagent/bin/libCertSessionLinkerPlugin.so"
AgentIdFile="/etc/httpd/conf/AgentId.dat"
Figure : WebAgent.conf
hostname="th-rhel65-4"
sharedsecret="{RC2}ovOEr7teMKP9xpKisg157/t4T1tqGXwNT0SWGsfi1QnajkcDjumFEmF9kBbw1d2MZb8CGf2ueSfWkfKmZEWrYVeM3hZ/HRI1F2bh6v4lBQq9uqi0Lp2iIQJb0flOY2oUGLmDE/iZFDQt3ceo7aCij9YQY72YD2iLLtJTKJGlu6e2nJcMzGlf2roiNfd2MwMV"
sharedsecrettime="0"
enabledynamichco="NO"
hostconfigobject="hco"
#Add additional bootstrap policy servers here for fault tolerance.
policyserver="shruj01-i1849.ca.com,44441,44441,44441"
requesttimeout="60"
cryptoprovider="ETPKI"
fipsmode="COMPAT"
Figure : SmHost.conf
- Establish Agent API connection with the Policy server listed in the SmHost.conf file. This includes 3 way handshake (more details below ).
- Read HCO (Host configuration object) from the policy server/policy store.
- Establish the Agent API connection with the primary policy server listed in the HCO ( this could be same or different from the bootstrap policy server as listed in SmHost.conf)
- Once the connection is established with the primary policy server listed in the HCO , read ACO (Agent configuration object as per the WebAgent.conf)
- Initialize agent log/trace file as per the ACO configuration.
Three way agent to policy server handshake
- Agent opens a TCP socket connection with the policy server.
- Agent sends a Hello message which includes following info among other details :
- MD5 Hash of shared secret and trusted host name combined (stronger encryption is used if using FIPS only mode)
- Trusted host name
- Policy server validates the shared secret based on the trusted host name passed. It may validate both current and previous shared secret from the policy store against the shared secret sent by agent. If the shared secret validation is successful , policy server sends Hello Reply message which consists of following info among other details :
- Session Keys
- New shared secret (optional - this is sent only if the agent currently doesn't have the current shared secret)
- New shared secret generated time (optional)
- Agent sends Hello Confirm message encrypted with the Session Keys previously sent by Policy server.
- (Optional ) Agent updates the SmHost.conf file with the new shared secret.
Figure : Three way agent to Policy server handshake
Web Agent Initialization Error Codes :
Code Meaning
00 00 00 00 Debug version of SiteMinder agent is running.
01 00 00 00 Unable to determine SiteMinder agent configuration file path.
02 00 00 00 Unable to open SiteMinder agent configuration file or file is corrupt.
03 00 00 00 Unable to load SiteMinder host configuration object or host configuration file.
04 00 00 00 Unable to load SiteMinder agent configuration object.
05 00 00 00 Unable to load SiteMinder local agent configuration file or file is corrupt.(EG: Web Server user does not have permissions on the Web Agent repositories & files.)
06 00 00 00 SiteMinder agent has encountered initialization errors and is exiting.
07 00 00 00 SiteMinder agent has encountered initialization errors and will not service requests.
08 00 00 00 SiteMinder agent is not enabled.
09 00 00 00 SiteMinder agent is enabled.
10 00 00 00 DefaultUserName configured for agent cannot logon to the web server. Please provide a new user name or password through central agent configuration or in the local configuration file. The current user name configured is shown below.
11 00 00 00 Secure credential cache has failed to start. The data is the error code. Please check the System events for problems with service startup.
12 00 00 00 SiteMinder agent is running.
13 00 00 00 There was an error allocating memory for the base configuration object.
14 00 00 00 Sm_AgentApi_Init Failed.
15 00 00 00 Failed to Start the LLAWP process.
16 00 00 00 Resource cache failed to initialize.
17 00 00 00 Session cache failed to initialize.
18 00 00 00 Failed to send message to the LLAWP.
19 00 00 00 Failed to initialize the message bus.
20 00 00 00 Failed to initialize the log queue.
21 00 00 00 Failed to initialize the configuration manager.
22 00 00 00 Server already running.
23 00 00 00 Unable to open file.
24 00 00 00 Configuration file path:
25 00 00 00 Failed to send close message to LLAWP.
26 00 00 00 LogonUser failed for specified user shown below.
27 00 00 00 Invalid character found in the server path variable. Make sure that alphanumeric values are used. the invalid character shown below.
28 00 00 00 Message bus already initialized.
29 00 00 00 PID Cache error.
30 00 00 00 Resource cache re-initialized.
31 00 00 00 Session cache re-initialized.
32 00 00 00 Web-agent process is exiting...
ff ff ff ff unable to get the HostConfigurationObject from any Policy Server
Basic troubleshooting:
telnet <policyserverIP> <policy server ports>
Test connectivity to all the ports - accounting, authentication , authorization
Advance troubleshooting:
Following logs will be required to be analyzed for advanced troubleshooting:
Linux :
- web server error/startup log
- policy server logs (smps.log) and trace log (smtracedefault.log)
At minimum use following profiler for policy server trace log :
components: AgentFunc, Server/Connection_Management, Server/Policy_Server_General, Tunnel_Service
data: Date, Time, Pid, Tid, TransactionID, SrcFile, Function, User, UserDN, Directory, SessionID, SessionSpec, ErrorValue, ErrorString, Realm, Resource, Action, Rule, Policy, Domain, Message, PreciseTime, ReturnValue, Group, AgentName, AgentType, ObjectClass, DomainOID, SearchKey, ObjectOID, Property, IPAddr, IPPort, AuthStatus, AuthReason, AuthScheme, CertSerial, SubjectDN, IssuerDN, CertDistPt, RealmOID, State, ClusterID, HandleCount, FreeHandleCount, BusyHandleCount, ResponseTime, Throughput, MaxThroughput, MinThroughput, Threshold, TransactionName, Data, HexadecimalData, Query, ActiveExpr, CallDetail, RequestIPAddr, Returns, Expression, Result, CacheHits, CacheSize, RefCount, ExecutionTime, Tenant
version: 1.1
strace -Ff -t -i -v -o strace.log -s 16384 apachectl start
tcpdump -i <network interface> -s 65535 -w <some-file.pcap>
To find the available network interface, you can run following command : ifconfig
[root@rhel65_3 ~]# ifconfig
eth2 Link encap:Ethernet HWaddr 00:50:56:21:B6:A8
inet addr:155.35.245.220 Bcast:155.35.245.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:fe21:b6a8/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:432695 errors:0 dropped:0 overruns:0 frame:0
TX packets:3238 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:46333929 (44.1 MiB) TX bytes:926017 (904.3 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:31 errors:0 dropped:0 overruns:0 frame:0
TX packets:31 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2012 (1.9 KiB) TX bytes:2012 (1.9 KiB)
[root@rhel65_3 ~]#
then, run the tcpdump command as below :
[root@rhel65_1 Desktop]# tcpdump -i eth2 -s 65535 -w watopsnetworkcapture.pcap
tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
^C244 packets captured
248 packets received by filter
0 packets dropped by kernel
[root@rhel65_1 Desktop]#
- web agent logs & trace ( note : if there is an agent initialization issue, these logs will most likely not be created as they are created only at the end of init process, however it is good to get this configured just in case )At minimum enable following profiler for web agent trace :
components: AgentFramework, HTTPAgent, WebAgent, Agent_Con_Manager
data: Date, Time, Pid, Tid, SrcFile, Function, ResponseTime, IPAddr, IPPort, AgentName, Resource, User, Threshold, Throughput, MinThroughput, MaxThroughput, HandleCount, BusyHandleCount, FreeHandleCount, State, ClusterID, Message
Solaris/AIX :
All the logs from Linux except strace logs are applicable for Solaris/AIX based system.
The Linux's strace equivalent is truss in Solaris.
You can capture truss output from web server startup as below :
truss -a -e -f -D -l -o /tmp/truss.out -rall -wall <command to start webserver>
Windows:
- For windows the equivalent of strace is process monitor logs (procmon.exe)
Process Monitor - Windows Sysinternals | Microsoft Docs
- TCPDump can also be replaced with the wireshark network capture.
Wireshark · Go Deep.
- Event Viewer would also be helpful.
Commons Issues :
Problem : Policy server smps.log shows following error :
[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2035][ERROR][sm-Tunnel-00010] Bad security handshake attempt. Handshake error: 3154
[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2046][ERROR][sm-Tunnel-00050] Handshake error: Shared secret incorrect for this client
[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2207][ERROR][sm-Server-01070] Failed handshake with 155.35.245.220:50711
The web server log (Apache for our test) shows agent initialization errors:
[04/Sep/2017:23:41:45] [Error] SiteMinder Agent
Unable to load SiteMinder host configuration object or host configuration file.
/opt/CA/webagent/config/SmHost.conf
06 00 00 00
[04/Sep/2017:23:41:45] [Error] SiteMinder Agent
Failed to initialize the configuration manager.
LLAWP unable to get configuration, exiting.
nm: '/etc/httpd/bin/httpd': No such file
No Agent Log/Trace is created.
Cause :
If no changes has been done either in the policy server side or on the agent side since the last working state, then this error indicate a possible change in the hostid (for unix based system) on the web agent side.
On all non-Windows platforms, the agent code used to encrypt and decrypt the shared secret uses a key that is derived from a hard coded value (Web Agent Host Key) combined with the results of calling gethostid() on the platform in question. gethostid() is a standard C Library function that returns a 32-bit long value.
Different UNIX system implements this function differently. For e.g Linux, AIX and solaris , the system implementation for the gethostid() C library function is not the same.
As such, SiteMinder web agent might not be able to decrypt the shared secret generated in one UNIX system when moved to other system. Not only that, if the host ID of the same system changes (due to change in IP, hostname etc ) , the webagent will not be able to decrypt the shared secret which was originally generated on the same system.
Testing :
Set
- hostname = rhel65_1.ca.com
- IP =192.168.0.6 (in the hosts file)
hostid gives output as a8c00600
Agent initializes fine with the 3 way agent to PS handshake being successful
Now, change the IP address to 192.168.0.7 in the hosts file with everything else remaining the same.
This time hostid command gives a different result : a8c00700
3 way agent to PS handshake now fails with the following error in smps.log :
[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2035][ERROR][sm-Tunnel-00010] Bad security handshake attempt. Handshake error: 3154
[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2046][ERROR][sm-Tunnel-00050] Handshake error: Shared secret incorrect for this client
[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2207][ERROR][sm-Server-01070] Failed handshake with 155.35.245.220:50711
Also, note in the network capture, the encrypted data in front of the trusted hostname is now different.
If the shared secret+trusted hostname+hostid combination is same, the encrypted data should remain same.
Resolution :
As we saw above, a simple change in the IP address resulted in the change in the hostid in RHEL system. This in turn invalidated the shared secret stored in SmHost.conf. There could be more factor contributing to the change in the hostid which is dependent on the platform being used.
The only way to fix this issue is by re-registering the trusted host or reverting to the previous hostid( reverting to previous IP address in this case).
smreghost -i <policyserver_ip>:44441,44442,44443 -u "siteminder" -p <siteminder super user password> -hn <trustedhost> -hc <hco> -cf COMPAT -f <Path_To_SmHost.conf> -o
e.g
[root@rhl65 bin]# pwd
/opt/CA/webagent/bin
[root@rhl65 bin]# smreghost -i shruj01-i1849.ca.com:44441,44442,44443 -u "siteminder" -p "siteminder" -hn th-rhel65-5 -hc hco -cf COMPAT -f /opt/CA/webagent/config/SmHost.conf -o
Host Registration written to '/opt/CA/webagent/config/SmHost.conf'.
[root@rhl65 bin]#
If it is expected to have frequent changes in the web server IP address (due to reboot/change in network interface/dns server etc.) , it is recommended to specify a static hostid.
In RHEL you can do this by running command : genhostid. The static hostid is then stored in /etc/hostid file.
(Please refer to your respective OS documentation on how to set static hostid)
To be continued....