I have a server that is closing network connections, after the last weekend VMware suspensions.
Before when the servers were being rebooted I would just restart my agent servers and all would be well.
Now, when the app server ExecutionServer was suspended software that was running on it did not recover.
In an effort to fix the issue, I ran ipconfig /flushdns on the app server and the client.
After running flushDNS, before reboot of agent (current state of agent on AGNETHOST):
2018-05-02 12:43:23,964 [KeepAliveWorker-5647] WARN (com.nolio.nimi.comm.impl.NetworkConnectionManagerImpl:281) - could not create connection to [ExecutionServer/OLDIPADDRESSSOFExecutionSERVER:6600] :
java.net.ConnectException: Connection timed out: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
(stack continues)
After running flushDNS, and a reboot of agent (current state of agent on AGENTHOST):
2018-05-02 13:35:54,930 [PeriodicChecks] INFO (com.nolio.nimi.appmsg.durability.PeriodicChecks:89) - Connected supernodes (1) : [NodeInfo{nodeId=nid:es_ExecutionServer, hostname='ExecutionServer', addresses=[ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600, /fe80:0:0:0:0:74ff:fef5:2510%12:6600], nodeType=SUPERNODE, version=6.4.0.10011}]
2018-05-02 13:35:55,648 [Timed-out Connections Killer] INFO (com.nolio.nimi.comm.impl.NetworkConnectionManagerImpl:660) - Closing not used connection NimiConnectionImpl{remoteAddress=ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600, localAddress=0.0.0.0/0.0.0.0:6600, connectionID=nid:es_ExecutionServer, channel=[id: 0x00600923, /10.242.58.155:59550 => ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600], closed=false, lastAccessedTime=1525282435543}
now=Wed May 02 13:35:55 EDT 2018, last access time=Wed May 02 13:33:55 EDT 2018
2018-05-02 13:35:55,648 [Timed-out Connections Killer] INFO (com.nolio.nimi.comm.impl.NimiConnectionImpl:133) - connection [NimiConnectionImpl{remoteAddress=ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600, localAddress=0.0.0.0/0.0.0.0:6600, connectionID=nid:es_ExecutionServer, channel=[id: 0x00600923, /10.242.58.155:59550 => ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600], closed=true, lastAccessedTime=1525282435543}] is closed.
2018-05-02 13:35:55,648 [Timed-out Connections Killer] DEBUG (com.nolio.nimi.comm.impl.nettysupport.BasicHandler:101) - ActiveHandshakeHandler disconnected CONNECTED ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600 closing the connection.
I Have used the automation studio to remove the last agent and try to reinstall the agent but the app server continues to reject calls to it.
2018-05-03 21:45:43,159 [New I/O server worker #1-2] ERROR (com.nolio.nimi.comm.impl.nettysupport.BasicHandler:57) - NimiConnectionImpl{remoteAddress=null, localAddress=null, connectionID=null, channel=null, closed=false, lastAccessedTime=1525364083159}:java.io.IOException: An existing connection was forcibly closed by the remote host
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
(stack continues)
I have requested a whole new image, and installed the agent.
2018-05-04 19:00:45,579 [PeriodicChecks] INFO (com.nolio.nimi.appmsg.durability.PeriodicChecks:87) - This node info : NodeInfo{nodeId=nid:agenthostname, hostname='agenthostname', addresses=[/AGENTIP:6600, /fe80:0:0:0:0:74ff:fef5:247a%12:6600], nodeType=NODE, version=6.4.0.10011}
2018-05-04 19:00:45,579 [PeriodicChecks] INFO (com.nolio.nimi.appmsg.durability.PeriodicChecks:89) - Connected supernodes (1) : [NodeInfo{nodeId=nid:es_ExecutionServer, hostname='ExecutionServer', addresses=[ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600, /fe80:0:0:0:0:74ff:fef5:2510%12:6600], nodeType=SUPERNODE, version=6.4.0.10011}]
2018-05-04 19:01:46,407 [New I/O client worker #1-1] DEBUG (com.nolio.nimi.comm.impl.nettysupport.BasicHandler:101) - ActiveHandshakeHandler disconnected CONNECTED ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600 closing the connection.
So I would say that the problem is with the old (before the suspension) server ExecutionServer, but the install of a completely new agent and regitration to the Execution server.
I have cleared this folder on both Agent and Execution server:
CA\ReleaseAutomationAgent\persistency
I have cleared this folder on the Execution server:
CA\ReleaseAutomationServer\activemq-data
I have used the asap client to delete agents hoping this would re-register then, but it will not.
I can telnet from the agent to the ExecutionServer, so maybe not network?
Telent from my ExecutionServer to agent:
DHello! NiMi here. Node type: NODE You come from /NEWIPADDRESSOFExecutionSERVER:61615
Telnet from agent to exicution server:
IHello! NiMi here. Node type: SUPER_NODE You come from /AGENTIP:63948
On the ExecutionServer Home screen is shows ALL of my agents offline with OLD ip addresses, and it will not let anything new register.
My question to the ReleaseAutomation team is:
what is the proper way to clean out an agent from the ExecutionServer?
Since I can telnet, there is no firewall, what else would stop the ExecutionServer server from accepting requests?