Release Automation

 View Only
Expand all | Collapse all

Execution server not accepting agent connections from old or new agents

  • 1.  Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 07, 2018 10:28 AM

    I have a server that is closing network connections, after the last weekend VMware suspensions.

    Before when the servers were being rebooted I would just restart my agent servers and all would be well.

    Now, when the app server ExecutionServer was suspended software that was running on it did not recover.

    In an effort to fix the issue, I ran ipconfig /flushdns on the app server and the client.

    After running flushDNS, before reboot of agent (current state of agent on AGNETHOST):
    2018-05-02 12:43:23,964 [KeepAliveWorker-5647] WARN (com.nolio.nimi.comm.impl.NetworkConnectionManagerImpl:281) - could not create connection to [ExecutionServer/OLDIPADDRESSSOFExecutionSERVER:6600] :
    java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    (stack continues)

    After running flushDNS, and a reboot of agent (current state of agent on AGENTHOST):

    2018-05-02 13:35:54,930 [PeriodicChecks] INFO (com.nolio.nimi.appmsg.durability.PeriodicChecks:89) - Connected supernodes (1) : [NodeInfo{nodeId=nid:es_ExecutionServer, hostname='ExecutionServer', addresses=[ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600, /fe80:0:0:0:0:74ff:fef5:2510%12:6600], nodeType=SUPERNODE, version=6.4.0.10011}]

    2018-05-02 13:35:55,648 [Timed-out Connections Killer] INFO (com.nolio.nimi.comm.impl.NetworkConnectionManagerImpl:660) - Closing not used connection NimiConnectionImpl{remoteAddress=ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600, localAddress=0.0.0.0/0.0.0.0:6600, connectionID=nid:es_ExecutionServer, channel=[id: 0x00600923, /10.242.58.155:59550 => ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600], closed=false, lastAccessedTime=1525282435543}
    now=Wed May 02 13:35:55 EDT 2018, last access time=Wed May 02 13:33:55 EDT 2018

    2018-05-02 13:35:55,648 [Timed-out Connections Killer] INFO (com.nolio.nimi.comm.impl.NimiConnectionImpl:133) - connection [NimiConnectionImpl{remoteAddress=ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600, localAddress=0.0.0.0/0.0.0.0:6600, connectionID=nid:es_ExecutionServer, channel=[id: 0x00600923, /10.242.58.155:59550 => ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600], closed=true, lastAccessedTime=1525282435543}] is closed.

    2018-05-02 13:35:55,648 [Timed-out Connections Killer] DEBUG (com.nolio.nimi.comm.impl.nettysupport.BasicHandler:101) - ActiveHandshakeHandler disconnected CONNECTED ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600 closing the connection.


    I Have used the automation studio to remove the last agent and try to reinstall the agent but the app server continues to reject calls to it.

    2018-05-03 21:45:43,159 [New I/O server worker #1-2] ERROR (com.nolio.nimi.comm.impl.nettysupport.BasicHandler:57) - NimiConnectionImpl{remoteAddress=null, localAddress=null, connectionID=null, channel=null, closed=false, lastAccessedTime=1525364083159}:java.io.IOException: An existing connection was forcibly closed by the remote host
    java.io.IOException: An existing connection was forcibly closed by the remote host
    at sun.nio.ch.SocketDispatcher.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
    (stack continues)


    I have requested a whole new image, and installed the agent.

    2018-05-04 19:00:45,579 [PeriodicChecks] INFO (com.nolio.nimi.appmsg.durability.PeriodicChecks:87) - This node info : NodeInfo{nodeId=nid:agenthostname, hostname='agenthostname', addresses=[/AGENTIP:6600, /fe80:0:0:0:0:74ff:fef5:247a%12:6600], nodeType=NODE, version=6.4.0.10011}
    2018-05-04 19:00:45,579 [PeriodicChecks] INFO (com.nolio.nimi.appmsg.durability.PeriodicChecks:89) - Connected supernodes (1) : [NodeInfo{nodeId=nid:es_ExecutionServer, hostname='ExecutionServer', addresses=[ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600, /fe80:0:0:0:0:74ff:fef5:2510%12:6600], nodeType=SUPERNODE, version=6.4.0.10011}]
    2018-05-04 19:01:46,407 [New I/O client worker #1-1] DEBUG (com.nolio.nimi.comm.impl.nettysupport.BasicHandler:101) - ActiveHandshakeHandler disconnected CONNECTED ExecutionServer/NEWIPADDRESSOFExecutionSERVER:6600 closing the connection.


    So I would say that the problem is with the old (before the suspension) server ExecutionServer, but the install of a completely new agent and regitration to the Execution server.


    I have cleared this folder on both Agent and Execution server:
    CA\ReleaseAutomationAgent\persistency

    I have cleared this folder on the Execution server:
    CA\ReleaseAutomationServer\activemq-data

    I have used the asap client to delete agents hoping this would re-register then, but it will not.

    I can telnet from the agent to the ExecutionServer, so maybe not network?

    Telent from my ExecutionServer to agent:
    DHello! NiMi here. Node type: NODE You come from /NEWIPADDRESSOFExecutionSERVER:61615

    Telnet from agent to exicution server:
    IHello! NiMi here. Node type: SUPER_NODE You come from /AGENTIP:63948

    On the ExecutionServer Home screen is shows ALL of my agents offline with OLD ip addresses, and it will not let anything new register.

    My question to the ReleaseAutomation team is:
    what is the proper way to clean out an agent from the ExecutionServer?
    Since I can telnet, there is no firewall, what else would stop the ExecutionServer server from accepting requests?



  • 2.  Re: Execution server not accepting agent connections from old or new agents

    Posted May 07, 2018 12:00 PM

    curious, do we know what the TTL is set to on the records that are being changed/updated in dns?



  • 3.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 07, 2018 12:06 PM

    I will try to find that out.



  • 4.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 07, 2018 03:48 PM

    Jermy,

      Also can you think of anywhere the execution server would cache it's own IP? 

      what can I clean out of the server folders?

     

    Bill



  • 5.  Re: Execution server not accepting agent connections from old or new agents

    Posted May 07, 2018 04:06 PM

    The only place I am aware of that the IP is stored in regards to nimi is going to be in the servers(agents) table and exec_servers(execution) tables. If I could get a look at this lab environment that may be of assistance.


    Jeremy



  • 6.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 09, 2018 08:29 PM

    Jeremy,

      Ping me on skype, I work 6 am central to 2 pm central.

     

    Bill



  • 7.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 11, 2018 09:13 AM

    Jeremy,

      Your solution of "I re-added the execution server using the loopback interface" seems to have allowed my agents to reconnect.

     

       If you can expand on what that is exactly, I will happily click on correct answer!

     

    Bill Patton



  • 8.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 07, 2018 02:59 PM

    Hi Jeremy! Hope you're well.  I have been working with Bill on any possible networking/infrastructure related issues that could be causing the agents to fail to connect to the app server but as far as I can tell everything seems ok. Everything is resolvable and pingable between agent and app servers and there are no bad entries in Infoblox. We are leaning toward there being an issue with the release automation application.  If you can think of anything let us know and if all three of us need to get together on a webex to troubleshoot then I will make myself available.  Thanks for you inputs.

    -John



  • 9.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 09, 2018 01:48 AM

    Hi Bill,

     

    At first, please check if <RA Agent>/conf/nimi_config.xml has correct IP address of NES with <supernode> tag.

     

    IP Addresses are cached in persistency folder, so it seems your steps are correct.

    I believe this situation can be recovered with following steps generally, so please retry that.

    1. Remove problematic agents on [Agent Management] using Automation Studio.
    2. Stop NAC, NES and agents.
    3. Remove persistency folder on all machines. If NAC and NES also have agent, you should remove not only <RA Server>/persistency but also <RA Server>/NolioAgent/persistentcy folder.
    4. Start NAC and NES and wait for starting up.
    5. Open Automation Studio and navigate to [Agent Management].
      At this stage, problematic agent is not still shown.
    6. Start Agent, and then run auto-registration.
    7. Wait for a minute, and refresh Agent list.

     

    If these steps cannot resolve your problem, I'd like to ask you to try following items:

    - Disable IPv6 among machines.

    - If machines have several NICs, please enable only NIC for communication among RA machines.

     

    I hope it helps you.

     

    Regards

    Yas



  • 10.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 09, 2018 07:02 PM

    Yas,

       I am having trouble with step 6

    Start Agent, and then run auto-registration.

       I cannot find where this documented anywhere.

       I looked in the agent bin directory, nothing there.

       Can you tell me how to do this?

     

    Bill Patton



  • 11.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 09, 2018 10:25 PM

    Hi Bill,

     

    Step 6 means that you need to start Agent service only. And then, agent will connect to NES, and NES will update Management database on NAC. Persistency folder can be recreated on agent during this process automatically.

     

    Are there any files under <RA Agent>/bin folder?

    Do you mean that you tried to install agent remotely?

    The requirement of my steps is that Agent is already installed locally. (because you wrote that you installed agent from install media.)

     

    If remote install was failed, please download nolio_agent_<OS>_<Version>.sh or .exe from <Install Media>/Agents or <RA Server>/scripts folder on NAC. After that run the installation.

    Further instruction is:

    Deploy Agents - CA Release Automation - 6.5 - CA Technologies Documentation 

     

    By the way, was nimi_config.xml correct on agent?

     

    Thanks

    Yas



  • 12.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 10, 2018 07:31 AM

    Yas,

        I just needed clarification on the agent registration.

        the nimi_config.xml all have hostnames.

        I cannot disable IPv6. New LOD images will have it enabled and I don't think this is the problem.

     

    Bill



  • 13.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 11, 2018 12:20 AM

    Hi Bill,

     

    Could you install agent properly?

    We need to know which your problem is installation or communication.

     

    If communication problem, it may be caused by DNS. Please use IP address (IPv4) instead of hostname as supernode in nimi_config.xml.

    And, please try "nslookup <New IP address>" each other between NES and Agent.

     

    Thanks

    Yas



  • 14.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 22, 2018 10:35 PM

    Hi,

     

    Did your problem resolve?

    If you need additional help, please let us know.

    If you could resolve, please share your solution or mark correct answer on this thread.

     

    Thanks

    Yas



  • 15.  Re: Execution server not accepting agent connections from old or new agents

    Broadcom Employee
    Posted May 23, 2018 12:22 PM

    Yasuyuki,

      Please I am happy to click answered, it is like I told Jeremy above, can you get him to tell us more about this resolution?:

     

    Jeremy,

      Your solution of "I re-added the execution server using the loopback interface" seems to have allowed my agents to reconnect.

     

       If you can expand on what that is exactly, I will happily click on correct answer!

     

    Bill Patton



  • 16.  Re: Execution server not accepting agent connections from old or new agents
    Best Answer

    Posted May 23, 2018 12:33 PM

    Hey Bill,

     

    From what I could tell(and remember), the problem was not so much that the agents were not reconnecting to the execution server, as the execution server was not connected to the management server, since you will not see any agents connected on a down/missing exec server.   I know sometimes when using the external interface to connect locally I have what seems like routing issues.  Then there is the scenario when designing templates for different nolio environments, that I choose to use the loopback interface to avoid the hassle of a lot of scripting to update the IP address and other settings.