AppWorx, Dollar Universe and Sysload Community

Expand all | Collapse all

frequent connection socket error

Jump to Best Answer
  • 1.  frequent connection socket error

    Posted 08-11-2020 06:55 PM

    I need some help since I am getting a lot of agents srvc_down status or no_service.

    I tried to troubleshoot according to https://knowledge.broadcom.com/external/article/87537/remote-agent-fails-to-start-receive-awe5.html

    The awcomm definitely not able to view the log since the rmi server show the agent status as srvc_down or no_service.

    I had checked $AW_HOME/data/net_conn.dat and confirmed the information is correct.

    The agentservice and watchworx is running in the agent server for more than 22 days using awstat and grep the process but rmiserver shows the agent service status as SRVC_DOWN since 3 days ago.

    I can see the agentservice is running in Window task manager too. (I have both unix and window agent server).

    However, I couldn't find any result when I issue the netstat command. I can see the result after bounced the agentservice using stopso/startso.

    below is one of many errors from multiple agents:

    ErrorMsg: AwE-5103 network socket error (8/8/20 1:49 AM)
    Details: 39385660[SSL_DH_anon_WITH_RC4_128_MD5: Socket[addr=XXXXXXX,port=10010,localport=45153]]
    java.io.EOFException
    at java.io.ObjectInputStream$PeekInputStream.readFully(Unknown Source)
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(Unknown Source)
    at java.io.ObjectInputStream.readStreamHeader(Unknown Source)
    at java.io.ObjectInputStream.<init>(Unknown Source)
    at com.appworx.shared.code.server.B.C(RequestSocket.java:115)
    at com.appworx.server.data.SocketManager$1.run(SocketManager.java:370)


    My concerns to get a permanent fix or find out the root cause as this had impacted many jobs due to frequent downtime and causes many agentservice unable to process job. I couldn't afford to always login to restart the services. There are about 42 active agents in 1 rmiserver. I have 6 rmiservers. 



  • 2.  RE: frequent connection socket error

    Posted 08-12-2020 08:32 AM
    Hi Lian, 
    Sorry for the quick answer on this. I don't know the situation, if this was an upgrade, or a sudden issue of some sort. 

    When I see svc down I usually see it on my remote agents, and Banner agents. This is when I don't have the agent started. 

    I know from experience the Java awapi issue was a big problem for us with Java. Turns out we have to change the java.security file for the Java being used to have /dev/random to /dev/urandom. Once we did this, it starts right away. 

    We also saw a lot of these se AWe-5103 errors till we could get the user_keystore files working correctly. Depending on your Java level you need this is place. Java 8 201 and higher. We are V9.3.1 by the way. 

    I will look this over more later. I need to run.  Sorry if I missed the mark on this. I had a little time to look at it. 

    Good luck, 

    Rich


  • 3.  RE: frequent connection socket error

    Posted 08-12-2020 09:15 AM

    There is no upgrade, I still using v9.1.1

    There is no changes for almost a year. Out of sudden, this SRVC_DOWN error happen very frequent.

    The java installed is 1.8.201 version. However, that was java version in the server. The java used by the application manager is still follow installer :

    java version "1.6.0_31"
    Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
    Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

    This doesn't seems like agentservice problem. I agreed this could be relate to keystore but possible due to network socket? 

    The agent server is RHEL5 but rmiserver is RHEL7. I wonder if this could be root cause?

    even I could resume the service by restart the agentservice but this is definitely not a good idea and this give a lot of troubles to the job owner as their job required to launch 24x7.

    I already open case id 32134542 but so far no solution yet. if this is a bug required an upgrade, I hope support engineer could give me a clear instruction. Else, I really need a permanent fix. 

    I also planning an upgrade to v9.3.2 in near future too. 




  • 4.  RE: frequent connection socket error
    Best Answer

    Posted 08-13-2020 09:34 AM
    Hi everyone, 

    It is good that you opened a ticket for this. I was wondering if you have server debug on? I like turning it on from the 
    awenv.ini in the default section   Debug=true   You can also use the GUI using the Help/About/Debug options. 

    Since this has been running with java 6 I cant say much about that. We were using java 6 on AIX for a long time, when we moved to Linux I am using Java 8 now.  And it sounds like you didn't need to change the /dev/random to /dev/urandom in the java security file. This fixed our 
    java awapi issue. 

    I like to shutdown, wipe all the logs, and start fresh.  It makes it easy to find stuff. And copy off the logs before they get cluttered, and zipped up. 

    Sorry, this is all I have from my experiences.  I hope support helps.


    Thank you

    Rich 






  • 5.  RE: frequent connection socket error

    Posted 08-13-2020 10:21 AM
    Hi Richard,

    Thank you so much for your input.

    Yes, I already enable debug option under awenv.ini file and restarted the rmiserver. I also enable debug by checking it in the agent setting.

    I think you are correct. I might need to clean up all the logs before startup the agent again in future.

    Anyway, I did submit all the logs within the time range to support. But I think they might need some time to filter the logs since the AM created huge log files after I enable agent debug.

    Besides, the issue randomly hit other agent server. I had enabled debug follow your suggestion to cover all the agents now.


    The content of this email is confidential and intended for the recipient specified in the message only. It is strictly forbidden to share any part of this message with any third party, without the written consent of the sender. If you received this message by mistake, please reply to this message and follow with its deletion, so that we can ensure such a mistake does not occur in the future.