Layer7 API Management

 View Only
  • 1.  One of the node in our 10 node cluster started behaving wierd

    Posted Sep 10, 2021 04:58 PM

    Today, one of the node in our cluster started behaving wierd.  SSG log had the following entries

    2021-09-10T09:31:36.934-0500 SEVERE 57 com.hazelcast.cluster.impl.operations.JoinCheckOperation: [10.202.78.163]:8777 [gateway] [3.6.1] Cannot send response: null to Address[10.202.78.252]:8777. Op: com.hazelcast.cluster.impl.operations.JoinCheckOperation{identityHash=1179682587, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=2054, invocationTime=1631283349820 (Fri Sep 10 09:15:49 CDT 2021), waitTimeout=-1, callTimeout=10000}
    com.hazelcast.spi.exception.ResponseNotSentException: Cannot send response: null to Address[10.202.78.252]:8777. Op: com.hazelcast.cluster.impl.operations.JoinCheckOperation{identityHash=1179682587, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=2054, invocationTime=1631283349820 (Fri Sep 10 09:15:49 CDT 2021), waitTimeout=-1, callTimeout=10000}
    at com.hazelcast.spi.impl.operationservice.impl.RemoteInvocationResponseHandler.sendResponse(RemoteInvocationResponseHandler.java:54)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.sendResponse(OperationRunnerImpl.java:278)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.handleResponse(OperationRunnerImpl.java:251)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:173)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:393)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processPacket(OperationThread.java:184)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.process(OperationThread.java:137)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.doRun(OperationThread.java:124)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.run(OperationThread.java:99)
    2021-09-10T09:31:53.849-0500 SEVERE 59 com.hazelcast.cluster.impl.operations.JoinCheckOperation: [10.202.78.163]:8777 [gateway] [3.6.1] Cannot send response: JoinMessage{packetVersion=4, buildNumber=20160228, address=Address[10.202.78.163]:8777, uuid='c275136f-d324-4903-9aa3-5697bc263dba', liteMember=false, memberCount=1, dataMemberCount=1} to Address[10.202.78.252]:8777. Op: com.hazelcast.cluster.impl.operations.JoinCheckOperation{identityHash=551967137, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=2055, invocationTime=1631283366748 (Fri Sep 10 09:16:06 CDT 2021), waitTimeout=-1, callTimeout=10000}
    com.hazelcast.spi.exception.ResponseNotSentException: Cannot send response: JoinMessage{packetVersion=4, buildNumber=20160228, address=Address[10.202.78.163]:8777, uuid='c275136f-d324-4903-9aa3-5697bc263dba', liteMember=false, memberCount=1, dataMemberCount=1} to Address[10.202.78.252]:8777. Op: com.hazelcast.cluster.impl.operations.JoinCheckOperation{identityHash=551967137, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=2055, invocationTime=1631283366748 (Fri Sep 10 09:16:06 CDT 2021), waitTimeout=-1, callTimeout=10000}


    These entries were happening intermittently causing this gateway node to perform extremely slow when compared to rest of the nodes.  

    Then started seeing Error allocating socker processor at 13:13:18
    2021-09-10T13:13:00.656-0500 SEVERE 2138 com.l7tech.server.SoapMessageProcessingServlet: RaiseErrorAssertion is stopping execution.
    2021-09-10T13:13:18.302-0500 SEVERE 151 org.apache.tomcat.util.net.JIoEndpoint: Error allocating socket processor
    java.lang.NullPointerException
    at org.apache.catalina.core.StandardThreadExecutor.execute(StandardThreadExecutor.java:150)
    at org.apache.tomcat.util.net.JIoEndpoint.processSocket(JIoEndpoint.java:806)
    at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:369)
    at java.lang.Thread.run(Thread.java:748)
    2021-09-10T13:47:44.278-0500 SEVERE 222 com.l7tech.se



    Noticed several instances of this error in the ssg log


    2021-09-10T13:52:20.053-0500 SEVERE 1643 org.apache.catalina.core.ContainerBase.[ssg].[node7.autozone.com].[/].[SoapMessageProcessingServlet]: Servlet.service() for servlet SoapMessageProcessingServlet threw exception
    java.lang.NullPointerException
    at com.l7tech.server.audit.AuditContextImpl.a(Unknown Source)
    at com.l7tech.server.audit.AuditContextFactory.doWithNewAuditContext(Unknown Source)
    at com.l7tech.server.SoapMessageProcessingServlet.service(Unknown Source)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.AppjlicationDispatcher.invoke(ApplicationDispatcher.java:646)
    at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:436)
    at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:342)
    at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:302)
    at com.l7tech.server.transport.http.HttpNamespaceFilter.doFilter(Unknown Source)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at com.l7tech.server.WsdlFilter.doFilter(Unknown Source)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at com.l7tech.server.transport.http.ConnectionIdFilter.doFilter(Unknown Source)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at com.l7tech.server.transport.http.InputTimeoutFilter.doFilter(Unknown Source)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at com.l7tech.server.log.HybridDiagnosticContextServletFilter.doFilter(Unknown Source)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:234)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:181)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
    at com.l7tech.server.tomcat.ResponseKillerValve.invoke(Unknown Source)
    at com.l7tech.server.tomcat.ConnectionIdValve.invoke(Unknown Source)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:295)
    at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
    at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:610)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:410)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    followed by this error

    2021-09-10T13:52:30.108-0500 SEVERE 1 com.l7tech.server.policy.module.j: Module ApiPortalDataLensAssertion-9.2.00.aar: exception while notifying class com.l7tech.external.assertions.apiportaldatalens.listener.ModuleListener of module unload: java.lang.reflect.InvocationTargetException
    java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.l7tech.server.policy.module.j.onModuleUnloaded(Unknown Source)
    at com.l7tech.server.policy.module.j.a(Unknown Source)
    at com.l7tech.server.policy.module.j.close(Unknown Source)
    at com.l7tech.server.policy.module.ModularAssertionModule.close(Unknown Source)
    at com.l7tech.server.policy.module.ModularAssertionsScanner.onModuleUnloaded(Unknown Source)
    at com.l7tech.server.policy.module.ModularAssertionsScanner.onModuleUnload(Unknown Source)
    at com.l7tech.server.policy.module.ModularAssertionsScanner.onModuleUnload(Unknown Source)
    at com.l7tech.server.policy.module.ModulesScanner.destroy(Unknown Source)
    at com.l7tech.server.policy.module.ScheduledModuleScanner.destroy(Unknown Source)
    at com.l7tech.server.policy.module.ModularAssertionsScanner.destroy(Unknown Source)
    at com.l7tech.server.policy.ServerAssertionRegistry.destroy(Unknown Source)
    at org.springframework.beans.factory.support.DisposableBeanAdapter$1.run(DisposableBeanAdapter.java:178)
    at java.security.AccessController.doPrivileged(Native Method)
    at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:176)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:487)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:463)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:431)
    at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:1048)
    at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1022)
    at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:970)
    at com.l7tech.server.boot.GatewayBoot.destroy(Unknown Source)
    at com.l7tech.server.boot.GatewayBoot.runUntilShutdown(Unknown Source)
    at com.l7tech.server.boot.GatewayMain.main(Unknown Source)
    Caused by: java.lang.NullPointerException
    at java.util.concurrent.ConcurrentSkipListMap.remove(ConcurrentSkipListMap.java:1991)
    at java.util.concurrent.ConcurrentSkipListSet.remove(ConcurrentSkipListSet.java:259)
    at com.l7tech.server.service.ServiceTemplateManagerImpl.unregister(Unknown Source)
    at com.l7tech.external.assertions.apiportaldatalens.listener.ModuleListener.unregisterServiceTemplates(ModuleListener.java:210)
    at com.l7tech.external.assertions.apiportaldatalens.listener.ModuleListener.onModuleUnloaded(ModuleListener.java:68)
    ... 27 more
    2021-09-10T14:04:28.562-0500 SEVERE 61 com.hazelcast.cluster.impl.operations.JoinCheckOperation: [10.202.78.163]:8777 [gateway] [3.6.1] Cannot send response: null to Address[10.202.78.252]:8777. Op: com.hazelcast.cluster.impl.operations.JoinCheckOperation{identityHash=1307830410, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=2805, invocationTime=1631299721458 (Fri Sep 10 13:48:41 CDT 2021), waitTimeout=-1, callTimeout=10000}
    com.hazelcast.spi.exception.ResponseNotSentException: Cannot send response: null to Address[10.202.78.252]:8777. Op: com.hazelcast.cluster.impl.operations.JoinCheckOperation{identityHash=1307830410, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=2805, invocationTime=1631299721458 (Fri Sep 10 13:48:41 CDT 2021), waitTimeout=-1, callTimeout=10000}
    at com.hazelcast.spi.impl.operationservice.impl.RemoteInvocationResponseHandler.sendResponse(RemoteInvocationResponseHandler.java:54)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.sendResponse(OperationRunnerImpl.java:278)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.handleResponse(OperationRunnerImpl.java:251)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:173)
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:393)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processPacket(OperationThread.java:184)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.process(OperationThread.java:137)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.doRun(OperationThread.java:124)
    at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.run(OperationThread.java:99)
    2021-09-10T14:14:50.354-0500 SEVERE 161 org.apache.tomcat.util.net.JIoEndpoint: Error allocating socket processor
    java.lang.IllegalStateException: StandardThreadPool not started.
    at org.apache.catalina.core.StandardThreadExecutor.execute(StandardThreadExecutor.java:155)
    at org.apache.tomcat.util.net.JIoEndpoint.processSocket(JIoEndpoint.java:806)
    at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:369)
    at java.lang.Thread.run(Thread.java:748)


    Tried restarting the gateway using reboot option, then the gw status was showing as Starting for a very long time.

    By looking at the process controller log, I found this highlighted line repeating many times
    2021-09-10T15:00:20.115-0500 INFO 1 com.l7tech.server.processcontroller.q: default starting
    2021-09-10T15:00:25.260-0500 INFO 1 com.l7tech.server.processcontroller.ProcessController: Getting API port from /opt/SecureSpan/Gateway/node/default/var/processControllerPort

    then this line also repeated every 5 to 6 seconds many times
    2021-09-10T15:05:18.585-0500 INFO 1 com.l7tech.server.processcontroller.ProcessController: /opt/SecureSpan/Gateway/node/default/var/processControllerPort does not exist yet, will try default port

    Then finally it started, took over 11 mins and 36 seconds.  The longest i have seen is over 20mins on this node after reboot or restart.
    2021-09-10T15:11:28.185-0500 INFO 1 com.l7tech.server.processcontroller.ProcessController: /opt/SecureSpan/Gateway/node/default/var/processControllerPort does not exist yet, will try default port
    2021-09-10T15:11:33.281-0500 INFO 1 com.l7tech.server.processcontroller.ProcessController: Getting API port from /opt/SecureSpan/Gateway/node/default/var/processControllerPort
    2021-09-10T15:11:36.591-0500 INFO 1 com.l7tech.server.processcontroller.q: default started successfully
    2021-09-10T15:11:36.595-0500 INFO 1 com.l7tech.server.processcontroller.ProcessController: default started



    Question:  Why I am noticing the above errors all of sudden today on one of our node?  Any tips/lead on how to resolve this issue permanently would be greatly appreciated.




  • 2.  RE: One of the node in our 10 node cluster started behaving wierd

    Posted Sep 15, 2021 01:02 AM
    Which API Gateway version are you using?