VMware vSphere

 View Only
  • 1.  Sudden SSL failure in agent when trying to get WebSphere Stats

    Posted Mar 31, 2010 08:32 AM
    I can't explain this in any short way so my apologies!

    We run WAS 6 and 6.1 in a Network Deployment (i.e. managed by ndgmr)

    Two identical nodes suddenly behave totally differently with one node failing to get metrics with errors in the agent log like:

    2010-03-31 08:11:00,743 ERROR [ScheduleThread] [WebsphereCollector] ADMC0053E: The system cannot create a SOAP connector to connect to host pndmgrnode01 at port 8879 with SOAP connector security enabled.
    org.hyperic.hq.product.MetricUnreachableException: ADMC0053E: The system cannot create a SOAP connector to connect to host pndmgrnode01 at port 8879 with SOAP connector security enabled.
    at org.hyperic.hq.plugin.websphere.WebsphereUtil.getMBeanServer(WebsphereUtil.java:120)
    at org.hyperic.hq.plugin.websphere.WebsphereCollector.getMBeanServer(WebsphereCollector.java:120)
    at org.hyperic.hq.plugin.websphere.WebsphereCollector.init(WebsphereCollector.java:86)
    at org.hyperic.hq.product.Collector.getValue(Collector.java:512)
    at org.hyperic.hq.product.MeasurementPlugin.getValue(MeasurementPlugin.java:445)
    at org.hyperic.hq.plugin.websphere.WebsphereMeasurementPlugin.getValue(WebsphereMeasurementPlugin.java:51)
    at org.hyperic.hq.product.MeasurementPluginManager.getPluginValue(MeasurementPluginManager.java:176)
    at org.hyperic.hq.product.MeasurementPluginManager.getValue(MeasurementPluginManager.java:274)
    at org.hyperic.hq.measurement.agent.server.ScheduleThread.getValue(ScheduleThread.java:298)
    at org.hyperic.hq.measurement.agent.server.ScheduleThread.collect(ScheduleThread.java:387)
    at org.hyperic.hq.measurement.agent.server.ScheduleThread.collect(ScheduleThread.java:344)
    at org.hyperic.hq.measurement.agent.server.ScheduleThread.collect(ScheduleThread.java:490)
    at org.hyperic.hq.measurement.agent.server.ScheduleThread.run(ScheduleThread.java:512)
    at java.lang.Thread.run(Thread.java:810)
    Caused by:
    com.ibm.websphere.management.exception.ConnectorException: ADMC0053E: The system cannot create a SOAP connector to connect to host pndmgrnode01 at port 8879 with SOAP connector security enabled.
    at com.ibm.websphere.management.AdminClientFactory.createAdminClient(AdminClientFactory.java:414)
    at org.hyperic.hq.plugin.websphere.WebsphereUtil.getMBeanServer(WebsphereUtil.java:118)
    ... 13 more
    Caused by:
    java.lang.reflect.InvocationTargetException
    at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:522)
    at com.ibm.websphere.management.AdminClientFactory.createAdminClient(AdminClientFactory.java:305)
    ... 14 more
    Caused by:
    com.ibm.websphere.management.exception.ConnectorNotAvailableException: [SOAPException: faultCode=SOAP-ENV:Client; msg=Error opening socket: javax.net.ssl.SSLHandshakeException: com.ibm.jsse2.util.h: PKIX path building failed: java.security.cert.CertPathBuilderException: unable to find valid certification path to requested target; targetException=java.lang.IllegalArgumentException: Error opening socket: javax.net.ssl.SSLHandshakeException: com.ibm.jsse2.util.h: PKIX path building failed: java.security.cert.CertPathBuilderException: unable to find valid certification path to requested target]
    at com.ibm.ws.management.connector.soap.SOAPConnectorClient.reconnect(SOAPConnectorClient.java:295)
    at com.ibm.ws.management.connector.soap.SOAPConnectorClient.<init>(SOAPConnectorClient.java:190)
    ... 18 more
    Caused by:
    [SOAPException: faultCode=SOAP-ENV:Client; msg=Error opening socket: javax.net.ssl.SSLHandshakeException: com.ibm.jsse2.util.h: PKIX path building failed: java.security.cert.CertPathBuilderException: unable to find valid certification path to requested target; targetException=java.lang.IllegalArgumentException: Error opening socket: javax.net.ssl.SSLHandshakeException: com.ibm.jsse2.util.h: PKIX path building failed: java.security.cert.CertPathBuilderException: unable to find valid certification path to requested target]
    at org.apache.soap.transport.http.SOAPHTTPConnection.send(Unknown Source)
    at org.apache.soap.rpc.Call.invoke(Unknown Source)
    at com.ibm.ws.management.connector.soap.SOAPConnectorClient$2.run(SOAPConnectorClient.java:266)
    at com.ibm.ws.security.util.AccessController.doPrivileged(AccessController.java:118)
    at com.ibm.ws.management.connector.soap.SOAPConnectorClient.reconnect(SOAPConnectorClient.java:259)
    ... 19 more

    Repeated for every metric it is trying to collect.

    The setup is that the dmgr is WAS6.1
    It manages a cluster of which 12 nodes have a single WAS6.0 node whereas the 2 nodes in question have both a 6.0 and a 6.1 node.

    Between the two identical nodes, the keystores are identical, the soap.client.props files are identical, the ssl.client.props files and the agent.properties files are identical

    Yet one works and the other doesnt! They both worked for a long time (6 months plus) until last week when they were restarted. As nothing differs between the nodes or in their configuration in hyperic..... I am totally stumped!

    The errors suggest it is a SOAP security issue, yet if I execute wsadmin.sh from the problem node (which gets a SOAP connection to the DMGR node just as the agent would do) it works without problem. I can also run synchNode.sh from the node and sync with the dmgr.

    Soooooo where the hell to look next :)

    - One question I have which may or may not be related is the agent.properties file has the following set
    websphere.installpath=/opt/IBM/WebSphere/AppServer

    However in this example of course we have 2 versions of websphere installed. Should I need to specify both paths somehow?
    websphere.installpath=/opt/IBM/WebSphere/AppServer
    and
    websphere.installpath=/opt/IBM/WebSphere/AppServer61

    It can' be the cause if the other node works though right?


  • 2.  RE: Sudden SSL failure in agent when trying to get WebSphere Stats

    Posted Mar 31, 2010 08:38 AM
    I read that again and didnt think I was that clear so here it is again:

    Server1
    dmgr

    Server2
    was6node - hq metrics working
    was6.1node - hq metrics working

    Server3
    was6node -hq metrics not working
    was6.1node - hq metrics not working


  • 3.  RE: Sudden SSL failure in agent when trying to get WebSphere Stats

    Posted Apr 13, 2010 10:48 AM
    Still banging my head against a brick wall with this problem.

    I have checked EVERYTHING between the two nodes (one working and one not working) and EVERYTHING is byte-identical concerning soap.client properties, ssl.client.properties, the keystores, security.xml, server.xml, agent.properties and just about anything else I can consider in the chain. Of course this is how it should be when they are members of the same cluster :)

    from the non-working node, I can use wsadmin.sh to connect to the dmgr using SOAP, I can
    manually synch the node using syncNode.sh which also uses SOAP, with no input required from myself.

    I have checked the entire directory structure and there are no differences in permissions. I can read all the properties files as the hyperic user.

    If I understand how the plugin works - it uses wsadmin scripting to connect to the ndmgr node over SOAP and get performance mbeans..... there is nothing more to it?

    I enabled debugging some time ago and the same messages arrive in the logs every time:

    2010-03-30 13:05:01,487 ERROR [ScheduleThread] [SystemErr] Mar 30, 2010 1:05:01 PM com.ibm.websphere.management.AdminClientFactory
    WARNING: ADMC0046W

    MEANS IT CANT READ THE SOAP FILE

    2010-03-30 13:05:01,488 ERROR [ScheduleThread] [SystemErr] Mar 30, 2010 1:05:01 PM com.ibm.ws.management.connector.soap.SOAPConnectorClient
    WARNING: ADMC0037W

    MEANS IT HASNT BEEN GIVEN A KEYSTORE (BECAUSE IT COULDNT READ THE SOAP FILE I GUESS)

    2010-03-30 13:05:01,489 ERROR [ScheduleThread] [SystemErr] Mar 30, 2010 1:05:01 PM com.ibm.ws.management.connector.soap.SOAPConnectorClient
    WARNING: ADMC0038W

    SAME AS ADMC0037W

    The soap files are both exactly the same, in the same locations with the same permissions. The hyperic user is a member of the same groups (most importantly the wasuser group).

    Seriously, what else is there left to check?


  • 4.  RE: Sudden SSL failure in agent when trying to get WebSphere Stats

    Posted Apr 16, 2010 05:50 AM
    I guess I am too late. Did you get the solution?

    are these severs on same mac and in different zone?

    @naveen


  • 5.  RE: Sudden SSL failure in agent when trying to get WebSphere Stats

    Posted Apr 27, 2010 11:33 AM
    I am currently working with Hyperic support on the issue, and I will of course post the solution if I reach one.


  • 6.  RE: Sudden SSL failure in agent when trying to get WebSphere Stats

    Posted Jun 04, 2010 12:56 PM
    Sorry for getting back so slowly, but I have been distracted :)

    We did get a solution in the end... we realised that no matter what properties were set the agent always starts with its own bundled JRE instead of the WAS JRE.

    So we forced the matter by renaming the agent's jre directory (to jre_old or whatever) and then it is forced to start looking for properties that control what jre to use.

    So several steps for the full solution:
    1. Edit agent.properties so the websphere install path is that of your 6.1 installation
    2. add export JAVA_HOME=/opt/IBM/WebSphere/AppServer61/java/jre to the hq-agent.sh somewhere near the top
    3. set set.default.HQ_JAVA_HOME=/opt/IBM/WebSphere/AppServer61/java/jre in wrapper .conf
    4. rename jre to jre_old in the agent's directory
    5. restart agent

    Worked for a few weeks and now suddenly we are getting another error for no reason (by no reason I mean that not a single detail of the configuration has changed yet it now gives more errrors) but that is for another post !