Layer7 API Management

Expand all | Collapse all

Jarvis Healthcheck KO

  • 1.  Jarvis Healthcheck KO

    Posted 08-10-2017 04:58 AM

    Hi folks,

     

    I am currently setting up the most up-to-date APIm platform :

    - gateway in 9.2 

    - OTK in 4.1

    - portal 4.1

     

    I am quite familiar with gateway and OTK, all is OK for this part. My issue is with last portal version. It requires an external analytics engine, jarvis.

     

    I have been following the docops instruction to install it and set up. Nevertheless the last step, verification of the installation using jarvisHealthCheck.sh.

     

    It seems I encounter an SSL issue, here is the output of the health check script :

     

    -----------------------------------------------------------------------------------------------------------------------------
    Argument onboarding url is https://<hostname>:8443
    Argument ingestion url is https://<hostname>:8443
    Argument elastic search url is http://<hostname>:9200
    Argument isAvro is true
    Argument wait time is 60

    Avro is set to true
    ***** Starting End to End Jarvis health check *****
    Onboarding product: jarvisheathcheck1502303043 resulted in curl: (58) NSS: client certificate not found (nickname not specified)
    error! Expected 2XX return code but got 000. Please check logs.

    Onboarding tenant: tenant1502303043 resulted in curl: (58) NSS: client certificate not found (nickname not specified)
    error! Expected 2XX return code but got 000. Please check logs.

    Create mapping with doctype: doctype1502303043 resulted in curl: (58) NSS: client certificate not found (nickname not specified)
    error! Expected 2XX return code but got 000. Please check logs.
    Let's wait for ElasticSearch to sync up............................................................
    Ingesting data into Jarvis with POST resulted in curl: (58) NSS: client certificate not found (nickname not specified)
    error! Expected 2XX return code but got 000. Please check logs.
    Let's wait for ElasticSearch to sync up............................................................
    Check for index's existent in Elasticsearch resulted in curl: (52) Empty reply from server
    error! Expected 2XX return code but got 000. Please check logs.
    Documents found in Elastic Search.
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed

    0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
    0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
    curl: (52) Empty reply from server

     

    ***** JARVIS DEPLOYMENT STATUS: broken *****
    ------------------------------------------------------------------------------------------------------------------------------

     

    Since Jarvis is not functionnal, the portal will not start

     

    Could you help me identify the root cause ?

     

    Thank you



  • 2.  Re: Jarvis Healthcheck KO

    Posted 08-10-2017 10:05 PM

    If you want to skip Jarvis, you may try comment out the Jarvis settings in the deploy configuration file.



  • 3.  Re: Jarvis Healthcheck KO

    Posted 08-11-2017 12:39 PM

    Hello Mark_HE,

    Unfortunately, I tried to comment these lines in the "portal-local.inc" file :

    #export PORTAL_JARVIS_URL="https://<hostname>:8443"
    #export PORTAL_JARVIS_PRODUCT_ID="portal-client"

     

    But the portal didn't start working. I think Jarvis is needed to start the "apim-portal" service. (What a SPOF..)



  • 4.  Re: Jarvis Healthcheck KO

    Posted 08-14-2017 01:57 AM

    Hello FrederickMiszewski ,

    As per the info I got, it seems the "Elasticsearch" and "SMTP" are required. So if you don't have Jarvis instance, you will need to connect to Elasticsearch instance (the same in portal 4.0 EE).

     

    Hope this can help.

    Regards,

    Mark



  • 5.  Re: Jarvis Healthcheck KO

    Posted 08-29-2017 04:36 AM

    Hello Mark_HE,

     

    Thank you for your feedback.

    Support team confirmed that Jarvis is mandatory in my case, I do not have an Elasticsearch Server in my environment.

    Since Jarvis is not working correctly, my portal will not start properly

     

    Regards



  • 6.  Re: Jarvis Healthcheck KO

    Posted 08-10-2017 10:18 PM

    Adding to what Mark said, can you please share with us the first 10-20 lines from the analyticsInstall.properties which you used to install Jarvis?

    On the Portal, as you try and start the apim portal service, run the journalctl -u <portal service name> to see why it is failing.

    Sometimes, we can see where it fails by looking at that output.

    Hope this helps,

    regards,

    Amit.



  • 7.  Re: Jarvis Healthcheck KO

    Posted 08-11-2017 12:41 PM

    Hello Amit_Aharon,

     

    I'm Nicolas' colleague working on this same issue.

    We were given a new Jarvis installation by the support team (Jarvis 2.1.1). 

     

    However, even if the healthcheck was successful this time, we cannot connect the portal to Jarvis as we have this error running "journalctl -f -u apim-portal" :

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------

    Aug 11 15:37:36 <portal_hostname> start-portal[4666]: analytics-server_1 | 2017-08-11 13:37:36.833 [INFO ] RestESDriver - Creating ES Rest Client with following properties: url(s): [https://<jarvis_hostname>:9200], clustername: apim-es
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | 2017-08-11 13:37:37.034 [ERROR] RestESDriver RestESDriver.java:177 - Error initializing Elastic Search Client
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | java.security.UnrecoverableKeyException: Cannot recover key
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at sun.security.provider.KeyProtector.recover(KeyProtector.java:328) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at sun.security.provider.JavaKeyStore.engineGetKey(JavaKeyStore.java:146) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at sun.security.provider.JavaKeyStore$JKS.engineGetKey(JavaKeyStore.java:56) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at sun.security.provider.KeyStoreDelegator.engineGetKey(KeyStoreDelegator.java:96) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at sun.security.provider.JavaKeyStore$DualFormatJKS.engineGetKey(JavaKeyStore.java:70) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at java.security.KeyStore.getKey(KeyStore.java:1023) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at sun.security.ssl.SunX509KeyManagerImpl.<init>(SunX509KeyManagerImpl.java:133) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at sun.security.ssl.KeyManagerFactoryImpl$SunX509.engineInit(KeyManagerFactoryImpl.java:70) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at javax.net.ssl.KeyManagerFactory.init(KeyManagerFactory.java:256) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at org.apache.http.ssl.SSLContextBuilder.loadKeyMaterial(SSLContextBuilder.java:187) ~[httpcore-4.4.5.jar!/:4.4.5]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at org.apache.http.ssl.SSLContextBuilder.loadKeyMaterial(SSLContextBuilder.java:208) ~[httpcore-4.4.5.jar!/:4.4.5]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at com.ca.apim.datasource.RestESDriver.createESClient(RestESDriver.java:148) [commons.jar!/:?]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at com.ca.apim.datasource.RestESDriver.initialize(RestESDriver.java:103) [commons.jar!/:?]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at com.ca.apim.datasource.AppESDriver.initializeAppESDriver(AppESDriver.java:43) [analytics-commons.jar!/:?]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_111-internal]
    Aug 11 15:37:37 <portal_hostname> start-portal[4666]: analytics-server_1 | at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_111-internal]

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------



  • 8.  Re: Jarvis Healthcheck KO

    Posted 08-14-2017 02:14 AM

    Dear FrederickMiszewski ,

    It seems to be a problem of Elasticsearch key. Did you configure the Elasticsearch? It's required, if you installed jarvis, I believe you should use the same cert of Jarvis.

     

    It may be better to open a support ticket and check the portal-local.inc file.

     

    Regards,

    Mark



  • 9.  Re: Jarvis Healthcheck KO

    Posted 08-29-2017 05:09 AM

    Hello Mark_HE and Amit_Aharon

    I did configure elasticsearch properly with jarvis in One Way SSL (the jarvis healtcheck is completely successful). We couldn't make the "analytics-server" service start properly with Two Way SSL

     

    The problem that I have now is that the "portal-entreprise" service eventually becomes unealthy when trying to notify watchdog. Here is what the logs says (I couldn't have more details than this)

     

    Aug 29 10:29:03 <portal_hostname> start-portal[18140]: apim_1 | INFO: Bundle imported.
    Aug 29 10:29:03 <portal_hostname> start-portal[18140]: apim_1 | Aug 29, 2017 8:29:03 AM
    Aug 29 10:29:03 <portal_hostname> start-portal[18140]: apim_1 | INFO: -4: Portal API sync completed
    Aug 29 10:29:03 <portal_hostname> start-portal[18140]: apim_1 | Aug 29, 2017 8:29:03 AM com.l7tech.server.service.u run
    Aug 29 10:29:03 <portal_hostname> start-portal[18140]: apim_1 | INFO: Created/Updated/Deleted: [cf190bfa80cf44a2bec2f12d992ce16c]
    Aug 29 10:29:04 <portal_hostname> start-portal[18140]: portal-enterprise_1 | 2017-08-29 08:29:04.540 INFO 1 --- [ main] liquibase : Waiting for changelog lock....
    Aug 29 10:29:06 <portal_hostname> start-portal[18140]: monitor | INFO Performing container health checks
    Aug 29 10:29:07 <portal_hostname> start-portal[18140]: monitor | WARNING Container /run_portal-enterprise_1 failing streak: 4 (exceeds threshold)
    Aug 29 10:29:07 <portal_hostname> start-portal[18140]: monitor | INFO Containers are starting
    Aug 29 10:29:07 <portal_hostname> start-portal[18140]: monitor | INFO Notifying watchdog
    Aug 29 10:29:14 <portal_hostname> start-portal[18140]: portal-enterprise_1 | 2017-08-29 08:29:14.565 INFO 1 --- [ main] liquibase : Waiting for changelog lock....
    Aug 29 10:29:17 <portal_hostname> start-portal[18140]: monitor | INFO Performing container health checks
    Aug 29 10:29:17 <portal_hostname> start-portal[18140]: monitor | WARNING Container /run_portal-enterprise_1 failing streak: 5 (exceeds threshold)
    Aug 29 10:29:17 <portal_hostname> start-portal[18140]: monitor | INFO Containers are starting
    Aug 29 10:29:17 <portal_hostname> start-portal[18140]: monitor | INFO Notifying watchdog
    Aug 29 10:29:24 <portal_hostname> start-portal[18140]: portal-enterprise_1 | 2017-08-29 08:29:24.590 INFO 1 --- [ main] liquibase : Waiting for changelog lock....
    Aug 29 10:29:27 <portal_hostname> start-portal[18140]: monitor | INFO Performing container health checks
    Aug 29 10:29:27 <portal_hostname> start-portal[18140]: apim_1 | Aug 29, 2017 8:29:27 AM com.l7tech.external.assertions.ratelimit.server.ServerRateLimitAssertion a
    Aug 29 10:29:27 <portal_hostname> start-portal[18140]: apim_1 | INFO: Removing stale rate limiter 41edb9a1-application-sync
    Aug 29 10:29:28 <portal_hostname> start-portal[18140]: monitor | WARNING Container /run_portal-enterprise_1 failing streak: 5 (exceeds threshold)
    Aug 29 10:29:28 <portal_hostname> start-portal[18140]: monitor | INFO Containers are starting
    Aug 29 10:29:28 <portal_hostname> start-portal[18140]: monitor | INFO Notifying watchdog
    Aug 29 10:29:30 <portal_hostname> start-portal[18140]: solr_1 | 177178 INFO (qtp611437735-21) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={action=STATUS} status=0 QTime=0
    Aug 29 10:29:31 <portal_hostname> start-portal[18140]: apache_1 | ::1 - - [29/Aug/2017:08:29:31 +0000] "GET / HTTP/1.1" 302 220
    Aug 29 10:29:34 <portal_hostname> start-portal[18140]: portal-enterprise_1 | 2017-08-29 08:29:34.614 INFO 1 --- [ main] liquibase : Waiting for changelog lock....
    Aug 29 10:29:38 <portal_hostname> start-portal[18140]: monitor | INFO Performing container health checks
    Aug 29 10:29:38 <portal_hostname> start-portal[18140]: tenant-provisioner_1 | 2017-08-29 08:29:38.876 INFO 31 --- [ taskExecutor-2] o.q.p.h.LoggingTriggerHistoryPlugin : Trigger DEFAULT.provisionerTrigger fired job DEFAULT.runningProvisioningJob at: 08:29:38 08/29/2017
    Aug 29 10:29:39 <portal_hostname> start-portal[18140]: monitor | WARNING Container /run_portal-enterprise_1 failing streak: 5 (exceeds threshold)
    Aug 29 10:29:39 <portal_hostname> start-portal[18140]: monitor | WARNING Containers are unhealthy
    Aug 29 10:29:39 <portal_hostname> start-portal[18140]: monitor | WARNING Skipping watchdog
    Aug 29 10:29:39 <portal_hostname> start-portal[18140]: tenant-provisioner_1 | 2017-08-29 08:29:39.959 INFO 31 --- [ taskExecutor-2] o.q.p.h.LoggingTriggerHistoryPlugin : Trigger DEFAULT.provisionerTrigger completed firing job DEFAULT.runningProvisioningJob at 08:29:39 08/29/2017 with resulting trigger instruction code: DO NOTHING
    Aug 29 10:29:44 <portal_hostname> start-portal[18140]: portal-enterprise_1 | 2017-08-29 08:29:44.640 INFO 1 --- [ main] liquibase : Waiting for changelog lock....
    Aug 29 10:29:49 <portal_hostname> start-portal[18140]: monitor | INFO Performing container health checks
    Aug 29 10:29:49 <portal_hostname> start-portal[18140]: monitor | WARNING Container /run_portal-enterprise_1 failing streak: 6 (exceeds threshold)
    Aug 29 10:29:50 <portal_hostname> start-portal[18140]: monitor | WARNING Containers are unhealthy
    Aug 29 10:29:50 <portal_hostname> start-portal[18140]: monitor | WARNING Skipping watchdog
    Aug 29 10:29:54 <portal_hostname> start-portal[18140]: portal-enterprise_1 | 2017-08-29 08:29:54.666 INFO 1 --- [ main] liquibase : Waiting for changelog lock....
    Aug 29 10:29:58 <portal_hostname> systemd[1]: apim-portal.service watchdog timeout (limit 30s)!
    Aug 29 10:29:58 <portal_hostname> start-portal[18140]: /opt/ca/apim-portal/start-portal: line 6: 18151 Aborted TMP=$TEMP_DIR ${PORTAL_HOME}/watchdog $args -m ${PORTAL_MODE}
    Aug 29 10:29:58 <portal_hostname> systemd[1]: apim-portal.service: main process exited, code=killed, status=6/ABRT
    Aug 29 10:29:58 <portal_hostname> systemd[1]: Unit apim-portal.service entered failed state.
    Aug 29 10:29:58 <portal_hostname> systemd[1]: apim-portal.service failed.
    Aug 29 10:29:59 <portal_hostname> systemd[1]: apim-portal.service holdoff time over, scheduling restart.
    Aug 29 10:29:59 <portal_hostname> systemd[1]: Starting CA On-premise Portal...

     

    Do you have any idea about this timeout error on the watchdog notification?

    Best regards,

    Frederick Mizsewski



  • 10.  Re: Jarvis Healthcheck KO

    Posted 09-01-2017 04:21 AM

    Thank you FrederickMiszewski Amit_Aharon Mark_HE we will open a new thread for this specific issue.