Apps Manager

 Apps manger takes for ever to load after upgrade to PCF 2.3.5

Yogendra Mummaneni's profile image
Yogendra Mummaneni posted Jan 31, 2019 05:40 PM

Hi,

 

We recently upgraded couple of PCF foundations to version 2.3.5 from 2.2.5.

After the upgrade, on both the envs, apps manager takes for ever to load.

I am seeing the lots of 404 errors in access.log on go_router as below:

 

backend - [2019-01-31T14:28:21.038+0000] "GET /eureka/apps/ HTTP/1.1" 404 0 59 "-" "Java-EurekaClient/v1.4.6" "172.16.7.103:54425" "-" x_forwarded_for:"172.16.4.141" x_forwarded_proto:"-" vcap_request_id:"837c7b44-f544-4f4a-5aad-830ec4b51706" response_time:0.000678569 app_id:"-" app_index:"-" x_b3_traceid:"7a7aeeaa2b922045" x_b3_spanid:"7a7aeeaa2b922045" x_b3_parentspanid:"-"

backend - [2019-01-31T14:28:21.042+0000] "GET /eureka/apps/ HTTP/1.1" 404 0 59 "-" "Java-EurekaClient/v1.4.6" "172.16.7.102:52210" "-" x_forwarded_for:"172.16.4.141" x_forwarded_proto:"-" vcap_request_id:"1a43ed76-649e-4c54-6e07-8295f46771be" response_time:0.000508426 app_id:"-" app_index:"-" x_b3_traceid:"e49593a9f5a8743f" x_b3_spanid:"e49593a9f5a8743f" x_b3_parentspanid:"-"

backend - [2019-01-31T14:28:21.118+0000] "GET /eureka/apps/ HTTP/1.1" 404 0 59 "-" "Java-EurekaClient/v1.4.6" "172.16.7.103:54426" "-" x_forwarded_for:"172.16.4.40" x_forwarded_proto:"-" vcap_request_id:"4c67a72c-ea54-4b59-524c-95302866b4c0" response_time:0.00089113 app_id:"-" app_index:"-" x_b3_traceid:"48ab97e6976d03d4" x_b3_spanid:"48ab97e6976d03d4" x_b3_parentspanid:"-"

backend - [2019-01-31T14:28:21.123+0000] "GET /eureka/apps/ HTTP/1.1" 404 0 59 "-" "Java-EurekaClient/v1.4.6" "172.16.7.102:52211" "-" x_forwarded_for:"172.16.4.40" x_forwarded_proto:"-" vcap_request_id:"d110c4a2-fdd9-4106-61a4-46e43f88b710" response_time:0.000612605 app_id:"-" app_index:"-" x_b3_traceid:"1072ff5534aeb552" x_b3_spanid:"1072ff5534aeb552" x_b3_parentspanid:"-"

Seeing loads of below errors in gorouter.log on gorouter.

 

[2019-01-31 14:38:58+0000] {"log_level":1,"timestamp":1548945538.563132,"message":"unknown-route","source":"vcap.gorouter","data":{}}

[2019-01-31 14:38:58+0000] {"log_level":1,"timestamp":1548945538.5632765,"message":"status","source":"vcap.gorouter","data":{"body":"404 Not Found: Requested route ('backend') does not exist."}}

[2019-01-31 14:38:58+0000] {"log_level":1,"timestamp":1548945538.663769,"message":"unknown-route","source":"vcap.gorouter","data":{}}

[2019-01-31 14:38:58+0000] {"log_level":1,"timestamp":1548945538.6638093,"message":"status","source":"vcap.gorouter","data":{"body":"404 Not Found: Requested route ('backend') does not exist."}}

[2019-01-31 14:38:58+0000] {"log_level":1,"timestamp":1548945538.6718276,"message":"unknown-route","source":"vcap.gorouter","data":{}}

[2019-01-31 14:38:58+0000] {"log_level":1,"timestamp":1548945538.6718507,"message":"status","source":"vcap.gorouter","data":{"body":"404 Not Found: Requested route ('backend') does not exist."}}

 

 

 

Seeing the below errors in apps-manager-js-green app under system/system org/space.

 

  2019-01-31T14:37:44.31+0000 [APP/PROC/WEB/2] ERR 2019/01/31 14:37:44 [error] 84#0: *4599 upstream timed out (110: Connection timed out) while connecting to upstream, client: 172.16.4.15, server: localhost, request: "GET /proxy/api/v2/organizations/f333fece-5ad5-462c-8184-d295304198f2 HTTP/1.1", upstream: "https://10.60.146.9:443/v2/organizations/f333fece-5ad5-462c-8184-d295304198f2", host: "apps.system.titan.lab.emc.com"

 

Could someone please help me in resolving these issues?

 

Thanks,

Yogendra.

Daniel Mikusa's profile image
Daniel Mikusa

In your browser, open up the Dev Tools & switch to the Network tab. Clear out the tab and refresh the page. You should see it list all the requests which are sent to load Apps Manager. Dev Tools will tell you how long it takes for each request to load. Isolate the one or ones that are slow.

 

For what it's worth, I think the 404s are irrelevant. The upstream timeout is probably more relevant. It could mean that your Cloud Controller is not responding fast enough. Again, the Dev Tools output will give you a better picture from the perspective of your browser as to what is slow. If it turns out that requests from CC are slow, you'll need to look at Cloud Controller and its metrics/logs to see more about why it's slow.

 

Hope that helps!

Daniel Mikusa's profile image
Daniel Mikusa

OK, so I would suggest checking three things.

 

1.) Look at the Gorouter access.log file. Search for one of the requests above. This will confirm the request made it to the Gorouter. You'll likely see it and a response code of 504.

 

2.) Look at the Cloud Controller Nginx access.log file. Search for one of the requests above. This will confirm the request made it to the Cloud Controller. You'll likely see a valid response code but perhaps that the request took a very long time to complete.

 

3.) You can look up the response in the Cloud Controller Ng logs for more details on what happened with that request. I believe the Gorouter access log should include a vcap_request_id. You can search for that in the CCNG logs to find logging from that specific request.

Yogendra Mummaneni's profile image
Yogendra Mummaneni

Thanks Daniel for you reply!

 

I am seeing loads of 504's, I am giving few requests here which is giving 504's after waiting for 1 min, I have attached the image from timing tab,

 

  1. Request URL: https://apps.system.titan.lab.emc.com/proxy/api/v3/apps?space_guids=ecb5f992-bb71-443a-9d14-b9b937023646%2Cd2258adb-3391-48d0-b039-3a9c1423ae46%2C93afa550-bc60-4e46-a442-0f0e6be28828%2C6cb1cb9a-cc39-4893-b212-59bb5375758e&page=1&per_page=50
  2. Request Method: GET
  3. Status Code: 504 Gateway Timeout
  4. Remote Address: 10.60.146.9:443
  5. Referrer Policy: no-referrer

 

  1. Request URL: https://apps.system.titan.lab.emc.com/proxy/api/v3/processes?space_guids=ecb5f992-bb71-443a-9d14-b9b937023646%2Cd2258adb-3391-48d0-b039-3a9c1423ae46%2C93afa550-bc60-4e46-a442-0f0e6be28828%2C6cb1cb9a-cc39-4893-b212-59bb5375758e&page=1&per_page=50
  2. Request Method: GET
  3. Status Code: 504 Gateway Timeout
  4. Remote Address: 10.60.146.9:443
  5. Referrer Policy: no-referrer

 

  1. Request URL: https://apps.system.titan.lab.emc.com/proxy/api/v2/spaces/ecb5f992-bb71-443a-9d14-b9b937023646/routes?page=4&results-per-page=50&order-direction=asc
  2. Request Method: GET
  3. Status Code: 504 Gateway Timeout
  4. Remote Address: 10.60.146.9:443
  5. Referrer Policy: no-referrer

 

I have tried bumping up the VM size for go routers, that didn't help. Could you please let me know what log file should be looking at on cloud controller vms?

 

Thanks,

yogi.

Attachment  View in library
Yogendra Mummaneni's profile image
Yogendra Mummaneni

Hi Daniel,

 

I don't see any corresponding message in nginx or CCNG logs for router's 504 log message, even I can't find that vcap id.

 

But I can see correlation between the logs of app apps-manager-js-green and gorouter logs

 

For example: The below log from the app apps-manager-js-green has the response time more than 60 secs and has the vcap id 270c4aec-a312-4066-723c-96ae2e78ddde, I can see the corresponding log in gorouter logs, but not in CC logs

 

cf logs apps-manager-js-green

 

  2019-02-01T05:39:13.83+0000 [APP/PROC/WEB/1] ERR 2019/02/01 05:39:13 [error] 82#0: *7779 upstream timed out (110: Connection timed out) while connecting to upstream, client: 172.16.4.15, server: localhost, request: "GET /proxy/api/v2/spaces/d2258adb-3391-48d0-b039-3a9c1423ae46/routes?page=1&results-per-page=1&order-direction=asc HTTP/1.1", upstream: "https://10.60.146.9:443/v2/spaces/d2258adb-3391-48d0-b039-3a9c1423ae46/routes?page=1&results-per-page=1&order-direction=asc", host: "apps.system.titan.lab.emc.com"

  2019-02-01T05:39:13.83+0000 [APP/PROC/WEB/1] OUT 172.16.7.101 - - - [01/Feb/2019:05:39:13 +0000] "GET /proxy/api/v2/spaces/d2258adb-3391-48d0-b039-3a9c1423ae46/routes?page=1&results-per-page=1&order-direction=asc HTTP/1.1" 504 562

  2019-02-01T05:39:13.83+0000 [RTR/1] OUT apps.system.titan.lab.emc.com - [2019-02-01T05:38:13.835+0000] "GET /proxy/api/v2/spaces/d2258adb-3391-48d0-b039-3a9c1423ae46/routes?page=1&results-per-page=1&order-direction=asc HTTP/1.0" 504 0 562 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36" "172.16.7.101:53189" "172.16.4.142:61038" x_forwarded_for:"172.16.7.101" x_forwarded_proto:"https" vcap_request_id:"270c4aec-a312-4066-723c-96ae2e78ddde" response_time:60.003015201 app_id:"46703c79-7159-434c-bc3f-d1b7f0532ea0" app_index:"1" x_b3_traceid:"6c04b49a4131770a" x_b3_spanid:"6c04b49a4131770a" x_b3_parentspanid:"-"

 

router/d0f353fc-559e-4e4f-abd9-8dfbe2912318:/var/vcap/sys/log/gorouter# grep 270c4aec-a312-4066-723c-96ae2e78ddde *

access.log:apps.system.titan.lab.emc.com - [2019-02-01T05:38:13.835+0000] "GET /proxy/api/v2/spaces/d2258adb-3391-48d0-b039-3a9c1423ae46/routes?page=1&results-per-page=1&order-direction=asc HTTP/1.0" 504 0 562 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36" "172.16.7.101:53189" "172.16.4.142:61038" x_forwarded_for:"172.16.7.101" x_forwarded_proto:"https" vcap_request_id:"270c4aec-a312-4066-723c-96ae2e78ddde" response_time:60.003015201 app_id:"46703c79-7159-434c-bc3f-d1b7f0532ea0" app_index:"1" x_b3_traceid:"6c04b49a4131770a" x_b3_spanid:"6c04b49a4131770a" x_b3_parentspanid:"-"

 

But on CC(I reduced the count of CC's to 1 from 4, so that troubleshooting would be easier)

 

cloud_controller/f38de336-3441-4862-baf1-abcd6abdcba8:/var/vcap/sys/log# grep -R 270c4aec-a312-4066-723c-96ae2e78ddde *

cloud_controller/f38de336-3441-4862-baf1-abcd6abdcba8:/var/vcap/sys/log# grep -R 'response_time:60' *

cloud_controller/f38de336-3441-4862-baf1-abcd6abdcba8:/var/vcap/sys/log#

 

Also I don't see any response which takes 60 secs in nginx or CCNG on CC.

 

Thanks,

Yogi.

Daniel Mikusa's profile image
Daniel Mikusa

OK, that is valuable information. If you know the slow requests are not making it to Cloud Controller then you'd want to focus on what is happening between Gorouter & CC. I would suggest looking at the gorouter.log files. These will contain more detail about what Gorouer is doing when the problem occurs. This page explains a lot of common error messages from that file.

 

https://docs.pivotal.io/pivotalcf/2-4/adminguide/troubleshooting-router-error-responses.html

 

Aside from that, you might need to do a packet capture between Gorouter & CC to see if you can capture one of these slow requests and understand more about what is happening. There have occasionally been issues, like this one [1]. A packet capture can provide a better understanding of what's happening to the traffic and why the request is timing out.

 

Hope that helps, but if nothing jumps out to you in the logs or you'd like help reviewing, my suggestion would be to open a ticket with Pivotal Support and we can help debug further.

 

[1] - https://community.pivotal.io/s/article/backend-connections-on-the-gorouter-are-prematurely-resulting-in-502-error

Daniel Mikusa's profile image
Daniel Mikusa

There is some sort of mismatch between the stack that your app wants and the stack that's being used and possibly the stack that's supported by the buildpack. Please open a new thread since this is a totally different question and post the output of `cf buildpacks`, your `cf push` command and manifest.yml and this log output again, for full context. Thanks!

Yogendra Mummaneni's profile image
Yogendra Mummaneni

Thanks Daniel for your help! Now I have logged a ticket with Pivotal to investigate further.

 

In another env, after the upgrade to 2.3.5, seeing the below error with staticfile buildpack, any idea whats wrong here?

 

 

Starting app titan-ui-canary in org titan / space development as admin...

Downloading dicf_staticfile_buildpack_latest...

Downloaded dicf_staticfile_buildpack_latest

Cell 855b00e3-6f61-4801-9ee6-77dcdcd3566b creating container for instance 27fa8bdd-b93a-4a41-b962-178c26172171

Cell 855b00e3-6f61-4801-9ee6-77dcdcd3566b successfully created container for instance 27fa8bdd-b93a-4a41-b962-178c26172171

Downloading app package...

Downloading build artifacts cache...

Downloaded build artifacts cache (215B)

Downloaded app package (941K)

-----> Staticfile Buildpack version 1.4.24

    **ERROR** Stack not supported by buildpack: required stack cflinuxfs3 was not found

Failed to compile droplet: Failed to run all supply scripts: exit status 11

Exit status 223

Cell 855b00e3-6f61-4801-9ee6-77dcdcd3566b stopping instance 27fa8bdd-b93a-4a41-b962-178c26172171

Cell 855b00e3-6f61-4801-9ee6-77dcdcd3566b destroying container for instance 27fa8bdd-b93a-4a41-b962-178c26172171

Cell 855b00e3-6f61-4801-9ee6-77dcdcd3566b successfully destroyed container for instance 27fa8bdd-b93a-4a41-b962-178c26172171

 

FAILED

Error restarting application: BuildpackCompileFailed