I'm trying to resolve an issue that I've had for as long as I can remember and would be glad for any advice from the community! The issue is minor: within vCenter Server the preview of the console of any selected VM doesn't load. Opening the console via either "Launch Web Console" or "Launch Remote Console" works fine, as does the console preview when accessing the ESXi system directly with the ESXi Host Client. So whatever's wrong, it seems to *only* affect the console preview within vCenter Server.
So I finally decided to take a look at it and went down a rabbit-hole. We start with the browser sending a POST request to VCSA with a URL like (content in angle brackets varies): https://<host>/ui/data/propertiesWithParameters/urn:vmomi:VirtualMachine:vm-<id>:<guid>?properties=vmScreenshot.
The request will eventually complete "successfully" with a HTTP 200 response after ~75 seconds, but it has actually timed-out server-side. In VCSA you see something like this:
So what's happening behind the scenes? Trying to trace the request path it looks to be:
- Request is received by the vSphere Client (vsphere-ui) service.
- The vSphere Client forwards the request or sends a separate one (not sure which) to vCenter Server (vpxd).
- The vCenter Server sends a request to the vCenter Agent (vpxa) on the ESXi host where the virtual machine resides.
Here's where things ultimately seem to go wrong. From looking at the vpxa.log on the ESXi host we see something like this (unrelated operations removed for brevity):
2023-01-18T07:36:00.346Z In(166) Vpxa[2099604]: [Originator@6876 sub=vpxLro opID=SWI-2085ce76-58] [VpxLRO] -- BEGIN lro-205016 -- vpxa -- vpxapi.VpxaService.GetVmScreenshot -- 52189394-fa47-1531-ac6d-25895cdad128
2023-01-18T07:37:15.348Z Wa(164) Vpxa[2099591]: [Originator@6876 sub=IO.Connection opID=SWI-2085ce76-58] Failed to connect; <io_obj p:0x000000f550f47588, h:18, <TCP '2400:3740:200:6710::20 : 59627'>, <TCP '2400:3740:200:6710::20 : 443'>>, e: 110(Connection timed out), duration: 75001msec
2023-01-18T07:37:15.349Z In(166) Vpxa[2099591]: [Originator@6876 sub=IO.Connection opID=SWI-2085ce76-58] Failed to shutdown socket; <io_obj p:0x000000f550f47588, h:18, <TCP '2400:3740:200:6710::20 : 59627'>, <TCP '2400:3740:200:6710::20 : 443'>>, e: 104(shutdown: Connection reset by peer)
2023-01-18T07:37:15.349Z Wa(164) Vpxa[2099591]: [Originator@6876 sub=HttpConnectionPool-000000 opID=SWI-2085ce76-58] Failed to get pooled connection; <cs p:000000f5505b3950, TCP:[2400:3740:200:6710::20]:443>, (null), duration: 75002msec, N7Vmacore15SystemExceptionE(Connection timed out)
2023-01-18T07:37:15.351Z Wa(164) Vpxa[2099556]: --> [context]zKq7AVICAgAAAEkBOQEMdnB4YQAAYxwbbGlidm1hY29yZS5zbwAAsFUvAN9PNQCoUDUARVg1AH1eNQADBzAA7oEvAACeLwArGEEBO30AbGlicHRocmVhZC5zby4wAAJ90Q5saWJjLnNvLjYA[/context]
2023-01-18T07:37:15.351Z In(166) Vpxa[2099591]: [Originator@6876 sub=IO.Http opID=SWI-2085ce76-58] Set user agent error; state: 1, (null), N7Vmacore15SystemExceptionE(Connection timed out)
2023-01-18T07:37:15.353Z In(166) Vpxa[2099556]: --> [context]zKq7AVICAgAAAEkBOQEMdnB4YQAAYxwbbGlidm1hY29yZS5zbwAAsFUvAN9PNQCoUDUARVg1AH1eNQADBzAA7oEvAACeLwArGEEBO30AbGlicHRocmVhZC5zby4wAAJ90Q5saWJjLnNvLjYA[/context]
2023-01-18T07:37:15.353Z Er(163) Vpxa[2099591]: [Originator@6876 sub=IO.Http opID=SWI-2085ce76-58] User agent failed to send request; (null), N7Vmacore15SystemExceptionE(Connection timed out)
2023-01-18T07:37:15.355Z Er(163) Vpxa[2099556]: --> [context]zKq7AVICAgAAAEkBOQEMdnB4YQAAYxwbbGlidm1hY29yZS5zbwAAsFUvAN9PNQCoUDUARVg1AH1eNQADBzAA7oEvAACeLwArGEEBO30AbGlicHRocmVhZC5zby4wAAJ90Q5saWJjLnNvLjYA[/context]
2023-01-18T07:37:15.353Z Wa(164) Vpxa[2099604]: [Originator@6876 sub=Vmomi opID=SWI-2085ce76-58] VMOMI activation LRO failed; <<52189394-fa47-1531-ac6d-25895cdad128, <TCP '127.0.0.1 : 8089'>, <TCP '127.0.0.1 : 23184'>>, vpxa, vpxapi.VpxaService.GetVmScreenshot>, N3Vim5Fault8NotFound9ExceptionE(Fault cause: vim.fault.NotFound
2023-01-18T07:37:15.358Z Wa(164) Vpxa[2099556]: --> )
2023-01-18T07:37:15.358Z Wa(164) Vpxa[2099556]: --> [context]zKq7AVICAgAAAEkBOQETdnB4YQAAYxwbbGlidm1hY29yZS5zbwABaCsYdnB4YQABxj4eAXeyIgFSPx4Cy64LbGlidnB4YXBpLXR5cGVzLnNvAAG+oCoDz+QXbGlidm1vbWkuc28AARrIGQH9DSoBQA8qAVceKgGvUykBK/UpAO6BLwAAni8AKxhBBDt9AGxpYnB0aHJlYWQuc28uMAAFfdEObGliYy5zby42AA==[/context]
2023-01-18T07:37:15.358Z In(166) Vpxa[2099604]: [Originator@6876 sub=vpxLro opID=SWI-2085ce76-58] [VpxLRO] -- FINISH lro-205016
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099604]: [Originator@6876 sub=Default opID=SWI-2085ce76-58] [VpxLRO] -- ERROR lro-205016 -- 52189394-fa47-1531-ac6d-25895cdad128 -- vpxa -- vpxapi.VpxaService.GetVmScreenshot: :vim.fault.NotFound
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> Result:
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> (vim.fault.NotFound) {
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> faultCause = (vmodl.MethodFault) null,
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> faultMessage = <unset>
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> msg = ""
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> }
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> Args:
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: -->
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> Arg vmid:
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> 9
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> Arg snapshot:
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: -->
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> Arg options:
2023-01-18T07:37:15.358Z Er(163) Vpxa[2099556]: --> "&h=120&w=160"
So it's like the vCenter Agent is unable to invoke a VMOMI method to obtain a screenshot of the console because the TCP connection to itself never completes.
The following output would seem to confirm this (interesting lines in bold):
esxcli network ip connection list
Proto Recv Q Send Q Local Address Foreign Address State World ID CC Algo World Name
----- ------ ------ --------------------------------------------- ------------------ ----------- -------- ------- ----------
tcp 0 0 [2400:3740:200:6710::20]:443 [2400:3740:200:6710::20]:59627 RCVD 0
tcp 0 0 10.10.16.32:443 10.10.16.128:35194 ESTABLISHED 2098852 newreno rhttpproxy-IO
tcp 0 0 [2400:3740:200:6710::20]:59627 [2400:3740:200:6710::20]:443 SYN_SENT 2099579 newreno vpxa-worker
tcp 0 0 10.10.16.32:443 10.10.16.128:34816 ESTABLISHED 2098852 newreno rhttpproxy-IO
tcp 0 0 10.10.16.32:443 10.50.50.40:52070 ESTABLISHED 2098852 newreno rhttpproxy-IO
tcp 0 0 10.10.16.32:443 10.50.50.40:53845 ESTABLISHED 2098851 newreno rhttpproxy-IO
tcp 0 0 10.10.16.32:443 10.10.16.128:59700 ESTABLISHED 2098852 newreno rhttpproxy-IO
tcp 0 0 0.0.0.0:443 0.0.0.0:0 LISTEN 2098789 newreno rhttpproxy
tcp 0 0 [::]:443 [::]:0 LISTEN 2098789 newreno rhttpproxy
But why is that? I thought maybe it's an edge case the firewall doesn't handle, so tried disabling it with "esxcli network firewall unload" but it had no impact. My only thought at this point is it's something related to IPv6, but the connection is to itself (it's effectively loopback). Has anyone got any clever ideas? Is there something obvious I've overlooked?
The ESXi host is 8.0 (Build 20513097) and vCenter Server is 8.0a (Build 20920323) but this issue was around for as long as I remember on various vSphere 7.x releases before I upgraded.