vSphere 7u1 environment in my homelab, VMUG.
vCenter is running on a host I want to decommission, but is still connected to the standard vSwitch. I have the same network on my distributed vSwitch, and I have other VMs on that distributed portgroup that work without issue.
When I change the vCenter network binding to the distributed portgroup, vCenter shows all the hosts in a Not Responding state.
In this state, from vCenter I can ping the hosts as well as curl on port 902, as well as the hosts can ping vCenter and other VMs on that distributed portgroup. I can also continue to access the vSphere UI without issue. When I log in to the host and change vCenter back to the vSwitch portgroup, hosts eventually return to a healthy state.
I am seeing these errors (gathered by Runecast):
program Rhttpproxy
message-syslog warning rhttpproxy[1051156] [Originator@6876 sub=Proxy Req 27891] Error reading from client while waiting for header: N7Vmacore15SystemExceptionE(Connection reset by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network problem, timeout, or service overload.)
predicate Error reading from client while waiting for header
issue Error
program Rhttpproxy
message-syslog warning rhttpproxy[3168851] [Originator@6876 sub=Proxy Req 18143] Error reading from client while waiting for header: N7Vmacore15SystemExceptionE(Connection timed out)
predicate Error reading from client while waiting for header
issue Error
There is also this in vCenter vpxd.log:
2021-02-09T13:02:54.803-09:00 warning vpxd[05405] [Originator@6876 sub=Vmomi opID=FdmMonitor-domain-c7-4215dae5] [FdmClientAdapter] Got vmacore exception when invoking csi.FdmService.GetDebugManager on smesx02.incendiary.local: Server closed connection after 0 response bytes read; <SSL(<io_obj p:0x00007fd2e43cc308, h:64, <TCP '10.0.10.100 : 45520'>, <TCP '10.0.10.12 : 443'>>)>
2021-02-09T13:08:41.339-09:00 warning vpxd[13659] [Originator@6876 sub=HTTP server] UnimplementedRequestHandler: HTTP method POST not supported for URI /. Request from 10.0.10.100.
2021-02-09T13:08:41.339-09:00 warning vpxd[05359] [Originator@6876 sub=HostGateway] State(ST_CM_LOGIN) failed with: HTTP error response: Bad Request
2021-02-09T13:08:41.339-09:00 warning vpxd[05359] [Originator@6876 sub=HostGateway] Ignoring exception during refresh of HostGateway cache: N7Vmacore4Http13HttpExceptionE(HTTP error response: Bad Request)
I'm not sure what else or where else to look.
I can find no actual connectivity issue, vCenter can communicate with the hosts on port 902 as validated by curl, ICMP works. Both vSwitch and distributed vSwitch portgroups are on the same VLAN, and other VMs on the distributed portgroup work without issue; of course connectivity between vCenter and hosts is still there as well. There's just some other communication failing.
Ideas?