Hi,
I posted the same question the VMware vSphere thread before discovering this thread, sorry if this is posted twice now:
I am currently facing issues when deploying the workload management on my vSphere Cluster in my homelab. I'm using NSX networking + ALB, everything is setup according to the documentation as far as I can tell but the installation hangs on the host preparation. I can see inside vCenter that the task "starting service" for each of the ESXi hosts is repeated around every 3 minutes. The Spherelet service is running and is showing no errors inside /var/log/spherelet.log
.
On each Host inside the wcp setup there is a warning shown: "Kubernetes Worker Node is schedulable A general system error occurred. Error message: waiting for node esxhost1.example.com to move to ready state." and the configuration task is stuck at this point

The configuration for all work Supervisor VMs did complete and was successful.
When connecting to one of the supervisors and running a kubectl get node
all of my ESXi hosts are shown and being reported as running.
root@421c3a4c505c2409b59c1114ece9025d [ ~ ]# kubectl get node
NAME STATUS ROLES AGE VERSION
421c3a4c505c2409b59c1114ece9025d Ready control-plane,master 46m v1.29.7+vmware.wcp.1
421c77c0e1b71771775219aeb4ecb244 Ready control-plane,master 42m v1.29.7+vmware.wcp.1
421cc0a0ef3d95c171cbed8040c021f9 Ready control-plane,master 42m v1.29.7+vmware.wcp.1
esx03.************* Ready agent 35m v1.29.3-sph-c8e42be
esx04. ************* Ready agent 35m v1.29.3-sph-c8e42be
esx05.************* Ready agent 37m v1.29.3-sph-c8e42be
inside /var/log/vmware/wcp/wcpsvc.log
I can see the following debug message, but not sure if this is relevant:
2024-08-18T16:35:32.206Z info wcp [eamagency/resolve.go:169] [opID=vCLS] Successfully invoked resolve on agency &{0xc004f0e638 0xc0004c3840}
2024-08-18T16:35:32.277Z info wcp [] W0818 18:35:32.277611 676647 reflector.go:539] pkg/mod/k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: failed to list clusters: failed to list clusters: the server could not find the requested resource
2024-08-18T16:35:32.277Z info wcp [] W0818 18:35:32.277611 676647 reflector.go:539] pkg/mod/k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: failed to list clusters: failed to list clusters: the server could not find the requested resource
2024-08-18T16:35:32.277Z info wcp [] E0818 18:35:32.277684 676647 reflector.go:147] pkg/mod/k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: Failed to watch clusters: failed to list clusters: failed to list clusters: the server could not find the requested resource
2024-08-18T16:35:32.277Z info wcp [] E0818 18:35:32.277684 676647 reflector.go:147] pkg/mod/k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: Failed to watch clusters: failed to list clusters: failed to list clusters: the server could not find the requested resource
2024-08-18T16:35:32.277Z info wcp [] E0818 18:35:32.277684 676647 reflector.go:147] pkg/mod/k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: Failed to watch clusters: failed to list clusters: failed to list clusters: the server could not find the requested resource
Also the same error as in the setup process is shown:
2024-08-18T16:37:03.363Z error wcp [kubelifecycle/node_controller.go:1525] [opID=d06d1726-89c7-4144-891b-06256eff58af-host-70016] Failed to move host esxhost1.example.com to ready state, err:waiting for node esxhost1.example.com to move to ready state
2024-08-18T16:37:03.375Z error wcp [kubelifecycle/node_controller.go:474] [opID=d06d1726-89c7-4144-891b-06256eff58af-host-70016] Failed to realize node {nodeID:host-70016 supervisorID:d06d1726-89c7-4144-891b-06256eff58af} state. Err waiting for node esxhost1.example.com to move to ready state. Will retry.
This one, might also be relevant, this is being shown as an event inside each ESXi host and inside the wcp log file:
2024-08-18T16:42:54.504Z debug wcp [vclib/guestop.go:213] [opID=66c210b0-d06d1726-89c7-4144-891b-06256eff58af-SecretUploader-vm-77043] Failed to delete file from /dev/shm/secret.tmp: ServerFaultCode: File /dev/shm/secret.tmp was not found
What I already checked/tried:
- checked for uniform NTP configuration and tested it
- tested DNS on Supervisor, ALB, ESXi, vCenter
- restarted wcp service on vCenter
- restarted all services on ESXi host
- reinstalled one of my ESXi hosts
Does anyone have an idea on what I can look for/ try?