VMware vSphere

 View Only
Expand all | Collapse all

Enable workload management hangs on configuring

  • 1.  Enable workload management hangs on configuring

    Posted Jun 29, 2020 01:24 PM

    i'm trying to configure TKG cluster on vsphere 7 for the first time.

    NSX-T 3.0 configured and running.

    when i enable workload management with all required info it's never finished configuring.

    i can see in wcp log many messages in loop.

    attaching error messages that repeatedly showing in the log:

    020-06-29T12:52:51.438Z debug wcp informer.processLoop() lister.List() returned

    2020-06-29T12:52:54.612Z error wcp [opID=5ef9ca68-domain-c8] Unexpected object: &Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:Failure,Message:an error on the server ("unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)") has prevented the request from succeeding,Reason:InternalError,Details:&StatusDetails{Name:,Group:,Kind:,Causes:[]StatusCause{StatusCause{Type:UnexpectedServerResponse,Message:unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body),Field:,},StatusCause{Type:ClientWatchDecoding,Message:unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body),Field:,},},RetryAfterSeconds:0,UID:,},Code:500,}

    2020-06-29T12:52:54.612Z error wcp [opID=5ef9ca68-domain-c8] Error watching NSX CRD resources.

    2020-06-29T12:52:54.612Z error wcp [opID=5ef9ca68-domain-c8] Error creating NSX resources. Err: Kubernetes API call failed. Details Error watching NSX CRD resources.

    2020-06-29T12:52:54.612Z error wcp [opID=5ef9ca68-domain-c8] Failed to create cluster network interface for MasterNode: VirtualMachine:vm-1008. Err: Kubernetes API call failed. Details Error watching NSX CRD resources.

    2020-06-29T12:52:54.612Z error wcp [opID=5ef9ca68-domain-c8] Error configuring API server on cluster domain-c8 An error occurred. This operation will be retried.

    2020-06-29T12:52:54.832Z error wcp [opID=5ef9ca68-domain-c8] Unexpected object: &Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:Failure,Message:an error on the server ("unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)") has prevented the request from succeeding,Reason:InternalError,Details:&StatusDetails{Name:,Group:,Kind:,Causes:[]StatusCause{StatusCause{Type:UnexpectedServerResponse,Message:unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body),Field:,},StatusCause{Type:ClientWatchDecoding,Message:unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body),Field:,},},RetryAfterSeconds:0,UID:,},Code:500,}

    2020-06-29T12:52:54.832Z error wcp [opID=5ef9ca68-domain-c8] Error watching NSX CRD resources.

    2020-06-29T12:52:54.832Z error wcp [opID=5ef9ca68-domain-c8] Error creating NSX resources. Err: Kubernetes API call failed. Details Error watching NSX CRD resources.

    2020-06-29T12:52:54.832Z error wcp [opID=5ef9ca68-domain-c8] Failed to create cluster network interface for MasterNode: VirtualMachine:vm-1007. Err: Kubernetes API call failed. Details Error watching NSX CRD resources.

    2020-06-29T12:52:54.832Z error wcp [opID=5ef9ca68-domain-c8] Error configuring API server on cluster domain-c8 An error occurred. This operation will be retried.

    2020-06-29T12:52:54.957Z error wcp [opID=5ef9ca68-domain-c8] Unexpected object: &Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:Failure,Message:an error on the server ("unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)") has prevented the request from succeeding,Reason:InternalError,Details:&StatusDetails{Name:,Group:,Kind:,Causes:[]StatusCause{StatusCause{Type:UnexpectedServerResponse,Message:unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body),Field:,},StatusCause{Type:ClientWatchDecoding,Message:unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body),Field:,},},RetryAfterSeconds:0,UID:,},Code:500,}

    2020-06-29T12:52:54.957Z error wcp [opID=5ef9ca68-domain-c8] Error watching NSX CRD resources.

    2020-06-29T12:52:54.957Z error wcp [opID=5ef9ca68-domain-c8] Error creating NSX resources. Err: Kubernetes API call failed. Details Error watching NSX CRD resources.

    2020-06-29T12:52:54.957Z error wcp [opID=5ef9ca68-domain-c8] Failed to create cluster network interface for MasterNode: VirtualMachine:vm-1006. Err: Kubernetes API call failed. Details Error watching NSX CRD resources.

    2020-06-29T12:52:54.957Z error wcp [opID=5ef9ca68-domain-c8] Error configuring API server on cluster domain-c8 An error occurred. This operation will be retried.

    2020-06-29T12:52:54.957Z info wcp [opID=5ef9ca68-domain-c8] no single master succeeded - retrying

    2020-06-29T12:52:54.957Z debug wcp Publish change event: &cdc.ChangeLogChangeEvent{Resource:std.DynamicID{Type_:"ClusterComputeResource", Id:"domain-c8"}, Kind:"UPDATE", Properties:[]string{"messages"}, ParentResources:[]std.DynamicID(nil)}

    does anyone had a similar issue to this?



  • 2.  RE: Enable workload management hangs on configuring

    Posted Jun 29, 2020 02:11 PM

    It looks like you have issues communicating with NSX-T Manager. Describe your full networking config you supplied to the WCP wizard, please.



  • 3.  RE: Enable workload management hangs on configuring

    Posted Jul 21, 2020 12:43 PM

    I am running into this same issue. Were you able to get this resolved?

    My NSX manager, edge and Supervisor cluster IPs are all on the same layer 2 so there shouldn't be any connection issues.



  • 4.  RE: Enable workload management hangs on configuring

    Posted Jul 21, 2020 12:52 PM

    check that local DVS and underlay switch configured with MTU 9000



  • 5.  RE: Enable workload management hangs on configuring

    Posted Jul 21, 2020 01:09 PM

    You do not need an MTU of 9000, just anything at 1600 or higher.



  • 6.  RE: Enable workload management hangs on configuring

    Posted Jul 21, 2020 04:45 PM

    Checked DVS and all MTU is set to 9000



  • 7.  RE: Enable workload management hangs on configuring

    Posted Jul 24, 2020 02:07 PM

    I'm facing the exact same issue, VSCA, NSX-T manager, NSX edge and the Supervisor VMs are on the same network and can talk to each other.
    Have you solved this issue?


    Thank you!



  • 8.  RE: Enable workload management hangs on configuring

    Posted Oct 13, 2020 07:24 PM

    Hi all

    I am having a similar issue while trying to enable workload management. Using VCSA 7 U1, ESXi 7 U1 and NSX-T 3.0.1.1.

    3 supervisor control plane VMs are deployed and are in powered on state. I can also see a new T1 gateway, and some new segments, NAT rules and LBs in NSX-T manager.

    tail -f /var/log/vmware/wcp/wcpsvc.log shows the following:

    2020-10-13T19:08:04.3Z debug wcp [opID=5f862f9f] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:04.5Z debug wcp [opID=5f862fa0] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:04.699Z debug wcp [opID=5f862fa1] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:04.899Z debug wcp [opID=5f862fa2] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:04.952Z debug wcp [opID=5f86289f-domain-c8] Cluster Network Provider is NSXT Container Plugin. Performing additional NCP-specific configuration.

    2020-10-13T19:08:05.019Z debug wcp [opID=5f862fa3] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:05.019Z warning wcp [opID=5f85ea04-ea1f] Reflector for Resource:virtualmachineclasses, ClusterID:domain-c8 failed. Err: server/kubelifecycle/reflector/reflector.go:118: Failed to list <unspecified>: Failed to list virtualmachineclasses: Unauthorized. Will retry.

    2020-10-13T19:08:05.092Z debug wcp [opID=5f862fa3] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:05.099Z debug wcp [opID=5f86289f-domain-c8] Cluster Network Provider is NSXT Container Plugin. Performing additional NCP-specific configuration.

    2020-10-13T19:08:05.161Z debug wcp [opID=5f862fa3] vcrestlib: requesting new session

    2020-10-13T19:08:05.264Z debug wcp [opID=5f86289f-domain-c8] Cluster Network Provider is NSXT Container Plugin. Performing additional NCP-specific configuration.

    2020-10-13T19:08:05.375Z debug wcp [opID=5f862fa4] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:05.375Z error wcp [opID=5f86289f-domain-c8] Error checking if NSX resources exist. Err: Unauthorized

    2020-10-13T19:08:05.376Z error wcp [opID=5f86289f-domain-c8] Error checking if NSX resources exist for VMs: [vm-2050]. Err: Unauthorized

    2020-10-13T19:08:05.376Z error wcp [opID=5f86289f-domain-c8] Error creating NSX resources. Err: Unauthorized

    2020-10-13T19:08:05.376Z error wcp [opID=5f86289f-domain-c8] Failed to create cluster network interface for MasterNode: VirtualMachine:vm-2050. Err: Unauthorized

    2020-10-13T19:08:05.376Z error wcp [opID=5f86289f-domain-c8] Error configuring cluster NIC on master VM vm-2050: Unauthorized

    2020-10-13T19:08:05.376Z error wcp [opID=5f86289f-domain-c8] Error configuring API server on cluster domain-c8 Error configuring cluster NIC on master VM. This operation is part of API server configuration and will be retried.

    2020-10-13T19:08:05.451Z debug wcp [opID=5f862fa4] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:05.519Z debug wcp [opID=5f862fa4] vcrestlib: requesting new session

    2020-10-13T19:08:05.714Z debug wcp [opID=5f862fa6] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:05.715Z error wcp [opID=5f86289f-domain-c8] Error checking if NSX resources exist. Err: Unauthorized

    2020-10-13T19:08:05.715Z error wcp [opID=5f86289f-domain-c8] Error checking if NSX resources exist for VMs: [vm-2048]. Err: Unauthorized

    2020-10-13T19:08:05.715Z error wcp [opID=5f86289f-domain-c8] Error creating NSX resources. Err: Unauthorized

    2020-10-13T19:08:05.715Z error wcp [opID=5f86289f-domain-c8] Failed to create cluster network interface for MasterNode: VirtualMachine:vm-2048. Err: Unauthorized

    2020-10-13T19:08:05.715Z error wcp [opID=5f86289f-domain-c8] Error configuring cluster NIC on master VM vm-2048: Unauthorized

    2020-10-13T19:08:05.715Z error wcp [opID=5f86289f-domain-c8] Error configuring API server on cluster domain-c8 Error configuring cluster NIC on master VM. This operation is part of API server configuration and will be retried.

    2020-10-13T19:08:05.776Z debug wcp [opID=5f862fa6] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:05.847Z debug wcp [opID=5f862fa6] vcrestlib: requesting new session

    2020-10-13T19:08:06.048Z debug wcp [opID=5f862fa7] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:06.049Z error wcp [opID=5f86289f-domain-c8] Error checking if NSX resources exist. Err: Unauthorized

    2020-10-13T19:08:06.049Z error wcp [opID=5f86289f-domain-c8] Error checking if NSX resources exist for VMs: [vm-2049]. Err: Unauthorized

    2020-10-13T19:08:06.049Z error wcp [opID=5f86289f-domain-c8] Error creating NSX resources. Err: Unauthorized

    2020-10-13T19:08:06.049Z error wcp [opID=5f86289f-domain-c8] Failed to create cluster network interface for MasterNode: VirtualMachine:vm-2049. Err: Unauthorized

    2020-10-13T19:08:06.049Z error wcp [opID=5f86289f-domain-c8] Error configuring cluster NIC on master VM vm-2049: Unauthorized

    2020-10-13T19:08:06.049Z error wcp [opID=5f86289f-domain-c8] Error configuring API server on cluster domain-c8 Error configuring cluster NIC on master VM. This operation is part of API server configuration and will be retried.

    2020-10-13T19:08:06.049Z warning wcp [opID=5f86289f-domain-c8] Error configuring cluster NIC. Err <nil>

    2020-10-13T19:08:06.049Z warning wcp [opID=5f86289f-domain-c8] Error configuring cluster NIC. Err <nil>

    2020-10-13T19:08:06.049Z warning wcp [opID=5f86289f-domain-c8] Error configuring cluster NIC. Err <nil>

    2020-10-13T19:08:06.049Z info wcp [opID=5f86289f-domain-c8] no single master succeeded - retrying

    2020-10-13T19:08:06.049Z debug wcp Publish change event: &cdc.ChangeLogChangeEvent{Resource:std.DynamicID{Type_:"ClusterComputeResource", Id:"domain-c8"}, Kind:"UPDATE", Properties:[]string{"messages"}, ParentResources:[]std.DynamicID(nil)}

    2020-10-13T19:08:06.05Z debug wcp [opID=5f86289f] [ END ] [kubelifecycle.(*Controller).syncClusterState:285] [8.645360187s] cluster=domain-c8

    2020-10-13T19:08:06.165Z debug wcp [opID=5f862fa5] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:06.299Z debug wcp [opID=5f862fa8] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:06.411Z debug wcp [opID=5f862fa9] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:06.558Z debug wcp [opID=5f862faa] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:06.559Z warning wcp [opID=5f85ea04-ea20] Reflector for Resource:limitranges, ClusterID:domain-c8 failed. Err: server/kubelifecycle/reflector/reflector.go:118: Failed to list <unspecified>: Failed to list limitranges: Unauthorized. Will retry.

    2020-10-13T19:08:06.67Z debug wcp [opID=5f862fad] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:06.814Z debug wcp [opID=5f862fac] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:06.932Z debug wcp [opID=5f862fae] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:07.042Z debug wcp [opID=5f862fb0] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:07.043Z warning wcp [opID=5f85ea04-ea23] Reflector for Resource:serviceaccounts, ClusterID:domain-c8 failed. Err: server/kubelifecycle/reflector/reflector.go:118: Failed to list <unspecified>: Failed to list serviceaccounts: Unauthorized. Will retry.

    2020-10-13T19:08:07.153Z debug wcp [opID=5f862faf] Getting HOK signer; store: wcp, alias: wcp

    2020-10-13T19:08:07.267Z debug wcp [opID=5f862fb1] Getting HOK signer; store: wcp, alias: wcp

    Any pointers on resolving this issue?

    Thanks

    Vineeth



  • 9.  RE: Enable workload management hangs on configuring

    Posted Oct 13, 2020 07:45 PM

    Hey, hope you are doing fine

    You are having an unauthorized error

    Error checking if NSX resources exist. Err: Unauthorized

    2020-10-13T19:08:05.376Z error wcp [opID=5f86289f-domain-c8] Error checking if NSX resources exist for VMs: [vm-2050]. Err: Unauthorized

    2020-10-13T19:08:05.376Z error wcp [opID=5f86289f-domain-c8] Error creating NSX resources. Err: Unauthorized

    Can you check if permissions are correct con NSX, Kubernetes and vSphere compute manager?
    Have you accepted all certificates?

    Warm regards



  • 10.  RE: Enable workload management hangs on configuring

    Posted Oct 13, 2020 09:12 PM

    Thanks for the quick response. After finishing the workload management configuration wizard, it was able to deploy 3 supervisor control plane VMs and it also configured a new T1 gateway, and some new segments, NAT rules and LBs in NSX-T manager. So I can't really understand what this unauthorised means!



  • 11.  RE: Enable workload management hangs on configuring

    Posted Oct 17, 2020 09:15 AM

    Does anyone has a clue or some hint to troubleshoot further? I seem to have exactly the same problem.

    I have upgraded to the latest vCenter. Running all ESXi hosts on VMware Workstation in my home lab.



  • 12.  RE: Enable workload management hangs on configuring

    Posted Oct 24, 2020 07:38 PM

    This issue is now resolved. It was due to a missing configuration in NSX-T. Thanks Hari (@hari5611) for identifying it and helping me fix it.

    On Tier-0 Gateway added route re-distribution to allow all overlay traffic. This was missing earlier and it fixed the problem.

    Thanks

    Vineeth



  • 13.  RE: Enable workload management hangs on configuring

    Posted Dec 18, 2020 06:26 PM

    So what does this mean? Is this step in the docs anywhere? How did you configure it? 



  • 14.  RE: Enable workload management hangs on configuring

    Posted Dec 31, 2020 10:54 PM

    Two most common reasons are:

    1. Trust is not enabled in the Compute Manager for this vCenter in NSX.

    2. Time between vCenter and NSX is not in sync.



  • 15.  RE: Enable workload management hangs on configuring

    Broadcom Employee
    Posted Jan 05, 2021 02:56 AM

    I get exactly the same thing regardless of whether I configure using NSX-T or vSphere Networking (ie, no NSX-T).

    3 nodes up. Only 1 has any appreciable CPU activity (averaging about 50%). The other nodes average 1%.

    All nodes have only a single NIC (thus it isn't getting to the point of adding and configuring the 2nd NIC).

    Can SSH to the "master" node". "kubectl get nodes" lists a single node. Status is "Ready". Roles is "master. Version if v1.18.2

    kubectl get cluster --all-namespaces responds with "No resources found"

    Checking the wcpsvc.log file on vCenter, I can't see any errors that would tend to explain what is happening (or not happening). However, then again, I may be looking for the wrong text in the log.

    Any suggestions on where to look?

    Thanks.

     



  • 16.  RE: Enable workload management hangs on configuring

    Posted Jan 25, 2021 05:25 AM

    hey I had same issue "error configuring cluster nic on master vm" too.

    Does anyone know whether the "ingress" and "egress" CIDR should be the same as the subnet of edge uplink? Or maybe the ingress and egress CIDR can be any routable VLAN subnet?

    I doubt that this issue could be caused by NSX-T network setting but not so sure about any additional routing or SNAT should be considered?

    I've tried so many times on deploy workload management cluster, it's frustrating...

    (Note that my environment is using static routing for NSX-T edge, not BGP.)



  • 17.  RE: Enable workload management hangs on configuring

    Posted Jan 25, 2021 08:30 PM

    I got my system further by giving in and using. BGP. You just can't run this stuff without it. 

    Now I get TLS errors. When I try to connect via ssl, TLS hangs during establishing connection. BUT, if I ssh from the Supervisor VM to the Ingress IP, it works. I have no idea what to do from there. 

     



  • 18.  RE: Enable workload management hangs on configuring

    Broadcom Employee
    Posted Jan 26, 2021 10:36 PM

    I ended up getting mine working.

    It was an issue with underlying networking (there was ping connectivity to the Management K8 cluster IP,  but the logs indicated an inability of vCenter connect to this IP.
    Did some testing. Ping only worked with packet sizes less than 1400 bytes (even though MTU was set at 9000)

    The cause was traced to an issue with 2 of my hosts (this is a home lab environment). 2 of the hosts I'm using are Gen 6 NUC's with a USB NIC as the 2nd adapter (and using the VMware fling driver) (The other 2 hosts are Gen 8 NUCs, with 2 onboard NICs).
    Even though MTU at the switch layer was set to 9000, the driver for USB NIC's only supports up to 4000.

    So, I reconfigured the vDS that has the NSX-T transport VLAN to an MTU of 4000 bytes.

    I then redeployed the Workload Management cluster, and it completed successfully (using NSX-T networking).

    FYI: I'm not using BGP as a routing protocol (only using static routing between the NSX-T T0 router and the external NBN router.

    Hope this helps someone.



  • 19.  RE: Enable workload management hangs on configuring

    Posted Jan 27, 2021 09:51 PM

    This did the trick. I had all MTU set to 9000 because in the networking world, that works. Is some other component is limited to less, then the result is packets of that lower limit can go through hassel free. But apparently in the VMware networking world, Setting you MTU too high causes problems. Thanks VMware. 

    I set the NSX-T Overlay and the Edge Overlay vlans to 1600 MTU. Now things work, I can connect to the K8s interface. 



  • 20.  RE: Enable workload management hangs on configuring

    Posted Jan 28, 2021 03:01 AM

    my ESXi infrastructure had physical switch and VDS MTU set to 9000, same as nsx-t TN profiles. But the hanging issue still there...

    However, my DNS server is located on another physical switch(VLAN) of MTU 1700, which is separated from a physical firewall.

     

    I'm doubting that whether the underlying network MTU between all the services(such as DNS and vCenter, even they are on VLANs) should be the same or not?

     



  • 21.  RE: Enable workload management hangs on configuring

    Posted Jan 28, 2021 05:34 PM

    As it turns out, I hadd all MTU set to 9000 and K8S didn't work. It deployed (once I set up BGP on the switch), but I couldn't get to the Cluster Ingress IP. ONce I set the Overlay mtu to 1600 and the edge overlay mtu to 1600, everything worked. 



  • 22.  RE: Enable workload management hangs on configuring