We have a TKC cluster in vSphere Tanzu and we are trying to run pods that mount volumes provisione by the default vsphere-csi-driver.
After some infrastructure instability, TKC nodes were rebooted and some pods fail to remount their volumes.
A kubectl describe pod gives us a few error messages.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31m default-scheduler Successfully assigned logging/graylog-es-data-0 to tkc-3rd01-md-0-2rccf-x6d26-v2xfq
Warning FailedAttachVolume 31m attachdetach-controller Multi-Attach error for volume "pvc-b5092e1c-4dc8-4ed1-aab1-660551f2944e" Volume is already exclusively attached to one node and can't be attached to another
Warning FailedAttachVolume 23m (x9 over 25m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-b5092e1c-4dc8-4ed1-aab1-660551f2944e" : rpc error: code = Internal desc = observed Error: "failed to attach cns volume" is set on the volume "43bf6b28-a5f3-4593-9f8c-e80d5c2643da-b5092e1c-4dc8-4ed1-aab1-660551f2944e" on virtualmachine "tkc-3rd01-md-0-2rccf-x6d26-v2xfq"
Warning FailedAttachVolume 15m attachdetach-controller AttachVolume.Attach failed for volume "pvc-b5092e1c-4dc8-4ed1-aab1-660551f2944e" : volume attachment is being deleted
Warning FailedAttachVolume 11m (x2 over 13m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-b5092e1c-4dc8-4ed1-aab1-660551f2944e" : rpc error: code = Internal desc = observed Error: "failed to detach cns volume" is set on the volume "43bf6b28-a5f3-4593-9f8c-e80d5c2643da-b5092e1c-4dc8-4ed1-aab1-660551f2944e" on virtualmachine "tkc-3rd01-md-0-2rccf-x6d26-v2xfq"
Warning FailedAttachVolume 52s (x8 over 21m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-b5092e1c-4dc8-4ed1-aab1-660551f2944e" : rpc error: code = Internal desc = Watch on virtualmachine "tkc-3rd01-md-0-2rccf-x6d26-v2xfq" timed out
I guess the erro comes from the VolumeAttachment object itself.
Status:
Attach Error:
Message: rpc error: code = Internal desc = Watch on virtualmachine "tkc-3rd01-md-0-2rccf-x6d26-v2xfq" timed out
Time: 2025-04-04T23:28:54Z
Attached: false
Events: <none>
At this point, we are not sure how to troubleshoot these errors. Any advice is welcome.
Best regards.