VMware Cloud Director Container Service Extension

  • 1.  Any plan regarding how to operate multi-node etcd cluster?

    Posted Aug 08, 2022 10:17 AM

    The new capability to create multiples nodes for the control plane is good.
    However how should they be operated?

    I am thinking of etcd
    https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/

    For example:

    What would be the process to replace a failed ectd member?
    Removing a failed member doesn't seem possible if they are managed by Tanzu.
    I didn't see an option in the gui to delete a specific node.

    Will their be an option to backup etcd directly from the GUI or via CLI?



  • 2.  RE: Any plan regarding how to operate multi-node etcd cluster?

    Posted Sep 14, 2022 01:53 PM

    I have done some tests regarding this topic with a cluster created with 3 master.
    If one control plane node is shutdown from vCenter, "get pods -A" continue to work. (As expected)
    If two control plane nodes are shutdown, "get pods -A" doesn't work anymore (Expected)
    After restarting one of the control plane node "get pods -A" works again, (Expected)
    So the basic functionality of a multi control plane nodes is working.

    One issue is that no errors are reported in the events or in status of the cluster from CSE plugin. (Status is "ready")
    The only thing visible is at load balancer level  which shows that some endpoints are down and VAPP that is noticing some VMs down.
    Would it be possible to add some kinds of "health" in the CSE plugin? (like all control planes node up and running / worker nodes up and running, load balancer associated to management IP deployed etc)

    Second issue, I have deleted on purpose one of the control plane VM.
    As mentioned above no information are reported from the CSE plugin, it still show "3 nodes".
    It doesn't recreate the missing node (no "auto-heal" , which would be the best)
    Is there a procedure on how to replace a failed node in such case?