Tanzu

 View Only
  • 1.  vSphere with Tanzu (TKGs) worker nodes are having disk pressure

    Posted Jul 28, 2022 03:33 PM

    vCenter version: 7.0.3.00800 Build: 20150588
    k8s: v1.22.9+vmware.1  VMware Photon OS/Linux   4.19.225-3.ph3   containerd://1.5.11
    Cluster size: Control plane: 3 best-effort-2xlarge, Workers: 10 best-effort-2xlarge

    When I start deploying applications like elasticsearch-rally, cassandra, fio, vdbench, pgbench most of the nodes come under disk pressure evicting the pods. 

    I see following events on the nodes:

    Events:

      Type     Reason                 Age                   From     Message

      ----     ------                 ----                  ----     -------

      Warning  FreeDiskSpaceFailed    41m                   kubelet  failed to garbage collect required amount of images. Wanted to free 729588531 bytes, but freed 0 bytes

      Warning  FreeDiskSpaceFailed    26m                   kubelet  failed to garbage collect required amount of images. Wanted to free 687059763 bytes, but freed 0 bytes

      Warning  ImageGCFailed          21m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703636275 bytes, but freed 0 bytes

      Warning  FreeDiskSpaceFailed    21m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703636275 bytes, but freed 0 bytes

      Warning  FreeDiskSpaceFailed    16m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703996723 bytes, but freed 0 bytes

      Warning  ImageGCFailed          16m                   kubelet  failed to garbage collect required amount of images. Wanted to free 703996723 bytes, but freed 0 bytes

      Normal   NodeHasDiskPressure    12m (x9 over 15h)     kubelet  Node tkgs-cluster-1-test-nodes-wtzl5-8d6d65695-2n2pp status is now: NodeHasDiskPressure

      Warning  FreeDiskSpaceFailed    11m                   kubelet  failed to garbage collect required amount of images. Wanted to free 3352056627 bytes, but freed 0 bytes

      Warning  ImageGCFailed          11m                   kubelet  failed to garbage collect required amount of images. Wanted to free 3352056627 bytes, but freed 0 bytes

      Warning  EvictionThresholdMet   7m41s (x29 over 15h)  kubelet  Attempting to reclaim ephemeral-storage

     

    By default the root partition disk size is 16 GB. Is there any way to deploy the vSphere with Tanzu (TKGs) cluster with larger root partition.

    I am able to reproduce the issue consistently in last 3 releases of vSphere with Tanzu including the recent one. 



  • 2.  RE: vSphere with Tanzu (TKGs) worker nodes are having disk pressure

    Posted Jul 28, 2022 06:53 PM

    Hi,

    In https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-B1034373-8C38-4FE2-9517-345BF7271A1E.html there is a yaml sample ending with

    workers:
      count: 3
      class: best-effort-medium
      storageClass: vwt-storage-policy
      volumes:
        - name: containerd
          mountPath: /var/lib/containerd
          capacity:
            storage: 16Gi
     

     

    Modifying 16Gi to let's say 64Gi might help. I didn't test it and it is not my finding. The original answer was published in another thread.



  • 3.  RE: vSphere with Tanzu (TKGs) worker nodes are having disk pressure

    Posted Aug 03, 2022 02:16 PM

    Thanks . This solution helped me adding an additional disk and mount /var/lib/containerd on it,. This resolved the disk pressure issue that I was hitting.



  • 4.  RE: vSphere with Tanzu (TKGs) worker nodes are having disk pressure

    Posted Aug 03, 2022 11:10 AM

    It there already an Solution to this problem? I have the same problem.  PaybyPlateMa



  • 5.  RE: vSphere with Tanzu (TKGs) worker nodes are having disk pressure

    Posted Aug 03, 2022 01:04 PM

    Hi  ,

    Can you clarify the issue?

    Is it the same as  described that 16GB is low for initial capacity, or if there is a recipe to enlarge the 16GB on-the-fly, or how to provision with an initial capacity of e.g. 64GB ?

    Your clarification helps others in the community to contribute to the issue(s). Kind regards, Daniel



  • 6.  RE: vSphere with Tanzu (TKGs) worker nodes are having disk pressure

    Posted Nov 25, 2023 04:11 PM

    Hi,

    Was Your Problem Solved? if not you can take the following steps:

    1. Identify the source of disk pressure: Use monitoring tools to identify which workloads or processes consume the most disk space. This can help you determine where to focus your efforts to reduce disk pressure.

    2. Optimize resource usage: Consider optimizing your workloads to reduce their resource usage. For example, you can scale down or remove unnecessary workloads, adjust resource limits, or use more efficient storage solutions.

    3. Manage log files: Make sure that log files are properly managed and rotated. Large log files can quickly consume disk space, so it's important to regularly clean them up or archive them to a separate storage location.

    4. Regularly clean up unused resources: Remove unused resources, such as unused containers or images, to free up disk space.

    5. Add more disk space: If none of the above steps are sufficient, you may need to add more disk space to your worker nodes. You can do this by adding disks or expanding the existing disk volumes. PayByPlateMa Com



  • 7.  RE: vSphere with Tanzu (TKGs) worker nodes are having disk pressure

    Posted Aug 31, 2023 06:34 AM

    If your vSphere with Tanzu (TKGs) worker nodes are experiencing disk pressure, it means that the worker nodes are running out of disk space. This can be caused by a variety of factors, such as running too many workloads on a single node, not properly managing log files, or not regularly cleaning up unused resources.

    To address this issue, you can take the following steps:

    1. Identify the source of disk pressure: Use monitoring tools to identify which workloads or processes are consuming the most disk space. This can help you determine where to focus your efforts to reduce disk pressure.

    2. Optimize resource usage: Consider optimizing your workloads to reduce their resource usage. For example, you can scale down or remove unnecessary workloads, adjust resource limits, or use more efficient storage solutions.

    3. Manage log files: Make sure that log files are properly managed and rotated. Large log files can quickly consume disk space, so it's important to regularly clean them up or archive them to a separate storage location.

    4. Regularly clean up unused resources: Remove any unused resources, such as unused containers or images, to free up disk space.

    5. Add more disk space: If none of the above steps are sufficient, you may need to add more disk space to your worker nodes. You can do this by adding additional disks or expanding the existing disk volumes.

    By taking these steps, you can reduce disk pressure on your vSphere with Tanzu (TKGs) worker nodes and ensure that your workloads continue to run smoothly. PayByPlateMa



  • 8.  RE: vSphere with Tanzu (TKGs) worker nodes are having disk pressure

    Posted Jan 15, 2024 02:25 PM

    To all humans reading this thread: please report the 4 posts above with spam links to moderators, so that they get removed. As a community we may not accept such IA-generated content, aiming at publishing SPAM links in the context of black-hat Search Engine Optimisation.



  • 9.  RE: vSphere with Tanzu (TKGs) worker nodes are having disk pressure

    Posted Apr 07, 2024 07:26 AM

    Disk pressure in Kubernetes nodes occurs when the available storage space falls below a certain threshold, leading to eviction of pods to reclaim resources. The events you're seeing indicate that the kubelet is unable to free up the required amount of disk space through garbage collection, which is an automated process to clean up unused images and containers. Pay by plate ma