Auto Scaling of Kubernetes Workloads using Custom Application Metrics
by Ofer Yaniv, Principal Software Architect
Orchestration platforms such as Kubernetes and OpenShift help customers reduce costs by enabling on-demand scalable compute resources. Customers can manually scale out and scale in their Kubernetes compute resources as needed.
Autoscaling is the process of automatically adjusting compute resources to meet a system's performance requirements. As workloads grow, systems require additional resources to sustain performance and handle increasing demand. Conversely, when demand decreases, an automatic scaling service can reduce costs by scaling in unutilized compute resources.
Autoscaling is eliminating the need for manual human intervention in the scaling process, reducing the probability of incorrect resource allocation and saving cost by automatically allocating the exact compute power that is required.
In this post, we will show you how you can use KEDA to easily autoscale your Kubernetes or OpenShift pods based on your own custom in-memory application metrics.
KEDA is a Kubernetes-based Event Driven Autoscaler. With KEDA, you can drive the scaling of any container in Kubernetes based on custom application metrics that are external to Kubernetes.
KEDA is introducing a new Kubernetes Custom Resource Definition called ScaledObject. KEDA ScaledObject is used to define how KEDA should scale your application and what the triggers are.
The external ScaledObject can be configured to periodically poll your application (over gRPC protocol) to get the name/value of the custom application metrics that should control the number of pods (replicas) for a specific Kubernetes Deployment/StatefulSet.
This simple example is showing how to easily add the related gRPC endpoints to your application to support such custom Auto Scaling.
The gRPC API endpoints that should be added to your application are specified by the externalscaler.proto.
The following example is building and running the external-scaler-server that is serving as the gRPC server of your application.
The external-scaler-server is getting the custom metric from the backend external-scaler-web application. Based on the periodic response from the external-scaler-server, KEDA external ScaledObject is automatically scaling the number of worker pods (replicas).
Prerequisites
You will need the following to complete the steps in this post:
The following diagram shows the complete setup that we will walk through in this post:
KEDA External Scaler Server Setup
Step 1: To get the KEDA External Scaler Server example
-
Clone the git repository of external-scaler-samples
git clone https://github.com/kubernetes-zookeeper/external-scaler-samples
cd ./external-scaler-samples
Step 2: To build the KEDA External Scaler Server example
This build step is optional and can be skipped.
The example is using the following 2 docker images
These 2 docker images are already stored and available in Docker hub.
If you would like to re-build these 2 docker images, you may execute the following step:
wget -O gradle.zip https://services.gradle.org/distributions/gradle-7.4-bin.zip && unzip gradle.zip && /bin/mv gradle-7.4 gradle
./gradle/bin/gradle clean docker
This creates the docker images kuberneteszookeeper/external-scaler-web and kuberneteszookeeper/external-scaler-server.
The kuberneteszookeeper/external-scaler-web and kuberneteszookeeper/external-scaler-server docker images should be accessible from the k8s cluster.
Step 3: To run the KEDA External Scaler Server example
./external-scaler-grpc/helm/install_helm.sh
./external-scaler-grpc/keda/install_keda.sh
helm upgrade --install external-scaler-server ./external-scaler-grpc/helm/external-scaler-server/ --namespace external-scaler-server --create-namespace --values ./external-scaler-grpc/helm/external-scaler-server/values.yaml
AutoScaling in Action
The external KEDA ScaledObject is now periodically polling the external-scaler-server (over gRPC protocol) for the name/value of the metric. Based on the value (and the target size for this metric), KEDA is scaling the related worker Deployment/StatefulSet.
In this specific example the metricName is "number_of_jobs" and its targetSize is "4". That is, each worker pod should not run more than "4" application jobs.
For example, when the "number_of_jobs" is "20", KEDA should automatically scale the worker StatefulSet to (20 / 4 =) "5" pods.
In this specific example the polling interval (pollingInterval) is set to 10 seconds in helm values.yaml.
Let’s put some load on the application by running the following command for creating "20" application jobs:
kubectl exec -n external-scaler-server -it svc/external-scaler-web -- curl -X POST http://external-scaler-web.external-scaler-web.external-scaler-server.svc.cluster.local:8080/external-scaler-web/api/jobs?number_of_jobs=20
The "number_of_jobs" query parameter is controlling how many application jobs should be created.
You may now run the following and watch KEDA auto scaling in action:
kubectl -n external-scaler-server get ScaledObject
kubectl -n external-scaler-server get HorizontalPodAutoScaler
kubectl -n external-scaler-server get pods -w
In a different terminal window, you may run the following command to view the gRPC application server log:
kubectl -n external-scaler-server logs -f -l name=external-scaler-server
This example is scaling the number of worker pods (replicas) based on the value of an application metric in real time.
The external-scaler-server pod is running a gRPC server that is responding to the periodic getMetricSpec and getMetrics requests that are generated by the external KEDA ScaledObject.
The docker log of the external-scaler-server pod is showing the response for the periodic getMetricSpec and getMetrics requests that are generated by the external KEDA ScaledObject.
The getMetricSpec response specifies the name of the custom application metric that should control the auto scaling and its target size.
The periodic getMetrics response specifies the value of the custom application metric.
The external KEDA ScaledObject is dividing the returned value of the custom application metric by its target size ( "4" ) - and sets the number of worker pods (replicas) accordingly.
For example, metricValue: "20" and targetSize: "4" will set the worker replicas to 5 ( "20" applications jobs require 5 worker pods ). Each worker pod should not run more than "4" (targetSize) application jobs.
The worker pods are running each of the application jobs for a random period of time. When each job is completed, the worker is sending a rest call to the external scaler web for deleting the job (reducing the value of the "number_of_jobs" metric by 1).
In different terminal windows you may run the following commands and view the log of any of the worker pods (running the application jobs) or the external scaler web (managing the number of jobs):
kubectl logs -n external-scaler-server -f worker-0
kubectl logs -n external-scaler-server -f -l name=external-scaler-web
The targetSize (numberOfJobsPerServer), minimumJobDuration and maximumJobDuration are controlled by the following helm values.yaml:
externalScalerServer:
numberOfJobsPerServer: "4"
minimumJobDuration: "10"
maximumJobDuration: "20"
Clean-up
Use the following commands to delete resources created during this post:
helm uninstall external-scaler-server --namespace external-scaler-server
helm uninstall keda --namespace keda
kubectl delete ns external-scaler-server
kubectl delete ns keda
References
-
The documentation of the KEDA External Scalers is here: External scalers
-
The source code of the External Scaler Server sample is here: External scaler
-
For more information, refer to KEDA.