automatic scaling of Kubernetes pods using Metrics Server?
“The inception of this matter”
Kubernetes aims to achieve both resilience and scalability by deploying multiple pods with varying resource allocations, ensuring redundancy for your applications. While you have the option to manually adjust your deployments according to your requirements, Kubernetes offers excellent support for on-demand scaling through its Horizontal Pod Autoscaling feature. This closed loop system automatically adjusts the allocation of resources, i.e., application Pods, based on your current needs. Simply create a HorizontalPodAutoscaler (HPA) resource for each application deployment requiring autoscaling, and it will handle the process seamlessly for you.
HPA functions in the following manner at a broad perspective:
- With the help of the metrics server, it monitors the resource requests metrics from your application workloads. This monitoring includes querying the metrics server and comparing the target threshold specified in the HPA definition with the average resource utilization of your application workloads, such as CPU and memory. If the threshold is reached, the HPA will scale up your application deployment to accommodate increased demands. Conversely, if the resource utilization is below the threshold, the deployment will be scaled down. You can refer to the algorithm details page in the official documentation to understand the scaling logic used by the HPA.
A HorizontalPodAutoscaler operates by using a dedicated controller within the Control Plane of your cluster, functioning as a CRD (Custom Resource Definition). To apply the HPA resource in your cluster, you need to generate a HorizontalPodAutoscaler YAML manifest that specifically targets your application Deployment and execute it using kubectl.
To function properly, HPA requires a metrics server within your cluster to gather essential metrics like CPU and memory usage. A convenient choice for this is the Kubernetes Metrics Server. This server collects resource metrics from Kubelets and makes them accessible to the Horizontal Pod Autoscaler through the Kubernetes API Server. In case necessary, the Metrics API can also be accessed using kubectl top.
In this tutorial, you will learn:
- Deploy Metrics Server to your Kubernetes cluster.
- Learn how to create Horizontal Pod Autoscalers for your applications.
- Test each HPA setup, using two scenarios: constant and variable application load.
If you are searching for a hosted Kubernetes service, take a look at our straightforward, managed Kubernetes solution designed to facilitate scalability.
Requirements
To follow this guide, you will require:
- A Kubernetes cluster with role-based access control (RBAC) enabled. This setup will use a Silicon Cloud Kubernetes cluster, but you could also create a cluster manually. Your Kubernetes version should be between 1.20 and 1.25.
- The kubectl command-line tool installed in your local environment and configured to connect to your cluster. You can read more about installing kubectl in the official documentation. If you are using a Silicon Cloud Kubernetes cluster, please refer to How to Connect to a Silicon Cloud Kubernetes Cluster to learn how to connect to your cluster using kubectl.
- The version control tool Git available in your development environment. If you are working in Ubuntu, you can refer to installing Git on Ubuntu 22.04
- The Kubernetes Helm package manager also available in your development environment. You can refer to how to install software with Helm to install Helm locally.
Step 1 – Use Helm to install Metrics Server.
To begin, include the metrics-server repository in your helm package listings. You may utilize helm repo add for this purpose.
- helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server
Afterwards, employ the command helm repo update to renew the list of accessible packages.
- helm repo update metrics-server
Hang tight while we grab the latest from your chart repositories… …Successfully got an update from the “metrics-server” chart repository Update Complete. ⎈Happy Helming!⎈
After integrating the repository with helm, you will have the capability to include metrics-server in your Kubernetes deployments. While you have the option to create your own deployment configuration, this tutorial will use Silicon Cloud’s Kubernetes Starter Kit, which already includes a configuration for metrics-server.
To accomplish that, duplicate the Kubernetes Starter Kit Git repository.
- git clone https://github.com/digitalocean/Kubernetes-Starter-Kit-Developers.git
You can find the configuration for metrics-server at Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v3.8.2.yaml. To view or edit it, you can utilize nano or any text editor of your choice.
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v3.8.2.yaml
The set of parameters is limited, with the replicas always being fixed at a value of 2.
## Starter Kit metrics-server configuration
## Ref: https://github.com/kubernetes-sigs/metrics-server/blob/metrics-server-helm-chart-3.8.2/charts/metrics-server
##
# Number of metrics-server replicas to run
replicas: 2
apiService:
# Specifies if the v1beta1.metrics.k8s.io API service should be created.
#
# You typically want this enabled! If you disable API service creation you have to
# manage it outside of this chart for e.g horizontal pod autoscaling to
# work with this release.
create: true
hostNetwork:
# Specifies if metrics-server should be started in hostNetwork mode.
#
# You would require this enabled if you use alternate overlay networking for pods and
# API server unable to communicate with metrics-server. As an example, this is required
# if you use Weave network on EKS
enabled: false
For information on the available metrics-server parameters, please refer to the Metrics Server chart page.
Note
Once you have examined the file and made necessary modifications, you can move forward with deploying metrics-server by including this file in the helm install command.
- HELM_CHART_VERSION=“3.8.2”
- helm install metrics-server metrics-server/metrics-server –version “$HELM_CHART_VERSION“ \
- –namespace metrics-server \
- –create-namespace \
- -f “Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v${HELM_CHART_VERSION}.yaml”
Your configured Kubernetes cluster will have metrics-server deployed.
NAME: metrics-server LAST DEPLOYED: Wed May 25 11:54:43 2022 NAMESPACE: metrics-server STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: *********************************************************************** * Metrics Server * *********************************************************************** Chart version: 3.8.2 App version: 0.6.1 Image tag: k8s.gcr.io/metrics-server/metrics-server:v0.6.1 ***********************************************************************
Once you have deployed, you can utilize the helm ls command to confirm the addition of metrics-server to your deployment.
- helm ls -n metrics-server
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION metrics-server metrics-server 1 2022-02-24 14:58:23.785875 +0200 EET deployed metrics-server-3.8.2 0.6.1
Afterward, you have the option to verify the condition of all Kubernetes resources that have been deployed to the metrics-server namespace.
- kubectl get all -n metrics-server
According to the configuration you used for deployment, you should have a total of two available instances for both deployment.apps and replicaset.apps.
NAME READY STATUS RESTARTS AGE pod/metrics-server-694d47d564-9sp5h 1/1 Running 0 8m54s pod/metrics-server-694d47d564-cc4m2 1/1 Running 0 8m54s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/metrics-server ClusterIP 10.245.92.63 <none> 443/TCP 8m54s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/metrics-server 2/2 2 2 8m55s NAME DESIRED CURRENT READY AGE replicaset.apps/metrics-server-694d47d564 2 2 2 8m55s
You have successfully installed metrics-server in your Kubernetes cluster. Next, you will examine a few parameters of a HorizontalPodAutoscaler Custom Resource Definition.
Step 2 – Familiarizing oneself with HPAs
Until now, your setup has been using a constant value to deploy the ReplicaSet instances. In this next phase, you will discover how to define a HorizontalPodAutoscaler CRD, which will enable the value to expand or contract dynamically.
The structure of a regular HorizontalPodAutoscaler CRD appears as follows:
Please provide an alternative for “crd.yaml” in a more native way:
file.crd
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app-deployment
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
The parameters utilized in this setup are listed below:
- spec.scaleTargetRef: A named reference to the resource being scaled.
- spec.minReplicas: The lower limit for the number of replicas to which the autoscaler can scale down.
- spec.maxReplicas: The upper limit.
- spec.metrics.type: The metric to use to calculate the desired replica count. This example is using the Resource type, which tells the HPA to scale the deployment based on average CPU (or memory) utilization. averageUtilization is set to a threshold value of 50.
You can choose between two options when creating an HPA for your application deployment.
-
- Just follow these steps:
-
- 1. Employ the kubectl autoscale command for a current deployment.
- 2. Generate a HPA YAML manifest and apply the alterations to your cluster using kubectl.
You will start with option #1, utilizing a different setup provided by Silicon Cloud Kubernetes Starter Kit. This setup includes a deployment named myapp-test.yaml, which will showcase the functionality of Horizontal Pod Autoscaling (HPA) by generating a random CPU workload.
You are able to examine that document by utilizing either nano or your preferred text editor.
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/myapp-test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-test
spec:
selector:
matchLabels:
run: myapp-test
replicas: 1
template:
metadata:
labels:
run: myapp-test
spec:
containers:
- name: busybox
image: busybox
resources:
limits:
cpu: 50m
requests:
cpu: 20m
command: ["sh", "-c"]
args:
- while [ 1 ]; do
echo "Test";
sleep 0.01;
done
Take note of the closing lines in this document. They incorporate shell syntax to continuously display the word “Test” one hundred times per second, imitating a load. Once you finish evaluating the document, you can utilize kubectl to deploy it into your cluster.
- kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/myapp-test.yaml
Afterwards, proceed to utilize kubectl autoscale for the myapp-test deployment, thus creating a HorizontalPodAutoscaler.
- kubectl autoscale deployment myapp-test –cpu-percent=50 –min=1 –max=3
Take note of the parameters given to this command. It implies that your deployment will automatically adjust the number of replicas between 1 and 3 based on CPU utilization, triggered when it reaches 50 percent.
To determine if the HPA resource has been created, execute the command “kubectl get hpa”.
- kubectl get hpa
The TARGETS column in the output will eventually display the current usage percentage compared to the target usage percentage.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE myapp-test Deployment/myapp-test 240%/50% 1 3 3 52s
Note
To view the logged events generated by a Horizontal Pod Autoscaler, you can utilize kubectl describe command.
- kubectl describe hpa myapp-test
Name: myapp-test Namespace: default Labels: <none> Annotations: <none> CreationTimestamp: Mon, 28 May 2022 10:10:50 -0800 Reference: Deployment/myapp-test Metrics: ( current / target ) resource cpu on pods (as a percentage of request): 240% (48m) / 50% Min replicas: 1 Max replicas: 3 Deployment pods: 3 current / 3 desired … Events: Type Reason Age From Message —- —— —- —- ——- Normal SuccessfulRescale 17s horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target Normal SuccessfulRescale 37s horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target
The recommended approach for creating Horizontal Pod Autoscalers (HPA) in a production environment is to utilize a dedicated YAML manifest instead of the kubectl autoscale method. By maintaining the manifest in a Git repository, you can easily monitor changes and make necessary modifications.
In the final step of this tutorial, you will go through an example of this. However, before proceeding, make sure to delete the myapp-test deployment and its corresponding HPA resource.
- kubectl delete hpa myapp-test
- kubectl delete deployment myapp-test
Step 3 – Automatically scale applications using the Metrics Server.
In this final stage, you will conduct experiments using two methods to generate server load and scale through a YAML manifest.
-
- One possible paraphrasing could be:
A software deployment that generates a consistent workload by executing computationally intensive tasks.
A shell script emulates that external workload by making rapid and consecutive HTTP requests for a web application.
Continuous Load Test
You will be developing a sample application utilizing Python in this situation. The application will execute some computationally intensive tasks. This Python code is present in one of the example manifests in the starter kit, just like the shell script in the previous step. To access it, you can use nano or any other preferred text editor to open the constant-load-deployment-test.yaml file.
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-deployment-test.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
name: python-test-code-configmap
data:
entrypoint.sh: |-
#!/usr/bin/env python
import math
while True:
x = 0.0001
for i in range(1000000):
x = x + math.sqrt(x)
print(x)
print("OK!")
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: constant-load-deployment-test
spec:
selector:
matchLabels:
run: python-constant-load-test
replicas: 1
template:
metadata:
labels:
run: python-constant-load-test
spec:
containers:
- name: python-runtime
image: python:alpine3.15
resources:
limits:
cpu: 50m
requests:
cpu: 20m
command:
- /bin/entrypoint.sh
volumeMounts:
- name: python-test-code-volume
mountPath: /bin/entrypoint.sh
readOnly: true
subPath: entrypoint.sh
volumes:
- name: python-test-code-volume
configMap:
defaultMode: 0700
name: python-test-code-configmap
Above, you can see the highlighted Python code that continuously generates random square roots. To execute this code, the deployment process will retrieve a docker image containing the necessary python runtime. It will then connect a ConfigMap to the application Pod that hosts the Python script given earlier.
To improve observability, begin by generating a distinct namespace for this deployment. Subsequently, deploy it using kubectl.
- kubectl create ns hpa-constant-load
- kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-deployment-test.yaml -n hpa-constant-load
configmap/python-test-code-configmap created deployment.apps/constant-load-deployment-test created
Note
Please confirm that the deployment was successfully created and is operational.
- kubectl get deployments -n hpa-constant-load
NAME READY UP-TO-DATE AVAILABLE AGE constant-load-deployment-test 1/1 1 1 8s
Afterward, you must proceed with the deployment of an additional HPA to this cluster. You can access an example specifically designed for this situation in constant-load-hpa-test.yaml, which you can open using nano or any text editor of your preference.
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-hpa-test.yaml -n hpa-constant-load
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: constant-load-test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: constant-load-deployment-test
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Use kubectl to implement it.
- kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-hpa-test.yaml -n hpa-constant-load
To monitor the state of the constant-load-test HPA, you can use the command “kubectl get hpa”. This command will create a HPA resource that focuses on the sample Python deployment.
- kubectl get hpa constant-load-test -n hpa-constant-load
Take note of the REFERENCE column that focuses on constant-load-deployment-test, along with the TARGETS column displaying the current CPU resource requests compared to the threshold value, similar to the previous example.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE constant-load-test Deployment/constant-load-deployment-test 255%/50% 1 3 3 49s
In the HPA CRD spec, it mentions that the number of replicas in the REPLICAS column for the sample application deployment increased from 1 to 3. This increase occurred rapidly due to the high CPU load generated by the application used in this demonstration. Similar to the previous example, you can examine the HPA events logged by using the command kubectl describe hpa -n hpa-constant-load.
Testing the Load from an External Source
One alternative way to make things more intriguing and believable is to monitor the generation of external load. In this last instance, you will utilize a separate namespace and collection of documents to ensure no data is recycled from the previous experiment.
This illustration is going to utilize the sample server known as “quote of the moment” server. Each time this server receives an HTTP request, it will send back a distinct quote as a response. By sending HTTP requests every 1ms, you will generate load on your cluster. The deployment for this scenario can be found in the quote_deployment.yaml file. You can review this file using nano or any text editor of your preference.
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote_deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: quote
spec:
replicas: 1
selector:
matchLabels:
app: quote
template:
metadata:
labels:
app: quote
spec:
containers:
- name: quote
image: docker.io/datawire/quote:0.4.1
ports:
- name: http
containerPort: 8080
resources:
requests:
cpu: 100m
memory: 50Mi
limits:
cpu: 200m
memory: 100Mi
---
apiVersion: v1
kind: Service
metadata:
name: quote
spec:
ports:
- name: http
port: 80
targetPort: 8080
selector:
app: quote
Please be aware that the current HTTP query script is not included in the manifest at this point. The manifest is only responsible for setting up the application to execute the queries. Once you have finished reviewing the file, use kubectl to create the quote namespace and deployment.
- kubectl create ns hpa-external-load
- kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote_deployment.yaml -n hpa-external-load
Please confirm that the deployment and services of the quote application are functioning properly.
- kubectl get all -n hpa-external-load
NAME READY STATUS RESTARTS AGE pod/quote-dffd65947-s56c9 1/1 Running 0 3m5s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/quote ClusterIP 10.245.170.194 <none> 80/TCP 3m5s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/quote 1/1 1 1 3m5s NAME DESIRED CURRENT READY AGE replicaset.apps/quote-6c8f564ff 1 1 1 3m5s
Afterwards, you will generate the HPA for the quote deployment. This can be set up in quote-deployment-hpa-test.yaml. Examine the contents of the file using the nano text editor or any other preferred text editor.
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote-deployment-hpa-test.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: external-load-test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: quote
behavior:
scaleDown:
stabilizationWindowSeconds: 60
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 20
Please keep in mind that in this situation, the CPU utilization resource metric has a distinct threshold value (20%). Additionally, the scaling behavior is different. This setup modifies the behavior of scaleDown.stabilizationWindowSeconds, reducing it to 60 seconds. This adjustment may not always be necessary, but in this particular scenario, it helps accelerate the process to observe how the autoscaler performs the scale-down action more rapidly. By default, the HorizontalPodAutoscaler has a cooldown period of 5 minutes. This duration is typically adequate and prevents any fluctuations during replica scaling.
Once you are prepared, use kubectl to deploy it.
- kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote-deployment-hpa-test.yaml -n hpa-external-load
Now, verify the availability and functionality of the HPA resource.
- kubectl get hpa external-load-test -n hpa-external-load
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE external-load-test Deployment/quote 1%/20% 1 3 1 108s
In the end, you will execute the real HTTP queries by running the shell script called quote_service_load_test.sh. The reason for not including this script in the manifest earlier is to allow you to monitor its execution in your cluster and view the logs directly in your terminal. Take a look at the script using nano or any text editor you prefer.
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/scripts/quote_service_load_test.sh
#!/usr/bin/env sh
echo
echo "[INFO] Starting load testing in 10s..."
sleep 10
echo "[INFO] Working (press Ctrl+C to stop)..."
kubectl run -i --tty load-generator \
--rm \
--image=busybox \
--restart=Never \
-n hpa-external-load \
-- /bin/sh -c "while sleep 0.001; do wget -q -O- http://quote; done" > /dev/null 2>&1
echo "[INFO] Load testing finished."
To carry out this demonstration, please open two different terminal windows. In the first window, execute the shell script called quote_service_load_test.sh.
- Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/scripts/quote_service_load_test.sh
Afterwards, in the second window, execute a kubectl watch command on the HPA resource with the -w flag.
- kubectl get hpa -n hpa-external-load -w
You should observe the load increasing gradually and adjusting automatically.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE external-load-test Deployment/quote 1%/20% 1 3 1 2m49s external-load-test Deployment/quote 29%/20% 1 3 1 3m1s external-load-test Deployment/quote 67%/20% 1 3 2 3m16s
When the load increases, you can see the autoscaler in action as it increases the number of replicas in the quote server deployment. Once the load generator script is stopped, there will be a cooldown period. After approximately 1 minute, the number of replicas is reduced back to the initial value of 1. To stop the running script, you can press Ctrl+C after returning to the first terminal window.
In summary
You have successfully implemented and studied the functionality of Horizontal Pod Autoscaling (HPA) using Kubernetes Metrics Server in various situations. HPA plays a vital role in Kubernetes by allowing your infrastructure to efficiently manage increased traffic when necessary.
The Metrics Server has a notable drawback as it is only capable of measuring CPU and memory usage. To gain a comprehensive understanding of its functionality, it is advisable to refer to the Metrics Server documentation. However, if you have a need to scale based on metrics like disk usage or network load, you can utilize Prometheus with the assistance of a dedicated adapter known as prometheus-adapter.
More Tutorials
Server Configurations Frequently Used for Your Web Application(Opens in a new browser tab)
insertMany function for bulk insertion into a MongoDB database.(Opens in a new browser tab)
Adding a string to a Python variable(Opens in a new browser tab)
permissions in PostgreSQL(Opens in a new browser tab)
Common errors that occur when using Nginx for connections.(Opens in a new browser tab)