automatic scaling of Kubernetes pods using Metrics Server?

3 years ago

Ava Mitchell

19 minutes

“The inception of this matter”

Kubernetes aims to achieve both resilience and scalability by deploying multiple pods with varying resource allocations, ensuring redundancy for your applications. While you have the option to manually adjust your deployments according to your requirements, Kubernetes offers excellent support for on-demand scaling through its Horizontal Pod Autoscaling feature. This closed loop system automatically adjusts the allocation of resources, i.e., application Pods, based on your current needs. Simply create a HorizontalPodAutoscaler (HPA) resource for each application deployment requiring autoscaling, and it will handle the process seamlessly for you.

HPA functions in the following manner at a broad perspective:

With the help of the metrics server, it monitors the resource requests metrics from your application workloads. This monitoring includes querying the metrics server and comparing the target threshold specified in the HPA definition with the average resource utilization of your application workloads, such as CPU and memory. If the threshold is reached, the HPA will scale up your application deployment to accommodate increased demands. Conversely, if the resource utilization is below the threshold, the deployment will be scaled down. You can refer to the algorithm details page in the official documentation to understand the scaling logic used by the HPA.

A HorizontalPodAutoscaler operates by using a dedicated controller within the Control Plane of your cluster, functioning as a CRD (Custom Resource Definition). To apply the HPA resource in your cluster, you need to generate a HorizontalPodAutoscaler YAML manifest that specifically targets your application Deployment and execute it using kubectl.

To function properly, HPA requires a metrics server within your cluster to gather essential metrics like CPU and memory usage. A convenient choice for this is the Kubernetes Metrics Server. This server collects resource metrics from Kubelets and makes them accessible to the Horizontal Pod Autoscaler through the Kubernetes API Server. In case necessary, the Metrics API can also be accessed using kubectl top.

In this tutorial, you will learn:

Deploy Metrics Server to your Kubernetes cluster.
Learn how to create Horizontal Pod Autoscalers for your applications.
Test each HPA setup, using two scenarios: constant and variable application load.

If you are searching for a hosted Kubernetes service, take a look at our straightforward, managed Kubernetes solution designed to facilitate scalability.

Requirements

To follow this guide, you will require:

A Kubernetes cluster with role-based access control (RBAC) enabled. This setup will use a Silicon Cloud Kubernetes cluster, but you could also create a cluster manually. Your Kubernetes version should be between 1.20 and 1.25.
The kubectl command-line tool installed in your local environment and configured to connect to your cluster. You can read more about installing kubectl in the official documentation. If you are using a Silicon Cloud Kubernetes cluster, please refer to How to Connect to a Silicon Cloud Kubernetes Cluster to learn how to connect to your cluster using kubectl.
The version control tool Git available in your development environment. If you are working in Ubuntu, you can refer to installing Git on Ubuntu 22.04
The Kubernetes Helm package manager also available in your development environment. You can refer to how to install software with Helm to install Helm locally.

Step 1 – Use Helm to install Metrics Server.

To begin, include the metrics-server repository in your helm package listings. You may utilize helm repo add for this purpose.

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server

Afterwards, employ the command helm repo update to renew the list of accessible packages.

helm repo update metrics-server

Output

Hang tight while we grab the latest from your chart repositories… …Successfully got an update from the “metrics-server” chart repository Update Complete. ⎈Happy Helming!⎈

After integrating the repository with helm, you will have the capability to include metrics-server in your Kubernetes deployments. While you have the option to create your own deployment configuration, this tutorial will use Silicon Cloud’s Kubernetes Starter Kit, which already includes a configuration for metrics-server.

To accomplish that, duplicate the Kubernetes Starter Kit Git repository.

git clone https://github.com/digitalocean/Kubernetes-Starter-Kit-Developers.git

You can find the configuration for metrics-server at Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v3.8.2.yaml. To view or edit it, you can utilize nano or any text editor of your choice.

nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v3.8.2.yaml

The set of parameters is limited, with the replicas always being fixed at a value of 2.

yaml file with version 3.8.2 of metrics server values.

## Starter Kit metrics-server configuration
## Ref: https://github.com/kubernetes-sigs/metrics-server/blob/metrics-server-helm-chart-3.8.2/charts/metrics-server
##

# Number of metrics-server replicas to run
replicas: 2

apiService:
  # Specifies if the v1beta1.metrics.k8s.io API service should be created.
  #
  # You typically want this enabled! If you disable API service creation you have to
  # manage it outside of this chart for e.g horizontal pod autoscaling to
  # work with this release.
  create: true

hostNetwork:
  # Specifies if metrics-server should be started in hostNetwork mode.
  #
  # You would require this enabled if you use alternate overlay networking for pods and
  # API server unable to communicate with metrics-server. As an example, this is required
  # if you use Weave network on EKS
  enabled: false

For information on the available metrics-server parameters, please refer to the Metrics Server chart page.

Note

Be cautious when aligning Kubernetes deployments with the appropriate version of Kubernetes, as the helm charts are also versioned to ensure compatibility. The metrics-server helm chart available now is 3.8.2 and it deploys version 0.6.1 of metrics-server. Referring to the Metrics Server Compatibility Matrix, it is evident that version 0.6.x is compatible with Kubernetes 1.19 or later versions.

Once you have examined the file and made necessary modifications, you can move forward with deploying metrics-server by including this file in the helm install command.

HELM_CHART_VERSION=“3.8.2”
helm install metrics-server metrics-server/metrics-server –version “$HELM_CHART_VERSION“ \
–namespace metrics-server \
–create-namespace \
-f “Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v${HELM_CHART_VERSION}.yaml”

Your configured Kubernetes cluster will have metrics-server deployed.

Output

NAME: metrics-server LAST DEPLOYED: Wed May 25 11:54:43 2022 NAMESPACE: metrics-server STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: *********************************************************************** * Metrics Server * *********************************************************************** Chart version: 3.8.2 App version: 0.6.1 Image tag: k8s.gcr.io/metrics-server/metrics-server:v0.6.1 ***********************************************************************

Once you have deployed, you can utilize the helm ls command to confirm the addition of metrics-server to your deployment.

helm ls -n metrics-server

Output

NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION metrics-server metrics-server 1 2022-02-24 14:58:23.785875 +0200 EET deployed metrics-server-3.8.2 0.6.1

Afterward, you have the option to verify the condition of all Kubernetes resources that have been deployed to the metrics-server namespace.

kubectl get all -n metrics-server

According to the configuration you used for deployment, you should have a total of two available instances for both deployment.apps and replicaset.apps.

Output

NAME READY STATUS RESTARTS AGE pod/metrics-server-694d47d564-9sp5h 1/1 Running 0 8m54s pod/metrics-server-694d47d564-cc4m2 1/1 Running 0 8m54s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/metrics-server ClusterIP 10.245.92.63 <none> 443/TCP 8m54s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/metrics-server 2/2 2 2 8m55s NAME DESIRED CURRENT READY AGE replicaset.apps/metrics-server-694d47d564 2 2 2 8m55s

You have successfully installed metrics-server in your Kubernetes cluster. Next, you will examine a few parameters of a HorizontalPodAutoscaler Custom Resource Definition.

Step 2 – Familiarizing oneself with HPAs

Until now, your setup has been using a constant value to deploy the ReplicaSet instances. In this next phase, you will discover how to define a HorizontalPodAutoscaler CRD, which will enable the value to expand or contract dynamically.

The structure of a regular HorizontalPodAutoscaler CRD appears as follows:

Please provide an alternative for “crd.yaml” in a more native way:

file.crd

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app-deployment
  minReplicas: 1
  maxReplicas: 3
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50

The parameters utilized in this setup are listed below:

spec.scaleTargetRef: A named reference to the resource being scaled.
spec.minReplicas: The lower limit for the number of replicas to which the autoscaler can scale down.
spec.maxReplicas: The upper limit.
spec.metrics.type: The metric to use to calculate the desired replica count. This example is using the Resource type, which tells the HPA to scale the deployment based on average CPU (or memory) utilization. averageUtilization is set to a threshold value of 50.

You can choose between two options when creating an HPA for your application deployment.

2. Generate a HPA YAML manifest and apply the alterations to your cluster using kubectl.

You will start with option #1, utilizing a different setup provided by Silicon Cloud Kubernetes Starter Kit. This setup includes a deployment named myapp-test.yaml, which will showcase the functionality of Horizontal Pod Autoscaling (HPA) by generating a random CPU workload.

You are able to examine that document by utilizing either nano or your preferred text editor.

nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/myapp-test.yaml

Could you provide more context or information about “myapp-test.yaml” to help me provide an appropriate paraphrase?

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-test
spec:
  selector:
    matchLabels:
      run: myapp-test
  replicas: 1
  template:
    metadata:
      labels:
        run: myapp-test
    spec:
      containers:
        - name: busybox
          image: busybox
          resources:
            limits:
              cpu: 50m
            requests:
              cpu: 20m
          command: ["sh", "-c"]
          args:
            - while [ 1 ]; do
              echo "Test";
              sleep 0.01;
              done

Take note of the closing lines in this document. They incorporate shell syntax to continuously display the word “Test” one hundred times per second, imitating a load. Once you finish evaluating the document, you can utilize kubectl to deploy it into your cluster.

kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/myapp-test.yaml

Afterwards, proceed to utilize kubectl autoscale for the myapp-test deployment, thus creating a HorizontalPodAutoscaler.

kubectl autoscale deployment myapp-test –cpu-percent=50 –min=1 –max=3

Take note of the parameters given to this command. It implies that your deployment will automatically adjust the number of replicas between 1 and 3 based on CPU utilization, triggered when it reaches 50 percent.

To determine if the HPA resource has been created, execute the command “kubectl get hpa”.

kubectl get hpa

The TARGETS column in the output will eventually display the current usage percentage compared to the target usage percentage.

Output

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE myapp-test Deployment/myapp-test 240%/50% 1 3 3 52s

Note

Please be aware that for a brief period of about 15 seconds, the TARGETS column will show /50%. This is a normal occurrence as HPA requires some time to gather average values and it will not have sufficient data before the initial 15-second interval. HPA follows a default setting of checking metrics every 15 seconds.

To view the logged events generated by a Horizontal Pod Autoscaler, you can utilize kubectl describe command.

kubectl describe hpa myapp-test

Output

Name: myapp-test Namespace: default Labels: <none> Annotations: <none> CreationTimestamp: Mon, 28 May 2022 10:10:50 -0800 Reference: Deployment/myapp-test Metrics: ( current / target ) resource cpu on pods (as a percentage of request): 240% (48m) / 50% Min replicas: 1 Max replicas: 3 Deployment pods: 3 current / 3 desired … Events: Type Reason Age From Message —- —— —- —- ——- Normal SuccessfulRescale 17s horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target Normal SuccessfulRescale 37s horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target

The recommended approach for creating Horizontal Pod Autoscalers (HPA) in a production environment is to utilize a dedicated YAML manifest instead of the kubectl autoscale method. By maintaining the manifest in a Git repository, you can easily monitor changes and make necessary modifications.

In the final step of this tutorial, you will go through an example of this. However, before proceeding, make sure to delete the myapp-test deployment and its corresponding HPA resource.

kubectl delete hpa myapp-test
kubectl delete deployment myapp-test

Step 3 – Automatically scale applications using the Metrics Server.

In this final stage, you will conduct experiments using two methods to generate server load and scale through a YAML manifest.

A software deployment that generates a consistent workload by executing computationally intensive tasks.
A shell script emulates that external workload by making rapid and consecutive HTTP requests for a web application.

Continuous Load Test

You will be developing a sample application utilizing Python in this situation. The application will execute some computationally intensive tasks. This Python code is present in one of the example manifests in the starter kit, just like the shell script in the previous step. To access it, you can use nano or any other preferred text editor to open the constant-load-deployment-test.yaml file.

nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-deployment-test.yaml

constant-load-deployment-test.yaml can be paraphrased as a YAML file for testing continuous deployment under consistent load.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: python-test-code-configmap
data:
  entrypoint.sh: |-
    #!/usr/bin/env python

    import math

    while True:
      x = 0.0001
      for i in range(1000000):
        x = x + math.sqrt(x)
        print(x)
      print("OK!")

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: constant-load-deployment-test
spec:
  selector:
    matchLabels:
      run: python-constant-load-test
  replicas: 1
  template:
    metadata:
      labels:
        run: python-constant-load-test
    spec:
      containers:
        - name: python-runtime
          image: python:alpine3.15
          resources:
            limits:
              cpu: 50m
            requests:
              cpu: 20m
          command:
            - /bin/entrypoint.sh
          volumeMounts:
            - name: python-test-code-volume
              mountPath: /bin/entrypoint.sh
              readOnly: true
              subPath: entrypoint.sh
      volumes:
        - name: python-test-code-volume
          configMap:
            defaultMode: 0700
            name: python-test-code-configmap

Above, you can see the highlighted Python code that continuously generates random square roots. To execute this code, the deployment process will retrieve a docker image containing the necessary python runtime. It will then connect a ConfigMap to the application Pod that hosts the Python script given earlier.

To improve observability, begin by generating a distinct namespace for this deployment. Subsequently, deploy it using kubectl.

kubectl create ns hpa-constant-load
kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-deployment-test.yaml -n hpa-constant-load

Output

configmap/python-test-code-configmap created deployment.apps/constant-load-deployment-test created

Note

Please ensure that resource request limits are set for the Pods of the sample application as it is crucial for the HPA logic to function properly. It is recommended to set resource request limits for all your application Pods to prevent any unforeseen bottlenecks.

Please confirm that the deployment was successfully created and is operational.

kubectl get deployments -n hpa-constant-load

Output

NAME READY UP-TO-DATE AVAILABLE AGE constant-load-deployment-test 1/1 1 1 8s

Afterward, you must proceed with the deployment of an additional HPA to this cluster. You can access an example specifically designed for this situation in constant-load-hpa-test.yaml, which you can open using nano or any text editor of your preference.

nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-hpa-test.yaml -n hpa-constant-load

constant-load-hpa-test.yaml file will remain unchanged.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: constant-load-test
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: constant-load-deployment-test
  minReplicas: 1
  maxReplicas: 3
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50

Use kubectl to implement it.

kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-hpa-test.yaml -n hpa-constant-load

To monitor the state of the constant-load-test HPA, you can use the command “kubectl get hpa”. This command will create a HPA resource that focuses on the sample Python deployment.

kubectl get hpa constant-load-test -n hpa-constant-load

Take note of the REFERENCE column that focuses on constant-load-deployment-test, along with the TARGETS column displaying the current CPU resource requests compared to the threshold value, similar to the previous example.

Output

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE constant-load-test Deployment/constant-load-deployment-test 255%/50% 1 3 3 49s

In the HPA CRD spec, it mentions that the number of replicas in the REPLICAS column for the sample application deployment increased from 1 to 3. This increase occurred rapidly due to the high CPU load generated by the application used in this demonstration. Similar to the previous example, you can examine the HPA events logged by using the command kubectl describe hpa -n hpa-constant-load.

Testing the Load from an External Source

One alternative way to make things more intriguing and believable is to monitor the generation of external load. In this last instance, you will utilize a separate namespace and collection of documents to ensure no data is recycled from the previous experiment.

This illustration is going to utilize the sample server known as “quote of the moment” server. Each time this server receives an HTTP request, it will send back a distinct quote as a response. By sending HTTP requests every 1ms, you will generate load on your cluster. The deployment for this scenario can be found in the quote_deployment.yaml file. You can review this file using nano or any text editor of your preference.

nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote_deployment.yaml

rewrite the quote_deployment.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: quote
spec:
  replicas: 1
  selector:
    matchLabels:
      app: quote
  template:
    metadata:
      labels:
        app: quote
    spec:
      containers:
        - name: quote
          image: docker.io/datawire/quote:0.4.1
          ports:
            - name: http
              containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 50Mi
            limits:
              cpu: 200m
              memory: 100Mi

---
apiVersion: v1
kind: Service
metadata:
  name: quote
spec:
  ports:
    - name: http
      port: 80
      targetPort: 8080
  selector:
    app: quote

Please be aware that the current HTTP query script is not included in the manifest at this point. The manifest is only responsible for setting up the application to execute the queries. Once you have finished reviewing the file, use kubectl to create the quote namespace and deployment.

kubectl create ns hpa-external-load
kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote_deployment.yaml -n hpa-external-load

Please confirm that the deployment and services of the quote application are functioning properly.

kubectl get all -n hpa-external-load

Output

NAME READY STATUS RESTARTS AGE pod/quote-dffd65947-s56c9 1/1 Running 0 3m5s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/quote ClusterIP 10.245.170.194 <none> 80/TCP 3m5s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/quote 1/1 1 1 3m5s NAME DESIRED CURRENT READY AGE replicaset.apps/quote-6c8f564ff 1 1 1 3m5s

Afterwards, you will generate the HPA for the quote deployment. This can be set up in quote-deployment-hpa-test.yaml. Examine the contents of the file using the nano text editor or any other preferred text editor.

nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote-deployment-hpa-test.yaml

“Please provide a single paraphrased version of the following file name: ‘quote-deployment-hpa-test.yaml’.”

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: external-load-test
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: quote
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 60
  minReplicas: 1
  maxReplicas: 3
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 20

Please keep in mind that in this situation, the CPU utilization resource metric has a distinct threshold value (20%). Additionally, the scaling behavior is different. This setup modifies the behavior of scaleDown.stabilizationWindowSeconds, reducing it to 60 seconds. This adjustment may not always be necessary, but in this particular scenario, it helps accelerate the process to observe how the autoscaler performs the scale-down action more rapidly. By default, the HorizontalPodAutoscaler has a cooldown period of 5 minutes. This duration is typically adequate and prevents any fluctuations during replica scaling.

Once you are prepared, use kubectl to deploy it.

kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote-deployment-hpa-test.yaml -n hpa-external-load

Now, verify the availability and functionality of the HPA resource.

kubectl get hpa external-load-test -n hpa-external-load

Output

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE external-load-test Deployment/quote 1%/20% 1 3 1 108s

In the end, you will execute the real HTTP queries by running the shell script called quote_service_load_test.sh. The reason for not including this script in the manifest earlier is to allow you to monitor its execution in your cluster and view the logs directly in your terminal. Take a look at the script using nano or any text editor you prefer.

nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/scripts/quote_service_load_test.sh

script to test the load of the quote service

#!/usr/bin/env sh

echo
echo "[INFO] Starting load testing in 10s..."
sleep 10
echo "[INFO] Working (press Ctrl+C to stop)..."
kubectl run -i --tty load-generator \
    --rm \
    --image=busybox \
    --restart=Never \
    -n hpa-external-load \
    -- /bin/sh -c "while sleep 0.001; do wget -q -O- http://quote; done" > /dev/null 2>&1
echo "[INFO] Load testing finished."

To carry out this demonstration, please open two different terminal windows. In the first window, execute the shell script called quote_service_load_test.sh.

Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/scripts/quote_service_load_test.sh

Afterwards, in the second window, execute a kubectl watch command on the HPA resource with the -w flag.

kubectl get hpa -n hpa-external-load -w

You should observe the load increasing gradually and adjusting automatically.

Output

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE external-load-test Deployment/quote 1%/20% 1 3 1 2m49s external-load-test Deployment/quote 29%/20% 1 3 1 3m1s external-load-test Deployment/quote 67%/20% 1 3 2 3m16s

When the load increases, you can see the autoscaler in action as it increases the number of replicas in the quote server deployment. Once the load generator script is stopped, there will be a cooldown period. After approximately 1 minute, the number of replicas is reduced back to the initial value of 1. To stop the running script, you can press Ctrl+C after returning to the first terminal window.

In summary

You have successfully implemented and studied the functionality of Horizontal Pod Autoscaling (HPA) using Kubernetes Metrics Server in various situations. HPA plays a vital role in Kubernetes by allowing your infrastructure to efficiently manage increased traffic when necessary.

The Metrics Server has a notable drawback as it is only capable of measuring CPU and memory usage. To gain a comprehensive understanding of its functionality, it is advisable to refer to the Metrics Server documentation. However, if you have a need to scale based on metrics like disk usage or network load, you can utilize Prometheus with the assistance of a dedicated adapter known as prometheus-adapter.