Scaling Applications in Kubernetes with Horizontal Pod Autoscaler

Introduction

In the world of cloud-native applications, the ability to scale your services dynamically in response to varying loads is crucial. Kubernetes, the de facto standard for container orchestration, provides several tools to help manage scaling, and one of the most powerful is the Horizontal Pod Autoscaler (HPA).

In this post, we'll explore what HPA is, why it’s an essential component in modern applications, and how you can leverage it to scale a Go application automatically. I will guide you through setting up a Kubernetes cluster using Kind, installing the metrics server, and configuring HPA to monitor and scale your pods based on resource usage.

What is Horizontal Pod Autoscaler (HPA)?

The Horizontal Pod Autoscaler (HPA) is a Kubernetes API resource that automatically adjusts the number of pod replicas in a deployment, replication controller, or stateful set based on observed CPU utilization or other select metrics. It works by periodically querying the Kubernetes metrics server, which gathers resource usage data, and adjusting the replica count of the target resource accordingly.

Key Features of HPA

Automatic Scaling: HPA dynamically adjusts the number of pods in a deployment based on current workload.
CPU/Memory-Based Scaling: HPA can scale based on CPU or memory usage, ensuring that your application runs efficiently even during high traffic.
Custom Metrics Support: HPA can be configured to use custom metrics, allowing you to scale your application based on application-specific parameters.

Why Do You Need HPA?

In a production environment, traffic and resource demands can fluctuate throughout the day. Without autoscaling, you would either have to overprovision resources to handle peak loads (leading to wasted resources during off-peak hours) or risk underprovisioning, which could lead to performance degradation during high traffic periods.

HPA solves this problem by ensuring that your application scales up when needed and scales down when demand is low, optimizing resource usage and maintaining application performance.

Setting Up the Demonstration Environment

To demonstrate HPA in action, we'll use a simple Go application that simulates CPU and memory-intensive tasks. We'll deploy this application on a Kubernetes cluster using Kind (Kubernetes in Docker) and configure HPA to automatically scale the pods based on resource utilization.

Step 1: Set Up a Kubernetes Cluster with Kind

Kind is a tool for running local Kubernetes clusters using Docker containers. It's a great way to create a local Kubernetes environment for development and testing.

This is how to install Kind for Linux (see their docs for other operating systems) :

curl -Lo /tmp/kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
sudo install -o root -g root -m 0755 kind /usr/local/bin/kind
rm -f /tmp/kind

You should be able to run the following:

kind --version

Then we can go ahead to write the kind-config.yaml configuration for our Kind cluster. This defines one control-plane node using version 1.29.7 :

cat > kind-config.yaml << EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:v1.29.7@sha256:f70ab5d833fca132a100c1f95490be25d76188b053f49a3c0047ff8812360baf
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: \"ingress-ready=true\"
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    protocol: TCP
    listenAddress: \"0.0.0.0\"
EOF

We can go ahead to deploy our one node cluster:

kind create cluster --name workshop --config kind-config.yaml

You should be able to see your nodes using:

kubectl get nodes -o wide

Step 2: Deploy Ingress Nginx Controller

This is an optional step, but I will demonstrate interacting with the Go application using an ingress.

Define the ingress-nginx custom values for our helm release:

cat > nginx-values.yaml << EOF
---
controller:
  admissionWebhooks:
    enabled: false
  hostPort:
    enabled: true
  ingressClass: nginx
  service:
    type: NodePort
EOF

Add the ingress-nginx helm release:

helm repo add nginx https://kubernetes.github.io/ingress-nginx
helm repo update nginx

Deploy the helm release for ingress-nginx to the kube-system namespace:

helm upgrade --install nginx-ingress nginx/ingress-nginx --version 4.7.3 --namespace kube-system --values nginx-values.yaml

You can view the pods using:

kubectl get pods -n kube-system

Step 3: Install the Metrics Server

The metrics server is a crucial component for HPA to function, as it provides the resource usage data that HPA needs to make scaling decisions.

If we try to view the metrics of our pods:

kubectl top pods -n kube-system
error: Metrics API not available

We can see that metrics-server is not installed, to install metrics-server, we can first add the metrics-server helm repository:

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm repo update metrics-server

Then deploy the helm release for metrics-server:

helm upgrade --install metrics-server metrics-server/metrics-server --namespace kube-system

As we are running metrics-server in a local environment with docker containers, we will see a issue in our metrics-server logs:

I0819 05:54:46.505557       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0819 05:54:52.619894       1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.48.2:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.48.2 because it doesn't contain any IP SANs" node="workshop-control-plane"

You can have a look at this github issue where they explain why we are seeing that issue.

We can use a workaround to resolve this (but only do this for test environments)

kubectl -n kube-system patch deployment metrics-server --type=json \
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]]'

Then we should see our pod transition into a running state, and when we use top pods we should see the metrics for our pods:

kubectl top pods -n kube-system

Which will show the following:

NAME                                                      CPU(cores)   MEMORY(bytes)
coredns-76f75df574-2dck8                                  3m           13Mi
coredns-76f75df574-d8v8h                                  3m           13Mi
etcd-workshop-control-plane                               36m          30Mi
kindnet-rnr84                                             2m           7Mi
kube-apiserver-workshop-control-plane                     90m          193Mi
kube-controller-manager-workshop-control-plane            26m          46Mi
kube-proxy-gmb8j                                          1m           13Mi
kube-scheduler-workshop-control-plane                     6m           23Mi
metrics-server-8549dcfdd6-rq8lw                           6m           15Mi
nginx-ingress-ingress-nginx-controller-5b5fcc6bc9-hjg8h   2m           84Mi

Step 4: Deploy the Go Application

Next, we'll deploy the Go application to the Kubernetes cluster. First we will write the manifest containing our Deployment, Service and Ingress to disk:

cat > myapp-deployment.yaml << EOF
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: ruanbekker/golang-prometheus-task-async:latest
        imagePullPolicy: IfNotPresent
        name: myapp
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        resources:
          limits:
            cpu: 500m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 32Mi
---
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: default
  labels:
    app: myapp
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: myapp
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  namespace: default
  labels:
    app.kubernetes.io/name: myapp
spec:
  ingressClassName: nginx
  rules:
  - host: myapp.127.0.0.1.nip.io
    http:
      paths:
      - backend:
          service:
            name: myapp
            port:
              name: http
        path: /
        pathType: Prefix
EOF

Apply the deployment:

kubectl apply -f myapp-deployment.yaml

Step 5: Configure Horizontal Pod Autoscaler

Now, let's configure HPA to scale the application based on CPU usage. For this demonstration, we'll set a target of 50% CPU utilization.

I am defining myapp-hpa.yaml:

cat > myapp-hpa.yaml << EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  namespace: default
spec:
  minReplicas: 1
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 75
        type: Utilization
  - type: Resource
    resource:
      name: memory
      target:
        averageUtilization: 80
        type: Utilization
EOF

Then we can deploy our HPA resource:

kubectl apply -f myapp-hpa.yaml

Step 6: Test scaling with HPA

To test HPA, you can simulate a load on the application by sending 10 requests which will be generating CPU load inside the container.

for x in {1..10}; do 
curl myapp.127.0.0.1.nip.io/task -d '{"type": "cpu", "duration": 5}';
done

Check the status of the HPA:

kubectl get hpa -n default

You should see something like:

NAME        REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
myapp-hpa   Deployment/myapp   91%/75%   1         10        3          169d

As the load increases, you should see the REPLICAS value increase, indicating that HPA is scaling your application. And as we can see the replicas scaled to 3 and we can verify that by looking at the pods:

kubectl get pods -n default

We should see something like:

NAME                                              READY   STATUS    RESTARTS      AGE
myapp-685947664-jqbmc                             1/1     Running   0             1d
myapp-685947664-kbqt7                             1/1     Running   0             69s
myapp-685947664-w7dp5                             1/1     Running   0             9s

As the load decreases, you should see the REPLICAS value decrease:

NAME        REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
myapp-hpa   Deployment/myapp   0%/75%    1         10        1          1d

If we integrate this with prometheus, we can see how our pods scale over time:

Conclusion

The Horizontal Pod Autoscaler is an invaluable tool for ensuring that your Kubernetes applications can handle varying workloads without manual intervention. By automatically adjusting the number of pod replicas based on resource usage, HPA helps maintain application performance while optimizing resource utilization.

In this post, we set up a Kubernetes cluster using Kind, deployed a Go application, and configured HPA to automatically scale the application based on CPU usage. This setup provides a solid foundation for experimenting with HPA and understanding how it can be applied to your own applications.

Resources

Thank You

Thanks for reading, if you like my content, feel free to check out my website, and subscribe to my newsletter or follow me at @ruanbekker on Twitter.

Join my Newsletter?

Linktree: https://go.ruan.dev/links
Patreon: https://go.ruan.dev/patreon