- Published on
Scaling Applications in Kubernetes with Horizontal Pod Autoscaler
- Authors
- Name
- Ruan Bekker
- @ruanbekker
Introduction
In the world of cloud-native applications, the ability to scale your services dynamically in response to varying loads is crucial. Kubernetes, the de facto standard for container orchestration, provides several tools to help manage scaling, and one of the most powerful is the Horizontal Pod Autoscaler (HPA).
In this post, we'll explore what HPA is, why it’s an essential component in modern applications, and how you can leverage it to scale a Go application automatically. I will guide you through setting up a Kubernetes cluster using Kind, installing the metrics server, and configuring HPA to monitor and scale your pods based on resource usage.
What is Horizontal Pod Autoscaler (HPA)?
The Horizontal Pod Autoscaler (HPA) is a Kubernetes API resource that automatically adjusts the number of pod replicas in a deployment, replication controller, or stateful set based on observed CPU utilization or other select metrics. It works by periodically querying the Kubernetes metrics server, which gathers resource usage data, and adjusting the replica count of the target resource accordingly.
Key Features of HPA
- Automatic Scaling: HPA dynamically adjusts the number of pods in a deployment based on current workload.
- CPU/Memory-Based Scaling: HPA can scale based on CPU or memory usage, ensuring that your application runs efficiently even during high traffic.
- Custom Metrics Support: HPA can be configured to use custom metrics, allowing you to scale your application based on application-specific parameters.
Why Do You Need HPA?
In a production environment, traffic and resource demands can fluctuate throughout the day. Without autoscaling, you would either have to overprovision resources to handle peak loads (leading to wasted resources during off-peak hours) or risk underprovisioning, which could lead to performance degradation during high traffic periods.
HPA solves this problem by ensuring that your application scales up when needed and scales down when demand is low, optimizing resource usage and maintaining application performance.
Setting Up the Demonstration Environment
To demonstrate HPA in action, we'll use a simple Go application that simulates CPU and memory-intensive tasks. We'll deploy this application on a Kubernetes cluster using Kind (Kubernetes in Docker) and configure HPA to automatically scale the pods based on resource utilization.
Step 1: Set Up a Kubernetes Cluster with Kind
Kind is a tool for running local Kubernetes clusters using Docker containers. It's a great way to create a local Kubernetes environment for development and testing.
This is how to install Kind for Linux (see their docs for other operating systems) :
curl -Lo /tmp/kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
sudo install -o root -g root -m 0755 kind /usr/local/bin/kind
rm -f /tmp/kind
You should be able to run the following:
kind --version
Then we can go ahead to write the kind-config.yaml
configuration for our Kind cluster. This defines one control-plane node using version 1.29.7 :
cat > kind-config.yaml << EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.29.7@sha256:f70ab5d833fca132a100c1f95490be25d76188b053f49a3c0047ff8812360baf
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: \"ingress-ready=true\"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
listenAddress: \"0.0.0.0\"
EOF
We can go ahead to deploy our one node cluster:
kind create cluster --name workshop --config kind-config.yaml
You should be able to see your nodes using:
kubectl get nodes -o wide
Step 2: Deploy Ingress Nginx Controller
This is an optional step, but I will demonstrate interacting with the Go application using an ingress.
Define the ingress-nginx custom values for our helm release:
cat > nginx-values.yaml << EOF
---
controller:
admissionWebhooks:
enabled: false
hostPort:
enabled: true
ingressClass: nginx
service:
type: NodePort
EOF
Add the ingress-nginx helm release:
helm repo add nginx https://kubernetes.github.io/ingress-nginx
helm repo update nginx
Deploy the helm release for ingress-nginx to the kube-system namespace:
helm upgrade --install nginx-ingress nginx/ingress-nginx --version 4.7.3 --namespace kube-system --values nginx-values.yaml
You can view the pods using:
kubectl get pods -n kube-system
Step 3: Install the Metrics Server
The metrics server is a crucial component for HPA to function, as it provides the resource usage data that HPA needs to make scaling decisions.
If we try to view the metrics of our pods:
kubectl top pods -n kube-system
error: Metrics API not available
We can see that metrics-server is not installed, to install metrics-server, we can first add the metrics-server helm repository:
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm repo update metrics-server
Then deploy the helm release for metrics-server:
helm upgrade --install metrics-server metrics-server/metrics-server --namespace kube-system
As we are running metrics-server in a local environment with docker containers, we will see a issue in our metrics-server logs:
I0819 05:54:46.505557 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0819 05:54:52.619894 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.48.2:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.48.2 because it doesn't contain any IP SANs" node="workshop-control-plane"
You can have a look at this github issue where they explain why we are seeing that issue.
We can use a workaround to resolve this (but only do this for test environments)
kubectl -n kube-system patch deployment metrics-server --type=json \
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]]'
Then we should see our pod transition into a running state, and when we use top pods
we should see the metrics for our pods:
kubectl top pods -n kube-system
Which will show the following:
NAME CPU(cores) MEMORY(bytes)
coredns-76f75df574-2dck8 3m 13Mi
coredns-76f75df574-d8v8h 3m 13Mi
etcd-workshop-control-plane 36m 30Mi
kindnet-rnr84 2m 7Mi
kube-apiserver-workshop-control-plane 90m 193Mi
kube-controller-manager-workshop-control-plane 26m 46Mi
kube-proxy-gmb8j 1m 13Mi
kube-scheduler-workshop-control-plane 6m 23Mi
metrics-server-8549dcfdd6-rq8lw 6m 15Mi
nginx-ingress-ingress-nginx-controller-5b5fcc6bc9-hjg8h 2m 84Mi
Step 4: Deploy the Go Application
Next, we'll deploy the Go application to the Kubernetes cluster. First we will write the manifest containing our Deployment
, Service
and Ingress
to disk:
cat > myapp-deployment.yaml << EOF
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: ruanbekker/golang-prometheus-task-async:latest
imagePullPolicy: IfNotPresent
name: myapp
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
cpu: 500m
memory: 256Mi
requests:
cpu: 100m
memory: 32Mi
---
apiVersion: v1
kind: Service
metadata:
name: myapp
namespace: default
labels:
app: myapp
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
selector:
app: myapp
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp
namespace: default
labels:
app.kubernetes.io/name: myapp
spec:
ingressClassName: nginx
rules:
- host: myapp.127.0.0.1.nip.io
http:
paths:
- backend:
service:
name: myapp
port:
name: http
path: /
pathType: Prefix
EOF
Apply the deployment:
kubectl apply -f myapp-deployment.yaml
Step 5: Configure Horizontal Pod Autoscaler
Now, let's configure HPA to scale the application based on CPU usage. For this demonstration, we'll set a target of 50% CPU utilization.
I am defining myapp-hpa.yaml
:
cat > myapp-hpa.yaml << EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: default
spec:
minReplicas: 1
maxReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
metrics:
- type: Resource
resource:
name: cpu
target:
averageUtilization: 75
type: Utilization
- type: Resource
resource:
name: memory
target:
averageUtilization: 80
type: Utilization
EOF
Then we can deploy our HPA resource:
kubectl apply -f myapp-hpa.yaml
Step 6: Test scaling with HPA
To test HPA, you can simulate a load on the application by sending 10 requests which will be generating CPU load inside the container.
for x in {1..10}; do
curl myapp.127.0.0.1.nip.io/task -d '{"type": "cpu", "duration": 5}';
done
Check the status of the HPA:
kubectl get hpa -n default
You should see something like:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapp-hpa Deployment/myapp 91%/75% 1 10 3 169d
As the load increases, you should see the REPLICAS
value increase, indicating that HPA is scaling your application. And as we can see the replicas scaled to 3 and we can verify that by looking at the pods:
kubectl get pods -n default
We should see something like:
NAME READY STATUS RESTARTS AGE
myapp-685947664-jqbmc 1/1 Running 0 1d
myapp-685947664-kbqt7 1/1 Running 0 69s
myapp-685947664-w7dp5 1/1 Running 0 9s
As the load decreases, you should see the REPLICAS
value decrease:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapp-hpa Deployment/myapp 0%/75% 1 10 1 1d
If we integrate this with prometheus, we can see how our pods scale over time:
Conclusion
The Horizontal Pod Autoscaler is an invaluable tool for ensuring that your Kubernetes applications can handle varying workloads without manual intervention. By automatically adjusting the number of pod replicas based on resource usage, HPA helps maintain application performance while optimizing resource utilization.
In this post, we set up a Kubernetes cluster using Kind, deployed a Go application, and configured HPA to automatically scale the application based on CPU usage. This setup provides a solid foundation for experimenting with HPA and understanding how it can be applied to your own applications.
Resources
- https://github.com/ruanbekker/golang-prometheus-task-async
- https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
Thank You
Thanks for reading, if you like my content, feel free to check out my website, and subscribe to my newsletter or follow me at @ruanbekker on Twitter.
- Linktree: https://go.ruan.dev/links
- Patreon: https://go.ruan.dev/patreon