Using KEDA for Autoscaling Pods using Prometheus Metrics

In our previous KEDA post we looked at a KEDA introduction. We explored how KEDA can dynamically scale containers based on various triggers, including messages in queues, Kafka topics, HTTP requests, and custom metrics.

In this post, we will leverage this flexibility and integrate Prometheus with KEDA to achieve autoscaling based on application-specific metrics, such as http_requests_per_minute where we can scale our pods based on the amount of requests it receives.

What will we be doing?

To get started, we'll deploy a Kubernetes cluster using KinD for demonstration purposes. Then, we'll leverage Helm to deploy both KEDA and Prometheus. Once the environment is set up, we'll deploy a sample application that exposes Prometheus metrics. Finally, we'll define a ScaledObject to automatically scale our deployment pods based on HTTP requests per minute.

Kubernetes Environment Setup

You can follow this if you don't have KinD installed, once you have it installed, we can define the kind-config.yaml:

kind-config.yaml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:v1.29.4@sha256:3abb816a5b1061fb15c6e9e60856ec40d56b7b52bcea5f5f1350bc6e2320b6f8
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    protocol: TCP
    listenAddress: "0.0.0.0"

This config defines one control-plane node with port 80 exposed. Go ahead and deploy this node:

kind create cluster --name workshop --config kind-config.yaml

Ingress Controller

Now let's deploy Ingress-Nginx for our Kubernetes Ingress Controller to the kube-system namespace:

helm repo add nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm upgrade --install nginx-public nginx/ingress-nginx \
  --version 4.7.3 \
  --namespace kube-system \
  --set controller.admissionWebhooks.enabled=false \
  --set controller.hostPort.enabled=true \
  --set controller.ingressClass=nginx \
  --set controller.service.type=NodePort

KEDA

Next, we need to deploy KEDA and I will deploy it to the keda namespace:

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm upgrade --install keda kedacore/keda --namespace keda --create-namespace --version 2.15.0

You can view the helm chart values for more configuration options.

Prometheus Stack

Next up, we need to deploy Prometheus and I will use the kube-prometheus-stack helm chart from Prometheus.

Here I am providing the prometheus values:

prometheus-values.yaml

prometheus:
  prometheusSpec:
    serviceMonitorSelector:
      matchLabels:
        release: kube-prometheus-stack
  ingress:
    enabled: true
    ingressClassName: nginx
    pathType: ImplementationSpecific
    hosts:
      - prometheus.127.0.0.1.nip.io
    paths:
      - /

Then we can proceed to deploy Prometheus to the prometheus namespace:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --version 61.7.0 \
  --namespace prometheus \
  --create-namespace \
  --values prometheus-values.yaml

At this point in time, you should have Ingress-Nginx, KEDA and Prometheus deployed to your Kubernetes cluster.

Application Deployment

The application that we are going to deploy, will be the application that exposes Prometheus metrics, which we can use to autoscale on Prometheus metrics. We will rely on the http_requests_per_minute metric to determine when we want to scale.

I will chain the Deployment, Service, ServiceMonitor and Ingress into one application-deployment.yaml manifest:

application-deployment.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - env:
        - name: API_VERSION
          value: v5
        image: ruanbekker/golang-prometheus-task-async
        imagePullPolicy: IfNotPresent
        name: myapp
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        resources:
          limits:
            cpu: 500m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 32Mi
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  labels:
    app.kubernetes.io/name: myapp
  name: myapp
  namespace: default
spec:
  ingressClassName: nginx
  rules:
  - host: myapp.127.0.0.1.nip.io
    http:
      paths:
      - backend:
          service:
            name: myapp
            port:
              name: http
        path: /
        pathType: Prefix
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: myapp
    release: kube-prometheus-stack
  name: myapp
  namespace: default
spec:
  endpoints:
  - path: /metrics
    port: http
    scheme: http
  namespaceSelector:
    matchNames:
    - default
  selector:
    matchLabels:
      app: myapp
      release: kube-prometheus-stack
---
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: default
  labels:
    app: myapp
    release: kube-prometheus-stack
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: myapp
  type: ClusterIP

Once the manifest was written to disk, we can deploy the sample application to the default namespace:

kubectl apply -f application-deployment.yaml

Since we defined an ingress for our application, we can make a http requests against it:

Application Endpoint: http://myapp.127.0.0.1.nip.io/
Metrics Endpoint: http://myapp.127.0.0.1.nip.io/metrics

We have also defined a ServiceMonitor so that the application can be registered by Prometheus as a scrape target, so that Prometheus can scrape our application for metrics.

AutoScaling our Application

In order to setup AutoScaling, we need to define a ScaledObject, which is a Kubernetes CRD from KEDA to define and manage autoscaling rules for applications based on various events or metrics. It acts as a bridge between the application and the Kubernetes autoscaling mechanism, allowing for more flexible and granular control over scaling behavior.

application-scaledobject.yaml

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: prometheus-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    name: myapp
  minReplicaCount: 2
  maxReplicaCount: 10
  pollingInterval: 10
  cooldownPeriod: 30
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-operated.prometheus.svc.cluster.local:9090
      metricName: http_requests_per_minute
      query: sum(rate(http_requests_total{job="myapp", path="/"}[1m]) * 60) by (service)
      threshold: '60'

Where it will use Prometheus metrics to automatically scale a deployment named myapp based on HTTP requests per minute. Here are some of the configuration parameters in detail:

scaleTargetRef: Reference to the Kubernetes object to be scaled (deployment named myapp).
minReplicaCount: Sets the minimum number of replicas for the myapp deployment (2 in this case).
maxReplicaCount: Defines the maximum number of replicas the deployment can scale to (10 in this case).
pollingInterval: Specifies the interval (10 seconds) at which KEDA checks the Prometheus metrics.
cooldownPeriod: Defines the waiting period (30 seconds) after scaling up or down before KEDA reevaluates the metrics.
triggers: An array containing the trigger definitions for scaling.
- type: Sets the trigger type to prometheus.
- metadata: Additional information for the trigger.
  - serverAddress: Specifies the address of the Prometheus server (http://prometheus-operated.prometheus.svc.cluster.local:9090).
  - metricName: Defines the human-readable name of the metric to monitor (http_requests_per_minute).
  - query: The PromQL query to fetch the actual metric value. This query calculates the rate of HTTP requests per minute for the / path of the myapp service in the last minute.
  - threshold: Sets the threshold value (60) for triggering scaling.

Now we can deploy the scaledobject resource:

kubectl apply -f application-scaledobject.yaml

We can now access the Prometheus frontend on http://prometheus.127.0.0.1.nip.io/ and use the following query to monitor the http requests:

sum(rate(http_requests_total{job="myapp", path="/"}[1m]) * 60) by (service)

Then we can generate 60 requests per minute:

while true; do sleep 1; curl myapp.127.0.0.1.nip.io; done

We can view the scaledobject resource:

kubectl get scaledobject -w
# NAME                      SCALETARGETKIND      SCALETARGETNAME   MIN   MAX   TRIGGERS     AUTHENTICATION   READY   ACTIVE   FALLBACK   PAUSED    AGE
# prometheus-scaledobject   apps/v1.Deployment   myapp             2     10    prometheus                    True    True     False      Unknown   16m

Then when we look at our pods after some time:

kubectl get pods -n default

We can see that our pods has increased:

NAME                    READY   STATUS    RESTARTS   AGE
myapp-fd69cc685-47m9x   1/1     Running   0          67m
myapp-fd69cc685-758m9   1/1     Running   0          61s
myapp-fd69cc685-9nfcl   1/1     Running   0          46s
myapp-fd69cc685-f9tc5   1/1     Running   0          46s
myapp-fd69cc685-h7ng4   1/1     Running   0          61s
myapp-fd69cc685-hkk8r   1/1     Running   0          67m
myapp-fd69cc685-kmb55   1/1     Running   0          46s
myapp-fd69cc685-p88vw   1/1     Running   0          61s
myapp-fd69cc685-qk646   1/1     Running   0          46s
myapp-fd69cc685-x5wxm   1/1     Running   0          4m16s

And if we stop our loop from generating requests, after some time we can look at our pods again:

NAME                    READY   STATUS    RESTARTS   AGE
myapp-fd69cc685-47m9x   1/1     Running   0          70m
myapp-fd69cc685-758m9   1/1     Running   0          4m30s
myapp-fd69cc685-9nfcl   1/1     Running   0          4m15s
myapp-fd69cc685-f9tc5   1/1     Running   0          4m15s
myapp-fd69cc685-h7ng4   1/1     Running   0          4m30s
myapp-fd69cc685-hkk8r   1/1     Running   0          70m
myapp-fd69cc685-kmb55   1/1     Running   0          4m15s
myapp-fd69cc685-p88vw   1/1     Running   0          4m30s
myapp-fd69cc685-qk646   1/1     Running   0          4m15s
myapp-fd69cc685-x5wxm   1/1     Running   0          7m45s
myapp-fd69cc685-qk646   1/1     Terminating   0          6m16s
myapp-fd69cc685-f9tc5   1/1     Terminating   0          6m16s
myapp-fd69cc685-kmb55   1/1     Terminating   0          6m16s
myapp-fd69cc685-758m9   1/1     Terminating   0          6m31s
myapp-fd69cc685-qk646   0/1     Terminating   0          6m16s
myapp-fd69cc685-kmb55   0/1     Terminating   0          6m16s
myapp-fd69cc685-f9tc5   0/1     Terminating   0          6m16s

We can also view the hpa resource as KEDA uses HPA under the hood:

kubectl get hpa -n default

Will show us the hpa that is being used:

NAME                               REFERENCE          TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-prometheus-scaledobject   Deployment/myapp   0/60 (avg)   2         10        2          29m

Up Next?

In the next post in our keda series, we will use the RabbitMQ scaler, so that we can scale applications based on rabbitmq queues.

Resources

Thank You

Thanks for reading, if you like my content, feel free to check out my website, and subscribe to my newsletter or follow me at @ruanbekker on Twitter.

Join my Newsletter?

Linktree: https://go.ruan.dev/links
Patreon: https://go.ruan.dev/patreon