Skip to content

HTTP Scaling for Ingress-Based Applications

This guide demonstrates how to scale applications exposed through Kubernetes Ingress based on HTTP traffic. You’ll deploy a sample application with an Ingress resource, configure a ScaledObject, and see how Kedify automatically manages traffic routing for efficient load-based scaling—including scale-to-zero when there’s no demand.

Architecture Overview

For applications exposed via Ingress, Kedify automatically rewires traffic using its autowiring feature. When using the kedify-http scaler, traffic flows through:

Ingress -> kedify-proxy -> Service -> Deployment

The kedify-proxy intercepts traffic, collects metrics, and enables informed scaling decisions. When traffic increases, Kedify scales your application up; when traffic decreases, it scales down—even to zero if configured.

Prerequisites

  • A running Kubernetes cluster (local or cloud-based).
  • The kubectl command line utility installed and accessible.
  • Connect your cluster in the Kedify Dashboard.
  • Install hey to send load to a web application.

Step 1: Deploy Application and Ingress

Deploy the following application and Ingress to your cluster:

Terminal window
kubectl apply -f application.yaml

The whole application YAML:

application.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: application
spec:
replicas: 1
selector:
matchLabels:
app: application
template:
metadata:
labels:
app: application
spec:
containers:
- name: application
image: ghcr.io/kedify/sample-http-server:latest
imagePullPolicy: Always
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: RESPONSE_DELAY
value: '0.3'
---
apiVersion: v1
kind: Service
metadata:
name: application-service
spec:
ports:
- name: http
protocol: TCP
port: 8080
targetPort: http
selector:
app: application
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: application-ingress
spec:
rules:
- host: application.keda
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: application-service
port:
number: 8080
  • Deployment: Defines a simple Go-based HTTP server that listens for requests, responds with a configurable delay, and exposes metrics.
  • Service: Routes traffic to the application pods within the cluster.
  • Ingress: Exposes the application outside the cluster using the hostname application.keda.

Step 2: Apply ScaledObject to Autoscale

Now, apply the following ScaledObject:

Terminal window
kubectl apply -f scaledobject.yaml

The ScaledObject YAML:

scaledobject.yaml
kind: ScaledObject
apiVersion: keda.sh/v1alpha1
metadata:
name: application
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: application
cooldownPeriod: 5
minReplicaCount: 0
maxReplicaCount: 10
fallback:
failureThreshold: 2
replicas: 1
advanced:
restoreToOriginalReplicaCount: true
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 5
triggers:
- type: kedify-http
metadata:
hosts: application.keda
pathPrefixes: /
service: application-service
port: '8080'
scalingMetric: requestRate
targetValue: '1000'
granularity: 1s
window: 10s
trafficAutowire: ingress
  • type (kedify-http): Specifies the Kedify HTTP scaler for monitoring HTTP traffic.
  • metadata.hosts (application.keda): The hostname to monitor for traffic.
  • metadata.pathPrefixes (/): The path prefix to monitor.
  • metadata.service (application-service): The Kubernetes Service associated with the application.
  • metadata.port (8080): The port on the service to monitor.
  • metadata.scalingMetric (requestRate): The metric used for scaling decisions.
  • metadata.targetValue (1000): Target request rate; KEDA scales out when traffic meets or exceeds this value.
  • metadata.granularity (1s): The time unit for the targetValue (requests per second).
  • metadata.window (10s): Granularity at which the request rate is measured.
  • metadata.trafficAutowire (ingress): Enables Kedify’s ingress autowiring feature.

You should see the ScaledObject in the Kedify Dashboard:

Kedify Dashboard With ScaledObject

Step 3: Test Autoscaling

First, let’s verify that the application responds to requests:

Terminal window
# If testing locally with k3d (if testing on a remote cluster, use the Ingress IP or domain)
curl -I -H "Host: application.keda" http://localhost:9080

If everything is working, you should see a successful HTTP response:

Terminal window
HTTP/1.1 200 OK
content-type: text/html
date: Wed, 16 Apr 2025 11:32:30 GMT
content-length: 320
x-envoy-upstream-service-time: 302
server: envoy

Now, let’s test with higher load:

Terminal window
# If testing locally with k3d (if testing on a remote cluster, use the Ingress IP or domain)
hey -n 10000 -c 150 -host "application.keda" http://localhost:9080

After sending the load, you’ll see a response time histogram in the terminal:

Terminal window
Response time histogram:
0.301 [1] |
0.498 [9749] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.695 [0] |
0.892 [0] |
1.090 [0] |
1.287 [0] |
1.484 [0] |
1.681 [53] |
1.878 [0] |
2.075 [53] |
2.272 [44] |

In the Kedify Dashboard, you can also observe the traffic load and resulting scaling:

Kedify Dashboard ScaledObject Detail

Next steps

You can explore the complete documentation of the HTTP Scaler for more advanced configurations, including other ingress types like Gateway API, Istio VirtualService, or OpenShift Routes.