Skip to content

HTTP Scaling for In-Cluster Traffic (Kubernetes Service)

This guide demonstrates how to scale applications based on cluster-internal HTTP traffic. These applications do not expose external Ingress resources, relying instead on Kubernetes Services for traffic routing. You’ll deploy a sample application, a matching Services, and a KEDA ScaledObject, and see how Kedify uses its proxy to manage traffic and enable efficient, load-based scaling — including scale-to-zero when there’s no demand.

Architecture Overview

To properly enable autoscaling with Kedify, network traffic needs to be automatically rewired by its autowiring feature, which manages Kubernetes Endpoints for specified Services, as detailed in the traffic autowire documentation.

Configuring service as the trafficAutowire option excludes setting any other trafficAutowire options because it effectively replaces all of them. Therefore, kedify-agent will wire the traffic by managing Kubernetes Endpoints belonging to the Services defined in service and fallbackService in the trigger metadata.

This type of traffic autowiring brings two more requirements on the application and autoscaling manifests:

  1. ScaledObject must define fallbackService: because the kedify-agent uses it for injecting Endpoints to the service defined in the trigger metadata and kedify-proxy for routing.
  2. Application service must NOT have selector defined: the Kubernetes control plane manages Endpoints for Services with selectors, which would collide with autowire feature. The service defined in the trigger metadata must be without selector while the fallbackService should carry the original selector you’d define on the service if it wasn’t autowired.

Prerequisites

  • A running Kubernetes cluster (a local cluster or cloud-based EKS, GKS, etc).
  • The kubectl command line utility installed and accessible.
  • Connect your cluster in the Kedify Dashboard.

Step 1: Deploy Application

Deploy the following application to your cluster:

Terminal window
kubectl apply -f application.yaml

The whole application YAML:

application.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: application
spec:
replicas: 1
selector:
matchLabels:
app: application
template:
metadata:
labels:
app: application
spec:
containers:
- name: application
image: ghcr.io/kedify/sample-http-server:latest
imagePullPolicy: Always
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: RESPONSE_DELAY
value: '0.3'
---
apiVersion: v1
kind: Service
metadata:
name: application-service
spec:
ports:
- name: http
protocol: TCP
port: 8080
targetPort: http
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: application-service-fallback
spec:
ports:
- name: http
protocol: TCP
port: 8080
targetPort: http
type: ClusterIP
selector:
app: application
  • Deployment (application-service): This defines the application itself. It’s a simple Go-based HTTP server designed for Kubernetes. It listens for HTTP requests, responds (with a configurable delay via the RESPONSE_DELAY environment variable), and exposes metrics.
  • Service (application-service): This is the primary internal access point for the application within the cluster. Clients within the cluster target this Service’s ClusterIP and port 8080 to reach the application.
    • When used with Kedify’s trafficAutowire: service feature (configured in the ScaledObject below), this Service intentionally lacks a selector. Instead of Kubernetes automatically populating its Endpoints with Pod IPs, the kedify-agent takes control.
    • It dynamically updates the Endpoints for this Service to point either to the kedify-proxy (when the scaler is active and healthy) or to the application-service-fallback Service (when the scaler is inactive or unhealthy). This allows Kedify to intercept traffic for metric collection and scaling purposes.
  • Service (application-service-fallback): This is a standard Kubernetes Service that does use a selector (app: application-service) to directly target the application Pods managed by the Deployment.
    • It acts as a direct pathway to the application, bypassing the kedify-proxy.
    • The kedify-agent uses this Service's endpoint information to populate the main application-service’s Endpoints when the Kedify trigger enters a fallback state (e.g., due to metric collection issues) or when scaling is inactive. This ensures application availability even if the Kedify scaling mechanism encounters problems.

Step 2: Apply ScaledObject to Autoscale and Autowire

Apply the following ScaledObject:

Terminal window
kubectl apply -f scaledobject.yaml

The whole ScaledObject YAML:

scaledobject.yaml
kind: ScaledObject
apiVersion: keda.sh/v1alpha1
metadata:
name: application
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: application
cooldownPeriod: 5
minReplicaCount: 1
maxReplicaCount: 1
fallback:
failureThreshold: 2
replicas: 1
advanced:
restoreToOriginalReplicaCount: true
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 5
triggers:
- type: kedify-http
metadata:
hosts: application.keda
service: application-service
port: '8080'
scalingMetric: requestRate
targetValue: '1000'
granularity: 1s
window: 10s
trafficAutowire: service
fallbackService: application-service-fallback
  • type (kedify-http): Specifies the use of the Kedify HTTP scaler, which monitors HTTP traffic.
  • metadata.hosts (application-service.keda): The scaler monitors traffic directed to this hostname.
  • metadata.service (application-service): The Kubernetes Service name associated with the traffic path being monitored and whose Endpoints will be managed.
  • metadata.port (8080): The port on the specified service to monitor.
  • metadata.scalingMetric (requestRate): The metric used for scaling decisions is the rate of incoming requests.
  • metadata.targetValue (1000): Target value for the scaling metric; KEDA scales out when traffic meets or exceeds this value.
  • metadata.granularity (1s): The time unit for the targetValue (i.e., requests per second).
  • metadata.window (10s): Granularity at which the request rate is measured.
  • metadata.trafficAutowire (service): This enables Kedify’s service autowiring feature.
  • metadata.fallbackService (application-service-fallback): This specifies the name of the Service (application-service-fallback) that kedify-agent should point the main application-service Endpoints to when the kedify-http trigger is inactive, in a fallback state (due to errors), or scaling down to zero (if minReplicas allowed it).

You should see the ScaledObject in the Kedify Dashboard:

Kedify Dashboard With ScaledObject

Step 3: Test Autoscaling

First, let’s test that the application responds to the requests. Because this application is not exposed outside of the cluster, you can spin up a curl pod internally to easily use the Kubernetes cluster DNS. Because the ScaledObject has configured hosts: application.keda in the metadata, you will need to tell curl to add a matching host HTTP header to the request too.

Terminal window
kubectl run curl --image=curlimages/curl:latest --restart=Never --command -- /bin/sh -c "sleep infinity"
kubectl exec -it curl -- curl -I -H 'host: application.keda' http://application-service.default.svc.cluster.local:8080

If everything is working, you should see the following output in your terminal:

Terminal window
HTTP/1.1 200 OK
content-type: text/html
date: Wed, 16 Apr 2025 11:32:30 GMT
content-length: 320
x-envoy-upstream-service-time: 302
server: envoy

Now, we can test higher load:

Terminal window
kubectl run hey --image=ghcr.io/kedacore/tests-hey:latest --restart=Never --command -- /bin/sh -c "sleep infinity"
kubectl exec -it hey -- sh
./hey -n 10000 -c 150 -host "application.keda" http://application-service.default.svc.cluster.local:8080

After a while, you will see a response time histogram in the terminal:

Terminal window
Response time histogram:
0.300 [1] |
0.317 [8760] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.334 [962] |■■■■
0.350 [27] |
0.367 [0] |
0.383 [0] |
0.400 [8] |
0.417 [44] |
0.433 [86] |
0.450 [3] |
0.467 [9] |
In the Kedify Dashboard, you can also see the load:
![Kedify Dashboard ScaledObject Detail]('/assets/images/how-to/http-scaling-for-in-cluster-traffic/step-3-load.png')
## Next steps
You can check the whole documentation of [HTTP Scaler](/documentation/scalers/http-scaler).