Skip to content

HTTP Scaler

HTTP Scaler ensures that your service scales based on incoming HTTP requests.

Details

The HTTP scaler is designed specifically for ScaledObject resources to enable scaling based on incoming HTTP traffic, ScaledJob resource is not supported at the moment. It supports automatic scaling, including scaling to zero, without requiring Prometheus or other external components. The scaler monitors traffic using an interceptor proxy and routes traffic accordingly, caching incoming requests when necessary. Additionally, the scaler automatically configures ingress objects for the specified workload.

The scaler supports multiple ingress implementations including Gateway API, Amazon ALB, and Istio. This allows flexibility in how the traffic is managed and monitored.

By using this scaler, users can define specific hosts and path prefixes that should be monitored for traffic. The scaler uses metrics such as request rate or concurrency to determine the scaling needs of the application, ensuring optimal performance and resource utilization.

With automatic configuration of ingress objects, the HTTP scaler simplifies the setup process, allowing for seamless integration with existing infrastructure and workloads. This makes it an ideal choice for applications that need to scale based on real-time HTTP traffic.

Trigger Specification

This specification describes the kedify-http trigger that scales workloads based on incoming HTTP traffic.

Here is an example of trigger configuration using the HTTP scaler:

triggers:
- type: kedify-http
metadata:
hosts: www.my-app.com
pathPrefixes: '/'
service: http-demo-service
port: '8080'
scalingMetric: requestRate # or concurrency
targetValue: '10'
granularity: '1s'
window: '1m0s'
externalProxyMetricKey: 'my_app_com'
trafficAutowire: 'httproute,ingress,virtualservice'
healthcheckPath: '/healthz'
healthcheckResponse: 'passthrough' # or 'static'
tlsSecretName: tls-http-demo-service

Parameter list:

  • hosts - A comma-separated list of hosts that the scaler will monitor (eg. www.my-app.com,www.foo.bar)
  • pathPrefixes - A comma-separated list of path prefixes that the scaler will monitor. (eg. /foo,/bar, Optional)
  • service - The name of a Kuberentes ervice which is defined for our workload specified in the ScaledObject.spec.scaleTargetRef. It is a service to which the traffic should be routed.
  • port - The port on which the above mentioned Kubernetes Service is listening.
  • scalingMetric - The metric used for scaling, which can be either requestRate or concurrency.
  • targetValue - The target value for the scaling metric. When the incoming traffic meets or exceeds this value, KEDA will scale out the deployment. (Default: 100)
  • granularity - The granularity at which the request rate is measured. For example, “1s” means one second. (Only for requestRate, Default: 1s)
  • window - The window over which the request rate is averaged. For example, “1m0s” means one minute. (Only for requestRate, Default: 1m)
  • externalProxyMetricKey - Matching external metric name, used for aggregating metrics from external sources. (eg. concrete cluster_name for Envoy, Optional)
  • trafficAutowire - This setting configures traffic autowiring of ingress resources. Value false disables the autowiring completely. To enable autowiring only for a specific ingress classes, a comma separated list of specific ingress configurations can be used httproute,ingress,virtualservice. (See Traffic Autowiring for more details, Optional)
  • healthcheckPath - Healthcheck path on the scaled application to respond to when application is scaled to 0. (See Scaled Application Healthcheck Configuration for more details, Optional)
  • healthcheckResponse - The respond mode to use when healthcheck response feature is used, allowed values are passthrough or static, This value can be set only if healthcheckPath is specified. (See Scaled Application Healthcheck Configuration for more details, Default: passthrough Optional)
  • tlsSecretName: Reference to a Secret containing the TLS certificate and key under cert.tls, key.tls for having the traffic encrypted end-to-end. Not necessary if cluster internal traffic is plaintext, e.g. when using TLS termination at the ingress gateway. (Optional)

Example ScaledObject with HTTP trigger

Here is a full example of a scaled object definition using the HTTP trigger:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: http-demo-scaledobject
labels:
deploymentName: http-demo-deployment
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: http-demo-deployment
cooldownPeriod: 5
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: kedify-http
metadata:
hosts: www.my-app.com
pathPrefixes: '/'
service: http-demo-service
port: '8080'
scalingMetric: requestRate
targetValue: '10'
granularity: '1s'
window: '1m0s'

Note: Ensure that the hosts, pathPrefixes, service, and port parameters are correctly configured to match the application’s routing requirements.

Traffic Autowiring:

Kedify automatically re-wires ingress resources for the following implementations:

  • Ingress
  • Gateway API
  • Istio

In a typical Kubernetes setup, the networking configuration is structured as follows:

  1. Ingress: This resource manages external access to services within a Kubernetes cluster, typically via HTTP or HTTPS. It provides load balancing, SSL termination, and name-based virtual hosting.
  2. Service: This resource defines a logical set of Pods and a policy by which to access them. Services enable the decoupling of work definitions from Pods.
  3. Deployment: This resource provides declarative updates for Pods and ReplicaSets. It defines the desired state for application deployment, scaling, and updates.

The standard flow of traffic in Kubernetes is:

Ingress -> Service -> Deployment

To enable the automatic scaling of applications based on incoming HTTP traffic, Kedify introduces additional components into this flow:

  1. kedify-proxy: An Envoy-based proxy that routes traffic and collects metrics for scaling decisions.
  2. HTTP Add-on Interceptor: This component ensures that the requests are appropriately routed and cached when the app is scaled to zero.

With Kedify, the traffic flow is modified to include these additional components:

Ingress -> kedify-proxy -> Service -> Deployment

Kedify handles the automatic re-wiring of Ingress resources for specific implementations (e.g., Ingress, Gateway API, Istio). This ensures that the incoming traffic is routed through the kedify-proxy and the interceptor, allowing Kedify to monitor and scale based on real-time HTTP traffic.

For finer-grained control over what types of ingress resources are autowired, you can use the same property, where the value is a comma-separated list of resources. Those resources that are present will be autowired.

triggers:
- type: kedify-http
metadata:
trafficAutowire: 'httproute,ingress,virtualservice'

Autowiring Fallback

In case of any issues with the control plane (kedify-proxy or interceptor), Kedify has a built-in fallback mechanism. This mechanism automatically rewires the traffic back to the original flow:

Ingress -> Service -> Deployment

This fallback mechanism ensures that there are no outages and the application remains accessible. By default, if Kedify detects problems with the control plane for longer than 5 seconds, the traffic is rewired to bypass the kedify-proxy and interceptor. This duration can be configured using the environment variable HTTP_HEALTHCHECK_DEBOUNCER_SECONDS on the Kedify Agent deployment.

Not setting this property keeps the automatic configuration enabled for this particular ScaledObject as a default behavior.

Disabling Traffic Autowiring

To disable the traffic autowiring, you can specify the following trigger setting in your ScaledObject:

triggers:
- type: kedify-http
metadata:
trafficAutowire: 'false'

In this case user needs to manually wire the networking traffic. Autowiring Fallback doesn’t work for this scenario.

Metrics Aggregation

Kedify HTTP Scaler uses kedify-proxy (Envoy) to route traffic and get metrics for applications to improve reliability and performance. This setup prevents situations where the interceptor may become a bottleneck. Standard reverse proxies such as Envoy, nginx, or HAProxy are better equipped to handle such conditions. To address this, it is possible to offload all or part of the network traffic from the interceptor to an off-the-shelf reverse proxy. Currently, there is native support for Envoy within the interceptor, other reverse proxy solutions may require additional configuration.

The kedify-proxy is automatically deployed in every namespace in the cluster that contains at least one ScaledObject with kedify-http trigger and the traffic is correctly autowired as described in Traffic Autowiring.

Scaled Application Healthcheck Configuration

It’s common practice to configure healthchecks for applications to exclude unhealthy replicas from loadbalancing requests. But this defeats the purpose of scaling to 0 based on HTTP traffic because healthchecks generate that HTTP traffic which would scale the application up.

Kedify can instruct interceptor to respond to healthchecks on behalf of the scaled application instead of proxying this check to the application and triggering a scale out action. We can define healthcheck path on the scaled application and also optional response mode. For static response mode the interceptor will always respond, for passthrough (the default mode if not specified) the interceptor will respond only if the application is scaled to 0, otherwise, it proxies the request on the same path to the application.

triggers:
- type: kedify-http
metadata:
healthcheckPath: '/healthz'
healthcheckResponse: 'passthrough' # or 'static'

The requests for preconfigured healthcheck path are excluded from the metrics counter stats.

Example ScaledObject with Healthcheck Configuration

Following configuration instructs interceptor to respond to requests for www.my-app.com/healthz when scaled to 0:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: http-demo-scaledobject
labels:
deploymentName: http-demo-deployment
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: http-demo-deployment
cooldownPeriod: 5
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: kedify-http
metadata:
hosts: www.my-app.com
pathPrefixes: '/'
service: http-demo-service
port: '8080'
scalingMetric: requestRate
targetValue: '10'
healthcheckPath: '/healthz'
healthcheckResponse: 'passthrough'