Skip to content

HTTP Scaler

HTTP Scaler ensures that your service scales based on incoming HTTP requests.

Details

The HTTP scaler is designed specifically for ScaledObject resources to enable scaling based on incoming HTTP traffic; ScaledJob resource is not supported at the moment. It supports automatic scaling, including scaling to zero, without requiring Prometheus or other external components. The scaler monitors traffic using an interceptor proxy and routes traffic accordingly, caching incoming requests when necessary. Additionally, the scaler automatically configures ingress objects for the specified workload.

The scaler supports multiple ingress implementations including Gateway API, Amazon ALB, Istio, and OpenShift Routes, allowing flexibility in managing and monitoring traffic.

Using this scaler, users can define specific hosts and path prefixes to be monitored for traffic. The scaler uses metrics such as request rate or concurrency to determine the scaling needs of the application, ensuring optimal performance and resource utilization.

With automatic configuration of ingress objects, the HTTP scaler simplifies the setup process, allowing for seamless integration with existing infrastructure and workloads. This makes it an ideal choice for applications that need to scale based on real-time HTTP traffic.

Trigger Specification

This specification describes the kedify-http trigger, which scales workloads based on incoming HTTP traffic.

Here is an example of trigger configuration using the HTTP scaler:

triggers:
- type: kedify-http
metadata:
hosts: www.my-app.com
pathPrefixes: '/'
service: http-demo-service
fallbackService: http-demo-service-fallback # used only for 'service' traffic autowiring
port: '8080' # exclusive with 'portName'
portName: 'http' # exclusive with 'port'
scalingMetric: requestRate # or concurrency
targetValue: '10'
granularity: '1s'
window: '1m0s'
externalProxyMetricKey: 'my_app_com'
trafficAutowire: 'httproute,ingress,virtualservice,route,service'
healthcheckPath: '/healthz'
healthcheckResponse: 'passthrough' # or 'static'
tlsSecretName: tls-http-demo-service

Parameter list:

  • hosts: Comma-separated list of hosts to monitor (e.g., www.my-app.com,www.foo.bar).
  • pathPrefixes: Comma-separated list of path prefixes to monitor (e.g., /foo,/bar, Optional).
  • service: Name of the Kubernetes service for the workload specified in ScaledObject.spec.scaleTargetRef, where traffic should be routed.
  • fallbackService: Name of the Kubernetes service used as a fallback along with service autowiring.
  • port: Port on which the Kubernetes Service is listening. Only one of port or portName can be set.
  • portName: Reference to the port by its name. Only one of port or portName can be set.
  • scalingMetric: Metric used for scaling, either requestRate or concurrency.
  • targetValue: Target value for the scaling metric; KEDA scales out when traffic meets or exceeds this value. (Default: 100)
  • granularity: Granularity at which the request rate is measured (e.g., “1s” for one second). (Only for requestRate, Default: 1s)
  • window: Window over which the request rate is averaged (e.g., “1m0s” for one minute). (Only for requestRate, Default: 1m)
  • externalProxyMetricKey: Metric name used for aggregating external source metrics (e.g., cluster_name for Envoy, Optional).
  • trafficAutowire: Configures traffic autowiring of ingress resources. Setting false disables autowiring; to enable only specific ingress classes, use a comma-separated list (e.g., httproute,ingress,virtualservice,route or service). (See Traffic Autowiring for more details, Optional)
  • healthcheckPath: Healthcheck path on the scaled application for responses when scaled to zero. (See Scaled Application Healthcheck Configuration for more details, Optional)
  • healthcheckResponse: Response mode for healthchecks, allowed values are passthrough or static. Only set if healthcheckPath is specified. (Default: passthrough, Optional)
  • tlsSecretName: Reference to a Secret containing the TLS certificate and key under cert.tls, key.tls. Not necessary if cluster internal traffic is plaintext (Optional).

Example ScaledObject with HTTP trigger

Here is a full example of a scaled object definition using the HTTP trigger:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: http-demo-scaledobject
labels:
deploymentName: http-demo-deployment
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: http-demo-deployment
cooldownPeriod: 5
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: kedify-http
metadata:
hosts: www.my-app.com
pathPrefixes: '/'
service: http-demo-service
port: '8080'
scalingMetric: requestRate
targetValue: '10'
granularity: '1s'
window: '1m0s'

Note: Ensure that hosts, pathPrefixes, service, and port parameters match the application’s routing requirements.

Traffic Autowiring:

Kedify automatically re-wires ingress resources for the following implementations:

In a typical Kubernetes setup, the networking configuration is structured as follows:

Ingress -> Service -> Deployment

To enable automatic scaling based on incoming HTTP traffic, Kedify introduces additional components:

  1. kedify-proxy: An Envoy-based proxy that routes traffic and collects metrics for scaling.
  2. HTTP Add-on Interceptor: Ensures requests are routed and cached when the app is scaled to zero.

With Kedify, the traffic flow includes these additional components:

Ingress -> kedify-proxy -> Service -> Deployment

For finer control over ingress resources autowiring, use trafficAutowire with a comma-separated list of resources to be autowired, including route for OpenShift Routes.

triggers:
- type: kedify-http
metadata:
trafficAutowire: 'httproute,ingress,virtualservice,route'

Autowiring Fallback

In case of control plane issues with kedify-proxy or interceptor, Kedify rewires traffic back to the original flow:

Ingress -> Service -> Deployment

This fallback avoids outages and keeps the application accessible. By default, traffic is rewired if Kedify detects control plane issues for over 5 seconds. This duration can be configured using HTTP_HEALTHCHECK_DEBOUNCER_SECONDS on the Kedify Agent deployment.

Disabling Traffic Autowiring

To disable traffic autowiring, specify:

triggers:
- type: kedify-http
metadata:
trafficAutowire: 'false'

In this case, users must manually wire the networking traffic. Note that Autowiring Fallback does not apply here.

Service Autowiring

For applications that are used only within the cluster and exposed only as a Service but would like to benefit from HTTP traffic autoscaling along with fallback, there is also service level autowiring. Configuring service as the trafficAutowire option excludes setting any other trafficAutowire options because it effectively replaces all of them. Kedify Agent will wire the traffic by managing Kubernetes Endpoints belonging to the Services defined in service and fallbackService in the trigger metadata.

This type of traffic autowiring brings two more requirements on the application and autoscaling manifests:

  1. ScaledObject must define fallbackService: because the Kedify Agent uses it for injecting Endpoints to the service defined in the trigger metadata and kedify-proxy for routing.
  2. Application service must NOT have selector defined: the Kubernetes control plane manages Endpoints for Services with selectors, which would collide with autowire feature. The service defined in the trigger metadata must be without selector while the fallbackService should carry the original selector you’d define on the service if it wasn’t autowired.

Kedify Proxy

The Kedify HTTP Scaler uses the kedify-proxy (Envoy) to route traffic and collect metrics for applications, enhancing reliability and performance. This proxy setup helps prevent potential bottlenecks in the interceptor. Currently, Envoy is the only natively supported proxy for the HTTP Scaler; other reverse proxy solutions may require additional configuration.

Deployment Options for Kedify Proxy

There are two main deployment configurations for kedify-proxy: Namespace-Level and Cluster-Wide.

  1. Namespace-Level Deployment (Default): By default, kedify-proxy is deployed in each namespace that contains at least one ScaledObject using the kedify-http trigger. This approach ensures that traffic routing and metric collection are confined within the namespace where the ScaledObject is defined, providing isolation and control.

  2. Cluster-Wide Deployment (Optional): For environments where Istio’s VirtualService is used (currently the only supported configuration), kedify-proxy can be deployed cluster-wide. In this setup, kedify-proxy is deployed in the KEDA installation namespace and shared among all ScaledObjects across namespaces. This configuration allows centralized traffic routing and scaling across all namespaces in the cluster.

    • To enable cluster-wide deployment, set the environment variable KEDIFY_PROXY_CLUSTER_WIDE to true on the Kedify Agent. This will configure the Kedify Agent to deploy a single instance of kedify-proxy for the entire cluster, located in the KEDA installation namespace. For more details, refer to the Kedify Agent documentation.

Note: The cluster-wide setup for kedify-proxy is only compatible with Istio’s VirtualService. Other types of ingress configurations are not supported in this setup.

Configuring Kedify Proxy Replica Count

The kedify-proxy deployment has a default replica count of 1, which can be adjusted to meet specific performance or redundancy requirements within a namespace or across the cluster.

To configure a different default replica count globally, set the environment variable KEDIFY_PROXY_DEFAULT_REPLICA_COUNT to a valid integer N on the Kedify Agent:

env:
- name: KEDIFY_PROXY_DEFAULT_REPLICA_COUNT
value: 'N'

This setting applies to each kedify-proxy deployment, adjusting the number of proxy replicas for improved scalability or availability as needed.

Kedify Proxy Traffic Flow

When using Kedify, the traffic flow in Kubernetes is enhanced to include additional components for real-time monitoring and scaling based on HTTP traffic. Typically, Kubernetes follows this traffic pattern:

Ingress -> Service -> Deployment

With Kedify, the traffic flow includes kedify-proxy as an intermediary to monitor and intercept traffic before it reaches the service:

Ingress -> kedify-proxy -> Service -> Deployment

The kedify-proxy intercepts, routes, and caches HTTP requests when necessary. This routing allows the scaler to collect traffic metrics and adjust replica counts based on real-time demands. For more details on autowiring and traffic routing configurations, refer to the Traffic Autowiring section.

Kedify Proxy Environment Variables Summary

Here’s a summary of the key environment variables used for configuring kedify-proxy:

  • KEDIFY_PROXY_CLUSTER_WIDE: Enables the cluster-wide deployment of kedify-proxy in the KEDA installation namespace, shared across all namespaces in the cluster. This variable should be set to true on the Kedify Agent for a cluster-wide configuration. This setup is only compatible with Istio’s VirtualService. For more information, see the Kedify Agent documentation.

  • KEDIFY_PROXY_DEFAULT_REPLICA_COUNT: Sets the default number of replicas for each kedify-proxy deployment. This variable should be set to a valid integer N on the Kedify Agent to adjust the global replica count as per performance or availability requirements.

  • KEDIFY_PROXY_LOG_FORMAT: Environment variable defined on the Kedify Agent specifies the log format for the kedify-proxy fleet. Supports two options, plaintext and json, with default being plaintext.

Scaled Application Healthcheck Configuration

Configuring healthchecks for applications typically excludes unhealthy replicas from load balancing. However, this conflicts with scaling to zero, as healthchecks generate HTTP traffic, triggering scale-up actions.

Kedify’s interceptor can respond to healthchecks on behalf of the scaled application instead of proxying the check to the application and causing a scale-out. Healthcheck path and response mode (default passthrough, or static) can be defined for the scaled application. Passthrough mode allows the interceptor to respond only if the application is scaled to zero; otherwise, it proxies the request.

triggers:
- type: kedify-http
metadata:
healthcheckPath: '/healthz'
healthcheckResponse: 'passthrough' # or 'static'

Preconfigured healthcheck paths are excluded from metric counting.

Example ScaledObject with Healthcheck Configuration

The following configuration instructs the interceptor to respond to requests for www.my-app.com/healthz when scaled to 0:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: http-demo-scaledobject
labels:
deploymentName: http-demo-deployment
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: http-demo-deployment
cooldownPeriod: 5
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: kedify-http
metadata:
hosts: www.my-app.com
pathPrefixes: '/'
service: http-demo-service
port: '8080'
scalingMetric: requestRate
targetValue: '10'
healthcheckPath: '/healthz'
healthcheckResponse: 'passthrough'