HTTP Scaler
HTTP Scaler ensures that your service scales based on incoming HTTP requests.
Details
The HTTP scaler is designed specifically for ScaledObject
resources to enable scaling based on incoming HTTP traffic, ScaledJob
resource is not supported at the moment. It supports automatic scaling, including scaling to zero, without requiring Prometheus or other external components. The scaler monitors traffic using an interceptor proxy and routes traffic accordingly, caching incoming requests when necessary. Additionally, the scaler automatically configures ingress objects for the specified workload.
The scaler supports multiple ingress implementations including Gateway API, Amazon ALB, and Istio. This allows flexibility in how the traffic is managed and monitored.
By using this scaler, users can define specific hosts and path prefixes that should be monitored for traffic. The scaler uses metrics such as request rate or concurrency to determine the scaling needs of the application, ensuring optimal performance and resource utilization.
With automatic configuration of ingress objects, the HTTP scaler simplifies the setup process, allowing for seamless integration with existing infrastructure and workloads. This makes it an ideal choice for applications that need to scale based on real-time HTTP traffic.
Trigger Specification
This specification describes the kedify-http
trigger that scales workloads based on incoming HTTP traffic.
Here is an example of trigger configuration using the HTTP scaler:
Parameter list:
hosts
- A comma-separated list of hosts that the scaler will monitor (eg.www.my-app.com,www.foo.bar
)pathPrefixes
- A comma-separated list of path prefixes that the scaler will monitor. (eg./foo,/bar
, Optional)service
- The name of a Kuberentes ervice which is defined for our workload specified in theScaledObject.spec.scaleTargetRef
. It is a service to which the traffic should be routed.port
- The port on which the above mentioned Kubernetes Service is listening.scalingMetric
- The metric used for scaling, which can be eitherrequestRate
orconcurrency
.targetValue
- The target value for the scaling metric. When the incoming traffic meets or exceeds this value, KEDA will scale out the deployment. (Default:100
)granularity
- The granularity at which the request rate is measured. For example, “1s” means one second. (Only forrequestRate
, Default:1s
)window
- The window over which the request rate is averaged. For example, “1m0s” means one minute. (Only forrequestRate
, Default:1m
)externalProxyMetricKey
- Matching external metric name, used for aggregating metrics from external sources. (eg. concretecluster_name
for Envoy, Optional)trafficAutowire
- This setting configures traffic autowiring of ingress resources. Valuefalse
disables the autowiring completely. To enable autowiring only for a specific ingress classes, a comma separated list of specific ingress configurations can be usedhttproute,ingress,virtualservice
. (See Traffic Autowiring for more details, Optional)healthcheckPath
- Healthcheck path on the scaled application to respond to when application is scaled to 0. (See Scaled Application Healthcheck Configuration for more details, Optional)healthcheckResponse
- The respond mode to use when healthcheck response feature is used, allowed values arepassthrough
orstatic
, This value can be set only ifhealthcheckPath
is specified. (See Scaled Application Healthcheck Configuration for more details, Default:passthrough
Optional)tlsSecretName
: Reference to aSecret
containing the TLS certificate and key undercert.tls
,key.tls
for having the traffic encrypted end-to-end. Not necessary if cluster internal traffic is plaintext, e.g. when using TLS termination at the ingress gateway. (Optional)
Example ScaledObject with HTTP trigger
Here is a full example of a scaled object definition using the HTTP trigger:
Note: Ensure that the
hosts
,pathPrefixes
,service
, andport
parameters are correctly configured to match the application’s routing requirements.
Traffic Autowiring:
Kedify automatically re-wires ingress resources for the following implementations:
- Ingress
- Gateway API
- Istio
In a typical Kubernetes setup, the networking configuration is structured as follows:
- Ingress: This resource manages external access to services within a Kubernetes cluster, typically via HTTP or HTTPS. It provides load balancing, SSL termination, and name-based virtual hosting.
- Service: This resource defines a logical set of Pods and a policy by which to access them. Services enable the decoupling of work definitions from Pods.
- Deployment: This resource provides declarative updates for Pods and ReplicaSets. It defines the desired state for application deployment, scaling, and updates.
The standard flow of traffic in Kubernetes is:
Ingress -> Service -> Deployment
To enable the automatic scaling of applications based on incoming HTTP traffic, Kedify introduces additional components into this flow:
- kedify-proxy: An Envoy-based proxy that routes traffic and collects metrics for scaling decisions.
- HTTP Add-on Interceptor: This component ensures that the requests are appropriately routed and cached when the app is scaled to zero.
With Kedify, the traffic flow is modified to include these additional components:
Ingress -> kedify-proxy -> Service -> Deployment
Kedify handles the automatic re-wiring of Ingress resources for specific implementations (e.g., Ingress, Gateway API, Istio). This ensures that the incoming traffic is routed through the kedify-prox
y and the interceptor
, allowing Kedify to monitor and scale based on real-time HTTP traffic.
For finer-grained control over what types of ingress resources are autowired, you can use the same property, where the value is a comma-separated list of resources. Those resources that are present will be autowired.
Autowiring Fallback
In case of any issues with the control plane (kedify-proxy
or interceptor
), Kedify has a built-in fallback mechanism. This mechanism automatically rewires the traffic back to the original flow:
Ingress -> Service -> Deployment
This fallback mechanism ensures that there are no outages and the application remains accessible. By default, if Kedify detects problems with the control plane for longer than 5 seconds, the traffic is rewired to bypass the kedify-proxy
and interceptor
. This duration can be configured using the environment variable HTTP_HEALTHCHECK_DEBOUNCER_SECONDS
on the Kedify Agent deployment.
Not setting this property keeps the automatic configuration enabled for this particular ScaledObject
as a default behavior.
Disabling Traffic Autowiring
To disable the traffic autowiring, you can specify the following trigger setting in your ScaledObject
:
In this case user needs to manually wire the networking traffic. Autowiring Fallback doesn’t work for this scenario.
Metrics Aggregation
Kedify HTTP Scaler uses kedify-proxy
(Envoy) to route traffic and get metrics for applications to improve reliability and performance. This setup prevents situations where the interceptor
may become a bottleneck. Standard reverse proxies such as Envoy, nginx, or HAProxy are better equipped to handle such conditions. To address this, it is possible to offload all or part of the network traffic from the interceptor
to an off-the-shelf reverse proxy. Currently, there is native support for Envoy within the interceptor
, other reverse proxy solutions may require additional configuration.
The kedify-proxy
is automatically deployed in every namespace in the cluster that contains at least one ScaledObject with kedify-http
trigger and the traffic is correctly autowired as described in Traffic Autowiring.
Scaled Application Healthcheck Configuration
It’s common practice to configure healthchecks for applications to exclude unhealthy replicas from loadbalancing requests. But this defeats the purpose of scaling to 0 based on HTTP traffic because healthchecks generate that HTTP traffic which would scale the application up.
Kedify can instruct interceptor
to respond to healthchecks on behalf of the scaled application instead of proxying this check to the application and triggering a scale out action. We can define healthcheck path on the scaled application and also optional response mode.
For static
response mode the interceptor will always respond, for passthrough
(the default mode if not specified) the interceptor will respond only if the application is scaled to 0, otherwise, it proxies the request on the same path to the application.
The requests for preconfigured healthcheck path are excluded from the metrics counter stats.
Example ScaledObject with Healthcheck Configuration
Following configuration instructs interceptor to respond to requests for www.my-app.com/healthz
when scaled to 0: