HTTP Scaler
HTTP Scaler ensures that your service scales based on incoming HTTP requests.
Details
The HTTP scaler is designed specifically for ScaledObject
resources to enable scaling based on incoming HTTP traffic; ScaledJob
resource is not supported at the moment. It supports automatic scaling, including scaling to zero, without requiring Prometheus or other external components. The scaler monitors traffic using an interceptor proxy and routes traffic accordingly, caching incoming requests when necessary. Additionally, the scaler automatically configures ingress objects for the specified workload.
The scaler supports multiple ingress implementations including Gateway API, Amazon ALB, Istio, and OpenShift Routes, allowing flexibility in managing and monitoring traffic.
Using this scaler, users can define specific hosts and path prefixes to be monitored for traffic. The scaler uses metrics such as request rate or concurrency to determine the scaling needs of the application, ensuring optimal performance and resource utilization.
With automatic configuration of ingress objects, the HTTP scaler simplifies the setup process, allowing for seamless integration with existing infrastructure and workloads. This makes it an ideal choice for applications that need to scale based on real-time HTTP traffic.
Trigger Specification
This specification describes the kedify-http
trigger, which scales workloads based on incoming HTTP traffic.
Here is an example of trigger configuration using the HTTP scaler:
Parameter list:
hosts
: Comma-separated list of hosts to monitor (e.g.,www.my-app.com,www.foo.bar
).pathPrefixes
: Comma-separated list of path prefixes to monitor (e.g.,/foo,/bar
, Optional).service
: Name of the Kubernetes service for the workload specified inScaledObject.spec.scaleTargetRef
, where traffic should be routed.fallbackService
: Name of the Kubernetes service used as a fallback along withservice
autowiring.port
: Port on which the Kubernetes Service is listening. Only one ofport
orportName
can be set.portName
: Reference to theport
by its name. Only one ofport
orportName
can be set.scalingMetric
: Metric used for scaling, eitherrequestRate
orconcurrency
.targetValue
: Target value for the scaling metric; KEDA scales out when traffic meets or exceeds this value. (Default:100
)granularity
: Granularity at which the request rate is measured (e.g., “1s” for one second). (Only forrequestRate
, Default:1s
)window
: Window over which the request rate is averaged (e.g., “1m0s” for one minute). (Only forrequestRate
, Default:1m
)externalProxyMetricKey
: Metric name used for aggregating external source metrics (e.g.,cluster_name
for Envoy, Optional).trafficAutowire
: Configures traffic autowiring of ingress resources. Settingfalse
disables autowiring; to enable only specific ingress classes, use a comma-separated list (e.g.,httproute,ingress,virtualservice,route
orservice
). (See Traffic Autowiring for more details, Optional)healthcheckPath
: Healthcheck path on the scaled application for responses when scaled to zero. (See Scaled Application Healthcheck Configuration for more details, Optional)healthcheckResponse
: Response mode for healthchecks, allowed values arepassthrough
orstatic
. Only set ifhealthcheckPath
is specified. (Default:passthrough
, Optional)tlsSecretName
: Reference to aSecret
containing the TLS certificate and key undercert.tls
,key.tls
. Not necessary if cluster internal traffic is plaintext (Optional).
Example ScaledObject with HTTP trigger
Here is a full example of a scaled object definition using the HTTP trigger:
Note: Ensure that
hosts
,pathPrefixes
,service
, andport
parameters match the application’s routing requirements.
Traffic Autowiring:
Kedify automatically re-wires ingress resources for the following implementations:
In a typical Kubernetes setup, the networking configuration is structured as follows:
Ingress -> Service -> Deployment
To enable automatic scaling based on incoming HTTP traffic, Kedify introduces additional components:
- kedify-proxy: An Envoy-based proxy that routes traffic and collects metrics for scaling.
- HTTP Add-on Interceptor: Ensures requests are routed and cached when the app is scaled to zero.
With Kedify, the traffic flow includes these additional components:
Ingress -> kedify-proxy -> Service -> Deployment
For finer control over ingress resources autowiring, use trafficAutowire
with a comma-separated list of resources to be autowired, including route
for OpenShift Routes.
Autowiring Fallback
In case of control plane issues with kedify-proxy
or interceptor
, Kedify rewires traffic back to the original flow:
Ingress -> Service -> Deployment
This fallback avoids outages and keeps the application accessible. By default, traffic is rewired if Kedify detects control plane issues for over 5 seconds. This duration can be configured using HTTP_HEALTHCHECK_DEBOUNCER_SECONDS
on the Kedify Agent deployment.
Disabling Traffic Autowiring
To disable traffic autowiring, specify:
In this case, users must manually wire the networking traffic. Note that Autowiring Fallback does not apply here.
Service Autowiring
For applications that are used only within the cluster and exposed only as a Service but would like to benefit from HTTP traffic autoscaling along with fallback, there is also service level autowiring. Configuring service
as the trafficAutowire
option excludes setting any other trafficAutowire
options because it effectively replaces all of them. Kedify Agent will wire the traffic by managing Kubernetes Endpoints belonging to the Services defined in service
and fallbackService
in the trigger metadata.
This type of traffic autowiring brings two more requirements on the application and autoscaling manifests:
- ScaledObject must define
fallbackService
: because the Kedify Agent uses it for injecting Endpoints to theservice
defined in the trigger metadata andkedify-proxy
for routing. - Application service must NOT have selector defined: the Kubernetes control plane manages Endpoints for Services with selectors, which would collide with autowire feature. The
service
defined in the trigger metadata must be without selector while thefallbackService
should carry the original selector you’d define on theservice
if it wasn’t autowired.
Kedify Proxy
The Kedify HTTP Scaler uses the kedify-proxy
(Envoy) to route traffic and collect metrics for applications, enhancing reliability and performance. This proxy setup helps prevent potential bottlenecks in the interceptor
. Currently, Envoy is the only natively supported proxy for the HTTP Scaler; other reverse proxy solutions may require additional configuration.
Deployment Options for Kedify Proxy
There are two main deployment configurations for kedify-proxy
: Namespace-Level and Cluster-Wide.
-
Namespace-Level Deployment (Default): By default,
kedify-proxy
is deployed in each namespace that contains at least oneScaledObject
using thekedify-http
trigger. This approach ensures that traffic routing and metric collection are confined within the namespace where theScaledObject
is defined, providing isolation and control. -
Cluster-Wide Deployment (Optional): For environments where Istio’s VirtualService is used (currently the only supported configuration),
kedify-proxy
can be deployed cluster-wide. In this setup,kedify-proxy
is deployed in the KEDA installation namespace and shared among allScaledObjects
across namespaces. This configuration allows centralized traffic routing and scaling across all namespaces in the cluster.- To enable cluster-wide deployment, set the environment variable
KEDIFY_PROXY_CLUSTER_WIDE
totrue
on the Kedify Agent. This will configure the Kedify Agent to deploy a single instance ofkedify-proxy
for the entire cluster, located in the KEDA installation namespace. For more details, refer to the Kedify Agent documentation.
- To enable cluster-wide deployment, set the environment variable
Note: The cluster-wide setup for
kedify-proxy
is only compatible with Istio’s VirtualService. Other types of ingress configurations are not supported in this setup.
Configuring Kedify Proxy Replica Count
The kedify-proxy
deployment has a default replica count of 1, which can be adjusted to meet specific performance or redundancy requirements within a namespace or across the cluster.
To configure a different default replica count globally, set the environment variable KEDIFY_PROXY_DEFAULT_REPLICA_COUNT
to a valid integer N
on the Kedify Agent:
This setting applies to each kedify-proxy
deployment, adjusting the number of proxy replicas for improved scalability or availability as needed.
Kedify Proxy Traffic Flow
When using Kedify, the traffic flow in Kubernetes is enhanced to include additional components for real-time monitoring and scaling based on HTTP traffic. Typically, Kubernetes follows this traffic pattern:
Ingress -> Service -> Deployment
With Kedify, the traffic flow includes kedify-proxy
as an intermediary to monitor and intercept traffic before it reaches the service:
Ingress -> kedify-proxy -> Service -> Deployment
The kedify-proxy
intercepts, routes, and caches HTTP requests when necessary. This routing allows the scaler to collect traffic metrics and adjust replica counts based on real-time demands. For more details on autowiring and traffic routing configurations, refer to the Traffic Autowiring section.
Kedify Proxy Environment Variables Summary
Here’s a summary of the key environment variables used for configuring kedify-proxy
:
-
KEDIFY_PROXY_CLUSTER_WIDE
: Enables the cluster-wide deployment ofkedify-proxy
in the KEDA installation namespace, shared across all namespaces in the cluster. This variable should be set totrue
on the Kedify Agent for a cluster-wide configuration. This setup is only compatible with Istio’s VirtualService. For more information, see the Kedify Agent documentation. -
KEDIFY_PROXY_DEFAULT_REPLICA_COUNT
: Sets the default number of replicas for eachkedify-proxy
deployment. This variable should be set to a valid integerN
on the Kedify Agent to adjust the global replica count as per performance or availability requirements. -
KEDIFY_PROXY_LOG_FORMAT
: Environment variable defined on the Kedify Agent specifies the log format for thekedify-proxy
fleet. Supports two options,plaintext
andjson
, with default beingplaintext
.
Scaled Application Healthcheck Configuration
Configuring healthchecks for applications typically excludes unhealthy replicas from load balancing. However, this conflicts with scaling to zero, as healthchecks generate HTTP traffic, triggering scale-up actions.
Kedify’s interceptor
can respond to healthchecks on behalf of the scaled application instead of proxying the check to the application and causing a scale-out. Healthcheck path and response mode (default passthrough
, or static
) can be defined for the scaled application. Passthrough
mode allows the interceptor to respond only if the application is scaled to zero; otherwise, it proxies the request.
Preconfigured healthcheck paths are excluded from metric counting.
Example ScaledObject with Healthcheck Configuration
The following configuration instructs the interceptor to respond to requests for www.my-app.com/healthz
when scaled to 0: