Envoy Configuration for Kedify Proxy
At the core of kedify-http
scaler is kedify-proxy
, which forms a fleet of Envoy proxies. The fleet is configured over xDS control plane, implemented as part of the http-add-on interceptor
component.
There are two parts of envoy configuration that support override of the default configuration, both are set as values in the kedify-agent
helm chart:
cluster
- chart, envoy optionsroute
- chart, envoy options
Retry Configuration on Error
By default, kedify-proxy
will not retry requests that fail with any error code and will return the error code to the client. With route
configuration, you can enable retries for specific error codes. For example, to retry on 5xx errors, you can set the following in your kedify-agent
values:
agent: kedifyProxy: globalEnvoyConfigs: route: retry_policy: retry_on: 5xx # any internal or external 5xx error num_retries: 5 # retry up to 5 times retry_back_off: base_interval: 1s # first retry will be after 1 second max_interval: 10s # maximum interval between retries is 10 seconds with exponential backoff
This envoy config snippet means kedify-proxy
will retry requests that fail with 5xx errors up to 5 times, with an exponential backoff starting at 1 second and capping at 10 seconds.
Slow Start Configuration
The kedify-proxy
envoy uses ROUND_ROBIN
load balancing strategy by default. This means that all endpoints in the cluster are treated equally, no matter how long they have been up. This can lead to issues if some endpoints are slow to start, as they may receive high load of traffic before they are ready.
To mitigate this, you can enable slow start for the cluster by setting the slow_start_config
configuration in the cluster
section of your kedify-agent
values:
agent: kedifyProxy: globalEnvoyConfigs: cluster: lb_policy: ROUND_ROBIN round_robin_lb_config: slow_start_config: slow_start_window: 60s # slow start window will take effect for 60 seconds, after that it's ROUND_ROBIN min_weight_percent: value: 0.1 # as little as 0.1% of the traffic can be sent to the new endpoint to warm it up aggression: default_value: 0.5 # pace of traffic increase during the slow start window, lower number means slower in the beginning runtime_key: slow_start_aggression
This envoy config snippet will instruct kedify-proxy
to use a slow start window of 60 seconds, during which as little as 0.1% of the traffic will be sent to the new endpoint and it will gradually increase. After the slow start window, the endpoint will be treated equally with other endpoints in the service and receive its fair share of the traffic.
Preconnecting
Envoy supports preconnecting endpoints
in the cluster
, which can help reduce latency for requests by anticipating a request and establishing a TCP session before it’s needed.
agent: kedifyProxy: globalEnvoyConfigs: cluster: preconnect_policy: per_upstream_preconnect_ratio: 2 # preconnect 1 upstream for each request
Having per_upstream_preconnect_ratio
set to 2 means that for each request, kedify-proxy
will preconnect one upstream endpoint in the cluster instead of waiting for the new request to arrive and then establishing the connection.