Envoy Configuration for Kedify Proxy

At the core of kedify-http scaler is kedify-proxy, which forms a fleet of Envoy proxies. The fleet is configured over xDS control plane, implemented as part of the http-add-on interceptor component.

There are two parts of envoy configuration that support override of the default configuration, both are set as values in the kedify-agent helm chart:

cluster - chart, envoy options
route - chart, envoy options

Retry Configuration on Error

By default, kedify-proxy will not retry requests that fail with any error code and will return the error code to the client. With route configuration, you can enable retries for specific error codes. For example, to retry on 5xx errors, you can set the following in your kedify-agent values:

agent:
  kedifyProxy:
    globalEnvoyConfigs:
      route:
        retry_policy:
          retry_on: 5xx         # any internal or external 5xx error
          num_retries: 5        # retry up to 5 times
          retry_back_off:
            base_interval: 1s   # first retry will be after 1 second
            max_interval: 10s   # maximum interval between retries is 10 seconds with exponential backoff

This envoy config snippet means kedify-proxy will retry requests that fail with 5xx errors up to 5 times, with an exponential backoff starting at 1 second and capping at 10 seconds.

Slow Start Configuration

The kedify-proxy envoy uses ROUND_ROBIN load balancing strategy by default. This means that all endpoints in the cluster are treated equally, no matter how long they have been up. This can lead to issues if some endpoints are slow to start, as they may receive high load of traffic before they are ready. To mitigate this, you can enable slow start for the cluster by setting the slow_start_config configuration in the cluster section of your kedify-agent values:

agent:
  kedifyProxy:
    globalEnvoyConfigs:
      cluster:
        lb_policy: ROUND_ROBIN
        round_robin_lb_config:
          slow_start_config:
            slow_start_window: 60s   # slow start window will take effect for 60 seconds, after that it's ROUND_ROBIN
            min_weight_percent:
              value: 0.1             # as little as 0.1% of the traffic can be sent to the new endpoint to warm it up
            aggression:
              default_value: 0.5     # pace of traffic increase during the slow start window, lower number means slower in the beginning
              runtime_key: slow_start_aggression

This envoy config snippet will instruct kedify-proxy to use a slow start window of 60 seconds, during which as little as 0.1% of the traffic will be sent to the new endpoint and it will gradually increase. After the slow start window, the endpoint will be treated equally with other endpoints in the service and receive its fair share of the traffic.

Preconnecting

Envoy supports preconnecting endpoints in the cluster, which can help reduce latency for requests by anticipating a request and establishing a TCP session before it’s needed.

agent:
  kedifyProxy:
    globalEnvoyConfigs:
      cluster:
        preconnect_policy:
          per_upstream_preconnect_ratio: 2  # preconnect 1 upstream for each request

Having per_upstream_preconnect_ratio set to 2 means that for each request, kedify-proxy will preconnect one upstream endpoint in the cluster instead of waiting for the new request to arrive and then establishing the connection.