Kedify OTEL Scaler

The Kedify OTEL Scaler enables scaling based on metrics ingested through the OpenTelemetry (OTEL) collector, integrating seamlessly with KEDA. This scaler allows KEDA to use OTEL metrics in real-time, supporting prompt scaling reactions for applications without requiring a Prometheus server.

Details

The kedify-otel scaler can be configured in ScaledObject and ScaledJob resources to enable scaling based on specific OTEL metrics. It supports PromQL-like metric queries, which the scaler evaluates against a short-term in-memory time-series database (TSDB). This setup enables efficient, trend-aware scaling decisions.

The scaler is based on the Kedify OTEL Add-on.

Architecture

OTel Scaler Architecture

Trigger Specification

This specification describes the kedify-otel trigger, which scales workloads based on metrics collected by OTEL.

Here is an example trigger configuration using the Kedify OTEL scaler:

triggers:
  - type: kedify-otel
    metadata:
      metricQuery: 'avg(http_server_request_count{app_id=nodeapp, method=GET, path=/v1.0/state/statestore})'
      targetValue: '5'
      clampMin: '0'
      clampMax: '10'
      operationOverTime: 'rate'
      scalerAddress: 'keda-otel-scaler.${kedaNs}.svc:4318' # optional - to overwrite auto-injected add-on URL

Parameter list:

metricQuery: Specifies the exact metric and its filters. See Metric Query Syntax for details.
targetValue: The desired target value for the selected metric, which will trigger scaling adjustments (e.g., 5).
clampMin (optional): Minimum bound for the scaler’s result, acting as a lower limit on replica counts.
clampMax (optional): Maximum bound for the scaler’s result, acting as an upper limit on replica counts.
operationOverTime (optional): Defines the time-series operation over the specified time window. See Operation Over Time for more details.
scalerAddress: The configurable gRPC endpoint where the OTEL scaler is running. If not set, Kedify will inject the correct value (e.g., keda-otel-scaler.${kedaNs}.svc:4318, Optional).

Metric Query Syntax

The metricQuery parameter in the OTEL Scaler specifies the exact metric to be monitored and is similar to a simplified PromQL query. It allows selecting a single metric and filtering based on labels. Optionally, an aggregation function can be used around the metric to perform basic calculations.

Basic Syntax: op(metricName{label1=val1, label2=val2})
- op is an optional aggregation function that can be one of sum, avg, min, or max.
- metricName refers to the specific metric being tracked.
- Labels can be included in the format {label1=val1, label2=val2} to filter the metric by specific dimensions.
- Note: val1 can be in quotes (e.g., "val1") but does not have to be.
Supported Aggregation Functions:
- {sum, avg, min, max}
- If an aggregation function is not specified, sum is used by default.
Examples:
- avg(http_requests_total{code=200,handler=targets,instance=example:8080,method=GET})
- up{instance="prod:8080"}
- foobar (single metric without filters or aggregation function)
Limitations:
- Only simple = operators are supported in label selectors. Advanced operators, such as != or =~, are not supported.
- Multiple metric names cannot be combined in a single query.
- No arithmetic operations are supported directly within the query.

Note: The OTEL collector can apply simple arithmetic to metrics using a processor, which allows further customization of metric data before it is passed to the scaler. For more details, refer to the OTEL Add-on README.

Operation Over Time

The operationOverTime parameter specifies how time-series data for a selected metric should be processed over a period of time. This enables the scaler to apply transformations such as calculating the rate of change or finding the average, minimum, or maximum values.

Available Options:

last_one: Returns the most recent metric value.
min: Returns the minimum value within the time window.
max: Returns the maximum value within the time window.
avg: Returns the average of the values within the time window.
rate: Calculates the rate of change over the time window, useful for metrics that represent counts.
count: Counts the total number of metric samples within the time window.

Example Behaviors:

Assuming metric values at times t1, t2, t3…:

Time Series	t1	t2	t3	t4	t5	t6	t7	Result (for `operationOverTime`)
last_one	3	2	1	6	3	2	3	3
min	3	2	1	6	3	2	3	1
max	3	2	1	6	3	2	3	6
avg	3	2	1	6	3	2	3	`round(20/7) = 3`
rate	1	2	3	4	5	6	7	1 (assuming measurements each second)
count	3	2	1	6	3	2	3	7

Example ScaledObject with Kedify OTEL Trigger

Here is a full example of a ScaledObject definition using the Kedify OTEL trigger, including advanced configurations for the horizontalPodAutoscalerConfig:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: otel-example
spec:
  scaleTargetRef:
    name: nodeapp
  triggers:
    - type: kedify-otel
      metadata:
        scalerAddress: 'keda-otel-scaler.default.svc:4318'
        metricQuery: 'avg(http_server_request_count{app_id=nodeapp, method=GET, path=/v1.0/state/statestore})'
        targetValue: '5'
        clampMin: '0'
        clampMax: '10'
        operationOverTime: 'rate'
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 10
        scaleUp:
          stabilizationWindowSeconds: 10

The OTEL Scaler evaluates metrics using a simple in-memory TSDB, enabling fast and responsive scaling reactions. This setup supports applications with high variability or latency-sensitive workloads, allowing them to scale dynamically without delays.