Kedify | OpenTelemetry

Use OpenTelemetry metrics to trigger autoscaling with Kedify and KEDA

Autoscaler for Kubernetes workloads based on real-time metrics from Prometheus, OpenTelemetry, and other compliant sources.

Book demo

Overview of OpenTelemetry Scaler in Kedify

The OpenTelemetry (OTEL) Scaler is designed to enable precise, data-driven scaling for Kubernetes workloads. Using OpenTelemetry, it can capture a wide range of metrics, allowing the Kedify Scaler to adjust resources dynamically and improve response times while conserving costs. It utilizes a push-based approach for metrics, providing significant advantages over traditional pull-based models like Prometheus.

Key Features

1

Real-Time Metric Collection:

Gathers metrics in real time, enabling timely scaling based on traffic demand.
2

Wide Metric Range:

Supports a variety of metrics such as request rates and concurrency for granular scaling configurations.
3

No Need for Prometheus Server:

This approach eliminates the need to deploy a Prometheus server, reducing infrastructure overhead and resource usage.
4

Faster Response Times:

With a push-based model, metrics are sent directly to KEDA, minimizing delays that are typical in scrape intervals, thus allowing faster scaling responses.
5

Flexible Integration Options:

OpenTelemetry’s support for multiple protocols and integrations enables streamlined observability setups across diverse environments, with minimal configuration.

Learn More

Documentation: Kedify OTEL scaler documentation.
Migrate from Prometheus Scaler: Follow our guide on migrating from Prometheus to OpenTelemetry scaler.
How To: Check out our how to use OpenTelemetry scaler guide.
GitHub Repository: Explore the source code and examples at Kedify OTEL add-on repository.
For more details about the scaler, see our blog post.

Featured Use Cases

Scenario:

Scale AI/ML training workloads dynamically based on metrics such as tokens per minute and GPU memory usage. A Prometheus server is not used to ensure real-time scaling that adapts to intensive computational loads.

OpenTelemetry Scaler Usage:

Scaling is determined by AI/ML-specific metrics, such as tokens per minute and GPU memory usage, allowing rapid adjustments to resources based on model training demands and without the overhead of a Prometheus setup.

KEDA Usage:

The ScaledObject is configured with kedify-otel triggers for high-frequency training metrics. For example, the metricQuery might specify "avg(model_training_tokens{model=my_model, job=training})" to adjust replicas based on real-time usage and performance, which is essential for latency-sensitive AI workloads.

Get Started

                      apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ai-training
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-training-service
  minReplicaCount: 1
  maxReplicaCount: 15
  triggers:
    - type: kedify-otel
      metadata:
        metricQuery: 'avg(model_training_tokens{model=my_model, job=training})'
        operationOverTime: 'rate'
        targetValue: '500'
    - type: kedify-otel
      metadata:
        metricQuery: 'avg(gpu_memory_usage{model=my_model, job=training})'
        targetValue: '800'

Scenario:

Automatically scale Dapr-enabled services in response to real-time metrics. This ensures responsive scaling making OpenTelemetry metrics a better fit than a Prometheus-based approach for Dapr services.

OpenTelemetry Scaler Usage:

Scaling is based on metrics provided by Dapr’s real-time service-to-service call data collected via OpenTelemetry, allowing a precise match between Dapr service load and Kubernetes replicas without delay.

KEDA Usage:

Configure the ScaledObject with the kedify-otel trigger, collecting data directly from Dapr’s OpenTelemetry metrics.The metricQuery can target service-specific call metrics such as "sum(dapr_runtime_service_invocation_req_sent_total{dst_app_id=myapp})" to adjust replicas according to the current workload without needing a Prometheus server.

Get Started

                      apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dapr-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: dapr-service
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
    - type: kedify-otel
      metadata:
        metricQuery: 'sum(dapr_runtime_service_invocation_req_sent_total{dst_app_id=myapp})'
        operationOverTime: 'rate'
        targetValue: '5'

Scenario:

Optimize the scaling of a recommendation service based on user demand captured by OpenTelemetry metrics. With numerous Prometheus-like metrics available, a faster, more responsive scaling mechanism is needed, so a Prometheus server is not ideal.

OpenTelemetry Scaler Usage:

Scaling is triggered by tracking real-time metrics such as average HTTP request rates captured by OpenTelemetry, ensuring that the recommendation service can meet user demands dynamically without delays.

KEDA Usage:

Using a ScaledObject with a kedify-otel trigger, we target metrics directly from OpenTelemetry. The query example shows request rates for specific endpoints, such as "avg(app_frontend_requests{target=/api/recommendations, method=GET})" and targetValue: 100. This setup provides rapid scaling based on real-time traffic demands without the latency of a Prometheus-based approach.

Get Started

                      apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: recommendationservice
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: recommendationservice
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
    - type: kedify-otel
      metadata:
        metricQuery: 'avg(app_frontend_requests{target=/api/recommendations, method=GET, status=200})'
        operationOverTime: 'rate'
        targetValue: '100'
        clampMax: '20'