Arrow Left IconExplore Scalers

Use events from OpenTelemetry to trigger autoscaling with Kedify and KEDA

Book demo !
Opentelemetry Scaler Diagram

Autoscaler for Kubernetes workloads based on real-time metrics from Prometheus, OpenTelemetry, and other compliant sources.

Overview of OpenTelemetry Scaler in Kedify

The OpenTelemetry (OTEL) Scaler is designed to enable precise, data-driven scaling for Kubernetes workloads. Using OpenTelemetry, it can capture a wide range of metrics, allowing the Kedify Scaler to adjust resources dynamically and improve response times while conserving costs. It utilizes a push-based approach for metrics, providing significant advantages over traditional pull-based models like Prometheus.

Key Features

  • Real-Time Metric Collection: Gathers metrics in real time, enabling timely scaling based on traffic demand.
  • Wide Metric Range: Supports a variety of metrics such as request rates and concurrency for granular scaling configurations.
  • No Need for Prometheus Server: This approach eliminates the need to deploy a Prometheus server, reducing infrastructure overhead and resource usage.
  • Faster Response Times: With a push-based model, metrics are sent directly to KEDA, minimizing delays that are typical in scrape intervals, thus allowing faster scaling responses.
  • Flexible Integration Options: OpenTelemetry’s support for multiple protocols and integrations enables streamlined observability setups across diverse environments, with minimal configuration.

Learn More

Featured Use Cases

Scenario:

Scale AI/ML training workloads dynamically based on metrics such as tokens per minute and GPU memory usage. A Prometheus server is not used to ensure real-time scaling that adapts to intensive computational loads.

OpenTelemetry Scaler Usage:

Scaling is determined by AI/ML-specific metrics, such as tokens per minute and GPU memory usage , allowing rapid adjustments to resources based on model training demands and without the overhead of a Prometheus setup.

KEDA Usage:

The ScaledObject is configured with kedify-otel triggers for high-frequency training metrics. For example, the metricQuery might specify "avg(model_training_tokens{model=my_model, job=training})" to adjust replicas based on real-time usage and performance, which is essential for latency-sensitive AI workloads.
Get Started
                    apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ai-training
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-training-service
  minReplicaCount: 1
  maxReplicaCount: 15
  triggers:
    - type: kedify-otel
      metadata:
        metricQuery: 'avg(model_training_tokens{model=my_model, job=training})'
        operationOverTime: 'rate'
        targetValue: '500'
    - type: kedify-otel
      metadata:
        metricQuery: 'avg(gpu_memory_usage{model=my_model, job=training})'
        targetValue: '800'