New Case Study!   Discover how Kedify helped Tao Testing.   Read more Arrow icon

back button All Events

Optimizing Metrics Collection & Serving When Autoscaling LLM Workloads

Optimizing Metrics Collection & Serving When Autoscaling LLM Workloads

Speaker: Jiri Kremser & Vincent Hou

Event:     KubeCon Europe 2025

Register

April 03, 2025

Balancing resource provision for LLM workloads is critical for maintaining both cost efficiency and service quality. Kubernetes’s Horizontal Autoscaling offers a cloud-native capability to address these challenges, relying on the metrics to make the autoscaling decisions. However, the efficiency of metrics collection impacts how quickly and accurately Autoscaler responds to the LLM workload demands. This session explores strategies to enhance metrics collection for autoscaling LLM workloads with:

  1. The fundamentals of how horizontal autoscaling works in Kubernetes
  2. The unique challenges of autoscaling LLM workloads
  3. A comparison of existing Kubernetes autoscaling solution for custom metrics with their pros and cons
  4. How optimizing metrics collection through push-based approaches can improve scaling responsiveness. It will demonstrate an integrated solution using KServe, OpenTelemetry collector and KEDA to showcase how they can be leveraged to optimize LLM workload autoscaling.
Kedify home screenshot

Reduce cloud costs and complexity.

Start autoscaling with KEDA today.

Get Started Free