KEDA and LLM workload autoscaling

Speaker: Jirka Kremser & Josef Karasek Event: KCD Prague 2026

May 22, 2026

Neither resource overprovisioning nor underprovisioning is ideal in terms of cost saving or service quality. Horizontal Autoscaling becomes a cloud native capability to tackle these challenges in Kubernetes. Horizontal autoscaling decisions are based on the metrics, collected from any sources. How to effectively and efficiently collect them can lead to how fast the target workload can make the autoscaling decisions. We’ll explore Kubernetes Event-Driven Autoscaling (KEDA) strategies tailored specifically for dynamic real-time traffic, custom metrics for AI workloads, and considerations for managing complex services. Additionally, we’ll address the trade-offs involved, pitfalls to avoid, and illustrate best practices through real-world examples.

This session provides the insights and demo to optimize the approach of metrics collection to make autoscaling decisions. It will walk you through how horizontal autoscaling works in Kubernetes, what is special in autoscaling LLM workloads, the current autoscaling solutions in Kubernetes ecosystem with their pros and cons, and what can be optimized in terms of push mode for metrics collection and serving, with the integration of OTel collector and KEDA in LLM workload autoscaling.

Join us for an insightful look at building robust autoscaling strategies optimized for real-time responsiveness, AI efficiency, and cost control in Kubernetes.

Reduce cloud costs and complexity.

Start autoscaling with Kedify today.

Get Started