New Case Study:   How Kitabisa Scales Unpredictable Donation Traffic Reliably with Kedify Arrow icon

back button All Events

KEDA and LLM workload autoscaling

KEDA and LLM workload autoscaling
Speaker: Jirka Kremser & Josef Karasek Event: KCD Prague 2026
Register
May 22, 2026

Neither resource overprovisioning nor underprovisioning is ideal in terms of cost saving or service quality. Horizontal Autoscaling becomes a cloud native capability to tackle these challenges in Kubernetes. Horizontal autoscaling decisions are based on the metrics, collected from any sources. How to effectively and efficiently collect them can lead to how fast the target workload can make the autoscaling decisions. We’ll explore Kubernetes Event-Driven Autoscaling (KEDA) strategies tailored specifically for dynamic real-time traffic, custom metrics for AI workloads, and considerations for managing complex services. Additionally, we’ll address the trade-offs involved, pitfalls to avoid, and illustrate best practices through real-world examples.

This session provides the insights and demo to optimize the approach of metrics collection to make autoscaling decisions. It will walk you through how horizontal autoscaling works in Kubernetes, what is special in autoscaling LLM workloads, the current autoscaling solutions in Kubernetes ecosystem with their pros and cons, and what can be optimized in terms of push mode for metrics collection and serving, with the integration of OTel collector and KEDA in LLM workload autoscaling.

Join us for an insightful look at building robust autoscaling strategies optimized for real-time responsiveness, AI efficiency, and cost control in Kubernetes.

Kedify home screenshot

Reduce cloud costs and complexity.

Start autoscaling with Kedify today.

Get Started