FREE E-BOOK

From Intent to Impact:
The 2025 Kubernetes Autoscaling Playbook

Scale on the signals that matter, from HTTP intent,
push‑based telemetry, and GPU economics - without DIY glue.

Written by Zbynek Roubalik, Founder & CTO, maintainer of
the KEDA project.

Get the ebook

What you’ll learn:

Scale on the right signals. Use HTTP RPS/concurrency, backlog age, and tail latency; not just CPU.

Push beats scrape: wire OpenTelemetry into autoscaling to cut the “lag chain.”

Master all types of autoscaling on Kubernetes, safely scale out workloads and schedule jobs.

Preview to Predictive autoscaling and why we should use it.

GPU‑aware scaling: blend inflight intent with VRAM/SM headroom; hide cold starts.

Who’s it for?

DevOps / Platform / SRE leads operating multi‑cluster K8s on AWS/GCP/
Azure; event‑driven or spiky workloads; cost pressure & latency SLOs.

Inside the book

The Evolution of Kubernetes Autoscaling (2014–2025)

Why Traditional Scaling Models Break at Today’s Latency & Cost SLOs

Event-Driven Architecture, Simply

HTTP & gRPC Workloads: What Most Get Wrong

GPU-Aware Autoscaling for AI & ML

Cluster & Node Autoscaling: Provisioning Capacity That Matches Your Workloads

Predictive Autoscaling: From Reactive Loops to Forecast-Driven Capacity

Build vs. Buy: The True Cost of DIY Autoscaling

A quick history of HPA, VPA, CA/Karpenter, and event-driven loops; and what changed by 2025.

How CPU and memory metrics lag real demand and what “intent-aware” scaling fixes.

How event streams and asynchronous workloads reshape autoscaling beyond request-per-second thinking.

Designing for concurrency, cold starts, and backpressure when scaling real-time APIs.

Keeping GPU workloads efficient with pre-warm strategies and VRAM-safe scaling behavior.

Coordinating KEDA, HPA, and Cluster Autoscaler to balance speed, placement, and efficiency.

Using time-series forecasts and lead times to prepare for demand before it hits.

Where in-house scaling platforms shine and when managed or enterprise tooling wins on ROI.

The Evolution of Kubernetes Autoscaling (2014–2025)

A quick history of HPA, VPA, CA/Karpenter, and event-driven loops; and what changed by 2025.

Why Traditional Scaling Models Break at Today’s Latency & Cost SLOs

How CPU and memory metrics lag real demand and what “intent-aware” scaling fixes.

Event-Driven Architecture, Simply

How event streams and asynchronous workloads reshape autoscaling beyond request-per-second thinking.

HTTP & gRPC Workloads: What Most Get Wrong

Designing for concurrency, cold starts, and backpressure when scaling real-time APIs.

GPU-Aware Autoscaling for AI & ML

Keeping GPU workloads efficient with pre-warm strategies and VRAM-safe scaling behavior.

Cluster & Node Autoscaling: Provisioning Capacity That Matches Your Workloads

Coordinating KEDA, HPA, and Cluster Autoscaler to balance speed, placement, and efficiency.

Predictive Autoscaling: From Reactive Loops to Forecast-Driven Capacity

Using time-series forecasts and lead times to prepare for demand before it hits.

Build vs. Buy: The True Cost of DIY Autoscaling

Where in-house scaling platforms shine and when managed or enterprise tooling wins on ROI.

About Zbynek

Founder and CTO, Kedify

Zbynek Roubalik is the co-creator of KEDA (Kubernetes Event-Driven Autoscaler) and the founding maintainer behind one of the most adopted autoscaling projects in the Kubernetes ecosystem. While at Red Hat, Zbynek helped design and scale KEDA to power thousands of production workloads worldwide, laying the foundation for modern, event-driven autoscaling.

Recognizing that enterprise teams needed more visibility, multi-cluster management, and production-grade reliability, Zbynek partnered with Open Core Ventures (founded by GitLab’s Sid Sijbrandij) to create Kedify, a commercial platform that extends KEDA’s power into enterprise environments.

As Founder & CTO, Zbynek leads Kedify’s engineering and product direction, building a platform that combines real-time observability, GPU and HTTP autoscaling, enterprise-level security, and multi-cloud simplicity. His mission is to help teams scale smarter, reduce cloud costs by 20–40%, and focus on building, not babysitting, infrastructure.

Zbynek continues to contribute to the open-source ecosystem while shaping the future of autoscaling: open-source built, enterprise tuned, and relentlessly focused on ROI.

Why now?

Autoscaling is being rewritten for the AI era. CPU-based heuristics can't keep up with GPU economics, sub-second SLOs, or multi-cluster workloads.

From Intent to Impact shows how to scale on real signals like HTTP, OpenTelemetry, and GPU intent, so your systems stay fast, efficient, and cost-smart in 2025.

Who Already Uses The Technology

KEDA powers autoscaling for companies you know including Microsoft, FedEx, Grab,
Qonto, Alibaba Cloud, Red Hat and many more. Kedify gives these capabilities turnkey
to enterprises that don’t want to build and maintain it themselves.

See Intent-Aware Autoscaling in Production

Whether you’re cutting GPU costs, preparing for your next big launch, or modernizing serverless
workloads, Kedify has you covered. Book a live demo or explore the docs to see Kedify in action.

From Intent to Impact: The 2025 Kubernetes Autoscaling Playbook