Inside the book

The Evolution of Kubernetes Autoscaling (2014–2025)

A quick history of HPA, VPA, CA/Karpenter, and event-driven loops; and what changed by 2025.

Why Traditional Scaling Models Break at Today’s Latency & Cost SLOs

How CPU and memory metrics lag real demand and what “intent-aware” scaling fixes.

Event-Driven Architecture, Simply

How event streams and asynchronous workloads reshape autoscaling beyond request-per-second thinking.

HTTP & gRPC Workloads: What Most Get Wrong

Designing for concurrency, cold starts, and backpressure when scaling real-time APIs.

GPU-Aware Autoscaling for AI & ML

Keeping GPU workloads efficient with pre-warm strategies and VRAM-safe scaling behavior.

Cluster & Node Autoscaling: Provisioning Capacity That Matches Your Workloads

Coordinating KEDA, HPA, and Cluster Autoscaler to balance speed, placement, and efficiency.

Predictive Autoscaling: From Reactive Loops to Forecast-Driven Capacity

Using time-series forecasts and lead times to prepare for demand before it hits.

Build vs. Buy: The True Cost of DIY Autoscaling

Where in-house scaling platforms shine and when managed or enterprise tooling wins on ROI.