Live Webinar with Arm! Join our code along on smart Kubernetes autoscaling.
Save your spot
by Zbynek Roubalik
December 15, 2025
Prometheus is everywhere. Teams already depend on it for dashboards, alerts, SLOs, and cost reviews. So when they reach for KEDA to autoscale their applications, the temptation is obvious:
“We already have metrics in Prometheus, let’s just use those.”
It works on small clusters. It works for workloads that scale slowly. And then it doesn’t.
Past a certain point, Prometheus-based autoscaling introduces systemic lag, control-plane pressure, and network load that directly contradict the goals of real-time scaling. As traffic patterns become bursty and services become more latency-sensitive, the cracks show quickly.
This post explains why Prometheus, Datadog, Dynatrace, and other pull-based metric stores become unreliable for time-critical autoscaling decisions and why real-time push-based OpenTelemetry signals fundamentally fix the control loop.
If you rely on KEDA today, or if you’ve hit scaling delays you can’t quite explain, read on.
Fix Prometheus-based autoscaling bottlenecks
See how Kedify’s OTel scaler cuts latency from your control loops.
Get StartedPrometheus was designed for observability, not real-time control. That distinction becomes painful once you use it to drive autoscaling.
Scrape → store → query → scale
This loop sounds reasonable, but each step inserts latency:
In even the best configuration, autoscaling decisions often operate on 30–90 seconds-old data. For a 15–30 second traffic spike, this means scaling after the spike has already ended. SLOs slip, p95 rises, dashboards look misleading, and the autoscaler appears “slow” despite behaving as designed.
Teams try to fix this by tightening Prometheus scrape intervals or KEDA polling intervals. That is when the second set of problems emerges.
Each KEDA ScaledObject performs its own metric queries. With dozens of scaled workloads, these queries stack quickly. Add HPA polling every 15 seconds and you create persistent load on:
If Prometheus is hosted in a different cluster or in a vendor SaaS (Datadog, Dynatrace), these calls become cross-zone or cross-region traffic. That introduces latency and directly increases costs. Datadog and Dynatrace have rate limits that many teams unknowingly hit the moment they shrink KEDA polling intervals.
The outcome is counterintuitive:
The autoscaler starts failing because the observability system can’t keep up.
Queries time out. KEDA receives stale values. HPA delays. The control plane becomes noisy. And scaling decisions become both slower and less predictable.
Autoscaling loops work best when the signal path is short:
workload → scaler → decision → replicas
Prometheus adds two intermediaries: a scraper and a query engine. Both add latency, retries, and rate-limit behavior. When teams try to reduce this lag, they overload the metric store.
Autoscaling on stale data is not just inconvenient. It directly harms:
The fix is not to tune Prometheus harder. The fix is to remove it from the critical path.
OpenTelemetry is already the emerging standard for distributed telemetry. What it provides for autoscaling is simple but transformative: push-based signals with no scrape delay.
The Kedify OpenTelemetry Scaler takes advantage of this model:
Because metrics flow only inside the cluster (unless you choose to ship OTel data externally), you eliminate the cross-zone cost and rate-limit issues that plague Prometheus-based autoscaling.
A typical ScaledObject using the Kedify OTel Scaler looks like this:
apiVersion: keda.sh/v1alpha1kind: ScaledObjectmetadata: name: paymentsspec: scaleTargetRef: name: api minReplicaCount: 1 maxReplicaCount: 60 triggers: - type: kedify-otel metadata: metricQuery: 'sum(inprogress_tasks{job="app"})' targetValue: '40'Your service emits inflight request count via OTel, the Collector forwards it, and the scaler acts immediately. Prometheus remains optional for visualization, not mandatory for control.
Teams that switch from Prometheus to push-based autoscaling report consistent patterns:
Latency becomes a function of workload behavior, not a function of scrape schedules.
Prometheus, Datadog, Dynatrace, and similar pull-based metric stores were not designed for real-time control loops. As workloads scale and traffic patterns become bursty, the limitations of scrape intervals, query load, and network costs become apparent.
Switching to push-based OpenTelemetry signals for autoscaling eliminates these bottlenecks. By shortening the signal path and reducing load on the metrics backend, teams achieve more reliable, responsive, and cost-effective autoscaling with Kedify.
⸻
Want to see Kedify in action?
Book a call with our team. We’ll show you how Kedify can fix your autoscaling bottlenecks and help you get more from your Kubernetes workloads.