Kedify Achieves SOC 2 Type II Certification!   Learn more about our commitment to security Arrow icon

How Amigo Scaled Reliable AI Workloads in Healthcare with Kedify

download button Download

The Challenge

Amigo builds AI agents for the medical field, supporting hospitals, clinics, and online medical services. A key part of their mission is trust and reliability, ensuring their AI systems produce predictable outcomes even in high-stakes environments.

Amigo runs Kubernetes with one cluster per region and environment. Inside those clusters, they operate a monolithic backend server, multiple asynchronous workers, and custom AI workloads including GPU-based services such as text-to-speech engines. Each of these workloads needs to scale in a different way, and Amigo needed a consistent approach that could handle all of them without introducing complexity or unpredictable behavior.

Their main goal was to guarantee performance during bursty traffic, reacting quickly when demand spikes and ensuring their platform remains responsive.

The Solution

Amigo chose Kedify to unify autoscaling across their Kubernetes workloads using one consistent framework and language, while still being able to scale each service based on the metric that best matches its behavior.

With Kedify, Amigo can define scaling rules in a predictable and repeatable way across very different workload types. Instead of treating every service the same, they can scale based on real demand signals such as HTTP request rate, WebSocket concurrency, queue depth, and Kubernetes workload state. This made it fast for the team to make new workloads autoscalable while keeping scaling behavior satisfying and reliable.

Autoscaling patterns used at Amigo

Below are examples of the autoscaling patterns Amigo uses with Kedify. This is a subset of the scaling strategies they apply across their stack.

HTTP scaling for backend services

  • Scales based on request rate using Kedify’s HTTP scaler
  • Supports different weighting for heavier vs lighter requests
  • Uses request path structure to separate heavy and light endpoints

WebSocket scaling

  • Scales based on concurrent WebSocket connections handled at a given moment

Amazon SQS scaling for asynchronous workers

  • Scales workers based on the number of messages in SQS queues
  • Used as a core coordination mechanism for async processing

Kubernetes workload based scaling for multi-component AI services

  • Used for AI workloads that include an API layer and an engine layer
  • Maintains a fixed ratio between API and engine components to ensure consistent throughput


“We chose Kedify because it lets us autoscale everything, from HTTP to SQS to GPU AI engines, in one consistent framework.”

Yi Hong

Member of Technical Staff, Amigo

Why Kedify

Amigo selected Kedify because they wanted one uniform autoscaling framework they could apply across services that behave very differently, while still scaling each workload using the most relevant metric. Kedify gave them predictable scaling behavior and made it quick to turn new services into autoscaled workloads without having to build custom tooling for every use case.

The team also valued how responsive Kedify was to real-world needs. Amigo required autoscaling based on a value stored inside a Kubernetes Secret, and Kedify implemented that feature quickly after the request. That flexibility helped Amigo move faster while keeping scaling logic aligned with how their platform actually operates.

“Our custom feature request was implemented extremely quickly and made a real difference.”

Yi Hong

Member of Technical Staff, Amigo

The Impact

With Kedify, Amigo improved reliability and performance under bursty demand by ensuring workloads scale quickly and predictably based on the right signals for each service.

This shift reduced operational overhead for the team, improved confidence in scaling behavior, and helped ensure consistent performance across backend APIs, async workers, and GPU-based AI services.

Customer

Amigo

https://www.amigo.aiExternal Link

Industry

AI platform for the healthcare sector delivering trusted and reliable agents for medical use cases

Size

  • Private, growth-stage
  • Region-based Kubernetes clusters
  • Custom AI workloads with GPU support

Challenges

  • Scaling HTTP, WebSocket, and queue workloads in a consistent way
  • Guaranteeing performance during bursty traffic
  • Reducing complexity of managing multiple scaler types

Overview

Kedify helped Amigo implement a unified, metric-driven autoscaling framework that spans multiple workload types and ensures predictable, scalable performance.

  • Predictable Autoscaling Across Workloads
    Predictable Autoscaling Across Workloads

    Amigo unified HTTP, WebSocket, and queue-based autoscaling under one framework

  • Improved Performance Under Bursty Demand
    Improved Performance Under Bursty Demand

    Faster reaction to load spikes with less overhead

  • Reduced Complexity
    Reduced Complexity

    A single scaling framework applied across highly diverse infrastructure

Kedify home screenshot

Looking to learn more hands on?

Let a Kedify team member show you what you have been missing

Get Started

Please reach out for more information, to try a demo, or to learn more:
www.kedify.io