by Sharad Regoti
October 30, 2023
Kubernetes has become the de facto standard for deploying microservices, owing to its autoscaling and self-healing capabilities. By default, Kubernetes employs two components for autoscaling:
Using HPA or VPA is a great starting point and works well for applications under uniform load. However, in today’s cloud-native ecosystem, scaling solely on CPU and memory utilization against dynamic traffic patterns and fluctuating workloads is inadequate. This is where native autoscaling features of Kubernetes fall short.
For accommodating such unpredictable behavior we require systems that can adjust in real-time. This is where KEDA comes in. In this blog post, we will explore how KEDA overcomes the limitations of HPA and provides a user-friendly interface for diverse autoscaling needs.
Consider an image processing application where users upload images via a mobile app interface. To meet the defined Service Level Agreement (SLA) of processing images within 30 seconds, the application’s processing time is directly influenced by the size of the image being processed. To optimize this, the upload API intelligently segregates incoming images based on their sizes and routes them to specific Kafka queues tailored for each size category. Each queue is serviced by specialized processing jobs equipped with algorithms designed for efficient image processing within the 30-second SLA.
In this scenario, relying solely on CPU and RAM metrics for scaling would be insufficient and potentially detrimental to our SLA. Traditional metrics like CPU and RAM do not account for the intricacies of image processing times, which heavily depend on the image size and complexity of the algorithms used. Scaling based on Kafka queue size, which directly correlates with the volume and complexity of the images in the processing pipeline, ensures a responsive scaling strategy. By dynamically scaling based on the size of the image queues, the application can adhere to the 30-second processing SLA.
But by default, HPA can only make scaling decisions based on CPU and RAM metrics, as these are the only metrics available on the Metrics Server. HPA can make decisions based on custom metrics, provided the custom metrics are available on the metrics server. However, the default implementation of the metric server only collects CPU and RAM information from the cluster.
To solve our use case in which we want HPA to scale based on Kafka queue size, we need to generate custom metrics such that HPA is able to scale based on them. This can be achieved by creating a Custom Metrics Server that implements the Metrics API. However, implementing a custom metric server uncovers a series of complexities and challenges.
Creating custom metrics for HPA is no walk in the park. It demands intricate knowledge of Kubernetes internals, requiring developers to delve deep into interfaces and intricate code modifications. Below is a brief outline of the steps involved:
Given these complexities, the path toward flexible autoscaling solutions in Kubernetes takes a significant turn. Instead of navigating the labyrinth of custom metrics servers and their associated challenges, a more elegant solution would be to use KEDA.
In the following sections, we will explore how KEDA overcomes these limitations and provides a seamless, user-friendly approach to autoscaling that aligns with the dynamic demands of modern cloud-native applications.
KEDA, which stands for Kubernetes Event-Driven Autoscaler, enables the scaling of pods based on external event sources. This allows Kubernetes applications to scale dynamically in response to events such as:
The above examples are just the tip of the iceberg. KEDA supports more than 60+ predefined event sources (scalers). To consume these event sources, KEDA provides a user-friendly interface that uses Kubernetes CRDs. Unlike the complexities of custom metrics servers, KEDA streamlines this process through a well-thought-out architecture.
At its core, KEDA comprises three key components:
KEDA introduces the concept of ScaledObjects and ScaledJobs in Kubernetes, which are custom resources that allow users to define how their workloads should scale based on specific events. Apart from defining scaling conditions, these CRDs offer fine-grained control over autoscaling behaviors. Here is an example of scaling an Nginx deployment based on AWS SQS queue size.
KEDA doesn’t stop at just overcoming the limitations of HPA; it goes above and beyond, offering functionalities that redefine auto scaling.
The range of features KEDA offers is vast and warrants an in-depth exploration; for a comprehensive list, refer to the official documentation.
In the ever-evolving landscape of cloud-native applications, adapting to dynamic workloads is a necessity. While Kubernetes provides native tools like HPA and VPA, their limitations become apparent in scenarios where applications demand responsiveness beyond CPU and RAM metrics.
KEDA not only overcomes these shortcomings but also offers a seamless and user-friendly interface for diverse autoscaling needs. Its ability to scale down to zero, triggering Kubernetes jobs, emit real-time events for diagnostics, and maintain secure connections through authentication providers, sets KEDA apart as a comprehensive solution. Moreover, setting up KEDA is simpler, reducing the typical hurdles users face with Kubernetes custom metrics.
Share: