by Sharad Regoti & Zbynek Roubalik, Founder & CTO, Kedify
January 17, 2024
When people talk about autoscaling, most of the time it is about scaling up, but have you asked yourself, what is the impact of scaling on different types of applications?
Let’s say you have written an API or a Job that takes a significant amount of time to get the work done and this application is configured with autoscaling in Kubernetes.
Scaling up is effortless, no problems there. But when Kubernetes instructs to scale in the application, what happens to the long-running process inside the container? If it gets killed, what about the inefficiency (processing cycles wasted) introduced with the enablement of auto-scaling?
These are some inherent problems of autoscaling that KEDA solves. In this blog post, we will show how KEDA not only enables event-driven autoscaling but also handles the intricacies associated with autoscaling.
Diagram of Video Encoding Application
Imagine a video streaming application like Netflix where each video has to be encoded into multiple formats: 480p, 720p, 1080p, 4K, etc. The encoding process is time-consuming and takes between 1 to 3 hours, depending on the requested quality. This video data, along with quality specifications, is available as a message in RabbitMQ.
A simple workflow for an application that encodes videos from RabbitMQ would be:
However, this naive implementation has scaling limitations as it processes only one message at a time. Here are some possible improvements:
A more effective metric for scaling would be RabbitMQ queue size, i.e., creating pod replicas based on the queue’s length. But Kubernetes can scale only based on CPU and memory usage.
This is where KEDA comes in. KEDA addresses this limitation by enabling scaling based on various external events, one of which is RabbitMQ queue size. Now you can simplify your application to process only one message at a time and offload scaling decisions to KEDA.
To get started, KEDA provides a ScaledObject CRD that enables event-driven autoscaling of Kubernetes workloads. Refer to this blog post to learn more about it.
As discussed earlier, there are some inherent problems with autoscaling, regardless of whether you are using KEDA or HPA.
The Problem with Long-running Processes: Suppose KEDA decides to scale out the pods owing to the large number of messages in RabbitMQ, but after a coolDownPeriod, it determines that the load is below the configured threshold and decides to scale in (reduce the number of pods). This scale-in action will terminate the ongoing video processing, leading to inefficiency. Particularly in the case of RabbitMQ, if the application doesn’t send an acknowledgment that the message is processed, a new job will start processing the same message, rendering the previous processing time wasted. In the worst-case scenario, the message might be lost if it has already been acknowledged but not completely processed.
Here are some workarounds for the above problem.
The above solutions are just workarounds, because timing any configuration for autoscaling might lead to unknown errors. This ineffectiveness of processing long-running jobs with autoscaling is precisely where the concept of Scaled Jobs becomes critical.
The scaling problems of long-running processes can be solved by just letting them run until completion and not interrupting them. This is exactly what a Kubernetes Job does.
ScaledJob is a KEDA CRD designed to manage the autoscaling of Kubernetes jobs. It is an extension of Kubernetes Jobs functionality. Kubernetes Jobs are designed to run individual or batch processes that eventually complete on their own. However, they lack the capability to autoscale based on external metrics or events.
KEDA enables this functionality by introducing event-driven scaling. This allows ScaledJobs to start additional Kubernetes jobs based on external triggers, such as the length of a queue. The scale-up behavior is the same as ScaledObject, but jobs created by ScaleJobs are not terminated until they are completed.
In our scenario, we can use ScaledJobs to monitor the RabbitMQ queue and dynamically deploy jobs to process each item. Once a job is completed, it’s terminated. Each job handles a portion of the workload, reducing the risk of wasted efforts due to mid-process termination in scaling-down scenarios.
Diagram with Video Encoding Application using KEDA
The above diagram depicts our revised architecture for the video encoding application, which contains four components.
We will first demonstrate the inefficiency of autoscaling long-running processes with ScaledObjects and then use ScaledJobs to see how it solves the problem.
2. Install RabbitMQ Kubernetes Operator
We will use RabbitMQ operator to provision and manage our cluster.
Wait for cluster to be up & running, It usually take 1 to 2 minutes.
Execute the below command and open this URL http://localhost:15672/ on browser to access the RabbitMQ admin UI. Use the below admin credentials to login.
Username: default_user_hmGZFhdewq65P4dIdx7
Password: qc98n4iGD7MYXMBVFcIO2mtB5voDuV_n
This application is configured to read one message at a time from RabbitMQ and starts processing. As this is a demo application, instead of actually performing any encoding operation, it sleeps for a configurable duration (defaults to 6m). And after the elapsed time, it starts processing the next message.
We have configured our scaled object to target consumer-program deployment and enabled auto-scaling based on the rabbitmq queue size. The value of queue size is set to 1. Refer to the docs, to learn more about RabbitMQ Scaler.
Example: If one pod can handle 10 messages, set the queue length target to 10. If the actual number of messages in the queue is 30, the scaler scales to 3 pods.
Execute the below command to create the scaled object.
Below is the response to the above curl request. The event with a message kill count indicates that the application was terminated while processing a message. Whereas an event with a message processed count indicated the message was processed successfully.
From the above response, we could conclude that out of 15 messages, only 5 were processed successfully, whereas the remaining 10 were terminated while processing due to scale-down action.
Let’s analyze, how ScaledJobs solve this problem
The ScaledJob configuration is defined as follows:
Job Target Reference: The jobTargetRef section specifies the template for the job that KEDA will scale. It uses the same container image and script (/scripts/consumer-script.sh) as defined in the initial consumer deployment, ensuring that the scaled jobs perform the same task as the original consumer pod.
Note: There was one change made in the script, instead of continuing to read the next message after processing. The new script exits the process. This behavior is required for applications running as jobs, otherwise the job will never be terminated.
Polling Interval: Set at 10 seconds, this determines how frequently KEDA checks the RabbitMQ queue for new messages.
Jobs History Limits: Both successful and failed jobs history limits are set to 100, providing visibility into the job’s performance while managing resource utilization.
Max Replica Count: Capped at 100, it defines the maximum number of concurrent jobs.
Triggers: The trigger is set on RabbitMQ queue length, with the queueName specified as testqueue. Each job is triggered for a single message (value: “1”), ensuring that each job processes one message at a time.
For detailed information on configuring ScaledJob in KEDA, refer to the official KEDA documentation.
Execute the below command to create the scaled job.
Execute the below command to start producing messages, the below jobs produces 15 messages in a queue called testqueue:
From the below response, we could conclude that all 15 messages were processed successfully and none were terminated while processing due to scale-down action.
In conclusion, ScaledJobs in Kubernetes offer a robust solution for handling long-running workloads, effectively addressing the inefficiencies of wasted processing time and potential data loss associated with ScaledObjects during autoscaling. By ensuring that each job runs to completion without interruption from scaling-down actions, ScaledJobs provides a more reliable and efficient processing environment.
We encourage you to tailor these methods to your setup and share your findings. Your feedback and experiences are invaluable to our community’s learning.
Share: