by Sharad Regoti & Zbynek Roubalik, Founder & CTO, Kedify
March 05, 2024
Kubernetes has become the de facto standard for deploying microservices, owing to its autoscaling and self-healing capabilities. By default, it provides HPA and VPA for scaling applications based on CPU and RAM metrics.
Using these components is a great starting point and works well for applications under uniform load. However, in today’s cloud-native ecosystem, scaling solely on CPU and memory utilization against dynamic traffic patterns and fluctuating workloads is inadequate. This is where the native autoscaling features of Kubernetes fall short.
To accommodate such unpredictable behavior, we require metrics that adjust in real-time and closely reflect application behavior. For example
These metrics offer a more responsive approach to autoscaling compared to just CPU and RAM. In this blog, we’ll explore configuring autoscaling for event-driven applications using AWS SQS with KEDA, and also checkout various authentication techniques and additional AWS metrics supported by KEDA for autoscaling.
For foundational understanding about KEDA, refer to this blog post.
Imagine a video streaming application like Netflix where each video has to be encoded into multiple formats: 480p, 720p, 1080p, 4K, etc. The encoding process is time-consuming and takes between 1 to 3 hours, depending on the requested quality. This video data, along with quality specifications, is available as a message in AWS SQS.
A simple workflow for an application that encodes videos from AWS SQS would be:
Read the message from the queue.
Encode the video as per specification.
Store the result and process the next message.
However, this naive implementation has scaling limitations as it processes only one message at a time. Here are some possible improvements:
1. Introduce Concurrent Processing: One can modify the application to process multiple messages concurrently, but with a single replica, this will eventually hit node resource limits, and you need to scale vertically.
2. Fixed Number of Kubernetes Replicas: One can start the application with a fixed number of replicas, regardless of the application execution type (synchronous or concurrent). It will face scaling issues when a large number of messages are produced.
3. Using Kubernetes HPA: One can configure Kubernetes HPA to scale based on the CPU and RAM utilization of the application. But as discussed in our previous post, this is ineffective in the modern cloud era.
A more effective metric for scaling would be to use AWS SQS queue size, i.e., creating pod replicas based on the queue’s length. But Kubernetes can scale only based on CPU and memory usage.
This is where KEDA comes in. KEDA addresses this limitation by enabling scaling based on various external events, one of which is AWS SQS queue size. Now you can simplify your application to process only one message at a time and offload scaling decisions to KEDA.
To get started, KEDA provides ScaledObject and ScaledJob CRD that enables event-driven autoscaling of Kubernetes workloads. Refer to this blog post to learn more about it.
We will be using ScaledJob CRD for configuring event driven autoscaling based on SQS queue size, as it protects applications from scale down action of autoscaler. For details on choosing between ScaledObject and ScaledJob, see our prior article.
The above diagram depicts our revised architecture for the video encoding application, which contains four components.
1. Message Producer: Produces messages in AWS SQS, containing video and encoding quality information.
2. Video Encoder/Consumer: Reads the message from AWS SQS, processes it, and stores the result in an artifact store like the S3 Bucket.
3. KEDA: Handles the autoscaling of consumer applications on the basis of AWS SQS queue size.
4. Result Analyzer: To understand the autoscaling behavior, this component exposes some REST APIs that are consumed by the Video Encoder application. It essentially keeps track of events that occurred while processing the message.
1. Clone the Example Repository
2. Create AWS SQS Queue
Execute the below command to create a queue named test-queue, note down the QueueUrl (e.g. https://sqs.ap-south-1.amazonaws.com/123123123123/test-queue) from the command output.
3. Create IAM Role with Trust Policy for KEDA Operator
Open the keda-operator-trust-policy.json file, it should have the below content
Replace replace-with-your-aws-account-id (e.g. 123123123123) & replace-with-your-eks-open-idc (e.g. oidc.eks.ap-south-1.amazonaws.com/id/123AB12332123CEE5C7123FF9D3123) keys in the file with its corresponding values obtained from AWS console.
Execute the below command to create a role named keda-operator, note down the RoleARN ( e.g. arn:aws:iam::123123123123:role /keda-operator) from the command output.
4. Install KEDA
Execute the below command and replace replace-with-keda-operator-role-arn key with RoleARN value obtained from previous step.
To scale based on AWS SQS queue size, KEDA requires authentication credentials of an AWS account which has appropriate permissions to operate on SQS queue.
KEDA provides a few secure patterns to manage authentication flows:
1. Directly Configure authentication per ScaledObject or ScaledJob using ConfigMap or Secret
2. Re-use per-namespace credentials with TriggerAuthentication
TriggerAuthentication allows you to describe authentication parameters separate from the ScaledObject and the deployment containers. It also enables more advanced methods of authentication like “pod identity”, external secrets, authentication re-use or allowing IT to configure the authentication.
3. Re-use global credentials with ClusterTriggerAuthentication
Each TriggerAuthentication is defined in one namespace and can only be used by a ScaledObject in that same namespace. For cases where you want to share a single set of credentials between scalers in many namespaces, you can instead create a ClusterTriggerAuthentication. As a global object, this can be used from any namespace.
In our case we will be using TriggerAuthentication at namespace level. The authentication provider to use with TriggerAuthentication depends where the application is running.
1. Application is running outside of AWS network
In this condition, we can use Secret provider to store aws credentials such as access-key or secret-key in a Kubernetes Secret.
2. Application is running inside of AWS network
We can use the Secret provider approach as mentioned earlier, but when your applications are running inside the AWS network the best practice is to use AWS IAM roles to obtain temporary credentials. In this case, we will be using AWS (IRSA) Pod Identity Webhook.
Below diagram depicts, how authorization will work in our case
Setup AWS Authentication with IRSA for Producer & Consumer Application
Create SQS Role with Trust Policy As seen from the above image, the sqs role has to be assumed by two entities: kubernetes service account and keda-operator role. To accommodate this, in the trust-policy.json file, we will replace the following keys
Replace replace-with-keda-operator-role-arn with RoleARN ( e.g. arn:aws:iam::123123123123:role /keda-operator)
Replace replace-with-your-aws-account-id (e.g. 123123123123) & replace-with-your-eks-open-idc (e.g. oidc.eks.ap-south-1.amazonaws.com/id/123AB12332123CEE5C7123FF9D3123) keys in the file with its corresponding values obtained from AWS console.
Execute the below command to create a role named sql-full-access, note down the RoleARN (e.g. arn:aws:iam::123123123123:role /sqs-full-access) from the command output.
Execute the below command to attach policy to the role.
Create Service Account
In the service-account.yaml file, replace replace-with-your-sqs-role-arn key with the RoleARN ( e.g. arn:aws:iam::123123123123:role /sqs-full-access) value obtained from previous step (sqs-full-access role)
Execute the below command to create a service account which will be used by producer and consumer applications.
Provide SQS Assume Role Permission KEDA Operator Role
Create a Policy
In the keda-operator-policy.json file replace put-sqs-role-arn-here with the SQS RoleARN (e.g. arn:aws:iam::123123123123:role /sqs-full-access).
Attach Policy to KEDA Role
Execute the below command to attach policy (created in above step) to the role (created in step 3).
Create Producer Application
In the producer.yaml replace put-your-sqs-queue-url-her with the queue URL obtained from the previous steps.
Execute the below command to start producing messages, the below jobs produces 10 messages in a queue called test-queue
Verify the application by checking application logs using below command
Note: The producer must create messages error-free, as depicted above. If you are encountering AWS configuration errors, verify that IRSA is correctly set up in EKS.
In the scaled-job-consumer.yaml replace put-your-sqs-queue-url-here with the queue URL obtained from the previous steps and put-your-aws-region with your AWS region.
The ScaledJob configuration is defined as follows:
For detailed information on configuring ScaledJob in KEDA, refer to the official KEDA documentation.
Execute the below command to deploy a consumer (video processor) application.
As there are already some messages in the queue, KEDA will start creating jobs to handle queue messages.
Execute the below command in a separate terminal to monitor how KEDA scales pods
You will observe KEDA created 10 pods corresponding to 10 messages in SQS queue, as we have configured the threshold of SQS trigger to 1.
Wait for 10–15 minutes for some messages to be processed, and then execute the below command. This curl request gets the auto-scaling event data from the result analyzer application.
Below is the response to the above curl request. The event with a message kill count indicates that the application was terminated while processing a message. Whereas an event with a message processed count indicated the message was processed successfully.
From the above response, we could conclude that all the 10 messages were processed successfully.
Apart from AWS SQS, KEDA has built in scalers for
DynamoDB - This specification describes the AWS DynamoDB scaler. This scaler uses a specified DynamoDB query to determine if and when to scale a given workload.
DynamoDB Streams - This specification describes the aws-dynamodb-streams trigger that scales based on the shard count of AWS DynamoDB Streams.
Kinesis Stream - This specification describes the aws-kinesis-stream trigger that scales based on the shard count of AWS Kinesis Stream.
If this is not sufficient KEDA also supports CloudWatch scaler, with this integration you can scale your applications based on any AWS metric that is available in CloudWatch
In conclusion, our exploration into autoscaling for AWS services with KEDA has demonstrated its potential to enhance the responsiveness and efficiency of Kubernetes deployments. By leveraging event-driven metrics, such as AWS SQS queue length, KEDA allows for more precise scaling decisions that traditional CPU and memory-based metrics cannot provide.
Beyond SQS, KEDA also supports a range of other AWS metrics including DynamoDB, DynamoDB Streams, Kinesis Streams, and CloudWatch, offering a versatile toolkit for scaling based on real-time demand and specific application needs.
We encourage you to experiment with KEDA and share your experiences, as your feedback is invaluable in refining and expanding the capabilities of autoscaling solutions.
Share: