by Jirka Kremser, Kedify
December 11, 2024
One of the possible benefits of using Dapr in a Kubernetes environment is its potential integration with autoscaling mechanisms, such as KEDA (Kubernetes Event-driven Autoscaling). Autoscaling enables microservices to dynamically adjust their resource allocation based on incoming traffic or other metrics, ensuring optimal performance while minimizing costs. This can be achieved because Dapr uses the sidecar model and each sidecar exposes the metrics in a unified format that can be later used for KEDA.
In this blog post, we will explore how to leverage Dapr’s service invocation pattern to enable autoscaling in your microservice architecture. However, similar approach can be used for scaling the pub/sub pattern or even the Actor model they also support.
In this example we will set up a microservice architecture using Dapr middleware. There will be two microservices:
one written in Node.js called nodeapp
and one written in Python called pythonapp
. These services are based on an
upstream example,
where the Python app calls the Node app using the service invocation pattern.
Both workloads run daprd
in a sidecar container, which also exposes metrics. We have modified the daprd
and its
mutating webhook (dapr-sidecar-injector
) to push metrics to our OTEL collector. These metrics use OpenCensus,
so we need to configure the OTEL collector to accept metrics through the opencensus
receiver.
By utilizing the otel-add-on
, we create an API bridge between KEDA’s external push contract and the OTEL protocol.
While it’s possible to use Prometheus, Datadog, or similar metrics stores as intermediate solutions, opting for a direct integration with OpenTelemetry offers benefits in speed, latency, and stability. Moreover, it simplifies our system by eliminating the need for an additional component. For more details, refer to our previous blog post.
For this demo, we will be using k3d
that can create lightweight k8s clusters, but any k8s cluster will work. For installation of k3d
, please consult k3d.io.
Install dapr into current k8s cluster. This require dapr
cli to be installed.
Dapr uses OpenCensus for metrics and by default the metrics are only exposed in plain text format using http for scraping.
In order to decrease the delay we need to migrate from pull model to push model for metric gathering.
Make sure our version of Dapr is used. This is needed for daprd
sidecars to push the metrics to our Kedify otel-add-on
.
Now we have two microservices one calling the other one via Dapr middleware.
Deploy the scaler and OTEL collector that forwards one whitelisted metric: runtime_service_invocation_req_recv_total
. You can spot the difference in
the metric name. This is because of the fact how OTEL collector internally works with the metrics, you can check the details when looking into OTEL collector
logs.
Metric runtime_service_invocation_req_recv_total
is described in the Dapr docs
and we identified it as a good candidate for scaling.
Remember we have to use own Dapr images because the upsteam one is not able to push the metrics to OTEL collector. This has to be also enabled on the application level using annotations. So we need to patch the deployments
You can create the ScaledObject
for nodeapp
that also contains more aggressive timeouts for HPA by issuing:
Each replica of the pythonapp microservice makes a call to the nodeapp microservice every second. Check the following part of the ScaledObject configuration:
pythonapp
calls nodeapp
.PromQL
, if not all dimensions are specified, multiple metric series will be returned.1
, as we are calling the API
every second, so the counter increments by one each second.3
.targetValue
was set to 1
, indicating that one replica of nodeapp can handle this value. This ensures replica
parity between the two services.targetValue
was set to 2
, it would indicate that if we scale pythonapp (the producer) to N
replicas,
it would result in nodeapp
(the consumer) being scaled to N/2
replicas.Scale the caller microservice to 3
replicas and observe the node app:
This should lead to nodeapp
being scaled also to 3
replicas.
Create 100
request from pythonapp
One may have noticed that we used minReplicaCount: 1
in our ScaledObject
so that there had to be always at least one replica of the nodeapp
.
However, we can also use scale-to-zero feature KEDA provides with Dapr.
To achieve this, we have two options:
pythonapp
can’t talk to nodeapp
pythonapp
can’t talk to nodeapp
together with Dapr’s resiliency mechanismsnodeapp
(transitively)Approach 1:
Add second trigger that uses the counter with errors as a mechanism for waking up the nodeapp
service.
To apply this custom resource, run:
Downside of this approach is the fact that first couple of requests are lost.
Approach 2 - Two OTEL scalers with resiliency enabled:
Dapr provides a way to re-try requests between the applications. We can leverage this feature and use metrics from this feature for
waking up the nodeapp
service.
First, add the dapr_resiliency_activations_total
metrics to otel collector configuration as allowed metric.
Apply a CR from Dapr that’s responsible for enabling the resiliency features.
Finally, apply the updated ScaledObject that has this new metric as a secondary trigger for scaling.
Now the nodeapp
service can be scaled to zero replicas and we shouldn’t be losing any requests.
Approach 3 - Combination of two different ScaledObjects:
HTTP Scaler
& otel-add-on
. This approach assumes the Kedify HTTP Scaler
and Kedify agent are already installed.
Internally it could have been using the service-autowiring feature to switch
the traffic between the internal Kedify Proxy and original interceptor that can hold the http request until given service has an endpoint.
Unfortunatelly, we can’t use this approach for waking up the nodeapp
service at the moment, because Dapr operator doesn’t allow changes to their Service that exposes the
dapr sidecar on the destination service. However, we can use this scaler for requests coming from the outside of the Kubernetes cluster. So this scaler can be used as an
entrypoint scaler for the very first service and for other internal microservice you can use the Approach 2.
Kedify Proxy based on Envoy is being used here as a more performant interceptor in terms of throughput and also more lightweight in terms of CPU and memory consumption. The original interceptor coming from HTTP add on is also being used, but only for its ability to wait for the very first pod to be ready and forward the first request successfully.
In the demo, we utilized metrics from two different microservices to scale them horizontally based on how rapidly the counters were increasing. Specifically, we used a counter metric from the Dapr ecosystem that tracks how many calls Service A made to Service B. Leveraging this metric allowed us to autoscale Service B efficiently.
By using metrics directly through OpenTelemetry, we gained several advantages, including improved speed, reduced latency, and enhanced stability. Most notably, we achieved the scale to zero scenario for Service B - a capability that wouldn’t have been possible with indirect metrics solutions like Prometheus or Datadog. By eliminating the need for an intermediate metrics store, we simplified the architecture and improved overall performance.
If you’re interested in more examples of this setup in action, feel free to explore: https://github.com/kedify/otel-add-on/tree/main/examples
Here you can watch a recording that demonstrates all the steps outlined in this blog post.
Further advanced ideas:
Share: