Optimal Consumption with Adaptive Load

Optimal Consumption with Adaptive Load

Optimal Consumption with Adaptive Load

Marcelo Lazaroni

Jul 11, 2024

8 Mins Read Time

This post will discuss Adaptive Load, a technique that allows our customers to ingest data streams while using their infrastructure optimally.


The Problem

When you consume a large data stream, you can process it and save the results, take an action, or both. To avoid incidents, you must consume fast enough to stay caught up but slow enough that downstream resources such as databases and APIs are never overloaded. That balancing act is challenging because other systems might be using those downstream resources simultaneously. So you're left with three options:


  1. Overprovision downstream resources (e.g., a database with more CPU and RAM than it needs).

  2. Set a fixed conservative consumption rate that never overloads downstream resources.

  3. Dynamically adjust consumption rates to maximize throughput, adapting to the load capacity of downstream resources in real-time.


The first and second options waste money by underutilizing downstream resources. The second option risks incidents because consumption could be too slow. The third option requires too much work and often fails anyway; for example, dynamically adjusting consumption rates based on downstream metrics such as CPU and RAM does not maximize throughput in practice because there are many other bottlenecks (each with their own metric).


The Solution: Adaptive Load

But what if you could maximize throughput and consume data at optimal rates? And what if you could do it without investing time and effort? At Ambar, we built precisely this, allowing our customers to delegate the real-time calculation of optimal consumption rates to us. In the rest of this post we will explain how we do this with Adaptive Load, a technique for dynamically adjusting our data delivery rate to a customer based on estimated saturation levels inferred from latency measurements.


Ambar is a data streaming service with end-to-end correctness guaranteed out-of-the-box that can be deployed in 15 minutes with minimal configuration. Instead of a pull-based consumption model, such as what's used in Kafka, Ambar delivers message streams to our customers through push. Our message delivery rate is continuously updated to ensure we saturate customers without overwhelming them. We call this Adaptive Load.

Adaptive Load provides a backpressure mechanism, regardless of why downstream resources are overloaded. It does not matter if the bottleneck is a downstream API, a database's CPU, a network switch, a database's IO limitations, or anything else. Adaptive Load optimizes for maximum throughput and does not require our customers to install software, use a library, or report health information to Ambar.

Additionally, Ambar's push-based model dramatically simplifies integration. It is compatible with all languages without needing consumer libraries, and it allows our customers to scale their infra or restart their machines without consumer rebalancing pauses.


The Details

Ambar's push-based approach calculates a limit for customer capacity and sends messages at a rate that respects that limit. The estimate cannot be too conservative because it would lead to an underutilization of customer resources. On the other hand, an excessive estimate would overwhelm customer resources and cause incidents. And to top it all off, customer capacity might decrease momentarily, so we need to adjust send-rates dynamically or, again, risk overwhelming our customers. To face these challenges, we rely on queuing theory.

From queuing theory, we know that a customer's endpoint will handle a certain number of parallel requests very well, answering them in a stable response time as long as we don't cross a threshold. Beyond this threshold, some of the customer's downstream resources will perform poorly, causing response times to increase. If pushed too far, the customer would drop our messages altogether. The bottleneck may be anything from the number of CPU cores in the destination machine to disk IO in a downstream database.



Looking at the overall throughput, we see the best performance when we push just beyond where messages start taking a bit longer, but only a little beyond that. If we push too far, the increase in response time starts to decrease overall throughput.



The goal is to reach a sweet spot where throughput is as high as possible so that customers use their resources efficiently. However, this sweet spot changes continuously because it is a function of the interaction dynamics between all of the customer's systems. Thus, Ambar updates the sweet spot calculation continuously.

And how does the calculation work? We use latency values of completed requests, model how we expect customer systems to behave, and use error-corrected differences in response times from each host to continuously calculate the sweet spot and infer client overloading.

In practice, this means that if the bottleneck is database CPU, Ambar will not overwhelm the database. If the bottleneck is a downstream queue, Ambar will keep the queue optimally filled. If the bottleneck is lock contention over IO, Ambar will strike the optimal balance between too much contention and too little contention. Whatever the bottleneck is, Ambar will figure out the sweet spot for send-rates.


The Reward

The combination of Adapative Load and a push-based model results in multiple quality-of-life improvements for our customers, including:


  • Optimal utilization of our customer's resources, regardless of where bottlenecks live.

  • Overload prevention of our customer's resources (e.g., avoid downtime on shared databases).

  • First-class support in any language. There is no need for an SDK or libraries to use Ambar. Customers consume data streams by exposing an HTTP endpoint to which we push data.

  • Customers can scale their fleet or restart hosts without rebalances.

  • Lower network traffic to support idle consumers to thousands of partitions.


Solving the problem of optimal resource utilization in data streaming consumption takes time and effort. But Ambar's push-based consumption mechanism makes it easy, and unlike metrics-based solutions, it truly maximizes throughput because it detects all downstream bottlenecks.

Ambar's push-based consumption mechanism comes with Adaptive Load and many other benefits out-of-the-box. Read more in our other blog posts, or try Ambar out by signing up for an account.

This post will discuss Adaptive Load, a technique that allows our customers to ingest data streams while using their infrastructure optimally.


The Problem

When you consume a large data stream, you can process it and save the results, take an action, or both. To avoid incidents, you must consume fast enough to stay caught up but slow enough that downstream resources such as databases and APIs are never overloaded. That balancing act is challenging because other systems might be using those downstream resources simultaneously. So you're left with three options:


  1. Overprovision downstream resources (e.g., a database with more CPU and RAM than it needs).

  2. Set a fixed conservative consumption rate that never overloads downstream resources.

  3. Dynamically adjust consumption rates to maximize throughput, adapting to the load capacity of downstream resources in real-time.


The first and second options waste money by underutilizing downstream resources. The second option risks incidents because consumption could be too slow. The third option requires too much work and often fails anyway; for example, dynamically adjusting consumption rates based on downstream metrics such as CPU and RAM does not maximize throughput in practice because there are many other bottlenecks (each with their own metric).


The Solution: Adaptive Load

But what if you could maximize throughput and consume data at optimal rates? And what if you could do it without investing time and effort? At Ambar, we built precisely this, allowing our customers to delegate the real-time calculation of optimal consumption rates to us. In the rest of this post we will explain how we do this with Adaptive Load, a technique for dynamically adjusting our data delivery rate to a customer based on estimated saturation levels inferred from latency measurements.


Ambar is a data streaming service with end-to-end correctness guaranteed out-of-the-box that can be deployed in 15 minutes with minimal configuration. Instead of a pull-based consumption model, such as what's used in Kafka, Ambar delivers message streams to our customers through push. Our message delivery rate is continuously updated to ensure we saturate customers without overwhelming them. We call this Adaptive Load.

Adaptive Load provides a backpressure mechanism, regardless of why downstream resources are overloaded. It does not matter if the bottleneck is a downstream API, a database's CPU, a network switch, a database's IO limitations, or anything else. Adaptive Load optimizes for maximum throughput and does not require our customers to install software, use a library, or report health information to Ambar.

Additionally, Ambar's push-based model dramatically simplifies integration. It is compatible with all languages without needing consumer libraries, and it allows our customers to scale their infra or restart their machines without consumer rebalancing pauses.


The Details

Ambar's push-based approach calculates a limit for customer capacity and sends messages at a rate that respects that limit. The estimate cannot be too conservative because it would lead to an underutilization of customer resources. On the other hand, an excessive estimate would overwhelm customer resources and cause incidents. And to top it all off, customer capacity might decrease momentarily, so we need to adjust send-rates dynamically or, again, risk overwhelming our customers. To face these challenges, we rely on queuing theory.

From queuing theory, we know that a customer's endpoint will handle a certain number of parallel requests very well, answering them in a stable response time as long as we don't cross a threshold. Beyond this threshold, some of the customer's downstream resources will perform poorly, causing response times to increase. If pushed too far, the customer would drop our messages altogether. The bottleneck may be anything from the number of CPU cores in the destination machine to disk IO in a downstream database.



Looking at the overall throughput, we see the best performance when we push just beyond where messages start taking a bit longer, but only a little beyond that. If we push too far, the increase in response time starts to decrease overall throughput.



The goal is to reach a sweet spot where throughput is as high as possible so that customers use their resources efficiently. However, this sweet spot changes continuously because it is a function of the interaction dynamics between all of the customer's systems. Thus, Ambar updates the sweet spot calculation continuously.

And how does the calculation work? We use latency values of completed requests, model how we expect customer systems to behave, and use error-corrected differences in response times from each host to continuously calculate the sweet spot and infer client overloading.

In practice, this means that if the bottleneck is database CPU, Ambar will not overwhelm the database. If the bottleneck is a downstream queue, Ambar will keep the queue optimally filled. If the bottleneck is lock contention over IO, Ambar will strike the optimal balance between too much contention and too little contention. Whatever the bottleneck is, Ambar will figure out the sweet spot for send-rates.


The Reward

The combination of Adapative Load and a push-based model results in multiple quality-of-life improvements for our customers, including:


  • Optimal utilization of our customer's resources, regardless of where bottlenecks live.

  • Overload prevention of our customer's resources (e.g., avoid downtime on shared databases).

  • First-class support in any language. There is no need for an SDK or libraries to use Ambar. Customers consume data streams by exposing an HTTP endpoint to which we push data.

  • Customers can scale their fleet or restart hosts without rebalances.

  • Lower network traffic to support idle consumers to thousands of partitions.


Solving the problem of optimal resource utilization in data streaming consumption takes time and effort. But Ambar's push-based consumption mechanism makes it easy, and unlike metrics-based solutions, it truly maximizes throughput because it detects all downstream bottlenecks.

Ambar's push-based consumption mechanism comes with Adaptive Load and many other benefits out-of-the-box. Read more in our other blog posts, or try Ambar out by signing up for an account.

Over $3B in Transactions Processed

Over $3B in Transactions Processed

Discover what Ambar could do for you!

Discover what Ambar could do for you!