Shoutout to the folks over at /r/rust that helped guide some of the initial improvements in this post.
Channels in Rust are a really powerful tool for allowing cross-thread communication. I have used them in just about every single project, both professional and personal (e.g., Have I Been Squatted?). They enable this message passing pattern between threads that is safe, efficient, and dare I say, fun. This is particularly the case when using Tokio mpsc
channels and utils such as select!
. They really allow you to design actor-based systems that are quite resilient to change while doing a good job at utilising the underlying hardware.
During a recent livestream, Jon Gjengset raised a point which stuck with me. There are situations were I might be over-relying on mpsc
channels. Situations that typically look like a fan-in pattern where we have a large number of producers and a single consumer. When we have a relatively small and stable number of producers, this pattern works well.
Significantly increasing the number of producers can lead to congestion on the consumer side, where the overhead of receiving and processing a message can lead to message backlog or messages being dropped. This is particularly dangerous when there is no backpressure and producers keep producing messages at a faster rate than the consumer can process them.
This post will explore an alternative method of enabling this fan-in pattern and explore its performance characteristics.
Imagine a situation where we have a large burst of data that we need to quickly process and send to a streaming client. This is what we do with Have I Been Squatted?. You input a domain in the input box and we generate a large number of possible permutations using twistrs
. The permutations are then enriched with all sorts of data (e.g., geolocation, rdap/whois, server banners) and streamed back to you as quickly as possible.
As a baseline, we’ll have a simple socket server that listens to incoming tcp connections from a client. Think of this as a simple client-server connection.
loop
An actor client can be as simple as the following serial mpsc consumer.
// Wait for a message from one of our workers and send it to the server
loop
mpsc
Our first implementation for solving this problem can use an mpsc
channel, where we have a single consumer, sending messages through the socket, and a large number of producers enriching some data. We name these actor
and worker
respectively.
pub async
pub async
Arc<Mutex<Vec<_>>>
An alternative approach is to instead rely on Arc<Mutex<Vec<_>>>
to act as our buffer. The important aspects are the following:
Mutex
in favour of std::sync::Mutex
.Starting with our worker, which will look very similar to the mpsc
worker.
pub async
The actor is where things get interesting. We need to be able to swap the buffers between the worker and actor. This is done by using a std::sync::Mutex
to lock the buffer and swap it with a new buffer. This is done in a critical section to avoid contention.
pub async
This has the benefit of consuming as many messages as possible in a single pass, improving the overall throughput.