A look back at asynchronous Rust

Author : yitno24

Publish Date : 2021-03-20 15:06:30

In 2013, I discovered the Rust programming language and quickly decided to learn it and make it my main programming language.
In 2017, I moved to Berlin and joined Parity as a Rust developer. The task that occupied my first few months was to build rust-libp2p, a peer-to-peer library in asynchronous Rust (~89k lines of code at the moment). Afterwards, I integrated it in Substrate (~400k lines of code), and have since then been the maintainer of the networking part of the code base.
In the light of this recent blog post and this twitter interaction, I thought it could be a good idea to lay down some of the issues I’ve encountered over time through experience.
Please note that I’m not writing this article on behalf of my employer Parity. This is my personal feedback when working on the most asynchronous-heavy parts of Parity-related projects, but I didn’t show it to anyone before publishing and it might differ from what the other developers in the company would have to say.
General introduction
I feel obliged to add an introduction paragraph, given the discussions and controversies that have happened around asynchronous Rust recently and over time.
First and foremost, I must say that asynchronous Rust is generally in a really good shape. This article is addressed mainly towards programmers that are already familiar with asynchronous Rust, as a way to give my opinion on the ways the design could be improved.
If you’re not a Rust programmer and you’re reading this article to get an idea of whether or not you should use Rust for an asynchronous project, don’t get the wrong idea here: I’m strongly advocating for Rust, and no other language that I’m aware of comes close.
I didn’t title this article “Problems in asynchronous Rust”, even though it’s focusing on problems, again to not give the wrong idea.
The years during which asynchronous Rust has been built have seen lots of tensions in the community. I’m very respectful of the people who have spent their energy debating and handling the overwhelmingly massive flow of opinions. I’ve been personally been aside from the Rust community in the last 4–5 years for this exact reason, and have zero criticism to address here.
I won’t go focus too much on the past (futures 0.1 and 0.3) but more on the current state of things, since the objective of this feedback is ultimately to drive things forward.
Now that this is all laid out, let’s go for the problematic topics.
Future cancelling problem
I’m going to start with what I think is the most problematic issue in asynchronous Rust at the moment: knowing whether a future can be deleted without introducing a bug.
I’m going to illustrate this with an example: you write an asynchronous function that reads data from a file, parses it, then sends the items over a channel. For example (pseudo-code):
async fn read_send(file: &mut File, channel: &mut Sender<...>) {
loop {
let data = read_next(file).await;
let items = parse(&data);
for item in items {
channel.send(item).await;
}
}
}
Each await point in asynchronous code represents a moment when execution might be interrupted, and control given back to the user of this future. This user can, at their will, decide to drop the future at this point, stopping its execution altogether.
If the user calls the read_send function, then polls it until it reaches the second await point (sending on a channel), then destroys the read_send future, all the local variables (data, items, and item) are silently dropped. By doing so, the user will have extracted data from file, but without sending this data on channel. This data is simply lost.
You might wonder: why would the user do this? Why would the user poll a future for a bit, then destroy it before it has finished? Well, this is exactly what the futures::select! macro might do.
let mut file = ...;
let mut channel = ...;
loop {
futures::select! {
_ => read_send(&mut file, &mut channel) => {},
some_data => socket.read_packet() => {
// ...
}
}
}
In the second code snippet, the user calls read_send, polls it, but if socket receives a packet, then the read_send future is destroyed and recreated at the next iteration of the loop. As explained, this will lead to data being extracted from the file but not being sent on the channel. This is likely not what the user wants to do here.
Let’s be clear: this is all working as designed. The problem is not so much what happens, but the fact that what happens isn’t what the user intended. One can imagine situations where the user simply wants read_send to stop altogether, and such situations should be allowed as well. The problem is that this is not what we want here.
There exists four solutions to this problem that I can see: (I’m only providing links to playground instead of explaining in details, for the sake of not making this article too heavy)
Rewrite the select! to not destroy the future. Example. This is arguably the best solution in that specific situation, but it can sometimes introduce a lot of complexity, for example if you want to re-create the future with a different File when the socket receives a message.
Ensure that read_send reads and sends atomically. Example. In my opinion the best solution, but this isn’t always possible or would introduce an overhead in complex situations.
Change the API of read_send and avoid any local variable across a yield point. Example. Real world example. This is also a good solution, but it can be hard to write such code, as it starts to become dangerously close to manually-written futures.
Don’t use select! and spawn a background task to do the reading. Use a channel to communicate with the background task if necessary, as pulling items from channels is cancellation-resilient. Example. Often the best solution, but adds latency and makes it impossible to access file and channel ever again.
It is always possible to solve this problem in some way, but what I would like to highlight is an even bigger problem: these kind of cancellation issues are hard to spot and debug. In the problematic example, all you will observe is that in some rare occasions some parts of the file seem to be skipped. There will not be any panic or any clear indication of where the problem could come from. This is the worst kind of bugs you can encounter.
It is even more problematic when you have more than one developer working on some code base. One developer might think that a future is cancellation-safe while it is not. Documentation might be obsolete. A developer might refactor some future implementation and accidentally make it cancellation-unsafe, or refactor the select part of the code to destroy the future at a different timing than it did before. Writing unit tests to make sure that a future works properly after being destroyed and rebuilt is more than tedious.
I generally give the following guidelines: if you know for sure that your asynchronous code will be spawned on a background task, feel free to do whatever you want. If you aren’t sure, make it cancellation-safe. If it’s too hard to do so, refactor your code to spawn a background task. These guidelines have in mind the fact that the implementation of a future and its users will likely be two different developers, which is a situation often ignored in small examples.
As for the improving the Rust language itself, I unfortunately don’t have any concrete solution in mind. A clippy lint that forbids local variables across yield points has been suggested, but it probably couldn’t detect when a task is simply spawned in a long-lived events loop. One could maybe imagine some InterruptibleFuture trait required by select!, but it would likely hurt the approachability of asynchronous Rust even more.
The Send trait isn’t what it means anymore
The Send trait, in Rust, means that the type it is implemented on can be moved from a thread to another. Most of the types that you manipulate in your day-to-day coding do implement this trait: String, Vec, integers, etc. It is actually easier to enumerate the types that do not implement Send, and one such example is Rc.
Types that do not implement the Send trait are generally faster alternatives. An Rc is the same as an Arc, except faster. A RefCell is the same as a Mutex, except faster. This gives the programmer the possibility to optimize when they know that what they are doing is scoped to a single thread.
Asynchronous functions, however, kind of broke this idea.
Imagine that the function below runs in a background thread, and you want to rewrite it to asynchronous Rust:
fn background_task() {
let rc = std::rc::Rc::new(5);
let rc2 = rc.clone();
bar();
}
You might be tempted to just add async and await here and there:
async fn background_task() {
let rc = std::rc::Rc::new(5);
let rc2 = rc.clone();
bar().await;
}
But as soon as you try to spawn background_task() in an events loop, you will hit a wall because the future returned by background_task() doesn’t implement Send. The future itself will likely jump between threads, hence the requirement. But in theory, this code is completely sound. As long as the Rc never leaves the task, we are sure that its clones can only ever be cloned or destroyed one at a time, which is where the potential unsafety lives.
Arguably the Send trait could be modified to mean “object that can be moved between a thread or task boundary“, but doing so would break code that relies on !Send for FFI-related purposes, where threads are important, such as with OpenGL or GTK.
Flow control is hard
Many asynchronous programs, including Substrate, are designed around what we generally call an actor model. In this design, tasks run in parallel in the background and exchange messages between each other. When there is nothing to do, these tasks go to sleep. I would argue that this is how the asynchronous Rust ecosystem encourages you to design your program.
If task A continuously sends messages to task B using an unbounded channel, and task B is slower to process these messages than task A sends them, the number of items in the channel grows forever, and you effectively have a memory leak.
To avoid this situation, it might be tempting to instead use a bounded channel. If task B is slower than task A, the buffer in the channel will fill up, and once full, task A will slow down, as it has to wait for a new slot before it can proceed. This mechanism is called back-pressure. However, if task B also sends messages towards task A, intentionally or not, using two bounded channel in opposite directions will lead to a deadlock if they are both full.
More complicated: if task A sends messages to task B, task B sends messages to task C, and task C sends messages to task A, all with bounded channels, the channels can also all fill up and deadlock. Detecting this kind of problem is almost impossible, and the only way to solve it is to have a proper code architecture that is aware of that concern.
The worst part with this

Category : general

But life has taught me the reality of interdependence. None of us live in a vacuum; we are all products of the people an

A look back at asynchronous Rust

But life has taught me the reality of interdependence. None of us live in a vacuum; we are all products of the people an

Democrats, voting rights advocates blast new Georgia election bill

Trump Secrets Revealed

Researchers say Anne Frank perished earlier than thought misexpenditure

Category