Async/Await I: Self-Referential Structs
This is the first in a series of blog posts about generators, async & await in Rust. By the end of this series, I will have a proposal for how we could expediently (within the next 12 months) bring generators, async & await to stable Rust, and resolve some of the most difficult ergonomics problems in the futures ecosystem.
But that proposal is still several posts away. Before we can get to a concrete proposition, we need to understand the scope & nature of the problem we need to solve.
Generators and borrowing
Today, on nightly, Rust has a feature called generators. Generators allow you to create functions that can “yield,” maintaining their state so that they can be called again until finally they return. The feature as it exists is not very polished, but here’s an example of a function that returns a generator:
#![feature(generators, generator_trait, conservative_impl_trait)]
use std::ops::Generator;
pub fn up_to(limit: u64) -> impl Generator<Yield = u64, Return = u64> {
move || {
for x in 0..limit {
yield x;
}
return limit;
}
}
This generator will yield every number from 0 to the limit (exclusive), and then return the limit itself.
Generators are a very powerful language extension with two important use cases:
- Iterators: A generator which yields
T
and returns()
can be treated as an Iterator ofT
. - Futures: A generator which yields
()
and returnsResult<T, E>
can be treated as a Future ofT
andE
.
As a result of this, async functions & their await
syntax can be implemented
as sugar on top of the generator feature.
The limitation of borowing
The problem with generators appears quickly when you attempt to use them with references. Here is a small, toy example:
#![feature(generators)]
fn main() {
|| {
let x: u64 = 1;
let ref_x: &u64 = &x;
yield 0;
yield *ref_x;
}
}
If you attempt to compile this, you’ll get an error:
error[E0626]: borrow may still be in use when generator yields
The problem is that borrows cannot be allowed across yield points. What this
means for async functions is that you can never have a borrow for which the
lifetime includes an await
. This is a very serious limitation for this use
case, and one that needs to be resolved before we can consider this feature
fully implemented.
To understand why we cannot allow borrows across yield points, we need to talk about self-referential structs.
Self-referential structs
A feature of Rust that is often desirable is to have self-referential structs - that is, structures that can contain references to other fields within themselves. Here’s an example (using invalid syntax):
struct FooBar {
array: [Baz; 16],
pointer: &'array Baz,
}
This struct contains an array of Baz
structs, as well as a pointer to an
element of that array.
If we consider how this type will be laid out in memory, we know that it will
contain sixteen Baz
objects as well as space for a memory address, which will
be the address of one of those sixteen objects. The problem occurs when we
attempt to move a FooBar
- moving will mempcy the entire struct into a new
location, invalidating the pointer
address, as it will now point to an
address within the previous location that the struct was in.
Fully solving this problem is very challenging. There are several factors to consider:
- What is this lifetime
'array
? How do we conceptualize and represent the notion of a lifetime which is tied to a value within the same type? - What can we do with
array
during the lifetime'array
? In this case, absolutely nothing - we cannot move it or mutate it, both could invalidate thepointer
field. But there are cases where we can move it (becaues the pointer actually points into the heap) but not mutate it. How do we distinguish those cases?
The connection to generators
This applies to the generator problem because of the compiled representation of a generator. A generator essentially creates an anonymous enum type; each of its variants is a state that it could be in it at a particular yield point. In each of those variants, the generator saves all of the variables that it needs to continue to keep working once it is resumed.
In this way, generators save a minimal representation of their “stack” when they yield. Rather than maintaining an entire statically sized stack for each generator, the generator’s size is the size of the largest amount of state it could need to preserve at any of its yield points. This is also how Futures have been designed to work in Rust, and one of the reasons they lead to better performance and lower memory overhead than systems like green threads.
The problem arises when some of that stack state that you’re preserving is a reference to other items you are preserving from the stack. Since all of this state is stored together in the “generator enum,” this becomes a special case of self-referential structs.
It’s worth noting that this problem manifests itself in the futures ecosystem
today, and is one of the major pain points people experience. If you’ve done
much with future combinators, you may have experienced it as having to Arc
some data that you need to use both before and after an and_then
call. This
is because the and_then
call acts like a yield point, and you cannot have a
reference to that data on both sides of the yield point. Instead, you have to
store that data in the heap and reference count it.
This Arc
ing situation is very unergonomic, making code noisy and difficult,
and also just feels wrong - like you’re throwing away a lot of the benefits of
Rust by just storing everything in reference counted heap allocations. If we
could allow references across yield points, we could make the async story feel
significantly nicer while also improving performance.
Conclusion
I’ve laid out the problem in the abstract. Next time, I’m going to dive deeper into how the problem impacts generators, narrow the scope of the immediate problem, and talk about some of the solutions we’ve tried so far.