Async/Await I: Self-Referential Structs

This is the first in a series of blog posts about generators, async & await in Rust. By the end of this series, I will have a proposal for how we could expediently (within the next 12 months) bring generators, async & await to stable Rust, and resolve some of the most difficult ergonomics problems in the futures ecosystem.

But that proposal is still several posts away. Before we can get to a concrete proposition, we need to understand the scope & nature of the problem we need to solve.

Generators and borrowing

Today, on nightly, Rust has a feature called generators. Generators allow you to create functions that can “yield,” maintaining their state so that they can be called again until finally they return. The feature as it exists is not very polished, but here’s an example of a function that returns a generator:

#![feature(generators, generator_trait, conservative_impl_trait)]

use std::ops::Generator;

pub fn up_to(limit: u64) -> impl Generator<Yield = u64, Return = u64> {
    move || {
	for x in 0..limit {
	     yield x;
	}
	return limit;
    }
}

This generator will yield every number from 0 to the limit (exclusive), and then return the limit itself.

Generators are a very powerful language extension with two important use cases:

  • Iterators: A generator which yields T and returns () can be treated as an Iterator of T.
  • Futures: A generator which yields () and returns Result<T, E> can be treated as a Future of T and E.

As a result of this, async functions & their await syntax can be implemented as sugar on top of the generator feature.

The limitation of borowing

The problem with generators appears quickly when you attempt to use them with references. Here is a small, toy example:

#![feature(generators)]

fn main() {
    || {
        let x: u64 = 1;
        let ref_x: &u64 = &x;
        yield 0;
        yield *ref_x;
    }
}

If you attempt to compile this, you’ll get an error:

error[E0626]: borrow may still be in use when generator yields

The problem is that borrows cannot be allowed across yield points. What this means for async functions is that you can never have a borrow for which the lifetime includes an await. This is a very serious limitation for this use case, and one that needs to be resolved before we can consider this feature fully implemented.

To understand why we cannot allow borrows across yield points, we need to talk about self-referential structs.

Self-referential structs

A feature of Rust that is often desirable is to have self-referential structs - that is, structures that can contain references to other fields within themselves. Here’s an example (using invalid syntax):

struct FooBar {
    array: [Baz; 16],
    pointer: &'array Baz,
}

This struct contains an array of Baz structs, as well as a pointer to an element of that array.

If we consider how this type will be laid out in memory, we know that it will contain sixteen Baz objects as well as space for a memory address, which will be the address of one of those sixteen objects. The problem occurs when we attempt to move a FooBar - moving will mempcy the entire struct into a new location, invalidating the pointer address, as it will now point to an address within the previous location that the struct was in.

Fully solving this problem is very challenging. There are several factors to consider:

  • What is this lifetime 'array? How do we conceptualize and represent the notion of a lifetime which is tied to a value within the same type?
  • What can we do with array during the lifetime 'array? In this case, absolutely nothing - we cannot move it or mutate it, both could invalidate the pointer field. But there are cases where we can move it (becaues the pointer actually points into the heap) but not mutate it. How do we distinguish those cases?

The connection to generators

This applies to the generator problem because of the compiled representation of a generator. A generator essentially creates an anonymous enum type; each of its variants is a state that it could be in it at a particular yield point. In each of those variants, the generator saves all of the variables that it needs to continue to keep working once it is resumed.

In this way, generators save a minimal representation of their “stack” when they yield. Rather than maintaining an entire statically sized stack for each generator, the generator’s size is the size of the largest amount of state it could need to preserve at any of its yield points. This is also how Futures have been designed to work in Rust, and one of the reasons they lead to better performance and lower memory overhead than systems like green threads.

The problem arises when some of that stack state that you’re preserving is a reference to other items you are preserving from the stack. Since all of this state is stored together in the “generator enum,” this becomes a special case of self-referential structs.

It’s worth noting that this problem manifests itself in the futures ecosystem today, and is one of the major pain points people experience. If you’ve done much with future combinators, you may have experienced it as having to Arc some data that you need to use both before and after an and_then call. This is because the and_then call acts like a yield point, and you cannot have a reference to that data on both sides of the yield point. Instead, you have to store that data in the heap and reference count it.

This Arcing situation is very unergonomic, making code noisy and difficult, and also just feels wrong - like you’re throwing away a lot of the benefits of Rust by just storing everything in reference counted heap allocations. If we could allow references across yield points, we could make the async story feel significantly nicer while also improving performance.

Conclusion

I’ve laid out the problem in the abstract. Next time, I’m going to dive deeper into how the problem impacts generators, narrow the scope of the immediate problem, and talk about some of the solutions we’ve tried so far.