Async/Await III: Moving Forward with Something Shippable

February 4, 2018

In the first post, we looked at the relationship between generators and a more general notion of self-references. In the second post, we narrowed down exactly what problem we need to solve to make generators work, and talked about some solutions that we’ve considered but don’t feel like we could ship in the near future.

In the original post, I promised that I would have a near term solution by the end of this series. In this post, I reveal the trick: I do not have some groundbreaking, never before seen solution to the problem of immovability. What I have is a proposal to use the powers we’ve had inside us all along to rise to this new challenge.

The true costs of immovability are not high

The whole problem with self-referential generators is that you can’t move them while you’re running them. This is necessarily a restriction on how you can use the generators, but its worth pointing out that most of the ways you would want to use a generator, this isn’t actually a problem.

First, I’d like to make one important distinction: this immovability only matters once you start calling resume on the generator. In the case of iterators, this corresponds to the next method, whereas for futures, it corresponds to poll. Until you start iterating or polling, your generator can be moved as much as you like, because it hasn’t started running yet.

This means that calling combinators (which usually consume self by value) does not invalidate a self-referential generator. You can make a generator, then map over it, filter it, and so on, and it will be fine. You just can’t use a combinator after you’ve started calling next or poll. Having said that, let’s consider how iterators and futures are usually used in practice.

In most cases, iteration occurs in one of two ways:

You use the iterator with a for loop. This consumes the iterator, and never moves it while iterating.
You use the iterator with one of its consuming combinators, like collect or all. These combinators don’t need to move the iterator while consuming it either.

There are exceptions to this of course - maybe the first value of the iterator is special, so you call .next to get that, and then .collect to get the rest of the values. But in general, you don’t need to move the iterator between calls to .next.

Futures are more interesting. Normally, futures are put onto an event loop, over which they are evaluated. Because the event loop can handle many different kinds of futures, internally the event loop will tend to hold a collection of Box<Future>.

Boxing a generator before you start resuming it has a very important impact - once boxed, the generator is stored in the heap. What you move around is not the generator itself, but the pointer to it. This means that you can move a Box<Future> around as much as you like, even if it is self-referential, and it won’t invalidate that future at all (you just can’t move the future out of the box.)

So when we look at the restriction of immovability, we find out that by convention and API design, you are already very unlikely to want to move a self-referential generator in an invalid way:

While you’re building up an iterator or future using combinators, you can move as much as you like.
Once you’ve started processing your generator, its not normal to want to move it all.
If you do need to be able to move it, putting it into the heap is a fine escape hatch.

Its important to remember also that the restrictions of immovability are fundamental. There is no proposal which would allow you to call next on a self-referential iterator, move it somewhere else, and call next again. What makes this whole idea work is that you rarely have a need to do it, and if you do, you can box it up to let you write that code with the performance cost that it necessarily entails.

Could I have a minute of your time to talk about `unsafe`?

One more digression before we get to the proposal.

I want to bring you back to 2014, when Rust was being shaped into the language that it is now. It was realized around this time that the core concept of ownership and borrowing was expressive enough to program without garbage collection.

There was a problem, though: what do we do about concurrency? Throughout Rust’s pre-development, various concepts had been built into the language to deal with concurrent code: Send and Sync were built into the language for a long time, and for a while it had built-in message passing primitives, greenthreads and so on. But it was realized, brilliantly, that none of this complexity in the language was necessary. Using bits of unsafe code, library authors could extend the core concepts of the language to eliminate data races.

It become a point of pride that despite having this data race freedom, the core language had no notion of concurrency. Using unsafe code, the standard library had been able to create a basis for constructing any kind of concurrency primitive while avoiding data races. Nowadays, the standard library provides Send and Sync as the core unsafe “marker traits” of concurrency, as well as several concurrency primitives (Mutex, RwLock, channels, etc). Other libraries, like crossbeam and parking_lot provide their own primitives with different trade offs.

We’ve similarly applied this concept to problems like splitting up contiguous collections (slice’s split_at_mut) and abstracting over OS resources like files and networking primitives. With a bit of carefully written unsafe at the foundation, users can build castles of safe code on top and know that their invariants will hold.

Using unsafe to solve immovable generators

Considering the above comments and the previous posts, here are the facts as they stand today:

We want to have generators and async/await in the language as soon as possible. Preferably on stable by the end of 2018.
References in generators are a subset of the self-referential struct problem, a problem which is large, important, and difficult.
We are unlikely to have a type system based solution to self-referential structs any time in the near future (even 2 or 3 years from now seems very optimistic).
By convention, users are unlikely to run into a situation in which they would even attempt to violate the immovability property.
We can use unsafe to extend the type system through libraries to support constraints we don’t have types for.

Laid out like this, a potential path forward seems apparent: can we support self-referential generators in the near term using unsafe code, eventually replacing that unsafe code with whatever general solution to self-references appears in the future?

As we discussed previously, fundamentally there are two kinds of generators: generators that can borrow across yields and generators that can be moved. So we would propose to have two traits, one for each of these:

trait Generator {
    type Yield;
    type Return;

    // Users *must not* move `self` between the first time they call this
    // method until it returns the Self::Return value.
    unsafe fn unsafe_resume(&mut self) -> CoResult<Self::Yield, Self::Return>;
}

// GeneratorMove extends Generator by supporting moving between calls to
// resume.
//
// All generators that cannot borrow across yield points implement
// GeneratorMove.
trait GeneratorMove: Generator {
    fn resume(&mut self) -> CoResult<Self::Yield, Self::Return>;
}

Every generator implements Generator, and those which have been properly annotated as movable generators implement GeneratorMove as well. We will build abstractions around the use cases for self-referential generators so that in general users will not have to call unsafe_resume directly. Unless you’re implementing a foundational library, you should never need to write unsafe code.

We’ll keep it unstable to implement either of these traits (just like the Fn traits have been kept), and some day, when we have self-referential structs, we’ll deprecate the unsafe API on Generator and transition to a new and completely type-checked API. Until that time, we’ll encourage users to interact with generators through these safe abstractions.

Bridging a `Generator` to a `GeneratorMove`

The easiest and most general way to make a self-referential generator safe to call is to heap allocate it. However, just putting it in a Box is not enough, because you can trivially move a value out of a Box. We need to make sure that it can not be safely removed from wherever it is put.

Even Rc and Arc have this problem, because they support get_mut, which returns an Option<&mut T>. You can move something out of an &mut T in safe code using mem::replace. So we need a new abstraction which says: it is not safe to take an &mut reference or to move out of this.

Fortunately, such an abstraction is not very many lines of code. I would propose to call these abstractions “anchors.” The code looks like this:

pub struct Anchor<T> {
    data: T,
}

impl<T> Anchor<T> {
    fn new(data: T) -> Anchor<T> {
        Anchor { data }
    }

    // By marking this (otherwise safe) function `unsafe`, we make it possible
    // to get mutable access to the data, but only in `unsafe` code.
    pub unsafe fn get_mut(this: &mut Anchor<T>) -> &mut T {
	 &mut this.data
    }
}

// Anchor will deref to a shared reference in safe code just fine, you can't
// move out of that.
impl<T> Deref for Anchor<T> {
    type Target = T;
    fn deref(&self) -> &T {
        &self.data
    }
}

Now that we have an anchor, we can safely provide an implementation of GeneratorMove for an anchored Generator:

impl<G: Generator + ?Sized> GeneratorMove for Anchor<Box<G>> {
    fn resume(&mut self) -> CoResult<Self::Yield, Self::Return> {
        unsafe {
             Anchor::get_mut(self).resume()
        }
    }
}

// And the same for Anchor<Rc<G>> and Anchor<Arc<G>>

Note that unlike Cell, Anchor goes around the outside of the pointer type to guarantee a stable address; otherwise you could write code like this:

// Call resume once at one address.
let x: Box<Anchor<_>> = make_generator();
x.resume();

// Move the anchor out of the box.
let anchor: Anchor<_> = *x;

// Move the anchor into a new box at a different address.
let evil = Box::new(anchor);

// Call resume again.
evil.resume();

Prior art

It’s worth noting at this point that (uncoordinated with this blog series) a PR was merged into rustc to allow you to create self-referential generators. There are some choices in this PR that I think we’ll probably want to change, but it has fundamentally the same shape as what I’m describing here:

This PR uses an annotation to the distinguish the two kinds of generators - in this case, static is used to anotate generators that are self-referential.
This PR makes dealing with self-referential generators unsafe. In the PR, its unsafe to create a self-referential generator; this seems mistaken: creating the generator is not where the invariant needs to be upheld, resuming it is.

Finally, we should keep in mind that C++ has also been experimenting with having generators and async/await. Their solution has been to “anchor” every generator by heap allocating them every time. What’s exciting about generators in Rust is that we will be able to limit that to the cases where its actually necessary, because we have the ability to distinguish whether or not a generator references itself.

Conclusion

The APIs in this post, or something very close to them, provides a reasonable basis for moving forward with self-referential generators. There is still one question that I haven’t addressed: bridging this API to async/await and futures. We certainly don’t want to have to “anchor” every time we await an async fn.

The truth is that there’s actually a variety of ways this bridging could work - over the past month, we’ve thrown around maybe a half dozen different designs. In my next and final post, I’ll present what is currently my prefered solution.