Async/Await III: Moving Forward with Something Shippable
In the first post, we looked at the relationship between generators and a more general notion of self-references. In the second post, we narrowed down exactly what problem we need to solve to make generators work, and talked about some solutions that we’ve considered but don’t feel like we could ship in the near future.
In the original post, I promised that I would have a near term solution by the end of this series. In this post, I reveal the trick: I do not have some groundbreaking, never before seen solution to the problem of immovability. What I have is a proposal to use the powers we’ve had inside us all along to rise to this new challenge.
The true costs of immovability are not high
The whole problem with self-referential generators is that you can’t move them while you’re running them. This is necessarily a restriction on how you can use the generators, but its worth pointing out that most of the ways you would want to use a generator, this isn’t actually a problem.
First, I’d like to make one important distinction: this immovability only
matters once you start calling resume
on the generator. In the case of
iterators, this corresponds to the next
method, whereas for futures, it
corresponds to poll
. Until you start iterating or polling, your generator can
be moved as much as you like, because it hasn’t started running yet.
This means that calling combinators (which usually consume self
by value)
does not invalidate a self-referential generator. You can make a generator,
then map over it, filter it, and so on, and it will be fine. You just can’t use
a combinator after you’ve started calling next
or poll
. Having said that,
let’s consider how iterators and futures are usually used in practice.
In most cases, iteration occurs in one of two ways:
- You use the iterator with a
for
loop. This consumes the iterator, and never moves it while iterating. - You use the iterator with one of its consuming combinators, like
collect
orall
. These combinators don’t need to move the iterator while consuming it either.
There are exceptions to this of course - maybe the first value of the iterator
is special, so you call .next
to get that, and then .collect
to get the
rest of the values. But in general, you don’t need to move the iterator between
calls to .next
.
Futures are more interesting. Normally, futures are put onto an event loop,
over which they are evaluated. Because the event loop can handle many different
kinds of futures, internally the event loop will tend to hold a collection of
Box<Future>
.
Boxing a generator before you start resuming it has a very important impact -
once boxed, the generator is stored in the heap. What you move around is not
the generator itself, but the pointer to it. This means that you can move a
Box<Future>
around as much as you like, even if it is self-referential, and
it won’t invalidate that future at all (you just can’t move the future out
of the box.)
So when we look at the restriction of immovability, we find out that by convention and API design, you are already very unlikely to want to move a self-referential generator in an invalid way:
- While you’re building up an iterator or future using combinators, you can move as much as you like.
- Once you’ve started processing your generator, its not normal to want to move it all.
- If you do need to be able to move it, putting it into the heap is a fine escape hatch.
Its important to remember also that the restrictions of immovability are
fundamental. There is no proposal which would allow you to call next
on a
self-referential iterator, move it somewhere else, and call next
again. What
makes this whole idea work is that you rarely have a need to do it, and if you
do, you can box it up to let you write that code with the performance cost that
it necessarily entails.
Could I have a minute of your time to talk about unsafe
?
One more digression before we get to the proposal.
I want to bring you back to 2014, when Rust was being shaped into the language that it is now. It was realized around this time that the core concept of ownership and borrowing was expressive enough to program without garbage collection.
There was a problem, though: what do we do about concurrency? Throughout Rust’s
pre-development, various concepts had been built into the language to deal with
concurrent code: Send
and Sync
were built into the language for a long
time, and for a while it had built-in message passing primitives, greenthreads
and so on. But it was realized, brilliantly, that none of this complexity in
the language was necessary. Using bits of unsafe code, library authors could
extend the core concepts of the language to eliminate data races.
It become a point of pride that despite having this data race freedom, the core
language had no notion of concurrency. Using unsafe code, the standard library
had been able to create a basis for constructing any kind of concurrency
primitive while avoiding data races. Nowadays, the standard library provides
Send
and Sync
as the core unsafe “marker traits” of concurrency, as well as
several concurrency primitives (Mutex, RwLock, channels, etc). Other libraries,
like crossbeam and parking_lot provide their own primitives with different
trade offs.
We’ve similarly applied this concept to problems like splitting up contiguous
collections (slice’s split_at_mut
) and abstracting over OS resources like
files and networking primitives. With a bit of carefully written unsafe at the
foundation, users can build castles of safe code on top and know that their
invariants will hold.
Using unsafe to solve immovable generators
Considering the above comments and the previous posts, here are the facts as they stand today:
- We want to have generators and async/await in the language as soon as possible. Preferably on stable by the end of 2018.
- References in generators are a subset of the self-referential struct problem, a problem which is large, important, and difficult.
- We are unlikely to have a type system based solution to self-referential structs any time in the near future (even 2 or 3 years from now seems very optimistic).
- By convention, users are unlikely to run into a situation in which they would even attempt to violate the immovability property.
- We can use
unsafe
to extend the type system through libraries to support constraints we don’t have types for.
Laid out like this, a potential path forward seems apparent: can we support
self-referential generators in the near term using unsafe
code, eventually
replacing that unsafe
code with whatever general solution to self-references
appears in the future?
As we discussed previously, fundamentally there are two kinds of generators: generators that can borrow across yields and generators that can be moved. So we would propose to have two traits, one for each of these:
trait Generator {
type Yield;
type Return;
// Users *must not* move `self` between the first time they call this
// method until it returns the Self::Return value.
unsafe fn unsafe_resume(&mut self) -> CoResult<Self::Yield, Self::Return>;
}
// GeneratorMove extends Generator by supporting moving between calls to
// resume.
//
// All generators that cannot borrow across yield points implement
// GeneratorMove.
trait GeneratorMove: Generator {
fn resume(&mut self) -> CoResult<Self::Yield, Self::Return>;
}
Every generator implements Generator
, and those which have been properly
annotated as movable generators implement GeneratorMove
as well. We will
build abstractions around the use cases for self-referential generators so that
in general users will not have to call unsafe_resume
directly. Unless
you’re implementing a foundational library, you should never need to write
unsafe code.
We’ll keep it unstable to implement either of these traits (just like the Fn
traits have been kept), and some day, when we have self-referential structs,
we’ll deprecate the unsafe API on Generator
and transition to a new and
completely type-checked API. Until that time, we’ll encourage users to
interact with generators through these safe abstractions.
Bridging a Generator
to a GeneratorMove
The easiest and most general way to make a self-referential generator safe to
call is to heap allocate it. However, just putting it in a Box
is not enough,
because you can trivially move a value out of a Box
. We need to make sure
that it can not be safely removed from wherever it is put.
Even Rc
and Arc
have this problem, because they support get_mut
, which
returns an Option<&mut T>
. You can move something out of an &mut T
in safe
code using mem::replace
. So we need a new abstraction which says: it is not
safe to take an &mut
reference or to move out of this.
Fortunately, such an abstraction is not very many lines of code. I would propose to call these abstractions “anchors.” The code looks like this:
pub struct Anchor<T> {
data: T,
}
impl<T> Anchor<T> {
fn new(data: T) -> Anchor<T> {
Anchor { data }
}
// By marking this (otherwise safe) function `unsafe`, we make it possible
// to get mutable access to the data, but only in `unsafe` code.
pub unsafe fn get_mut(this: &mut Anchor<T>) -> &mut T {
&mut this.data
}
}
// Anchor will deref to a shared reference in safe code just fine, you can't
// move out of that.
impl<T> Deref for Anchor<T> {
type Target = T;
fn deref(&self) -> &T {
&self.data
}
}
Now that we have an anchor, we can safely provide an implementation of
GeneratorMove
for an anchored Generator
:
impl<G: Generator + ?Sized> GeneratorMove for Anchor<Box<G>> {
fn resume(&mut self) -> CoResult<Self::Yield, Self::Return> {
unsafe {
Anchor::get_mut(self).resume()
}
}
}
// And the same for Anchor<Rc<G>> and Anchor<Arc<G>>
Note that unlike Cell
, Anchor
goes around the outside of the pointer
type to guarantee a stable address; otherwise you could write code like this:
// Call resume once at one address.
let x: Box<Anchor<_>> = make_generator();
x.resume();
// Move the anchor out of the box.
let anchor: Anchor<_> = *x;
// Move the anchor into a new box at a different address.
let evil = Box::new(anchor);
// Call resume again.
evil.resume();
Prior art
It’s worth noting at this point that (uncoordinated with this blog series) a PR was merged into rustc to allow you to create self-referential generators. There are some choices in this PR that I think we’ll probably want to change, but it has fundamentally the same shape as what I’m describing here:
- This PR uses an annotation to the distinguish the two kinds of generators -
in this case,
static
is used to anotate generators that are self-referential. - This PR makes dealing with self-referential generators
unsafe
. In the PR, itsunsafe
to create a self-referential generator; this seems mistaken: creating the generator is not where the invariant needs to be upheld, resuming it is.
Finally, we should keep in mind that C++ has also been experimenting with having generators and async/await. Their solution has been to “anchor” every generator by heap allocating them every time. What’s exciting about generators in Rust is that we will be able to limit that to the cases where its actually necessary, because we have the ability to distinguish whether or not a generator references itself.
Conclusion
The APIs in this post, or something very close to them, provides a reasonable
basis for moving forward with self-referential generators. There is still one
question that I haven’t addressed: bridging this API to async/await and
futures. We certainly don’t want to have to “anchor” every time we await
an
async fn.
The truth is that there’s actually a variety of ways this bridging could work - over the past month, we’ve thrown around maybe a half dozen different designs. In my next and final post, I’ll present what is currently my prefered solution.