Propane: an experimental generator syntax for Rust
I’ve just released a new crate called propane, which is a library for writing generator functions. It can only run on nightly:
#![feature(generators, generator_trait, try_trait)]
#[propane::generator]
fn fizz_buzz() -> String {
for x in 1..101 {
match (x % 3 == 0, x % 5 == 0) {
(true, true) => yield String::from("FizzBuzz"),
(true, false) => yield String::from("Fizz"),
(false, true) => yield String::from("Buzz"),
(..) => yield x.to_string(),
}
}
}
Note that I wrote this library in an afternoon (cribbing from work I’d previously done in fehler and futures-await). It’s very likely that the macro barfs on a lot of input, some exotic and some not. It currently only works at all on free functions. And there are a lot of features to be desired, such as support for async generators. But if you’d like to try out writing generator functions: please go ahead!
The goal of writing this library is to test whether or not a certain point in the design space is an effective solution or not. There are two important design decisions I’ve made in implementing propane, that I want to test.
Iterators, return types, and the ?
operator
Functions constructed using the generator attribute evaluate to an impl Iterator
type. This allows them to be neatly used with all the iterator adapters. It also means they do not have a separate “return type” that’s distinct from their “yield type” - they simply yield many values, and then eventually yield None
and terminate.
This is because I want to draw a distinction between generators - functions that yield many values - and a general purpose coroutine mechanism, which I don’t think is best solved by making generators into a coroutine swiss army knife.
Inside the body of a generator, you can still use the return
keyword to halt the function, but it cannot take an expression. The “return” type of a generator must be ()
.
However, this becomes complicated by the behavior of the ?
operator. In general, we would like the ?
operator to work inside generators. And so it does work inside propane generators, but with a slightly different desugaring from in normal functions. When the ?
operator encounters an error case, instead of “returning” the error, it yields the error, and then terminates the function on the next call. So a generator which yields Result
s will, on an error case, yield the error and then terminate, just like iterators of Result
s often do.
Self-referential generators and the Iterator interface
There is a problem lurking in the design of generators which we have not solved. The problem is this: Iterator::next
does not take a pinned reference to self
. This means, it is safe to call next
, move the iterator to a new location, and then call next
again. Users who followed the development of async/await will likely understand the implication: it is not safe for an iterator to contain self-references into its stack state.
Supporting self-references was essential to making async/await ergonomic enough for general use. But in order to do that in a zero-cost manner, we had to include pinning in the interface of Future::poll
. Because the interface of Iterator::next
is already stable (and has been since 1.0), there is no way to change that interface. So how will we bridge the gap? If we allow self-referential iterator generators, we need some way to pin them in place before next gets called in order for this to be sound. The worst solution, but by far the clearest to implement, is to heap allocate every generator function’s state.
However, there is another easy solution which is zero cost: just disallow self-references in generators, unlike async functions. This is what I’ve implemented in propane. It was fortunately very easy because generators which disallow self-references already exist in nightly Rust: the nightly syntax has two versions of generators, one which can be self-referential and one which can’t. propane uses the latter syntax.
The big question is: would generators be useful & ergonomic without self-references? Your immediate answer might be “no, of course not!” After all, didn’t we do all of this work to allow self-references in async fns because they would have been so ergonomic without it? It’s true, but consider this: before async fn, users using futures combinators were constantly bumping up against the problem of self-references. They would try to use a reference to something in one combinator, and then again in a subsequent combinator. The compiler would give them awful error messages, and then they would have to stick that thing in an Arc<Mutex>
to be able to use it in both places, even though the combinators sequence those two places.
This was actually a problem with the lack of support for self-references in combinators. Because futures ultimately get passed to an API like spawn
, which requires a 'static
future for it to so spawn onto an executor, they cannot contain non-self references: all of their state has to be self-contained. So users encountered the problem immediately. However, users using iterator combinators basically never run into this problem. This is because the way iterators are usually consumed (for
loops or adapters like collect
) don’t require the iterator to contain all of its state, iterators are allowed to be non-'static
(and usually are).
And so I have a hypothesis: maybe you don’t actually need self-referential generators. Maybe users can get by with having any referenced state exist outside of the generator, passed into it as an argument. Maybe this restriction is just fine for generators, sidestepping the problem with iterator’s interface completely.
I genuinely don’t know if this is the case; that’s why it’s an experiment. Maybe users will immediately run into annoying problems where they have to contort themselves to avoid self-referential generators. In that case, this experiment may be a failure, and we’ll have to come up with a better solution. Let’s find out!