Generators II: The Question Mark Problem

This is my second post on the design of generators. In the first post, I outlined what an MVP of the feature would look like. In this post, I want to take a look at the first design issue for the feature: how it integrates with the ? operator.

To explain exactly what I mean, let’s start with a specific motivating example:

// This generator yields the number of alphanumeric characters in every line
// in some io::Read'able data
// exact sign function declaration syntax left unspecified on purpose
|data| {
    let mut buffered_data = BufReader::new(data);
    let mut string = String::new();
    while buffered_data.read_line(&mut string)? != 0 {
        yield string.matches(|c| c.is_alphanumeric()).count();
        string.clear();
    }
}

To reiterate, this generator takes something that can be read from (like a File or stdin or a TCP connection) and yields repeatedly the number of alphanumeric characters in each line of that readable source. This generator should, as easily as possible, be treatable as an iterator.

The problem is ?. Because this performs IO, this has to deal with io::Error, which it does by using ? in this expression:

buffered_data.read_line(&mut string)? != 0

Under the most straightforward interpretation of both ? and the generator feature, this generator would yield usize and return io::Result<()> (and in fact would need a terminal Ok(()) expression at the end of it). But we want to be able to make this into an Iterator with an Item of io::Result<usize>. This is how iterators expression their fallibility: they yield an error and then stop yielding; they don’t have the ability to have a “terminal” result value.

The problem becomes an acute problem in that the sort of obvious transform through trait impls is, unfortunately, incoherent:

impl<G> Iterator for G where
    G: Generator<Return = ()>
{
    type Item = G::Yield;
}

impl<G, E> Iterator for G where
    G: Generator<Return = Result<(), E>>
{
    type Item = Result<G::Yield, E>;
}

These are both blanket impls for generators, so they are considered by rustc to overlap. I know of three potential solutions.

Solution 1: Function adapters

The first solution is to not use blanket impls at all, and instead - in at least one of these cases - have to write some function call around your generator to get an iterator out of it:

mod iter {
    fn gen<G>() -> impl Iterator<Item = G::Yield> where
        G: Generator<Return = ()>
    { ... }

    fn try_gen<G, E>() impl Iterator<Item = Result<G::Yield, E> where
        G: Generator<Return = Result<(), E>>
    { ... }
}

This would be a rather unpleasant outcome, in my opinion. You’d have to write this very unnecessary seeming iter::try_gen(generator()) wrapper every time you tried to iterate through a generator. Hopefully, we can do better.

Solution 2: Wait for chalk to solve this

The funny thing about this incoherence is that the overlapping set can be trivially proven to contain no actual values (in other words, there is not, in fact, any overlap at all).

Generators can have only one return type, and that type cannot be both () and Result<(), E>. This is just a classic case of a well-known problem with the current implementation of coherence: our compiler cannot reason from disjoint associated types that two trait bounds must also disjoint.

However, this is exactly the sort of extension to the language that chalk would very easily be able to support. So one option, depending on how chalk integration goes, is enhancing our coherence rules to support these kinds of impls, a feature that we’ve already talked about.

(Note that supporting impls which return not just result but any arbitrary type that implements the Try trait is also possible, but would require lattice specialization, a different extension which is also possible under chalk).

Solution 3: Don’t have a return type

This solution, which is a bit more radical, was proposed by Josh Triplett at the Rust All Hands. First, we would disallow generators from having a return type at all. Instead, their return type would always be (). Second, we would change the desugaring of ? in a generator to yield the error type and then, after the yield statement, early return (with no value, because generators do not return values).

In some ways, this solution is simpler: mainly we don’t have to worry about supporting generators which return different types, either in cases like this or in the syntax. However, it has a few downsides:

  • Often - in the non-MVP cases - users do want to return something other than (). For example, the async/await proc macros absolutely wanted this. If we don’t allow that, they will have to build their own enum into their state machine, wrapping yields appropriately, and there will be an extra variant on their return value & state machines for the “actually returned” case, costing them some runtime overhead (though how important this is I have no idea).
  • It’s less orthogonal in some meaningful senses. Generators still do return after all, and can support return expressions, but now in the generator context the return expression can only take an expression of type () (or maybe can’t, syntactically, take an expression at all?). The desugaring of ? also becomes context-dependent.

Conclusion

I think having no return type on generators has some nice advantages (it certainly isn’t obvious to most users on encountering the feature that they can also return things from generators as well as yield things), though I also think the ways it is non-orthogonal present a problem. I’m currently leaning away from that solution, but I’m not completely sold one way or the other.

However, I think that if we do have two separate return types, that necessitates taking solution 2 over solution 1. In other words, generators with a separate return and yield type, as a feature, requires that we make trait bonds with disjoint associated types into disjoint bounds. Depending on the speed with which chalk progresses (and the speed with which generators progress) this might not be such a big problem. We’ll see!

Not sure what I’ll cover in the next post in this series: possibly the requirements on the syntax for generators, possibly generators that are Unpin. It depends on what I manage to work on next.