Iterator, Generator

I have been devoting a lot of my free time in the past month to thinking about structured concurrency, and a blog post about that is coming soon, but first I want to revisit iterators and generators.

In a previous post, I wrote about one of the hardest problems for generators: self-referential generators. Unlike the Future trait when we were designing async functions, the Iterator trait is already stable, and it does not take a pinned reference to itself. This means an Iterator cannot be self-referential.

In my previous post, I said that I think Iterator should have taken a pinned reference, and I listed 3 options as to what to do about the fact that it doesn’t:

  • Disallow self-referential generators.
  • Have generators return something other than Iterator, and require pinning to use generators with for loops.
  • Deprecate Iterator and move the ecosystem to a new trait.

I wrote then that I thought the first of these options was the best option, and I still lean that way. Mainly, I think that it is non-disruptive, immediately shippable, and if it proves inadequate, forward compatible with “fixing” the problem (e.g. by adding a new modifier for self-referential generators). But I want to explore the alternatives more thoroughly, particular the third one.

So I want to ask the question: what would it look like to shift the ecosystem from Iterator to a new trait? For the sake of this post, let’s say the new trait is called Generator. It has the same interface as Iterator, except it is pinned:

trait Generator {
    type Item;
    fn next(self: Pin<&mut Self>) -> Option<Self::Item>
}

Similarly, there would need to be an equivalent of IntoIterator, let’s call it IntoGenerator:

trait IntoGenerator {
    type Item;
    type IntoGen: Generator<Item = Item>;
    fn into_gen(self) -> Self::IntoGen;
}

And based on these two, the desugaring of for loops would change so that after it constructs the iterator/generator, it pins it before the loop begins. All of this works out great, the problems emerge as you try to make it backwards compatible with everything that already exists.

Bridge impls & coherence

It would be necessary to add bridge impls, so all iterators are also now generators. Similarly, all IntoIterators would need to also be IntoGenerators.

The problem emerges with the fact that all Iterators are also all IntoIterators, and presumably all Generators would also be IntoGenerators (for the same reason that impl exists today: so that you can pass both Generators and IntoGenerators to for loops). And this creates a basic diamond incoherence problem:

impl<T: Iterator> IntoIterator for T { }
Impl<T: Iterator> Generator for T {}
Impl<T: IntoIterator> IntoGenerator for T {}
impl<T: Generator> IntoGenerator for T {}

By which of these two paths does Iterator implement IntoGenerator?:

                ┌──────────────────┐                 
                │                  │▒▒               
          ┌─────│     Iterator     │─────┐           
          │     │                  │▒▒   │           
          │     └──────────────────┘▒▒   │           
          ▼       ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒   ▼           
┌──────────────────┐           ┌──────────────────┐  
│                  │▒▒         │                  │▒▒
│   IntoIterator   │▒▒         │    Generator     │▒▒
│                  │▒▒         │                  │▒▒
└──────────────────┘▒▒         └──────────────────┘▒▒
  ▒▒▒▒▒▒▒▒│▒▒▒▒▒▒▒▒▒▒▒           ▒▒▒▒▒▒▒▒│▒▒▒▒▒▒▒▒▒▒▒
          │     ┌──────────────────┐     │           
          │     │                  │▒▒   │           
          └────▶│  IntoGenerator   │◀────┘           
                │                  │▒▒               
                └──────────────────┘▒▒               
                  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒

Fortunately, the impls result in the same code generation, but somehow this would have to be solved. Possibly, the compiler could have a special case coherence exception for these impls.

Bridge impls & pinning

Another problem arises from the change to pinning. Let’s actually look at the impl of Generator for Iterator:

impl<I: Iterator> Generator for I {
    type Item = <I as Iterator>::Item;
    fn next(self: Pin<&mut Self>) -> Option<Self::Item> {
        // Is this sound?
        unsafe { Iterator::next(Pin::get_unchecked_mut(self)) }
    }
}

To call Iterator::next, you need an unpinned mutable reference (this is, after all, the whole problem with Iterator). But an Iterator might be !Unpin, so the only way to access that is by unsafely unpinning the reference. Is that code sound?

At first, it seems like the answer should be no. It’s completely conceivable to define an Iterator that moves out of itself in its next method, and also doesn’t implement !Unpin, and has other pinned methods that depend on that fact. However, on reflection, I think that any such method would necessarily contain unsafe code, and therefore the Rust team could declare that code to contain the unsoundness.

This would be similar to the situation with Drop: Drop really ought to be pinned as well, but since it wasn’t, what we did was declare that if you move out of drop, it is your responsibility to make sure your type also depend on its pinning guarantee. A similar requirement could be imposed if your type implements Iterator.

However, it’s important to note that imposing this requirement after the fact is technically a breaking change, because it would be further restricting the soundness requirements on Pin. I believe any code that is in violation of it would be totally pathological though, and it’s exactly the sort of technically-breaking soundness change that the language team has been comfortable making in the past.

The key difference, I suppose, is that in the past those changes were to fix holes in existing features, whereas this would be changing the rules to support a new feature. The team should be very cautious about how much they move in that direction.

FromIterator and Extend

So far we’ve racked up a special case coherence exception and a technically breaking soundness change. Here’s something where I think the team simply cannot make this a completely smooth transition, and libraries will have to be updated by their authors to be compatible with generators.

Both FromIterator and Extend have the IntoIterator interface in their signature. Since generators do not implement Iterator or IntoIterator, there would need to be new traits: let’s call them FromGenerator and Grow. These would be identical to FromIterator and Extend, except they would take IntoGenerator.

The problem is that there can be no blanket impl of FromIterator to FromGenerator, or from Extend to Grow, because there is no way to pass an IntoGenerator to a function expecting an IntoIterator. It’s completely possible that a FromIterator interfaces moves the iterator around between calls to next, and therefore would violate the pinning requirement.

I haven’t investigated, but I would be very surprised if any of the types that impl FromIterator and Extend in the standard library can’t also implement the generator equivalents. But any third party library would need to add implementations of the new traits manually to be compatible with generators.

Update

Giacomo Stevananto on Reddit and Steven Portzer on Twitter have both pointed out to me that there is actually a way to write the blanket impls here.

There should probably be a blanket implementation of Iterator for Pin<&mut impl Generator> - though a generator is not an iterator, a pinned reference to a generator is. Using this, the blanket impls of FromGenerator for FromIterator and Grow for Extend could be implemented by first pinning the generator, and then passing it to the interface.

Social costs

I think this is a complete overview of the technical costs, but what overrides that I think is the social costs. Every existing documentation about iterators (a core language feature) would be out of date. If possible it would need to be updated, but there would still be a huge amount of documentation that isn’t updated (including printed material) that would just be wrong. Every Rust user would need to learn about the new trait and the transition and cope with it. In the future, new users would still encounter the old interface in documentation and old code, and have to learn about the transition and what to do.

It would be the biggest change since the 2018 edition, possibly even bigger, and Rust has orders of magnitude more existing users than it did in 2018. Is it worth it? Maybe. One problem is that it’s really difficult with current data to identify how important self-referential generators would be.

What gives me pause is that this is not the only such socially massive shift I’ve seen the team publicly contemplating. Even setting aside proposals to introduce new axes of abstraction, there has been talk about things like introducing a Leak trait and so on. These kinds of changes, which may technically meet the project’s stability guarantees but are still extremely disruptive, cannot be made repeatedly, or they will damage the community’s trust in the project and the language’s external reputation.