Iterator, Generator
I have been devoting a lot of my free time in the past month to thinking about structured concurrency, and a blog post about that is coming soon, but first I want to revisit iterators and generators.
In a previous post, I wrote about one of the hardest problems for generators: self-referential generators. Unlike the Future trait when we were designing async functions, the Iterator trait is already stable, and it does not take a pinned reference to itself. This means an Iterator cannot be self-referential.
In my previous post, I said that I think Iterator
should have taken a pinned reference, and I
listed 3 options as to what to do about the fact that it doesn’t:
- Disallow self-referential generators.
- Have generators return something other than Iterator, and require pinning to use generators with for loops.
- Deprecate
Iterator
and move the ecosystem to a new trait.
I wrote then that I thought the first of these options was the best option, and I still lean that way. Mainly, I think that it is non-disruptive, immediately shippable, and if it proves inadequate, forward compatible with “fixing” the problem (e.g. by adding a new modifier for self-referential generators). But I want to explore the alternatives more thoroughly, particular the third one.
So I want to ask the question: what would it look like to shift the ecosystem from Iterator to a new
trait? For the sake of this post, let’s say the new trait is called Generator
. It has the same
interface as Iterator
, except it is pinned:
trait Generator {
type Item;
fn next(self: Pin<&mut Self>) -> Option<Self::Item>
}
Similarly, there would need to be an equivalent of IntoIterator
, let’s call it IntoGenerator
:
trait IntoGenerator {
type Item;
type IntoGen: Generator<Item = Item>;
fn into_gen(self) -> Self::IntoGen;
}
And based on these two, the desugaring of for
loops would change so that after it constructs the
iterator/generator, it pins it before the loop begins. All of this works out great, the problems
emerge as you try to make it backwards compatible with everything that already exists.
Bridge impls & coherence
It would be necessary to add bridge impls, so all iterators are also now generators. Similarly, all IntoIterators would need to also be IntoGenerators.
The problem emerges with the fact that all Iterators are also all IntoIterators, and presumably all
Generators would also be IntoGenerators (for the same reason that impl exists today: so that you can
pass both Generators and IntoGenerators to for
loops). And this creates a basic diamond
incoherence problem:
impl<T: Iterator> IntoIterator for T { }
Impl<T: Iterator> Generator for T {}
Impl<T: IntoIterator> IntoGenerator for T {}
impl<T: Generator> IntoGenerator for T {}
By which of these two paths does Iterator
implement IntoGenerator
?:
┌──────────────────┐
│ │▒▒
┌─────│ Iterator │─────┐
│ │ │▒▒ │
│ └──────────────────┘▒▒ │
▼ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▼
┌──────────────────┐ ┌──────────────────┐
│ │▒▒ │ │▒▒
│ IntoIterator │▒▒ │ Generator │▒▒
│ │▒▒ │ │▒▒
└──────────────────┘▒▒ └──────────────────┘▒▒
▒▒▒▒▒▒▒▒│▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒│▒▒▒▒▒▒▒▒▒▒▒
│ ┌──────────────────┐ │
│ │ │▒▒ │
└────▶│ IntoGenerator │◀────┘
│ │▒▒
└──────────────────┘▒▒
▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
Fortunately, the impls result in the same code generation, but somehow this would have to be solved. Possibly, the compiler could have a special case coherence exception for these impls.
Bridge impls & pinning
Another problem arises from the change to pinning. Let’s actually look at the impl of Generator for Iterator:
impl<I: Iterator> Generator for I {
type Item = <I as Iterator>::Item;
fn next(self: Pin<&mut Self>) -> Option<Self::Item> {
// Is this sound?
unsafe { Iterator::next(Pin::get_unchecked_mut(self)) }
}
}
To call Iterator::next
, you need an unpinned mutable reference (this is, after all, the whole
problem with Iterator
). But an Iterator might be !Unpin
, so the only way to access that is by
unsafely unpinning the reference. Is that code sound?
At first, it seems like the answer should be no. It’s completely conceivable to define an Iterator
that moves out of itself in its next method, and also doesn’t implement !Unpin
, and has other
pinned methods that depend on that fact. However, on reflection, I think that any such method
would necessarily contain unsafe code, and therefore the Rust team could declare that code to
contain the unsoundness.
This would be similar to the situation with Drop
: Drop
really ought to be pinned as well, but
since it wasn’t, what we did was declare that if you move out of drop, it is your responsibility to
make sure your type also depend on its pinning guarantee. A similar requirement could be imposed if
your type implements Iterator
.
However, it’s important to note that imposing this requirement after the fact is technically a
breaking change, because it would be further restricting the soundness requirements on Pin
. I
believe any code that is in violation of it would be totally pathological though, and it’s exactly
the sort of technically-breaking soundness change that the language team has been comfortable making
in the past.
The key difference, I suppose, is that in the past those changes were to fix holes in existing features, whereas this would be changing the rules to support a new feature. The team should be very cautious about how much they move in that direction.
FromIterator and Extend
So far we’ve racked up a special case coherence exception and a technically breaking soundness change. Here’s something where I think the team simply cannot make this a completely smooth transition, and libraries will have to be updated by their authors to be compatible with generators.
Both FromIterator
and Extend
have the IntoIterator
interface in their signature. Since
generators do not implement Iterator
or IntoIterator
, there would need to be new traits: let’s
call them FromGenerator
and Grow
. These would be identical to FromIterator
and Extend
,
except they would take IntoGenerator
.
The problem is that there can be no blanket impl of FromIterator
to FromGenerator
, or from
Extend
to Grow
, because there is no way to pass an IntoGenerator
to a function expecting an
IntoIterator
. It’s completely possible that a FromIterator
interfaces moves the iterator around
between calls to next, and therefore would violate the pinning requirement.
I haven’t investigated, but I would be very surprised if any of the types that impl FromIterator
and Extend
in the standard library can’t also implement the generator equivalents. But any third
party library would need to add implementations of the new traits manually to be compatible with
generators.
Update
Giacomo Stevananto on Reddit and Steven Portzer on Twitter have both pointed out to me that there is actually a way to write the blanket impls here.
There should probably be a blanket implementation of Iterator
for Pin<&mut impl Generator>
-
though a generator is not an iterator, a pinned reference to a generator is. Using this, the
blanket impls of FromGenerator
for FromIterator
and Grow
for Extend
could be implemented by
first pinning the generator, and then passing it to the interface.
Social costs
I think this is a complete overview of the technical costs, but what overrides that I think is the social costs. Every existing documentation about iterators (a core language feature) would be out of date. If possible it would need to be updated, but there would still be a huge amount of documentation that isn’t updated (including printed material) that would just be wrong. Every Rust user would need to learn about the new trait and the transition and cope with it. In the future, new users would still encounter the old interface in documentation and old code, and have to learn about the transition and what to do.
It would be the biggest change since the 2018 edition, possibly even bigger, and Rust has orders of magnitude more existing users than it did in 2018. Is it worth it? Maybe. One problem is that it’s really difficult with current data to identify how important self-referential generators would be.
What gives me pause is that this is not the only such socially massive shift I’ve seen the team
publicly contemplating. Even setting aside proposals to introduce new axes of abstraction, there has
been talk about things like introducing a Leak
trait and so on. These kinds of changes, which may
technically meet the project’s stability guarantees but are still extremely disruptive, cannot be
made repeatedly, or they will damage the community’s trust in the project and the language’s
external reputation.