Iterators and traversables
This is a brief note about the definition of iterator.
In previous posts, I’ve written about a taxonomic confusion I think has arisen
within the Rust project about AsyncIterator
, in which it was conceived as “the async-modified
Iterator
” without also recognizing that it is “the iterative-modified Future
;” that is, without
recognizing that represents a coroutine with the combined affordances of Iterator
and Future
,
inheriting equally from both. But there is another definitional confusion that I think underlies
this line of thinking, which is a confusion about the definition of “iterator” itself.
In particular, some arguments I’ve seen bring up rayon’s ParallelIterator
as “another modified
Iterator
,” similar to AsyncIterator
. Yosh Wuyts for example has used this as an argument in
support of the “effects generics” initiative; in this framing “parallel” is also an effect which
“modifies” the underlying Iterator
concept in a similar manner to async
, and should then ideally
be abstractable (this strikes me as infeasible on a number of technical grounds, but I’m going to
stick to taxonomic comments in this post). In another vein, Niko Matsakis at one point proposed
considering internal iteration, rather than external iteration, as the basis for AsyncIterator
, in
analogy with rayon.
The problem with this line of thinking, in my view, is that it interprets the word “iterator” too broadly. I think a distinction should be made between two classes of abstraction, which I am going to call iterators and traversables (the latter term stolen from Haskell, without particular care as to the monadic laws I may or may not be breaking with this definition):
- Iterator: a state machine yielding a number of values, one after another, as it advances
- Traversable: an object which enables traversing through a number of values in any manner
Anything “traversable” can support combinators like map and reduce, but they will not necessarily have identical signatures or be implemented the same way. In this taxonomy, “internal iteration” is not iteration at all, but rather an alternative form of “traversal.” You can quibble with this terminology: maybe you prefer to call all of what I’ve called “traversables” iterators, and come up with a new name for what I’m calling “iterators.” This isn’t very important; what is important is that we draw a taxonomic distinction between abstractions for “yielding a number of values, one after another” and “traversing through a number of values in any manner.” The former definition is more precise than the latter, so that with my terminology you might say that all iterators are traversables but not all traversables are iterators.
The difference in affordance between Iterator
and traversables more broadly is of course that you
know iterators will yield up their values “one after another.” One consequence of this is that they
are representationally simpler: an Iterator
can be represented by a state machine, whereas some
other categories of traversable like ParallelIterator
are actually a kind of complex concurrent
scheduling infrastructure which balance their tasks across a thread pool, not at all similar in
implementation. And as another consequence, the combinators on a ParallelIterator
have subtly
different signatures from the combinators on Iterator
: for example ParallelIterator::map
not
only requires the closure to be thread-safe, but also requires it to be Fn
instead of FnMut
,
because it will be applied to values concurrently across many threads, instead of once at a time as
in the case of Iterator
.
The motivation for adding AsyncIterator
to Rust is not “to add the async
effect to Iterator
”
as a sort of box-ticking. There is a clear and compelling use case for a state machine that yields
values one after another while also being able to pend awaiting readiness, which is the affordance
of the AsyncIterator
state machine. A large portion of the tasks I spawn perform the bulk of their
work looping over values yielded by a state machine of that character, and better support for this
with features like async generators, for await
, and merge!
will enable implementing tasks like
this more easily. Attempting to fold other kinds of traversables into the design of this feature
would make it unfit for purpose by imposing other costs.
Taxonomically, this is another class of traversable (if you define traversably broadly enough to
include async abstractions), one which can reasonably be called an “async iterator” because it is
like an iterator, but also integrates into the async system. Importantly, this is not a subclass of
“iterators,” because of the additional affordance of the async system. I’ve already written at
length about how this is different from “an iterator of futures,” and how trying to model that with
an “async effect” on the Iterator::next
method results in an inferior design.
One can similarly imagine an asynchronous version of ParallelIterator
, another subclass of
traversables, which allows traversing a collection of values concurrently in parallel and
asynchronously, scheduled across a thread pool. This would not be a kind of async iterator in the
same way that ParallelIterator
is not a kind of iterator; instead, it would be a different
abstraction which, being another kind of traversable, affords a similar set of combinators, with
slightly different signatures. Indeed, rayon already has a lot in common with a work-stealing task
executor like tokio’s multi-threaded runtime: rayon literally uses work-stealing to schedule units
of work across multiple threads. Any “async rayon” would similarly ultimately be a sort of async
runtime for scheduling tasks, implemented by way of a set of combinators that look similar to the
combinators on AsyncIterator
, but with some differences in their signature and a very different
implementation.