Iterators and traversables

This is a brief note about the definition of iterator.

In previous posts, I’ve written about a taxonomic confusion I think has arisen within the Rust project about AsyncIterator, in which it was conceived as “the async-modified Iterator” without also recognizing that it is “the iterative-modified Future;” that is, without recognizing that represents a coroutine with the combined affordances of Iterator and Future, inheriting equally from both. But there is another definitional confusion that I think underlies this line of thinking, which is a confusion about the definition of “iterator” itself.

In particular, some arguments I’ve seen bring up rayon’s ParallelIterator as “another modified Iterator,” similar to AsyncIterator. Yosh Wuyts for example has used this as an argument in support of the “effects generics” initiative; in this framing “parallel” is also an effect which “modifies” the underlying Iterator concept in a similar manner to async, and should then ideally be abstractable (this strikes me as infeasible on a number of technical grounds, but I’m going to stick to taxonomic comments in this post). In another vein, Niko Matsakis at one point proposed considering internal iteration, rather than external iteration, as the basis for AsyncIterator, in analogy with rayon.

The problem with this line of thinking, in my view, is that it interprets the word “iterator” too broadly. I think a distinction should be made between two classes of abstraction, which I am going to call iterators and traversables (the latter term stolen from Haskell, without particular care as to the monadic laws I may or may not be breaking with this definition):

  • Iterator: a state machine yielding a number of values, one after another, as it advances
  • Traversable: an object which enables traversing through a number of values in any manner

Anything “traversable” can support combinators like map and reduce, but they will not necessarily have identical signatures or be implemented the same way. In this taxonomy, “internal iteration” is not iteration at all, but rather an alternative form of “traversal.” You can quibble with this terminology: maybe you prefer to call all of what I’ve called “traversables” iterators, and come up with a new name for what I’m calling “iterators.” This isn’t very important; what is important is that we draw a taxonomic distinction between abstractions for “yielding a number of values, one after another” and “traversing through a number of values in any manner.” The former definition is more precise than the latter, so that with my terminology you might say that all iterators are traversables but not all traversables are iterators.

The difference in affordance between Iterator and traversables more broadly is of course that you know iterators will yield up their values “one after another.” One consequence of this is that they are representationally simpler: an Iterator can be represented by a state machine, whereas some other categories of traversable like ParallelIterator are actually a kind of complex concurrent scheduling infrastructure which balance their tasks across a thread pool, not at all similar in implementation. And as another consequence, the combinators on a ParallelIterator have subtly different signatures from the combinators on Iterator: for example ParallelIterator::map not only requires the closure to be thread-safe, but also requires it to be Fn instead of FnMut, because it will be applied to values concurrently across many threads, instead of once at a time as in the case of Iterator.

The motivation for adding AsyncIterator to Rust is not “to add the async effect to Iterator” as a sort of box-ticking. There is a clear and compelling use case for a state machine that yields values one after another while also being able to pend awaiting readiness, which is the affordance of the AsyncIterator state machine. A large portion of the tasks I spawn perform the bulk of their work looping over values yielded by a state machine of that character, and better support for this with features like async generators, for await, and merge! will enable implementing tasks like this more easily. Attempting to fold other kinds of traversables into the design of this feature would make it unfit for purpose by imposing other costs.

Taxonomically, this is another class of traversable (if you define traversably broadly enough to include async abstractions), one which can reasonably be called an “async iterator” because it is like an iterator, but also integrates into the async system. Importantly, this is not a subclass of “iterators,” because of the additional affordance of the async system. I’ve already written at length about how this is different from “an iterator of futures,” and how trying to model that with an “async effect” on the Iterator::next method results in an inferior design.

One can similarly imagine an asynchronous version of ParallelIterator, another subclass of traversables, which allows traversing a collection of values concurrently in parallel and asynchronously, scheduled across a thread pool. This would not be a kind of async iterator in the same way that ParallelIterator is not a kind of iterator; instead, it would be a different abstraction which, being another kind of traversable, affords a similar set of combinators, with slightly different signatures. Indeed, rayon already has a lot in common with a work-stealing task executor like tokio’s multi-threaded runtime: rayon literally uses work-stealing to schedule units of work across multiple threads. Any “async rayon” would similarly ultimately be a sort of async runtime for scheduling tasks, implemented by way of a set of combinators that look similar to the combinators on AsyncIterator, but with some differences in their signature and a very different implementation.