The registers of Rust

It’s been nearly two and half years since I was an active contributor to the Rust project. There are some releases that I’ve been very excited about since then, and I’m heartened by Niko’s recent blog post emphasizing stability and polish over grand new projects. But I’ve also felt a certain apprehension at a lot of the directions the project has taken, which has often occupied my thoughts. From that preoccupation this blog post has emerged, hopefully the first in a series over the next few weeks outlining my thoughts on the design of Rust in 2023, especially in connection to async, and I hope its impact will be chiefly positive.

One vision of the design of Rust I held that I tried to express to others before I stopped working on it, but seem to have failed to evangelize well enough, is that a coherent but incomplete design around control-flow effects has somewhat naturally emerged. This is not a general “effect system” but is specifically a pattern of language and library support for a set of common control-flow sequences that appear over and over again in user code. There are three effects I am talking about in particular:

  1. Fallibility: The procedure may short-circuit in some error cases. This is supported by the Result and Option types and the ? operator.
  2. Asynchrony: The procedure may yield control while it waits for some condition to be true. This is supported by the Future trait and the async and await operators.
  3. Iteration: The procedure may repeatedly operate over values in sequence. This is supported by the Iterator trait and the for loop operator.

A number of things obscure and complicate the analogization of these patterns of use for these three control-flow effects in Rust, but the biggest problem is that Rust’s support for these are incomplete portions of the full pattern, and so the analogy is missing something, like trying to translate two overlapping but incomplete fragments of an ancient text.

While trying to articulate and justify this notion of natural analogies between these features, and the notion that each of them is incomplete, I developed an idea that I think is more generally useful, and that is the notion of programming language registers.

Programming languages have registers

It’s commonly understood that programming languages can have dialects - strikingly similar, overlapping, but in total mutually incompatible versions of the same language - and also idioms - common patterns and techniques for expressing the same sort of algorithm - but I contend that programming languages also have registers. In natural language, registers are a variety of the language considered appropriate to a specific social context. Individual users will speak in different registers depending on the context. These registers are not thematically contingent: you can discuss the same subject in any register, though some registers are considered more suited to certain subjects and a mismatch between these is a major source of irony.

Programming languages are similar. When writing a piece of code, even with the language selection made and the intended effect understood, the user must also select the register they will use. In Rust, this distinction in register can look like this:

  • Will I use a lot of “batteries included” dependencies and frameworks, or write my own implementations?
  • Will I use references and borrowing, or will I liberally use clone to avoid it?
  • Will I use blocking IO or async IO?
  • Will I accept a runtime cost, or the responsibility of writing unsafe code to avoid it when no safe abstraction exists?
  • Will I use combinators or matches and loops for this piece of code?
  • Will I use std or must my code be no_std?

There are many more register distinctions than this. I think this is a key part of the notion that “there is more than one way to do it” - the same task can be performed in different registers. When language designers talk about there being “one way to do it,” I think what they mean is that the language strives to have only one register. Of course, having multiple registers introduces cognitive burden to the user in selecting the appropriate register, and there should be a “safe default” register that users can use when a question is beyond their concern, shifting register only when they know they benefit from it. This is not always the case, unfortunately.

In Rust in particular, the trade off between registers is often a choice between “getting it done” and having fine-grained control over the runtime behavior of your system, for example when it’s performance critical. In general, Rust strives to have an obvious register which to operate which is still going to be performant enough (and hopefully accessible enough!), while allowing users to switch register when they need to. Here I am really just rearticulating the idea of “zero cost abstractions” in new terms.

I think this concept is generally useful. I think it was always implicit in our thinking about the design of Rust, but naming and describing it brings structure to the analysis of the design. Now I want to turn back to this notion of “control-flow effects,” and examine specifically in what registers these idioms can be expressed.

The four registers of control-flow effects

I think asynchrony is the effect in which we have the most well-rounded set of registers. This might seem contradictory with the fact that async/await was released as an MVP, and most recently, and that it has a reputation for having rough edges and being incomplete. But actually, I think these are not in contradiction but are actually positively corelated: because async/await has the most complete set of registers, it butts up against other language features (like traits) in ways that the other effects, being incomplete in their support, don’t even have the opportunity to.

Like all of these control-flow effects, the different registers of asynchrony are unified by way of a shared interface which is the reification of that control-flow effect into a type. In this case, it’s the Future trait. Each register is a different way of interacting with the Future trait:

  • You can implement Future by hand. You will have explicit and exact control over the memory layout of the Future and how it is polled, but you need to understand everything that comes with and write all of the necessary code by hand.
  • You can poll a future on an executor. This is the final operation that every future must ultimately be consumed by; most code does not concern spawning but stays “within” the future, but at least a little bit of this is necessary.
  • You can use Future combinators to construct your future out of these building blocks, linked together with higher-order functions. Hardly anyone does this anymore, but it was originally the default way to create futures.
  • You can use async/await syntax to instruct the compiler to generate a future from annotated code which uses ordinary control flow constructs. This is the main way futures are defined in Rust.

Each of these registers has a use case to which it is particularly well suited. Implementing futures by hand has a high cognitive burden (and requires careful consideration of potential bugs) but gives the user greater control; it is suited for specific use cases which are fundamental to the ecosystem or in which the user cares a great deal about control.

Executing a future is a necessary step for it to actually run to completion, and forms the boundary between the code “inside the future” and “outside of it.” We’ll see that all of the effects described have a similar boundary point.

Futures combinators are not used because the combinator model suffered some specific problems with state sharing in futures, but are a way of writing code that can be more elegant in some cases than using control flow. It’s sad they’re still not included in the standard library & I don’t know why this is. I certainly have written async blocks with a single await that could have been clearer as a simple call to map, for example.

Finally, using async/await and ordinary control flow is just the obvious way to create a future, and allows users to put the fact that they’re inside the effect into the background, noting only when necessary to make explicit the actual control flow that is happening. (Some users would like it put even further into the background and made completely invisible, but Rust did not do that for a host of reasons.)

These registers can be analogized for each of the control-flow effects I described above. An abstracted description of the registers can be had:

  • Core register: defining the reified interface by hand with complete control and minimal compiler or library defined abstractions.
  • Consuming register: actually consuming the reified type of the effect, closing the monad and finishing the effect.
  • Combinatoric register: using library-defined combinators on the interface to operate over the reification with closures.
  • Control-flow register: using compiler-expanded syntactic sugar to turn an ordinary looking function into one which constructs the effect-reified type.

Applying this framework also to iteration and fallibility as they exist today (in the language + std) is instructive:

                      │   ASYNCHRONY  │  ITERATION         │  FALLIBILITY
──────────────────────┼───────────────┼────────────────────┼─────────────────────
                      │               │                    │      
          REIFICATION │  Future       │  Iterator          │  Result/Option
                      │               │                    │      
        CORE REGISTER │  impl Future  │  impl Iterator     │  Ok/Err & Some/None
                      │               │                    │      
   CONSUMING REGISTER │               │  for loop/collect  │  match/unwrap
                      │               │                    │      
COMBINATORIC REGISTER │               │  Iterator methods  │  methods
                      │               │                    │      
CONTROL-FLOW REGISTER │  async/await  │                    │  ? operator
                      │               │                    │      

Some cells are just exempli gratis - there are more methods that consume on iterator for example than just collect - but the pattern is clear to see. If a method returns the reified type again, it is in the combinatoric register, whereas if it consumes it and returns something else, it is in the consuming register. And while some cells are missing for asynchrony, they are provided by third party libraries - the consuming register (spawn and block_on) being the most obviously critical and the one the project has most frequently considered how to bring into the standard library.

More interestingly, though, the final row is incomplete for both iteration and fallibility. They’ve both got the first three registers well filled out, but it’s in this last one that they are lacking in comparison to asynchrony. We do have obvious answers for the control-flow register for each of these effects, they just aren’t stable yet: generators and try blocks.

The missing control-flow register of iteration and fallibility

When considering iteration, the user is faced with a dilemma. The two obvious and easy ways to perform iteration are the consuming and combinatoric register. And indeed, these adequately cover the large majority of cases. But there is a case in which users are often stuck between them, switching back and forth unhappily, because the control-flow register is absent.

Specifically, users might want to abstract a particular iterative operation into a function without leaving the effect. This was one of the main motivations for stabilizing -> impl Iterator. But in order to do that, they must use combinators. And here they run into an issue: it can be quite difficult to construct complex control flow paths with combinators. Users often have the experience of finding this inadequate, and realizing they will be better off using a for loop. In the worst case, what users end up doing is collecting into an intermediate allocation, only to immediately begin iterating over that again, because trying to structure their code with combinators is either impossible or renders it unreadable.

A specific case of this problem is when users want to combine effects - for example they want to map a fallible function over an Iterator, or map an asynchronous function over a Result. In some cases this is possible with some contortion (for example, now you have an iterator of Results, and you need in each subsequent combinator to short circuit on the error case), in other cases its fully impossible.

One alternative approach would be to provide generators - functions that compile to iterators, because they yield values instead of returning them. Generators with yield statements would be the missing control-flow register, fitting nicely and naturally next to async/await for asynchrony. It’s important to note that this is different from a general-purpose coroutine mechanism, with which generators are often conflated (because they have been so conflated in other languages). Maybe a more general coroutine, which can receive values as well as return them each time it yields, is useful for other purposes. I think an argument could be made for it. But what I think Rust absolutely needs is a syntax for functions that evaluate to iterators, and these two purposes should not be conflated as a single feature in the same way that async was not conflated with coroutines either.

Years ago, I implemented a macro library for generators based on this approach called propane. I still think this is the way Rust should go, but I’ve seen basically no movement in this direction, or serious contention with the idea from the project, since I wrote and published this library.

The fallibility case is less incomplete, because we do have the ? operator. But having this is like having await without async. I’ve written before about why I think “Ok” wrapping functions would be a very good addition to Rust, and even implemented them as well. The only way my view has evolved since then (and here I think I am now more in line with the Rust project) is that using a syntax like try fn that doesn’t change the return type, so you can specify Option or Result, is probably the right approach for Rust. But overall having a syntax that keeps the user fully inside the effect (which is what Ok-wrapping really means), rather than halfway, would be not only more convenient but also more consistent with asynchrony and iteration.

Asynchronous iteration and AsyncIterator

One essential aspect of the design is that it must be possible to combine different control flow effects in one piece of code. Asynchronous and iterative code is often fallible, for example, and we have an easy way of handling that in the type system by creating Futures and Iterators of Results or Options. But combining the asynchronous and iterative effects is not so trivial.

For this reason, the futures library has always had the Stream interface, which combines asynchrony and iteration, and has been imported into std under the perhaps clearer name of AsyncIterator. I don’t object to the name change, but I do think it has been bundled up with an ideological commitment I do object to - namely the consideration of AsyncIterator as “just” the async version of Iterator. We shouldn’t forget that it’s also the iterative version of Future.

This confusion has led the project down a path that I think dead-ends: to try to simply “asyncify” the Iterator interface, specifically by their explicit goal to redefine AsyncIterator as having an async fn next as its required manner of implementation. But using the register analysis that I have laid out here, we can see that this is a mixing of registers: asynchrony is in the control-flow register, whereas iteration is in the core register. The problem is that because the iteration feature is incomplete, the pattern that “asyncifying” it implies is fully wrong, because without looking forward you can’t even see that iteration has a control-flow register at all.

If the project pursues this path, the problem will emerge that users have no core register for the asynchrony effect of an AsyncIterator. If they need the fine-grained control the core register provides them, they will be left with no options. For a systems language this is fully inadequate. But if poll_next stays, then there is a core register for the combination of asynchrony and iteration. And if generators are added to the language, they can be made async just as well as normal functions can - and an async generator would compile to an AsyncIterator, written fully in the control-flow register rather than halfway.

More broadly, it feels like the project has low regard for the core register of asynchrony and the control-flow register of iteration. Along these lines, I similarly do not think that AsyncRead should simply be the “asyncified” version of Read. The poll methods matter for when users need control! I think this is because of an over-pivoting on how easy the control-flow register is to use for asynchrony - so, the thinking seems to go, everyone should use it for everything - and on the fact the control-flow register doesn’t exist for iteration - so the idea of using it does not even occur.

Control flow and combinators, imperative and functional

When we look at these four register, a sort of pattern of use emerges, which informs which register to use. The core register is the verbose and possibly difficult to use, but it gives users the most absolute control. The trade off is clear. The consuming register is necessary for setting boundaries on the effect. But when it comes to the combinatoric and control-flow register, we are left with a question: what is really the difference between them? Both are touted as easy ways to achieve the end result. Both involve the acceptance of abstraction by the user (and thus the loss of explicit control over layout) as the trade off for getting things done. But what is the distinction between them?

One obvious distinction between them is that one is written in an imperative style (control flow) and one is written in a functional style (combinatoric). Similarly, one is more naturally suited to blocks of statements (control flow) and one is more naturally suited to an expression-oriented programming (combinatoric), though neither completely excludes either way of writing. Indeed, to some extent the difference between these two registers might be paradigmatic and stylistic. Rust is a multi-paradigm language after all.

But I find, in my own experience, that too complicated an assemblage of combinators can negatively impact readability. There is a subjective limit below which writing in the combinatoric style makes code clearer, and above which it makes it less clear.

The one hard limitation on the combinatoric style is that it is not possible for them to early return, because control flow cannot escape the closures passed to the combinator. This limits the ability of an effect to “pass through” the combinator, for example awaiting inside a map or throwing an error from a filter. This has motivated some libraries to provide try_ or async_ alternatives to their combinators, and has been a key motivator of keyword generics.

But maybe there is another framing to this limitation in which it is not a negative but a positive. When I see a pile of combinators, one thing I know about them is that control flow will not jump outside of that expression, which I don’t know when I see an ordinary block. Isn’t the absence of early return a hallmark of the functional style? Perhaps this should be embraced as a cognitive advantage of the combinatoric style, rather than a lack of expressiveness. And maybe users should be taking the fact that they do need that more complex control flow as a sign they should switch registers in this instance.

I’m not saying this with absolute confidence. I think there’s an argument to be made that especially early returning with await or ? (or, I suppose, yield) is different from general control flow. But given other considerations, I would say that once you’ve fleshed out the control-flow register, maybe supporting combinators “abstracted over effects” is not the necessity it seems without it.

Let’s end it here for now

This blog post has already gotten quite long and in that last section I verged on opening a real can of worms: keyword generics. I think this is enough for now. If I successfully continue to write in the near future, I will open with that discussion next time.

I really hope this framework of registers and control-flow effects will resonate with others, and can provide guidance for disentangling the design questions that face Rust today. But even if you think this particular application has some fatal flaw, I hope the concept of registers can be more generally useful to everyone trying to design and analyze programming languages.

If, in contrast to the past two years, anyone from the Rust project would like to reach out to discuss this or anything else directly, my email is available in the footer of this website.