Let futures be futures

In the early-to-mid 2010s, there was a renaissance in languages exploring new ways of doing concurrency. In the midst of this renaissance, one abstraction for achieving concurrent operations that was developed was the “future” or “promise” abstraction, which represented a unit of work that will maybe eventually complete, allowing the programmer to use this to manipulate control flow in their program. Building on this, syntactic sugar called “async/await” was introduced to take futures and shape them into the ordinary, linear control flow that is most common. This approach has been adopted in many mainstream languages, a series of developments that has been controversial among practitioners.

There are two excellent posts from that period which do a very good job of making the case for the two sides of the argument. I couldn’t more strongly recommend reading each these posts in full:

The thesis of Eriksen’s post is that futures provide a fundamentally different model of concurrency from threads. Threads provide a model in which all operations occur “synchronously,” because the execution of the program is modeled as a stack of function calls, which block when they need to wait for concurrently executing operations to complete. In contrast, by representing concurrent operations as asynchronously completing “futures,” the futures model enabled several advantages cited by Eriksen. These are the ones I find particularly compelling:

  1. A function performing asynchronous operations has a different type from a “pure” function, because it must return a future instead of just a value. This distinction is useful because it lets you know if a function is performing IO or just pure computation, with far-reaching implications.
  2. Because they create a direct representation of the unit of work to be performed, futures can be composed in multiple ways, both sequentially and concurrently. Blocking function calls can only be composed sequentially without starting a new thread.
  3. Because futures can be composed concurrently, concurrent code can be written which more directly expresses the logic of what is occurring. Abstractions can be written which represent particular patterns of concurrency, allowing business logic to be lifted from the machinery of scheduling work across threads. Eriksen gives examples like a flatMap operator to chain many concurrent network requests after one initial network request.

Nystrom takes the counter-position. He starts by imagining a language in which all functions are “colored,” either BLUE or RED . In his imaginary language, the important difference between the two colors of function is that RED functions can only be called from other RED functions. He posits this distinction as a great frustration for users of the language, because having to track two different kinds of functions is annoying and in his language RED functions must be called using an annoyingly baroque syntax. Of course, what he’s referring to is the difference between synchronous functions and asynchronous functions. Exactly what Eriksen cites as an advantage of futures - that functions returning futures are different from functions that don’t return futures - is for Nystrom it’s greatest weakness.

Some of the remarks Nystrom makes are not relevant to async Rust. For example, he says that if you call a function of one color as if it were a function of the other, dreadful things could happen:

When calling a function, you need to use the call that corresponds to its color. If you get it wrong … it does something bad. Dredge up some long-forgotten nightmare from your childhood like a clown with snakes for arms hiding under your bed. That jumps out of your monitor and sucks out your vitreous humour.

This is plausibly true of JavaScript, an untyped language with famously ridiculous semantics, but in a statically typed language like Rust, you’ll get a compiler error which you can fix and move on.

One of his main points is also that calling a RED function is much more “painful” than calling a BLUE function. As Nystrom later elaborates in his post, he is referring to the callback-based API commonly used in JavaScript in 2015, and he says that async/await syntax resolves this problem:

[Async/await] lets you make asynchronous calls just as easily as you can synchronous ones, with the tiny addition of a cute little keyword. You can nest await calls in expressions, use them in exception handling code, stuff them inside control flow.

Of course, he also says this, which is the crux of the argument about the “function coloring problem”:

But… you still have divided the world in two. Those async functions are easier to write, but they’re still async functions.

You’ve still got two colors. Async-await solves annoying rule #4: they make red functions not much worse to call than blue ones. But all of the other rules are still there.

Futures represent asynchronous operations differently from synchronous operations. For Eriksen, this provides additional affordances which are the key advantage of futures. For Nystrom, this is just an another hurdle to calling functions which return futures instead of blocking.

As you might expect if you’re familiar with this blog, I fall pretty firmly on the side of Eriksen. So it has not been easy on me to find that Nystrom’s views have been much more popular with the sort of people who comment on Hacker News or write angry, over-confident rants on the internet. A few months ago I wrote a post exploring the history of how Rust came to have the futures abstraction and async/await syntax on top of that, as well as a follow-up post describing the features I would like to see added to async Rust to make it easier to use.

Now I would like to take a step back and re-examine the design of async Rust in the context of this question about the utility of the futures model of concurrency. What has the use of futures actually gotten us in async Rust? I would like us to imagine that there could be a world in which the difficulties of using futures have been mitigated or resolved & the additional affordances they provide make async Rust not only just as easy to use as non-async Rust, but actually a better experience overall.

Async tasks aren’t ersatz threads

Usually the benefit of futures is explained to users in terms of a performance improvement: starting threads is expensive and so is switching between them, so being able to multiplex many concurrent operations on a single thread will allow you to perform more concurrent operations on a machine. Like Eriksen, I think this focus on the dichotomy of performance between “thread-based IO” and “event-based IO” is a red herring. All of Eriksen’s points about the benefits of futures for structuring your code hold true for Rust as well.

Eriksen was writing in the context of languages which use continuation-passing style as the underlying basis for their future abstraction. As I wrote in my post on the history of async Rust, this is not the approach that Rust took. Uniquely, Rust adopted a system based on inversion of the continuation based approach: instead of futures calling a continuation when they finish, they are externally polled to completion. To understand the relevance of this point, we will need to take a step back and talk about “tasks.”

When I write task I don’t just mean “a unit of work.” In async Rust, “task” is a specific term of art. The fundamental abstraction for async work is a future, a type that implements the Future trait, which is most often implemented with an async function or an async block. But in order to execute any asynchronous code, we also need to use an “executor” which can execute “tasks.” Usually, this is provided for us by a “runtime,” which also provides other things like types for doing asynchronous IO. The most widely used runtime is tokio.

These definitions are a bit winding. A “task” is just any future that has been scheduled on an executor. Most executors, like tokio’s, can run multiple tasks concurrently. They might use one thread to do it, or they use more than one and balance the tasks between those threads (which to prefer has been the subject of some additional controversy). Executors that can run multiple tasks at once usually expose an API with a name like spawn, which “spawns a task.” Other executors exist which only run a single task at a time (like pollster): these usually expose an API with a name like block_on.

All tasks are futures, but not all futures are tasks. Futures become tasks when they are passed to an executor to execute. Most often, a task will be composed of many smaller futures combined together: when you await a future inside of an async scope, the state of the awaited future is combined directly into the future that the async scope evaluates to. Normally, you’ll do this many more times than you’ll use spawn, so most of your futures will not be tasks in the sense that I mean. But there’s no type-level distinction between a future and a task: any future can be made into a task by running it on an executor.

The important thing about this distinction between futures and tasks is that all of the state needed to execute a task will be allocated as a single object, laid out together in memory; each individual future used within a task will not require a separate allocation. We’ve often described this state machine as the “perfectly sized stack” of the task; it is exactly large enough to contain all of the state this task could ever need when it yields.

(One other implication of this design is that it’s not possible to write a recursive async function without explicitly boxing the recursive call. This is for the same reason it’s not possible to write a recursive struct definition without boxing the recursively embedded type.)

This all has some interesting implications for representing concurrent operations in async Rust. I want to introduce a distinction between two kinds of concurrency that can be achieved with the task model: multi-task concurrency, in which concurrent operations are represented as separate tasks, and intra-task concurrency, in which a single task performs multiple operations concurrently.

Multi-task concurrency

If you want two operations to occur concurrently, one way to implement this is to spawn a separate task for each operation. This is “multi-task concurrency,” concurrency achieved by using multiple concurrent tasks.

For many users, multi-task concurrency is the most accessible approach to concurrency in async Rust, because it is the most similar to thread-based concurrency approaches. Just like how you can spawn threads for concurrency in non-async Rust, in async Rust you can spawn tasks. This makes it very familiar to users who are already used to concurrency with threads.

Once you have multiple asynchronous tasks, you probably need some way to transfer information between them. This sort of “inter-task communication” is achieved by using some sort of synchronization primitive, such as a lock or a channel. For async tasks, there is an asynchronous equivalent of every sort of blocking synchronization primitive available: an async Mutex, an async RwLock, an async mpsc channel, and so on. Many runtimes even provide async synchronization primitives without an analog in the standard library. When an analog does exist though, there is usually a very strong similarly between the interface of the two primitives: in terms of affordance, an async Mutex is really just like a blocking Mutex, except that the lock method is async, instead of blocking. This was the conceptual basis for the async-std runtime.

It’s worth noting, however, that the implementations of each of these things are completely different. The code that runs when you spawn an async task is nothing like spawning a thread, and the definition and implementation (for example) of an async lock is very different from a blocking lock: usually they will use an atomics-based lock under the hood with the addition of a queue of tasks that are waiting for the lock. Instead of blocking the thread, they put this task into that queue and yield; when the lock is freed, they wake the first task in the queue to allow it to take the lock again.

For users of these APIs, multi-task concurrency is very similar to multi-threaded concurrency. However, it is not the only kind of concurrency enabled by the futures abstraction.

Intra-task concurrency

While multi-task concurrency has the same API surface as multi-threaded concurrency (modulo the scattering of async & await keywords), the futures abstraction also enables another kind of concurrency that has no analog in the context of threads: “intra-task concurrency.” This means that the same task is concurrently performing multiple asynchronous operations. Rather than having to allocate separate tasks for each of your concurrent operations, you can perform those operations using the same task object, improving memory locality, saving allocation overhead & increasing the opportunity for optimization.

To be concrete about this, what I mean is that when you use an intra-task concurrency primitive (like select! for example), the state of two futures being operated on will be embedded directly in the parent future which is operating on them. The set of widely known intra-task concurrency primitives corresponds to the table of async primitives I’ve discussed in previous posts: they are select and join for Future and merge and zip for AsyncIterator:

                   │    SUM      │  PRODUCT
    ───────────────┼─────────────┼──────────
                   │             │     
            FUTURE │    select!  │  join!
                   │             │     
     ASYNCITERATOR │    merge!   │  zip!
                   │             │      

With threads, you can provide APIs like this, but only by spawning new threads and using channels or join handles to communicate their result back to the parent thread. This introduces a lot of overhead, whereas the intra-task implementation of these primitives is as cheap as possible in Rust.

In fact, eliminating the overhead of these combinators was the entire reason that Rust moved from a continuation-passing style to a readiness-based approach for the Future abstraction. When Aaron Turon wrote about futures that would need a heap allocation in a continuation-passing style, it is not a coincidence that his example was join. It is exactly those futures which embed concurrent operations which would need shared ownership of the continuation (to call the continuation whenever any of the concurrent operations completes as necessary). So it is exactly these combinators for intra-task concurrency that readiness-based futures were designed to optimize.

As Rain has compellingly argued in the past, “heterogeneous select is the point of async Rust.” Specifically, the fact that you can select a variety of futures of different types and await whichever of them finishes first & from within a single task, without additional allocations, is a unique property of async Rust compared to non-async Rust & one of it’s most powerful features.

A common architecture for an async Rust server is to spawn a task for each socket. These tasks often internally multiplex inbound and outbound reads and writes over that socket along with messages from other tasks intended for the service on the other end of the socket. To do so, they might select between some futures or merge streams of events together, depending on the exact details of their life cycle. This can have a very high-level appearance, and in many ways it resembles the actor model for asynchronous concurrency, but thanks to intra-task concurrency it will compile into a single state machine per socket, which is a runtime representation very similar to hand-written asynchronous servers in a language like C.

This architecture (and others like it) combines multi-task concurrency for the situation in which it is most appropriate & intra-task concurrency for the situation in which it is the better approach. Recognizing the difference between these scenarios is a key skill for mastering async Rust. There are a few limitations to intra-task concurrency: if your algorithm can abide these limitations, it is probably a good fit.

The first limitation is that it is only possible to achieve a static arity of concurrency with intra-task concurrency. That is, you cannot join (or select, etc) an arbitrary number of futures with intra-task concurrency: the number must be fixed at compile time. This is because the compiler needs to be able to lay out the state of each concurrent future in the state of the parent future, and each future needs to have a statically determined maximum size. This is really exactly the same as how you can’t have a dynamically sized collection of objects on the stack, but need to use something like a heap allocated Vec to have a dynamic number of objects.

The second limitation is that these concurrent operations do not execute independently of one another or of their parent that is awaiting them. By this I mean two things. First, intra-task concurrency achieves no parallelism: there is ultimately a single task, with a single poll method, and multiple threads cannot poll that task concurrently. (All the widely used async runtimes are also a poor fit for compute-bound work; I don’t think this is essential to the async model, but it is a fact about the libraries currently available for using async.) Secondly, if the user cancels interest in this task, all child operations are necessarily canceled, because they were all part of the same task. So if you want these operations to continue even if this work gets canceled, they must be separately spawned tasks.

The function non-coloring problem

I want to diverge for a moment to return to Nystrom’s post, and introduce a completely different thread to this discussion. I promise these threads will be re-joined in the future & I even hope that they will cohere.

I propose that we continue the thought experiment of the language with colored functions & imagine that the designer of the language has read Nystrom’s critique and tried to mitigate the pain of RED and BLUE functions. In the classic tendency of language designers who don’t know when to stop, they’ve added a third color of functions called GREEN functions, which they hope will end everyone’s complaints. Of course, these come with their own set of rules.

1. GREEN functions can be called like BLUE functions.

Unlike RED functions, there’s no special syntax for GREEN functions: you can call them anywhere using exactly the same syntax as BLUE functions. In fact, from looking at their signature and how they’re used, there’s no way to tell that there’s any difference at all. There will just be a note in the documentation that the function is GREEN , or maybe not if the author didn’t think to include this information.

This is great! No longer do you have to worry about what color a function is, as long as you stick to BLUE and GREEN functions, it’s all the same to you.

2. There is a GREEN equivalent for every primitive RED function.

Of course, to actually achieve that, you’ll need to be able to implement your program without calling RED functions. So the language authors have added to their standard library a GREEN function for each operation that otherwise would only have been available with a RED function.

The implementations differ in some way having something to do with performance, which may or may not be material to your use case, but we’ve decided to ignore things like the actual semantics of our code in this thought experience so at least for now we won’t dwell on this.

3. There is a GREEN function that wraps any RED function and calls it.

Despite the existence of GREEN functions in the standard library, users could still encounter libraries that are written using RED functions. So the language designers cleverly came up with a a workaround for this: there is a higher-order GREEN function that takes a RED function as an argument. It basically just calls the RED function, technical details notwithstanding. Because GREEN functions can be called from anywhere, it resolves the problem of not being able to call RED functions from inside BLUE ones.

4. Calling a GREEN function from inside a RED function is very bad.

Of course, there always has to be a downside. You should never call a GREEN function from inside a RED function. It’s not “nasal demons” undefined behavior bad, or even “clowns with snakes for arms” JavaScript bad, but it will certainly slow down your program and in the worst cause it could even cause a deadlock. Users absolutely should not do this. Programmers who are happy using RED functions must avoid GREEN functions at all cost.

But here’s the problem with how this language added these functions: because they’re identical to BLUE functions, there’s no way to tell when you call them! You just have to know from the documentation what all of the GREEN functions are, and you must make sure never to call them from within a RED function.

Blocking functions have no color

Now that I’ve lit my blog up like a Christmas tree, let’s talk about Rust again. Probably you’ve guessed what GREEN functions are: green functions are any function that blocks the current thread. There’s no special syntax or types to distinguish a thread which blocks waiting for something to concurrently occur: this is exactly what Nystrom argues is so great about blocking functions. Unlike many languages with asynchronous functions, Rust supports blocking functions as well: there’s an API to perform any kind of IO or thread synchronization by blocking the thread, and there’s a block_on API that takes any Future and blocks this thread until it is ready, so you can call asynchronous libraries as if they were blocking.

Languages that don’t support blocking operations don’t have this problem: instead, they have the problem that Nystrom complained about that you have to know the difference between asynchronous and non-asynchronous functions. But since in Rust all things are possible, users who don’t want to use futures can avoid them almost entirely: their only problem is that sometimes open source libraries (provided to them without charge or warranty!) will use async Rust, and they will need to use block_on to work with them from their code. Some will still complain frequently and fervently about this state of affairs.

The people who get the worst end of the deal here are the users of async Rust, who not only have to deal with async Rust, but also have to deal with the fact that they must never call a blocking function inside their async code. Yet blocking functions are completely indistinguishable from normal functions! That’s what’s supposed to be so good about them, according to Nystrom.

A long time ago (right after async/await came out), I proposed adding an attribute that could be put on blocking functions to try to introduce some sort of linting against calling them in async context, helping users catch any mistakes of this nature. This idea has not been pursued by the Rust project, for reasons I do not know. I would love to see more done to help users catch this error.

The most insidious of the blocking APIs for async IO is the blocking Mutex. Using a blocking Mutex inside an async function is fine under some specific but still quite common circumstances:

  1. It is only ever locked for brief periods of time in all of its use cases.
  2. It is never held over an await point.

However, and here is where it gets really bad, if a Mutex is held over an await point, it could easily deadlock your thread as other tasks running on the same thread try to take the lock while a pending task is holding it (the standard library Mutex is not re-entrant). This means its both perfectly fine to use sometimes, and not just bad but absolutely devastating to use other times. Not a great outcome!

“I don’t want fast threads, I want futures”

The previous two sections explore two fairly independent ideas:

  • In the first, Rust’s futures model enables a specific kind of highly optimal “intra-task concurrency” not enabled by the thread model.
  • In the second, blocking functions are insidiously indistinguishable from normal functions, causing problems for async Rust.

What unites these two discussions is the fact that the difference between async functions and blocking functions is the additional affordance of the Future trait. This is what allows an asynchronous task to perform multiple operations concurrently, whereas a thread cannot. And it’s the lack of this affordance that makes blocking functions problematic to call from async code, because they cannot yield, they can only block. My design principle for async Rust is this: we implemented this affordance for a very good reason & should leverage it to its full potential. As Henry de Valence wrote on Twitter: “I don’t want fast threads, I want futures.”

This idea is not new at all. In the RFC that removed the green threading library from Rust, Aaron Turon argued that trying to make the API for asynchronous IO and blocking IO the same limited the potential of async Rust:

With today’s design, the green and native threading models must provide the same I/O API at all times. But there is functionality that is only appropriate or efficient in one of the threading models.

For example, the lightest-weight M:N task models are essentially just collections of closures, and do not provide any special I/O support. This style of lightweight tasks is used in Servo, but also shows up in java.util.concurrent’s executors and Haskell’s par monad, among many others. These lighter weight models do not fit into the current runtime system.

Turon went on to develop the readiness based futures API that exists in Rust today, and the origins of it can be seen in these remarks. I think as we layered async/await syntax on top of the future abstraction (and also as contributors to Rust churned) this idea has been de-emphasized and somewhat lost. Now, the thinking goes, async Rust and blocking Rust should be as alike as possible. But this abandons async Rust’s extra affordance, except in terms of a possible performance improvement from userspace scheduling.

It’s important to understand how async/await fits into this world & that it isn’t the whole picture. Futures give the option, but not the requirement, to multiplex concurrent operations within a task. This option is critical for the few times you need to exercise it, but most of the time you are happy to let your code proceed, “one damn thing after another.” The await operator lets you do exactly that, without highly nested callbacks or combinator chaining. This reduces the cost of paying for the optionality of Futures to the fact that they divide the world into asynchronous and non-asynchronous functions, without the additional difficulties of use. But it’s exactly those points where you do exercise that option - where you don’t await a future - that matter the most!

Futures give you the ability to multiplex arbitrarily many perfectly-sized tasks on a single thread and to multiplex a static number of concurrent operations within a single task. In doing so they enable users to structure concurrent code logically instead of everwhere needing to include concurrency-related boilerplate about spawning threads. And they do so with much better performance characteristics, which can be critical in scenarios with a very high degree of concurrency. This alone would be worth the price of admission in my view, but we can also imagine other advantages.

Let’s return to the problem of holding locks over await points. One pattern some users will use is to make sure they give up the lock before they perform any potentially long-running asynchronous operation like IO, so that other concurrent operations can take the lock instead of waiting. (This requires care: you need to ensure that your code is resilient to potential changes to the protected state that occurs while you were performing IO.) Async/await already makes this easier than blocking IO, because the points at which your task could be performing long-running work are marked by the await keyword. For blocking IO, nothing syntactically indicates blocking, making it easier to miss a point the lock needs to be given up. But async Rust could do even better than that.

David Barsky has proposed what he calls a “lifecycle” trait: an interface analogous to Drop, but which executes when the future holding the object yields and resumes. He was interested in this concept specifically for tracing, which includes information about the task being executed in all log messages and therefore needs to know when that changes. It could also be used to enable a locking primitive which automatically gives up its lease whenever the future yields control, and re-takes it whenever it restarts. This would ensure a user never accidentally fails to give up the lock when awaiting, and it would even be more optimal than the manual version: when your task doesn’t actually yield (because the future was immediately ready), you wouldn’t need to give up the lock and re-take it.

maybe(async)

I would be remiss if I didn’t mention one feature that’s been under discussion within the Rust project which I think runs completely counter to my line of thinking here: the idea of maybe(async). This is a feature (syntax to be determined) for writing code which is abstract over whether or not it is async. In other words, any maybe(async) function could be instantiated to two variants: one in which it is async (and awaits all futures within it) and one in which it is not async (and presumably those functions which return futures in the async version would instead block).

The biggest problem with this idea is that it could only work for multi-task concurrency. As I’ve already written, there is a direct analogy between code written with multi-task concurrency and code written with multi-threaded concurrency. But intra-task concurrency has no equivalent in thread-based concurrency systems, because it depends on the futures affordance. Thus, any attempt to use maybe(async) would be limited to the sections of code which strictly use multi-task concurrency. The problem is that for any sufficiently significant piece of code, there will be key sections which take advantage of intra-task concurrency, and which therefore would not be suitable for abstraction with maybe(async).

Recently, Mario Ortiz Manero wrote about the difficulty of trying to write a library which supports usage either with blocking or asynchronous IO. This blog post seems to me to be the strongest case I could think of for maybe(async), so I want to analyze it more thoroughly.

Their use case was a wrapper which translates Rust method calls into HTTP requests to the Spotify API. They want to support both blocking and asynchronous versions of their library from the same source code, using reqwest as an asynchronous HTTP client and ureq as a blocking HTTP client. They wrote about how difficult this is right now, which is certainly true.

First, it’s interesting to note that the reqwest library actually contains its own blocking HTTP client as well as its asynchronous one. To implement this, it spawns a background thread on which all requests to that client will be made asynchronously, multiplexing them on the same thread. Ortiz Manero rejected this approach for this reason:

Unfortunately, this solution still has quite the overhead. You pull in large dependencies like futures or tokio, and include them in your binary. All of that, in order to… actually end up writing blocking code. So not only is it a cost at runtime, but also at compile time. It just feels wrong to me.

Here, by “overhead,” Ortiz Manero seems to mean the build-time overhead of these dependencies, and not runtime overhead. But we should ask why reqwest pull in these dependencies, even if it “feels wrong”? In blocking reqwest, tokio is used to multiplex all requests to the same client on a single thread. This architectural difference between blocking reqwest and ureq (which instead performs blocking IO from the thread that makes the request) seems more important to me than the fact that one depends on tokio and one does not. I’d like to see benchmarks comparing the two approaches for different workloads, rather than excluding one just because of what’s in its dependency tree.

One feature that reqwest supports and ureq does not is HTTP/2. HTTP/2 is designed to allow users to multiplex different requests over the same TCP connection. ureq by contrast provides only (non-pipelined) HTTP/1. And it has no way of supporting this with its current architecture, because whenever a user makes a request over a TCP connection, it blocks the thread until that request completes. Thus, with ureq the number of concurrent network requests you can make to a service is limited by the number of open TCP connections that service will allow you to make, and for each new connection you’ll need to perform a new TCP (and probably TLS) handshake.

If ureq wanted to support HTTP/2 and its multiplexing, it would find it needs to implement that multiplexing over a single TCP connection somehow. It might not do so using async Rust, but if it wanted to use blocking IO for this & still provide an API like the one it has now, it would still need to run a background thread and channels so that concurrent requests from multiple threads can be multiplexed over a single TCP connection. In other words, the architecture would come to look just like the architecture that reqwest has. By using async Rust, reqwest is more easily able to abstract over the difference between multiplexing requests to multiple connections with HTTP/1 and multiplexing requests to the same connection with HTTP/2. This is a huge advantage, since users frequently don’t know whether the service they want to communicate with supports HTTP/2.

Even still, you might say that maybe(async) would have some utility to this author even if they switched from ureq to reqwest’s blocking API, because it would allow them to save the boilerplate of implementing the async version of their library and a blocking API on top of it. But because of the limitations of what can be abstracted by maybe(async), this is really only true for a specific kind of library which is strictly a stateless “mapping” over the semantics of a lower-level library. This could be, as in this example, a library that translates HTTP RPC calls into Rust objects & methods, or it could be a library that defines a wire protocol in terms of a bytewise interface like TCP. As soon as the library has its own evolving state to manage (as the HTTP or IO libraries underneath them do), the two implementations would meaningfully diverge and no longer be implementable from the same source with maybe(async).

Since for those libraries maintaining two versions is just boilerplate, there are perhaps better ways to support this than adding a new abstraction. One approach would be to use the macro system, which could be used to generate something like reqwest’s blocking interface from an async interface (generating the code which spawns a background thread and maps blocking functions into messages to that thread). Libraries like the Spotify client could use that macro to avoid the boilerplate of supporting their blocking API, at the expense of using an async runtime on a background thread for their implementation. But this would apply equally well to stateless and stateful libraries, unlike maybe(async).

Another approach is what’s called “sans-IO.” The author of ureq, for example, also maintains a WebRTC library called str0m written in this style, which avoids the problem of blocking and non-blocking IO by not handling the actual IO in the library at all. A similarly written library is Cloudflare’s quiche, which implements the state machine for QUIC, but without performing IO. Building on this concept, we could imagine a way to “lift” the problem of IO completely out of these libraries, to instead write them against an abstract interface that allows them to be executed against any implementation of UDP, TCP, HTTP, or whatever they depend on. Exactly how this would be generalized remains to be determined.

A final digression about coroutines

This post is already too long, but I know that it might gain traction outside of the Rust community, and I can predict a certain negative response: the affordances of futures I cite are not only achievable with futures! These affordances can be provided by any kind of coroutine. Rust uses stackless coroutines, which have some unpleasant limitations, but a language with stackful coroutines could also provide the same affordance with less headache.

I actually agree. Returning the world of made up languages, one could imagine a language in which all functions are coroutines, meaning that all functions can yield. No function coloring! A “pure” function would be a function that yields Never (meaning it doesn’t actually yield at all), whereas an “impure” function would be a function that yields Pending (or some other magic type from your runtime meaning you’re waiting on an external event). Impure functions would be the default and all functions are coroutines, so the call operator would automatically forward Pending values outward. You could still mark pure functions somehow for when you want to ensure you are not doing any sort of IO or synchronization.

The language would also want a way to instantiate the coroutine object and resume it, instead of calling it to completion. Using that operator, you could implement concurrency combinators like select and join. And the language would need some way of spawning coroutines as entirely new, concurrent tasks. All of this without any need for async/await: that’s what stackful coroutines get you.

You might even extend this coroutine feature to also represent other things. For example, iterables could be represented as coroutines yielding a meaningful value. for loops would take that coroutine object and process each of those values in turn. Asynchronous iterables would just yield that value or Pending. And you could model exceptions in the same way, yielding an error (probably you would have a separate “pathway” from yield and return, in recognition of a fact that a function that has thrown an exception cannot be restarted). I don’t have the whole language mapped out, but all of this sounds plausible.

(And indeed, this doesn’t have to be done with coroutines. You could also model this in an inverted manner, so that instead you register a point in the stack to return to for each of these things: one for pending IO operations, one for thrown exceptions, one for items yielded from an iterable, and so on. You could call that point in the stack a “handler” for those “effects,” in other words a kind of “algebraic effect handler.” What I’m saying is that these two language concepts, effect handlers and coroutines, are at least partially isomorphic to one another.)

I also believe, but I am not certain, that such a language could achieve the same guarantee as Rust that references are not simultaneously mutable and aliased without adding lifetimes to the surface syntax. As long as coroutines can yield and be resumed with references, references could become modifiers that cannot be embedded in object types, and their lifetimes could be entirely inferred. It would not allow exactly as optimal code representation as Rust (in the terminology of a previous post, there would be no access to the “low-level register,”) but it would still give the same correctness guarantees.

Why didn’t Rust do something like this? It did, at first! But it gave way to other requirements. There was a really great comment on lobste.rs the other week that said it better than I could:

Async style language features are a compromise between your execution model being natively compatible with the 1:1 C ABI, C standard library, and C runtime and a M:N execution model. C++ async suffers from the same issues, except it’s not as strict in terms of lifetime safety (not a good thing). The cost for the native compatibility with the C/system runtime is the “function coloring” problem.

Rust has a prior commitment to be compatible with the existing C runtime. This means Rust code is made up of a stack of subroutines, and the address of items in the stack can be taken, and stored not only in that stack but also in other areas of program memory. Rust chose this approach to get zero-cost FFI to the enormous amounts of existing C and C++ code written using that model, and because the C runtime is the shared minimum of all mainstream platforms. But this runtime model is incompatible with stackful coroutines, so Rust needed to introduce a stackless coroutine mechanism instead. Every major language with async/await is similarly beholden to an existing runtime with a similar inability to represent stackful coroutines, if not C’s then some virtual machine runtime. The only thing about the C runtime is that it is so ubiquitous many programmers don’t even realize it exists & isn’t a naturally occurring phenomenon.

One more remark along these lines:

If you were a language designer of some renown, you might convince a large and wealthy technology company to fund your work on a new language which isn’t so beholden to C runtime, especially if you had a sterling reputation as a systems engineer with a deep knowledge of C and UNIX and could leverage that (and the reputation of the company) to get rapid adoption of your language. Having achieved such an influential position, you might introduce a new paradigm, like stackful coroutines or effect handlers, liberating programmers from the false choice between threads and futures. If Liebniz is right that we live in the best of all possible worlds, surely this is what you would do with that once in a generation opportunity.

(If you did this, I hope you at least wouldn’t go on stage and say the reason you’ve adopted this approach is that your users are “not capable of understanding a brilliant language”!)

In a less than optimal world, you might decide to do something less inspired. You might take that break from the C runtime and then just implement threads again, with basically the same semantics, except that they are scheduled in userspace. Your users would be required to implement concurrency in terms of threads, locks and channels, just like they had always been in the past. You might also decide your language should have other classic features like null pointers, default constructors, data races and GOTO, for reasons known only to you. Maybe you would also drag your feet for years on adding generics, despite frequent user requests. You might go do that, in a less than optimal world.

Alas. When I’m feeling pessimistic, I think our industry is mired in a certain stagnation, so that every decade we shall re-write new programs with the same behavior in new languages with the same semantics, having only mild differences in performance characteristics more suited to present hardware considerations. A sad fate, but perhaps soon to be the lot of large language models and not programmers. The reason I am an enthusiastic promoter of Rust is that it makes me feel optimistic about programming: Rust is committed to the belief that mainstream programming languages can meaningfully evolve for the better.

Despite being an advancement, Rust is not the language that has broken from the C runtime. It is that language’s lower level & more difficult cousin: you can get the same guarantees, “with some assembly required.” We should do everything we can, given our hard requirements, to reduce the necessary assembly & with async/await we have already laid the foundation for that. We should aspire not to simplify the system by hiding the differences between futures and threads, but instead to find the right set of APIs and language features that build on the affordances of futures to make more kinds of engineering achievable than before. Right now we only have the foundation, but this is already a huge leap forward from the previous world of hand-rolled state machines and directly managing your event loop. If we let futures be futures and build on that foundation, even more will be possible.