The Scoped Task trilemma

April 8, 2023

This is the first post in a series of posts about concurrency in Rust, and the different APIs that can exist to support it. Unlike my recent series on control-flow effects, this series isn’t driving toward any particular vision of what I think the Rust project should do. Instead, I am just trying to publicly explore the problem space and build tools for thinking about the issues involved. I’m not sure what the “right” concurrency API is.

It’s well known in certain circles that there was a sign hung up at the Mozilla office in San Francisco: “You must be this tall to write multi-threaded code”. The sign was hung about 10 feet off the ground. This little joke has an almost mythological relevance in the origin story of Rust, which was designed by a Mozilla team to make writing parallel and concurrent systems software more tractable. And indeed, Rust has well-resolved many of the problems users encounter writing multi-threaded systems. But there is a problem.

Recently, Tyler Mandry wrote a great blog post about “scoped tasks” - an attempt to “port” the scoped thread API to async Rust - and found that several desirable properties of the API are ultimately incompatible with one another. I want to contend that stripping away the specifics of the API, what he has encountered is a fundamental inability of Rust’s existing type system to support APIs with all of three desirable properties.

I’ll call it the scoped task trilemma. It’s not specific to scoped tasks, though: it’s come up again and again. You want to use the Rust type system to implement an API for concurrent programming. Conceptually, in this API you have a “parent task” and one or more “child tasks;” if there is more than one child task, they can run concurrently with one another. Any sound API can only provide at most two of the following three desirable properties:

Concurrency: Child tasks proceed concurrently with the parent.
Parallelizability: Child tasks can be made to proceed in parallel with the parent.
Borrowing: Child tasks can borrow data from the parent without synchronization.

Note that when I say “concurrent” I do mean concurrent with the parent - if there are multiple child tasks they are inherently concurrent with one another. And when I say “parallelizable” I don’t necessarily mean in parallel - they could be run sequentially in some cases, but in principle they are allowed to run in parallel.

The fact that all three cannot be provided at once is exactly why the “scoped task” API isn’t sound, it’s why the original scoped thread API wasn’t sound, and it’s also why certain seemingly obvious APIs for things like io-uring are not possible.

Every sound concurrency API in Rust ultimately provides two of these three properties. It’s instructive to examine which APIs align with which resolution of the trilemma.

Concurrency + Parallelizability

(Examples: thread::spawn, task::spawn.)

These APIs allow the child task(s) to proceed both independently and concurrently with the parent task. The work of the child task can be executed on another thread, possibly in parallel, and the work of the parent task can proceed at the same time uninterrupted. On the other hand, the child task is not able to borrow from the parent task - that’s why both of these APIs have a ’static bound on their argument.

These APIs have an affordance for joining the child task, thus waiting for it to complete, but it’s not required that you do so. And for the async task::spawn API, while you’re waiting for the you can run other work from outside of this scope concurrently on the same thread.

Parallelizability + Borrowing

(Examples: thread::scope, rayon::join)

These APIs allow the child task(s) to proceed independently with the parent task while borrowing from it. The work inside the child task can be executed on another thread while borrowing from the parent context. But what you lose is the ability to run code concurrently between the parent scope and the child scope - the parent cannot execute code from outside of the scope while this work is going on (and so its thread must either be put to work running the child tasks or block).

These APIs mandate that you join the child task(s) before they return control to the caller, that’s how they guarantee that the cross-thread borrowing conforms to the lifetime signature.

Borrowing + Concurrency

(Examples: select!, FuturesUnordered.)

These APIs allow the child task(s) to proceed concurrently with the parent task while borrowing from it. Code inside the child tasks (such as a future that’s part of the FuturesUnordered stream or the select! macro) can run concurrently with code from the parent context (such as whatever runs when those constructs are awaiting something). And the child tasks can borrow freely from the parent API without any synchronization.

The child tasks cannot be run independently of the parent task, and these APIs have no concept of “joining.” The futures being selected or streamed over are run as part of the same work unit as the parent, and cannot be moved to another thread during execution except as a single unit with the parent.

Why not all three?

Of course, this raises the question: why isn’t it possible for a single API to provide all three of these desirable properties?

The truth is that to some extent I’ve been a little dishonest when I describe the scoped thread and rayon APIs as being both parallelizable and allowing borrowing without synchronization. Of course, in reality they do require synchronization - this synchronization occurs when the APIs join on any subthreads before returning to the caller. However, this synchronization is amortized to a single waiting operation at the end of the function, rather than requiring some sort of synchronization to share or pass ownership of all the state between the different tasks.

Famously, the older scoped thread API was structured differently, so that this join occurred whenever a join handle object was dropped. This API was unsound because every object in Rust can be leaked without running its destructor. Shortly before the release of 1.0, the problem with this API was realized, and in an event called the “leakpocalypse” it was decided that any object can be leaked.

There’s been some discussion recently of revisiting this decision and adding the ability to define types that cannot be leaked. Yoshua Wuyts wrote a good summary with links to previous writing. I won’t get into the details in this post, but I’ll say that think the backward compatibility problems with changing this decision now would be enormous and disruptive and I’m not personally optimistic about this changing.

But let’s say it did change, or the decision about leaking had gone the other way. Would that resolve the trilemma, would we have an API that has all three desirable properties? Yes and no. Comparing the old and new scoped thread APIs is instructive.

In the new scoped thread API, the “interior” scope (which can be executed concurrently with parallelized tasks) is lexically defined by a closure passed to the scope function, and the “exterior” scope is anything outside of that closure. In the old scoped thread API, the “interior” scope was defined more implicitly, by the section of code in which the child thread’s join handle was still live. At the time the leakpocalypse decision was reached, these two APIs were equally expressive. But since then, additional features have been added to Rust which would make the destructor-based approach more appealing.

The first of these is non-lexical lifetimes (my use of the word “lexical” in the last paragraph might have tipped you off to this). In particular, it’s possible to hold the join handle longer in one branch in than in another, though in some of the most interesting cases this still hasn’t been stabilized for compiler performance reasons. In some cases, this control flow can be expressed in another ways with lexical scopes, but in some cases there’s a genuine expressive difference that non-lexical lifetimes enable that the forced-lexical scoped thread API will never be able to express.

The other of these - and here we get at the heart of the problem of the async scoped task API - is self-referential coroutine types, like the futures generated from an async function. These types “close over” lifetimes that essentially refer to sections of the stack within the coroutine. In essence, this creates an outer bound on the lifetime being borrowed by the scoped task, and it’s possible to run code from outside of that bound concurrently with the child tasks. In practice this means that when all the child tasks have no further work, the thread can run work from siblings of the parent task.

Why not just two?

Assuming that we are not adding linear types to Rust, we should ask ourselves: given that each API must choose a “horn” of the scoped task trilemma, are the APIs available to users complete and consistent? Is there a flaw with how this functionality has been provided?

When it comes to “Borrowing + Concurrency” we see that async APIs exist, but they are very inconsistent from the other APIs - none of them look anything like “spawning” child tasks explicitly. These APIs also tend to be the most problematic for users, having many ways to use them incorrectly. Niko Matsakis has written a library called moro that provides an API that looks more like the scoped thread API. Would an API like this be more intuitive for users? That’s a question I’ve been trying to get a grip on the last few months, and hope to return to after exploring more fundamental questions about async Rust.

When it comes to “Parallelizability + Borrowing” we see that no async APIs exist at all. This maybe isn’t surprising, because async is all about enabling concurrency. But there’s a missing API here that seems obvious to me: an API which starts an executor across a thread pool, and blocks until all of the tasks you spawn onto it are complete. This would basically function as an alternative to “async main” that lets you store some long lasting state in the stack of your main thread. There’s no way to recreate this with tokio right now, but it’s possible to combine smol’s async-executor library with the scoped thread API to achieve it.

When it comes to “Concurrency + Parallelizability,” both async Rust and non-async Rust provide essentially the same spawn API. This sort of API has been criticized in the context of some other languages by advocates of what is called “structured concurrency,” which requires spawns to be paired with a clear join to mark the end of the concurrent child task. In my next post, I want to review this notion of “structured concurrency” and see how it applies in the context of Rust.