The Waker API II: waking across threads
In the previous post, I provided a lot of background on what the
waker API is trying to solve. Toward the end, I touched on one of the tricky
problems the waker API has: how do we handle thread safety for the dynamic
Waker
type? In this post, I want to look at that in greater detail: what
we’ve been doing so far, and what I think we should do.
Restating the problem
The goal of this portion of the API is to ensure we can support all of the kinds of waker implementations that are necessary. In particular, we want to be able to support implementations that have special behavior when called from the same thread the waker was originally constructed on. There are two variations on this:
- The more common variation is to have an optimization specific to waking from the original thread, though you do support waking from different threads as well.
- A more niche use case is to only support waking from the same thread. In this implementation, the executor is designed for programs that use no multithreading at all, and it’s tightly coupled to a particular reactor design.
We’ve gone through a couple of iterations on this API. The design currently
implemented on nightly has two waker types: Waker
and LocalWaker
. The
difference between them is that the latter is not Send or Sync, and will call
a specialized wake_local
function when it is woken, instead of the default
wake
function. However, you can always convert a LocalWaker
into a Waker
using the into_waker
method.
This is perfectly designed to support the first use case I described above, but the second is a bit trickier. As I outlined in my previous blog post, there are three ways to implement a waker. One is only suitable to no-std embedded environments and not relevant here, so I’ll reiterate the other two:
- In the first, the waker is a
TaskId
which is used to identify the task to be woken. - In the second, the waker is a reference counted pointer to the task itself, which is then put back into the queue of tasks to be woken next.
The API I just described does not support the second case using a non-atomic
Rc
. This is because you could construct a Waker
, move it to another thread,
and clone or drop it. This introduces a data race in access to the reference
count.
For that reason, the RFC currently proposes to change the API, getting
rid of wake_local
, and using a different strategy instead. In this strategy,
there’s instead an into_waker
hook that the implementation can use to either
change its wake implementation (in the case where it just has a same-threaded
optimization) or panic (in the case where it is not meant to be called from
multiple threads).
From an end user’s perspective, the API is largely unchanged: there are two
waker types, LocalWaker
and Waker
, with the same conversions between them.
But we’ve now supported one additional implementation. So that seems like a
win. But the problem is this: it is exactly this unchanged portion of the API
that has a lot of costs for users of the API.
The high costs of distinguishing Waker
from LocalWaker
I had the opportunity to use the waker API extensively recently (in creating
the [romio][romio] crate). The distinction between Waker
and LocalWaker
had
not existed the last time I had dealt with the futures API, so I was
experiencing it very much as a newcomer. And I’m afraid I must admit: I was, at
first, quite baffled. A lot of strangeness conspires to make this API
exceptionally confusing:
- You receive a
LocalWaker
from the executor, rather than aWaker
. It’s unclear without a lot of explanation whether you’re supposed to convert it to aWaker
(the thing you probably really want) early or not. LocalWaker
is notSend
orSync
, butWaker
is, and there’s a conversion fromLocalWaker
toWaker
. This looks very odd: it makes it hard to understand whyLocalWaker
isn’t threadsafe, it can be converted directly to a threadsafe version.- The
AtomicWaker
API in the futures library receives an&LocalWaker
argument. Internally, it converts that immediately to aWaker
. But this means that a library like romio is exclusively dealing with&LocalWaker
, never directly seeing theWaker
type. And yet, because the API it uses makes the conversion toWaker
, it is incompatible with a local-only executor. This is uninituitive and surprising. - Having more versions of things is just inherently more confusing. Especially with multiple ways to construct a Waker/LocalWaker (from Wake or UnsafeWake/RawWaker), there’s now a grid of combinations between different API components, and understanding how they all relate (or don’t) is hard to learn, on top of learning how to use the APIs probably.
It would be much simpler if the API that a future used could just look like this:
struct Waker { ... }
impl Send for Waker { ... }
impl Sync for Waker { ... }
impl Waker {
fn wake(&self) { .... }
}
This is what the API looked like for years (under different names in earlier periods). It seemed to work well. So I asked myself: what are really getting for this additional complexity, and is it worth it?
What are we getting for this?
I talked my concerns through with cramertj, and ultimately we reached these conclusions:
- The first use case - the optimization for the same thread - can easily just
use TLS: either literally checking that its on the same thread or (more likely)
storing its thread-local queue in TLS and checking if the thread-local queue
exists or not. In other words, the first use case really needs no additional
support from the API,
LocalWaker
isn’t necessary for it to be supported. - The second use case is more interesting. There is one thing that indeed
cannot be supported without the API distinction: using an
Rc
, non-atomically reference counted task. There are still other ways to implement a singlethreaded event loop, however:- Using the Task IDs technique instead of the reference counting technique. Panic when you wake from another thread, instead of when you move to it. This strategy works completely fine.
- Using atomic reference counts. Since your application is single threaded, on x86 at least this should have essentially no overhead over using nonatomic reference counts.
So I had to ask myself: is forcing every author of a manual future to deal with this complexity and unintuitiveness worth it to allow one particular of the multiple implementation strategy for a niche executor use case? To me the answer was clear: we’re paying a cost in API ergonomics that doesn’t actually buy us very much.
cramertj agreed. We talked about this before the holidays. When I came back I started this blog series, whereas he just wrote a PR to the RFC. This PR would be the last major change to the futures API before stabilization. By eliminating the distinction between Waker and LocalWaker, I think the waker API becomes much more comprehensible.