Changing the rules of Rust

September 17, 2023

In Rust, there are certain API decisions about what is and isn’t sound that impact all Rust code. That is, a decision was made to allow or not allow types which have certain safety requirements, and now all users are committed to that decision. They can’t just use a different API with different rules: all APIs must conform to these rules.

These rules are determined through certain “marker” traits. If a safe API could do something to a value of a type which some types don’t support, the API must be bound by that marker trait, so that users can not pass values of those types which don’t support that behavior to that API. In contrast, if Rust allows APIs to perform that behavior on any type, without any sort of marker trait bound, then types which don’t support that behavior cannot exist.

I’m going to give three examples to show what I mean, each of which Rust has considered at different points, though only the first one actually exists in Rust.

Rules of Rust

`Send`

Let’s say you want Rust to support types which can’t be sent across threads. There are a couple of examples of why you would want this:

The type provides shared ownership without synchronizing interior mutable writes to the reference count (e.g. Rc)
The type may wrap an OS API that doesn’t guaranteee thread safety (e.g. Args, MutexGuard)

To support this, you would include a marker trait called Send, which is the set of types which can be sent across threads. Any API which might send a value to another thread needs to include a Send bound, such as:

thread::spawn, which spawns a thread
rayon::join, which runs two tasks on a thread pool
tokio::spawn, which may move this task to another thread of the executor

Of course, Rust chose to support types which can’t be sent across threads, and so it has a Send bound. But an alternative Rust could have just as easily chosen that all types in Rust must support sending across threads, and effectively all interior mutability would need to be synchronized. Indeed, the Send trait is actually a decision fully enforced by the standard library: someone could release an “alternative libcore” which imposes this requirement with no change to rustc, though it wouldn’t be compatible with any of the Rust code that exists in the world.

`Move`

Let’s say you want Rust to support types which can’t be invalidated without running their destructor once their address has been witnessed. This is a sort of wonky and specific definition of “immoveable type,” but it happens to fit perfectly for what stackless coroutines and intrusive data structures require.

To support this, you would include a marker trait called Move, which is the set of types that can be moved freely. Unlike Send, Move requires some language support: I think the simplest way to implement it would be to say that operations that take the address of something take ownership of types if they don’t implement Move (so let x = &mut y; takes ownership of y, effectively preventing you from ever moving it again.) And the magic behavior of Box which lets you move out of it would need to be bound by Move as well.

Additionally, certain APIs would need to be bounded by Move, which let you move out of a reference, such as:

mem::swap lets you swap values behind two mutable references
mem::replace lets you replace the value behind a mutable reference with another

You’ll notice that Rust doesn’t have a Move trait; instead, it provides the same guarantees using the Pin wrapper around pointer types. Even though the Move trait would have probably been a much easier to use API, it proved difficult to add in a backward compatible way (I’ll explain why in a moment), so instead the Pin API was added and used only in the new interfaces that required these semantics.

`Leak`

Let’s say you want Rust to support types which can’t go out of scope without running their destructor. This is one of the two different definitions of “linear types,” it is less expressive than the other (which would also prevent a destructor from running, requiring the type to be destructured as its final end), but it is the easier of the two to add to the language (because it works better with generics), and it supports all of the most compelling use cases for linear types.

To support this, you would include a marker trait called Leak, which is the set of types that can go out of scope without running their destructors. Like Send and unlike Move, this would require no language support at all: its not possible to “leak” a value in the core language of Rust, you have to use standard library APIs to do it.

Certain APIs would have to be bound by Leak:

APIs that always leak a value (mem::forget)
APIs that make it your responsibility to run the destructor (ManuallyDrop::new)
APIs that allow cyclic shared ownership and can accidentally leak values (Rc::new, Arc::new)

Of course, Rust doesn’t have the Leak trait, but it almost did. This discussion came to a head in early 2015, when the scoped thread API that Rust was using was found to be unsound, because its safety depended on its guard type never leaking. It was decided (in some haste, because the 1.0 release was scheduled within a few months of the controversy) that Rust would not support types that can’t be leaked, and so the Leak trait would not be added.

Changing the rules

There’s been a renewed interest in supporting linear types in Rust, especially because of what I called the scoped task trilemma, which is only true because of the fact that destructors cannot be guaranteed to run. Unlike immoveable types, there is no isolated API addition that could support guaranteeing that a destructor will run, the way there was with Pin. (You can guarantee a destructor will run if you never give up ownership of the object and use a kind of closure passing style, but this isn’t adequate for the “scoped task” use case). So some users would like to see Rust add a Leak trait.

There are two possible ways a marker trait like Leak could be added to Rust:

Auto trait: you could add a new auto trait, like Send and Sync
?Trait: you could add a new “?Trait,” like Sized

Each of these presents certain challenges regarding backward compatibility.

Auto traits

At first glance, adding auto traits might seem like a backward compatible change. You add a new trait, Leak, which says that a type can be leaked. Types that don’t implement this trait cannot go out of scope without running their destructor. Because all types in Rust today necessarily can be leaked (this is a consequence of the decision not to have a Leak trait), its perfectly fine for all types to implement Leak. This is the semantics of an auto trait, so it sounds like it should work great.

The problem comes when you go to add bounds to the APIs that can be used to leak values, like mem::forget. If you want to make it so that types that don’t implement Leakd cannot be leaked, you need to add a bound to mem::forget. But there are two ways in which this is not backward compatible.

First, it does not work with generics. This code is legal today, but would break if you add a Leak bound to mem::forget:

pub fn forget_generic<T>(value: T) {
    mem::forget(value);
}

This is because there is no Leak bound on the type parameter for this function. Adding such a bound to the API of mem::forget (or any other API that can forget values) would be a breaking change.

Another way in which it is not backward compatible is that trait object types will not implement Leak unless they add + Leak. Trait object types do not inherit impls by way of auto traits, because you don’t actually know what type the trait object is. So a trait object like dyn Future does not implement Leak. For example:

pub fn forget_trait_object(object: Box<dyn Display>) {
    mem::forget(object);
}

?Traits

So if adding an auto trait is not backward compatible, we are left turning to the ?Trait solution. But there are problems here as well.

in the strictest sense, adding a new trait like ?Leak is backward compatible. Instead of adding new bounds to APIs like mem::forget, you would be relaxing the bounds on other APIs. So all of the code above would be fine, because to make a generic function that takes a linear type, you would have to write a ?Leak bound.

The first problem with this for something like Leak is that the vast majority of generic APIs in Rust cannot possibly forget their value; after all, even though memory leaks are not undefined behavior, they are still undesirable and mostly avoided. This is quite different from Sized: because passing something by value requires Sized, the large majority of generic APIs in Rust require Sized, and so ?Sized bounds are relatively rare. In contrast, adding ?Leak would create a permanent scar across the ecosystem, as the vast majority of generics would gain a T: ?Leak bound.

The second problem is bigger, though: the interaction between ?Traits and associated types means that it is a breaking change to add a ?Trait bound to an associated type. This means that any stable associated type in the standard library cannot gain a ?Leak bound.

Consider this example:

pub fn forget_iterator(iter: impl Iterator) {
    iter.for_each(mem::forget);
}

This will forget every element of the iterator, even though the Iterator::Item associated type is never mentioned. Therefore, Iterator::Item must implement Leak, always. The compiler is allowed to assume that the item of every iterator implements Leak, and it would be a breaking change to invalidate that assumption.

The implications of this are far reaching. If Leak were added as a ?Trait, all of these things would not be possible with linear types:

Iterator: You couldn’t construct an iterator of linear types.
Future: You couldn’t construct a future that evaluates to a linear type (and so you couldn’t return a linear type from an async function).
Deref: You couldn’t dereference a Box containing a linear type, or dereference a vector of linear types to a slice of linear types, or any other smart pointer type.
Index: You couldn’t index a collection of a linear type, so you couldn’t index into a slice of linear types or into a map with linear values.
Add/Sub/Mul/Div: You couldn’t have a linear type as the output value of any overloaded arithmetic operator.

One special set of associated types is the return value of the Fn traits. The Rust project has specifically made it difficult to refer to the Output associated type of these traits, so that it would have flexibility to change this in the future. Still, certain issues were encountered in the past when experimenting with a Move trait. It’s not clear to me if these issues would have been encountered with other traits (like Leak) or if they were specific to the built-in semantics of Move.

Basically, there is a very steep trade off in choosing to add a new ?Trait - especially one that wouldn’t be relevant to very many bounds, like Leak - because you will add confusing new syntax to a huge variety of generic interfaces in exchange for getting a very limited new feature. I would consider adopting something like ?Leak a poor cost-benefit analysis, even if we do accept that the current Rust rules are “wrong” and we wish Rust had linear types.

Editions

The final thing that needs to be considered is the edition mechanism. Is it possible, using the edition mechanism, to introduce one of these traits? Maybe.

Each edition in effect forms a “dialect” of Rust, all of which are supported by the same compiler. So at first glance it sounds plausible that in one dialect of Rust, all types can be forgotten, and in another dialect, the Leak trait exists. The problem is that a hard requirement on editions is that crates from one edition can depend on crates from another edition, so that the upgrade from one edition to the next is seamless and voluntary.

Consider that the Rust project decided to add the Leak trait in the 2024 edition. All of the code in the 2021 edition needs to still work - including code like the examples I showed above. Sure, you can use a tool like cargo fix to add Leak bounds everywhere and expect users to relax them on their own as they move into the 2024 edition, but the code from the 2021 edition needs to work without modification.

A way to possibly do this would be to make trait coherence depend on edition, so that in pre-2024 editions, every type meets the Leak bounds, even types that absolutely shouldn’t, like unbounded generics, and trait object types that don’t mention Leak. If, in the pre-2024 editions, all types implement Leak, then the added bounds will never fail.

However - and this is the big, enormous caveat of this section - this would rely on the compiler effectively implementing some kind of unbreakable firewall, to prevent a type that does not implement Leak from ever getting used with pre-2024 code. This would mean:

Any time post-2024 code instantiates a generic from pre-2024 code, it needes to check that the type it instatiates it with implements Leak, even though the pre-2024 code doesn’t have such a bound.
Any time pre-2024 code calls post-2024 code, it needs to check that the types it gets from that API implement Leak (under the 2024 edition rules). Note that the standard library would be considered post-2024 code, so every call to a std API would involve checking types for Leak in the pre-2024 editions.

I don’t know if this is even possible to implement, probably it would have pretty bad effect on compile times at least on pre-2024 edition code, and it would likely create a very difficult transition at first. But in the longer term at least it would leave Leak in the “right” state (as an auto trait, which can be used with old associated types correctly, and not as a ?Trait).

What should be done

If I could go back in time to 2015, I think I would probably add both Move and Leak to Rust. There are certain downsides, which were highlighted by the team at the time in their decision not to add Leak: any trait object type which needs to meet these bounds needs to add + Move or + Leak to their definition. Having two auto traits of such global significance (Send and Sync) was already considered enough of a burden.

But if I’m honest, I have the impression (I wasn’t there) that the decision to exclude Leak was at least partly a matter of expediency: my understanding is that the Rust team at Mozilla was under incredible pressure from their management to ship a 1.0 on the deadline they set, and this probably influenced their decision not to make any last minute changes to the rules that could impact meeting that deadline.

In a language with Leak, the scoped task trilemma wouldn’t exist, the simpler scoped thread API would be safe, GC-integration would possibly be easier, and I’ve gotten the impression many systems APIs would be easier to wrap safely (though I don’t know the details of this).

In a language with Move, the Pin type wouldn’t need to exist, users would therefore not have such annoyance dealing with it, and so-called “pin projections” wouldn’t be an issue requiring macros to solve, and making self-referential generators would present no complication for the Iterator trait.

However, making these changes now is a much thornier problem. I think the edition-based technique is the only viable solution for adding a new, globally relevant marker trait (except for certain unique exceptions, like DynSized, not discussed here). And I think there are all kinds of reasons to think that wouldn’t work - that it would be too difficult to implement, that the implementation would have too many soundness holes, that the transition would be too disruptive, that it’s actually totally impossible because I’ve missed something important.

This is why I’m very glad we found the Pin solution for immoveable types, and were able to ship self-referential futures and async/await syntax on top of them, in a reasonable time frame without major disruption to all existing users. When it comes to Leak and linear types, I just despair.