Changing the rules of Rust
In Rust, there are certain API decisions about what is and isn’t sound that impact all Rust code. That is, a decision was made to allow or not allow types which have certain safety requirements, and now all users are committed to that decision. They can’t just use a different API with different rules: all APIs must conform to these rules.
These rules are determined through certain “marker” traits. If a safe API could do something to a value of a type which some types don’t support, the API must be bound by that marker trait, so that users can not pass values of those types which don’t support that behavior to that API. In contrast, if Rust allows APIs to perform that behavior on any type, without any sort of marker trait bound, then types which don’t support that behavior cannot exist.
I’m going to give three examples to show what I mean, each of which Rust has considered at different points, though only the first one actually exists in Rust.
Rules of Rust
Send
Let’s say you want Rust to support types which can’t be sent across threads. There are a couple of examples of why you would want this:
- The type provides shared ownership without synchronizing interior mutable writes to the reference
count (e.g.
Rc
) - The type may wrap an OS API that doesn’t guaranteee thread safety (e.g.
Args
,MutexGuard
)
To support this, you would include a marker trait called Send
, which is the set of types which
can be sent across threads. Any API which might send a value to another thread needs to include a
Send
bound, such as:
thread::spawn
, which spawns a threadrayon::join
, which runs two tasks on a thread pooltokio::spawn
, which may move this task to another thread of the executor
Of course, Rust chose to support types which can’t be sent across threads, and so it has a Send
bound. But an alternative Rust could have just as easily chosen that all types in Rust must
support sending across threads, and effectively all interior mutability would need to be
synchronized. Indeed, the Send
trait is actually a decision fully enforced by the standard
library: someone could release an “alternative libcore” which imposes this requirement with no
change to rustc, though it wouldn’t be compatible with any of the Rust code that exists in the
world.
Move
Let’s say you want Rust to support types which can’t be invalidated without running their destructor once their address has been witnessed. This is a sort of wonky and specific definition of “immoveable type,” but it happens to fit perfectly for what stackless coroutines and intrusive data structures require.
To support this, you would include a marker trait called Move
, which is the set of types that can
be moved freely. Unlike Send
, Move
requires some language support: I think the simplest way to
implement it would be to say that operations that take the address of something take ownership of
types if they don’t implement Move
(so let x = &mut y;
takes ownership of y
, effectively
preventing you from ever moving it again.) And the magic behavior of Box
which lets you move out
of it would need to be bound by Move
as well.
Additionally, certain APIs would need to be bounded by Move
, which let you move out of a
reference, such as:
mem::swap
lets you swap values behind two mutable referencesmem::replace
lets you replace the value behind a mutable reference with another
You’ll notice that Rust doesn’t have a Move
trait; instead, it provides the same guarantees using
the Pin
wrapper around pointer types. Even though the Move
trait would have probably been a
much easier to use API, it proved difficult to add in a backward compatible way (I’ll explain why in
a moment), so instead the Pin
API was added and used only in the new interfaces that required
these semantics.
Leak
Let’s say you want Rust to support types which can’t go out of scope without running their destructor. This is one of the two different definitions of “linear types,” it is less expressive than the other (which would also prevent a destructor from running, requiring the type to be destructured as its final end), but it is the easier of the two to add to the language (because it works better with generics), and it supports all of the most compelling use cases for linear types.
To support this, you would include a marker trait called Leak
, which is the set of types that can
go out of scope without running their destructors. Like Send
and unlike Move
, this would require
no language support at all: its not possible to “leak” a value in the core language of Rust, you
have to use standard library APIs to do it.
Certain APIs would have to be bound by Leak
:
- APIs that always leak a value (
mem::forget
) - APIs that make it your responsibility to run the destructor (
ManuallyDrop::new
) - APIs that allow cyclic shared ownership and can accidentally leak values (
Rc::new
,Arc::new
)
Of course, Rust doesn’t have the Leak
trait, but it almost did. This discussion came to a head in
early 2015, when the scoped thread API that Rust was using was found to be unsound, because its
safety depended on its guard type never leaking. It was decided (in some haste, because the 1.0
release was scheduled within a few months of the controversy) that Rust would not support types that
can’t be leaked, and so the Leak
trait would not be added.
Changing the rules
There’s been a renewed interest in supporting linear types in Rust, especially because of what I
called the scoped task trilemma, which is only true because of the fact that destructors
cannot be guaranteed to run. Unlike immoveable types, there is no isolated API addition that could
support guaranteeing that a destructor will run, the way there was with Pin
. (You can guarantee a
destructor will run if you never give up ownership of the object and use a kind of closure passing
style, but this isn’t adequate for the “scoped task” use case). So some users would like to see Rust
add a Leak
trait.
There are two possible ways a marker trait like Leak
could be added to Rust:
- Auto trait: you could add a new auto trait, like
Send
andSync
- ?Trait: you could add a new “?Trait,” like
Sized
Each of these presents certain challenges regarding backward compatibility.
Auto traits
At first glance, adding auto traits might seem like a backward compatible change. You add a new
trait, Leak
, which says that a type can be leaked. Types that don’t implement this trait
cannot go out of scope without running their destructor. Because all types in Rust today necessarily
can be leaked (this is a consequence of the decision not to have a Leak
trait), its perfectly
fine for all types to implement Leak
. This is the semantics of an auto trait, so it sounds like it
should work great.
The problem comes when you go to add bounds to the APIs that can be used to leak values, like mem::forget
. If you want to make it so that types that don’t implement Leak
d
cannot be leaked, you need to add a bound to mem::forget
. But there are two ways in which this
is not backward compatible.
First, it does not work with generics. This code is legal today, but would break if you add a
Leak
bound to mem::forget
:
pub fn forget_generic<T>(value: T) {
mem::forget(value);
}
This is because there is no Leak
bound on the type parameter for this function. Adding such a
bound to the API of mem::forget
(or any other API that can forget values) would be a breaking
change.
Another way in which it is not backward compatible is that trait object types will not implement
Leak
unless they add + Leak
. Trait object types do not inherit impls by way of auto traits,
because you don’t actually know what type the trait object is. So a trait object like dyn Future
does not implement Leak
. For example:
pub fn forget_trait_object(object: Box<dyn Display>) {
mem::forget(object);
}
?Traits
So if adding an auto trait is not backward compatible, we are left turning to the ?Trait
solution.
But there are problems here as well.
in the strictest sense, adding a new trait like ?Leak
is backward compatible. Instead of adding
new bounds to APIs like mem::forget
, you would be relaxing the bounds on other APIs. So all of
the code above would be fine, because to make a generic function that takes a linear type, you would
have to write a ?Leak
bound.
The first problem with this for something like Leak
is that the vast majority of generic APIs in
Rust cannot possibly forget their value; after all, even though memory leaks are not undefined
behavior, they are still undesirable and mostly avoided. This is quite different from Sized
:
because passing something by value requires Sized
, the large majority of generic APIs in Rust
require Sized
, and so ?Sized
bounds are relatively rare. In contrast, adding ?Leak
would
create a permanent scar across the ecosystem, as the vast majority of generics would gain a T: ?Leak
bound.
The second problem is bigger, though: the interaction between ?Traits
and associated types means
that it is a breaking change to add a ?Trait
bound to an associated type. This means that any
stable associated type in the standard library cannot gain a ?Leak
bound.
Consider this example:
pub fn forget_iterator(iter: impl Iterator) {
iter.for_each(mem::forget);
}
This will forget every element of the iterator, even though the Iterator::Item
associated type is
never mentioned. Therefore, Iterator::Item
must implement Leak
, always. The compiler is allowed
to assume that the item of every iterator implements Leak
, and it would be a breaking change to
invalidate that assumption.
The implications of this are far reaching. If Leak
were added as a ?Trait
, all of these things
would not be possible with linear types:
- Iterator: You couldn’t construct an iterator of linear types.
- Future: You couldn’t construct a future that evaluates to a linear type (and so you couldn’t return a linear type from an async function).
- Deref: You couldn’t dereference a Box containing a linear type, or dereference a vector of linear types to a slice of linear types, or any other smart pointer type.
- Index: You couldn’t index a collection of a linear type, so you couldn’t index into a slice of linear types or into a map with linear values.
- Add/Sub/Mul/Div: You couldn’t have a linear type as the output value of any overloaded arithmetic operator.
One special set of associated types is the return value of the Fn
traits. The Rust project has
specifically made it difficult to refer to the Output
associated type of these traits, so that it
would have flexibility to change this in the future. Still, certain issues were encountered in the
past when experimenting with a Move
trait. It’s not clear to me if these issues would have been
encountered with other traits (like Leak
) or if they were specific to the built-in semantics of
Move
.
Basically, there is a very steep trade off in choosing to add a new ?Trait
- especially one that
wouldn’t be relevant to very many bounds, like Leak
- because you will add confusing new syntax to
a huge variety of generic interfaces in exchange for getting a very limited new feature. I would
consider adopting something like ?Leak
a poor cost-benefit analysis, even if we do accept that the
current Rust rules are “wrong” and we wish Rust had linear types.
Editions
The final thing that needs to be considered is the edition mechanism. Is it possible, using the edition mechanism, to introduce one of these traits? Maybe.
Each edition in effect forms a “dialect” of Rust, all of which are supported by the same compiler.
So at first glance it sounds plausible that in one dialect of Rust, all types can be forgotten, and
in another dialect, the Leak
trait exists. The problem is that a hard requirement on editions is
that crates from one edition can depend on crates from another edition, so that the upgrade from one
edition to the next is seamless and voluntary.
Consider that the Rust project decided to add the Leak
trait in the 2024 edition. All of the
code in the 2021 edition needs to still work - including code like the examples I showed above.
Sure, you can use a tool like cargo fix to add Leak
bounds everywhere and expect users to relax
them on their own as they move into the 2024 edition, but the code from the 2021 edition needs to
work without modification.
A way to possibly do this would be to make trait coherence depend on edition, so that in pre-2024
editions, every type meets the Leak
bounds, even types that absolutely shouldn’t, like unbounded
generics, and trait object types that don’t mention Leak
. If, in the pre-2024 editions, all
types implement Leak
, then the added bounds will never fail.
However - and this is the big, enormous caveat of this section - this would rely on the compiler
effectively implementing some kind of unbreakable firewall, to prevent a type that does not
implement Leak
from ever getting used with pre-2024 code. This would mean:
- Any time post-2024 code instantiates a generic from pre-2024 code, it needes to check that the
type it instatiates it with implements
Leak
, even though the pre-2024 code doesn’t have such a bound. - Any time pre-2024 code calls post-2024 code, it needs to check that the types it gets from that
API implement
Leak
(under the 2024 edition rules). Note that the standard library would be considered post-2024 code, so every call to a std API would involve checking types forLeak
in the pre-2024 editions.
I don’t know if this is even possible to implement, probably it would have pretty bad effect on
compile times at least on pre-2024 edition code, and it would likely create a very difficult
transition at first. But in the longer term at least it would leave Leak
in the “right” state
(as an auto trait, which can be used with old associated types correctly, and not as a ?Trait).
What should be done
If I could go back in time to 2015, I think I would probably add both Move
and Leak
to Rust.
There are certain downsides, which were highlighted by the team at the time in their decision not to
add Leak
: any trait object type which needs to meet these bounds needs to add + Move
or + Leak
to their definition. Having two auto traits of such global significance (Send
and Sync
)
was already considered enough of a burden.
But if I’m honest, I have the impression (I wasn’t there) that the decision to exclude Leak
was at
least partly a matter of expediency: my understanding is that the Rust team at Mozilla was under
incredible pressure from their management to ship a 1.0 on the deadline they set, and this probably
influenced their decision not to make any last minute changes to the rules that could impact meeting
that deadline.
In a language with Leak
, the scoped task trilemma wouldn’t exist, the simpler scoped thread API
would be safe, GC-integration would possibly be easier, and I’ve gotten the impression many systems
APIs would be easier to wrap safely (though I don’t know the details of this).
In a language with Move
, the Pin
type wouldn’t need to exist, users would therefore not have
such annoyance dealing with it, and so-called “pin projections” wouldn’t be an issue requiring
macros to solve, and making self-referential generators would present no complication for the
Iterator
trait.
However, making these changes now is a much thornier problem. I think the edition-based technique is
the only viable solution for adding a new, globally relevant marker trait (except for certain unique
exceptions, like DynSized
, not discussed here). And I think there are all kinds of reasons to
think that wouldn’t work - that it would be too difficult to implement, that the implementation
would have too many soundness holes, that the transition would be too disruptive, that it’s actually
totally impossible because I’ve missed something important.
This is why I’m very glad we found the Pin
solution for immoveable types, and were able to ship
self-referential futures and async/await syntax on top of them, in a reasonable time frame without
major disruption to all existing users. When it comes to Leak
and linear types, I just despair.