Unsafe Abstractions
Unsafety in Rust is often discussed in terms the primitive operations that can only be performed inside of unsafe blocks (such as dereferencing raw pointers and accessing mutable statics). I want to look at it from a different angle from these primitive operations, and instead focus on the capability to produce unsafe abstractions.
The general concept of unsafe abstractions
An unsafe abstraction is a new abstraction which requires the unsafe
keyword
to apply to some context (this is an intentionally “abstract” definition,
because as we will see there are several highly divergent forms of unsafe
abstraction supported in Rust). The unsafe
keyword is required to apply the
abstraction because the abstraction introduces some invariant which cannot be
type checked and which the rest of the program is allowed to assume is
maintained in order to assume type safety.
To give a single concrete example, the
slice::from_raw_parts
function is an unsafe abstraction
which allows users to create a slice from a raw pointer and a length. This
function has several untyped invariants which must be maintained:
- The pointer must refer to an array of type
T
of at least the length of the second argument. - The array must be valid to dereference into for the entire lifetime
'a
.
In any unsafe abstraction, there are always three components, which I am going to assign (possibly arbitrary) names to:
- The abstractive: A component which introduces a new untyped invariant.
- The applicative: A component which upholds that invariant.
- The assumptive: A component which is correct so long as that invariant has been correctly upheld.
It’s worth noting that the assumptive component can only rely on the applicative component upholding the particular invariant introduces by the abstractive component. You cannot just assume additional invariants will be upheld. If additional invariants are necessary, they can be introduced, but doing so is a breaking change to the API.
Finally, it is very important that the applicative component - in which the
user asserts that they are upholding the invariant - involve the unsafe
keyword. The assumptive component, in contrast, does not necessarily need to
use unsafe
- it just assumes that an invariant is upheld, it does not claim
to uphold any invariant.
Now I want to go through some of the kinds of unsafe abstractions that can be introduced in Rust.
Unsafe functions
Functions and inherent methods can be marked unsafe in Rust. The
slice::from_raw_parts
function mentioned earlier is one
example, but there are many.
An unsafe function is broken up like this:
- The abstractive: The function signature, in using the
unsafe
keyword, introduces a new untyped invariant, which should be documented. - The applicative: Any caller of the function applies that abstraction and guarantees that it upholds the invariant the function requires.
- The assumptive: The body of the function, and any other code that relies on state controlled by that function (e.g. something that uses its return value) assumes that the invariant is upheld by the function’s caller.
Unsafe traits
Another kind of abstraction which is quite different from function abstraction
is the unsafe trait. The Send
, for example, is an unsafe trait.
Here the breakdown is quite different:
- The abstractive: The trait definition introduces an invariant which must
be true of any implementation of this trait (for example,
Send
’s invariant is that the type it is implemented for can be passed between threads). - The applicative: Any implementation of this trait must uphold the invariant introduced by that trait.
- The assumptive: Any time the bound
T: Send
is used, the assumption is made that the invariant - thatT
can be passed between threads - is upheld by every implementation of that trait.
Unsafe associated items (e.g. unsafe trait methods)
The most subtle form of unsafe abstraction is probably unsafe associated items.
This refers to methods and associated functions primarily. Within a trait, it
is possible to tag a particular function declaration as unsafe
. The breakdown
of the components of this unsafe abstraction is similar to unsafe functions:
- The abstractive: The function signature in the trait introduces a new untyped invariant.
- The applicative: Any caller of any instance of that trait function must uphold the invariant introduced by the trait definition.
- The assumptive: All implementers of that trait can assume that the invariant is upheld, but only the invariant introduced by the trait declaration. Implementations must not introduce new invariants.
The most surprising aspect of unsafe trait methods is the distance between introducing the invariant and relying on it. It might seem natural, when implementing an unsafe method, to think that you can introduce invariants of your own. But if you’re in a trait, this is not correct - in order for generic method calls to work, every implementation must rely on the same invariants, not new ones of their own.
Safe implementations of unsafe methods
What becomes even more frustrating, though, is when your particular
implementation doesn’t actually rely on the invariant that the trait has
introduced. What you’d like to be able to do here is drop the unsafe keyword,
asserting that your particular implementation is safe. Then, in a concrete
context, others can call this method from safe code without upholding the
invariant or using an unsafe
block. You’ll also get all the checking a safe
function would have in your implementation, helping you assure that your
implementation actually is safe.
This does seem like a particularly difficult feature to add to the language - just allow safe implementations of unsafe methods - but it runs into the problem that some code in the wild is currently relying (incorrectly, in my opinion) on the requirement that every implementation be marked unsafe.
A particular example of what I mean is tokio’s AsyncRead trait. This trait has one unsafe method, but as it is clearly documented, that method is actually safe:
This function isn’t actually
unsafe
to call butunsafe
to implement. The implementer must ensure that either the wholebuf
has been zeroed orread_buf()
overwrites the buffer without reading it and returns the correct value.
Here, the division of the unsafe abstraction is somewhat different from what I outlined above:
- The abstractive: The method signature introduces an invariant.
- The applicative: Every implementation upholds that invariant.
- The assumptive: Every caller can assume that invariant is upheld.
Because the applicative is the implementation, it is not acceptable to ever allow safe implementations of this method.
Given the set of tools we have available, this actually most clearly follows
the pattern of an unsafe trait, not an unsafe method, and probably the unsafe
keyword “should” be moved to unsafe trait AyncRead
instead of on this
particular method. However, this also works today (as long as we don’t
allow safe implementations of unsafe methods), so there currently isn’t an
impetus to change it.
It also is possible that there is a way to be more expressive in declaring the
components of this unsafe abstraction, while keeping the relatively simple
mental model that the current unsafe
keyword has (e.g. some way to say
clearly that this method is unsafe to implement, not unsafe to call).