Unsafe Abstractions

January 4, 2018

Unsafety in Rust is often discussed in terms the primitive operations that can only be performed inside of unsafe blocks (such as dereferencing raw pointers and accessing mutable statics). I want to look at it from a different angle from these primitive operations, and instead focus on the capability to produce unsafe abstractions.

The general concept of unsafe abstractions

An unsafe abstraction is a new abstraction which requires the unsafe keyword to apply to some context (this is an intentionally “abstract” definition, because as we will see there are several highly divergent forms of unsafe abstraction supported in Rust). The unsafe keyword is required to apply the abstraction because the abstraction introduces some invariant which cannot be type checked and which the rest of the program is allowed to assume is maintained in order to assume type safety.

To give a single concrete example, the slice::from_raw_parts function is an unsafe abstraction which allows users to create a slice from a raw pointer and a length. This function has several untyped invariants which must be maintained:

The pointer must refer to an array of type T of at least the length of the second argument.
The array must be valid to dereference into for the entire lifetime 'a.

In any unsafe abstraction, there are always three components, which I am going to assign (possibly arbitrary) names to:

The abstractive: A component which introduces a new untyped invariant.
The applicative: A component which upholds that invariant.
The assumptive: A component which is correct so long as that invariant has been correctly upheld.

It’s worth noting that the assumptive component can only rely on the applicative component upholding the particular invariant introduces by the abstractive component. You cannot just assume additional invariants will be upheld. If additional invariants are necessary, they can be introduced, but doing so is a breaking change to the API.

Finally, it is very important that the applicative component - in which the user asserts that they are upholding the invariant - involve the unsafe keyword. The assumptive component, in contrast, does not necessarily need to use unsafe - it just assumes that an invariant is upheld, it does not claim to uphold any invariant.

Now I want to go through some of the kinds of unsafe abstractions that can be introduced in Rust.

Unsafe functions

Functions and inherent methods can be marked unsafe in Rust. The slice::from_raw_parts function mentioned earlier is one example, but there are many.

An unsafe function is broken up like this:

The abstractive: The function signature, in using the unsafe keyword, introduces a new untyped invariant, which should be documented.
The applicative: Any caller of the function applies that abstraction and guarantees that it upholds the invariant the function requires.
The assumptive: The body of the function, and any other code that relies on state controlled by that function (e.g. something that uses its return value) assumes that the invariant is upheld by the function’s caller.

Unsafe traits

Another kind of abstraction which is quite different from function abstraction is the unsafe trait. The Send, for example, is an unsafe trait.

Here the breakdown is quite different:

The abstractive: The trait definition introduces an invariant which must be true of any implementation of this trait (for example, Send’s invariant is that the type it is implemented for can be passed between threads).
The applicative: Any implementation of this trait must uphold the invariant introduced by that trait.
The assumptive: Any time the bound T: Send is used, the assumption is made that the invariant - that T can be passed between threads - is upheld by every implementation of that trait.

Unsafe associated items (e.g. unsafe trait methods)

The most subtle form of unsafe abstraction is probably unsafe associated items. This refers to methods and associated functions primarily. Within a trait, it is possible to tag a particular function declaration as unsafe. The breakdown of the components of this unsafe abstraction is similar to unsafe functions:

The abstractive: The function signature in the trait introduces a new untyped invariant.
The applicative: Any caller of any instance of that trait function must uphold the invariant introduced by the trait definition.
The assumptive: All implementers of that trait can assume that the invariant is upheld, but only the invariant introduced by the trait declaration. Implementations must not introduce new invariants.

The most surprising aspect of unsafe trait methods is the distance between introducing the invariant and relying on it. It might seem natural, when implementing an unsafe method, to think that you can introduce invariants of your own. But if you’re in a trait, this is not correct - in order for generic method calls to work, every implementation must rely on the same invariants, not new ones of their own.

Safe implementations of unsafe methods

What becomes even more frustrating, though, is when your particular implementation doesn’t actually rely on the invariant that the trait has introduced. What you’d like to be able to do here is drop the unsafe keyword, asserting that your particular implementation is safe. Then, in a concrete context, others can call this method from safe code without upholding the invariant or using an unsafe block. You’ll also get all the checking a safe function would have in your implementation, helping you assure that your implementation actually is safe.

This does seem like a particularly difficult feature to add to the language - just allow safe implementations of unsafe methods - but it runs into the problem that some code in the wild is currently relying (incorrectly, in my opinion) on the requirement that every implementation be marked unsafe.

A particular example of what I mean is tokio’s AsyncRead trait. This trait has one unsafe method, but as it is clearly documented, that method is actually safe:

This function isn’t actually unsafe to call but unsafe to implement. The implementer must ensure that either the whole buf has been zeroed or read_buf() overwrites the buffer without reading it and returns the correct value.

Here, the division of the unsafe abstraction is somewhat different from what I outlined above:

The abstractive: The method signature introduces an invariant.
The applicative: Every implementation upholds that invariant.
The assumptive: Every caller can assume that invariant is upheld.

Because the applicative is the implementation, it is not acceptable to ever allow safe implementations of this method.

Given the set of tools we have available, this actually most clearly follows the pattern of an unsafe trait, not an unsafe method, and probably the unsafe keyword “should” be moved to unsafe trait AyncRead instead of on this particular method. However, this also works today (as long as we don’t allow safe implementations of unsafe methods), so there currently isn’t an impetus to change it.

It also is possible that there is a way to be more expressive in declaring the components of this unsafe abstraction, while keeping the relatively simple mental model that the current unsafe keyword has (e.g. some way to say clearly that this method is unsafe to implement, not unsafe to call).