References are like jumps

May 13, 2024

In a high-level language, the programmer is deprived of the dangerous power to update his own program while it is running. Even more valuable, he has the power to split his machine into a number of separate variables, arrays, files, etc.; when he wishes to update any of these he must quote its name explicitly on the left of the assignment, so that the identity of the part of the machine subject to change is immediately apparent; and, finally, a high-level language can guarantee that all variables are disjoint, and that updating any one of them cannot possibly have any effect on any other.
Unfortunately, many of these advantages are not maintained in the design of procedures and parameters in ALGOL 60 and other languages. But instead of mending these minor faults, many language designers have preferred to extend them throughout the whole language by introducing the concept of reference, pointer, or indirect address into the language as an assignable item of data. This immediately gives rise in a high-level language to one of the most notorious confusions of machine code, namely that between an address and its contents. Some languages attempt to solve this by even more confusing automatic coercion rules. Worst still, an indirect assignment through a pointer, just as in machine code, can update any store location whatsoever, and the damage is no longer confined to the variable explicitly named as the target of assignment…
Unlike all other values (integers, strings, arrays, files, etc.) references have no meaning independent of a particular run of a program. They cannot be input as data, and they cannot be output as results. If either data or references to data have to be stored on files or backing stores, the problems are immense. And on many machines they have a surprising overhead on performance, for example they will clog up instruction pipelines, data lookahead, slave stores, and even paging systems. References are like jumps, leading wildly from one part of a data structure to another. Their introduction into high-level languages has been a step backward from which we may never recover.

— C.A.R. Hoare, Hints on programming-language design 1974

How embarrassing it is for the practice of software development that, when it comes to the subject of references, we have spent half a century creating an enormous object proof that Sir Tony was correct. Null pointers may have been his billion dollar mistake, but the decision to ignore his remarks about the problem of pointers in general was a trillion dollar mistake that everyone else made.

What Tony Hoare was writing about when he said that references are like jumps was the problem of mutable, aliased state. If you have in a language the ability to alias two variables so that they refer to the same location in memory, and also the ability to assign values to variables as execution progresses, your ability to locally reason about the behavior of a component of your system becomes badly inhibited. Depriving the user of the ability to mutate aliased state by accident is critical to enabling the user to easily create correctly functioning systems.

When I write things along these lines on various scurrilous Internet forums, I sometimes receive the most incredulous response: there’s no way we could get anything done without mutable, aliased state! This is very similar to the response of some programmers when it was suggested that they should reduce the use of the GOTO operator in their program, demonstrating more evidence that Tony Hoare was right: references are like jumps. The incredulity does not phase me much, because this is something I know that I am right about.

Rust’s entire claim to fame is that it has shipped a type system intended to solve this problem. In Rust, references are mutable or aliased, but never both, and the entire system of lifetime variables and borrow analysis is designed to ensure that this remains true. But Rust was developed only ten years ago; what were we doing in the forty years before that?

The first major attempt to solve this problem was object-orientation. Objects would relate code and data together, and one of the promises made about object-orientation was that it would enable users to create abstractions which “encapsulate” mutating state. At first, this must have seemed like a promising solution. Tony Hoare himself endorsed it in his 1972 compilation Structured Programming, which took Edsger Dijkstra’s famous text, a text by himself on basic type-checking, and O.J. Dahl’s notes on Simula, the first object-oriented programming language. (PDF available here.) Hoare’s own remarks in the preface speak to his optimism about this technique:

The third monograph [on Simula] provides a synthesis of the previous two, and expounds the close theoretical and practical connections between the design of data and the design of programs. It introduces useful additional methods for program and data structuring which may be unfamiliar to many programmers.

The problem, though, is that an encapsulated aliased, mutable reference is still an aliased, mutable reference. You have objects with methods that update their state, and those objects are composed of aliased references to other objects, and call the methods on them at will. Hence Joe Armstrong’s comment:

the problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.

Because an object can be updated through a method called on one aliased reference to it, while another aliased reference might depend on some invariant violated by that method, you get what has been called “spooky action a distance.”

Software development being at best protoscientific, the failure of object-oriented abstraction to achieve its goal did not stop it from becoming the dominant practice. But its limitations were recognized early on in academia, which contributed to the growth of interest in pure functional programming. This solved the problem of shared mutable state by eliminating mutation from the language entirely: you could alias state as much as you like, but without assignment you could never mutate it. This enabled techniques like call-by-need, in which functions were only called when their values were actually needed (especially useful for cool psuedo-code demonstrations of infinite functions), and was actually valuable in the context of concurrent systems, because without mutation many race conditions can be automatically eliminated.

For many years, though, functional language researchers struggled with the problem of how to write a whole program which actually perform IO and interact with the system outside of itself. Earlier attempts were based on lazy streams, in which the program was modeled as a function from a stream of inputs to a stream of outputs. It was in this period that the famous Structure and Interpretation of Computer Programs (SICP) was written. It’s chapter 3 “Modularity, Objects, and State” constructs a system based on stateful objects and a system based on stateless streams to compare and contrast them. This text was very influential on my own development as a writer of computer programs.

Toward the end of the chapter, SICP highlights that even though the stream modeled the program as pure functions of stateless, infinitely unfolding streams, the problem of state emerges once again when the programmer tries to merge two streams together:

The trouble with this formulation is in the notion of merge. It will not do to merge the two streams by simply taking alternately one request from Peter and one request from Paul. Suppose Paul accesses the account only very rarely. We could hardly force Peter to wait for Paul to access the account before he could issue a second transaction. However such a merge is implemented, it must interleave the two transaction streams in some way that is constrained by “real time” as perceived by Peter and Paul, in the sense that, if Peter and Paul meet, they can agree that certain transactions were processed before the meeting, and other transactions were processed after the meeting. This is precisely the same constraint that we had to deal with in section 3.4.1, where we found the need to introduce explicit synchronization to ensure a “correct” order of events in concurrent processing of objects with state. Thus, in an attempt to support the functional style, the need to merge inputs from different agents reintroduces the same problems that the functional style was meant to eliminate.
We began this chapter with the goal of building computational models whose structure matches our perception of the real world we are trying to model. We can model the world as a collection of separate, time-bound, interacting objects with state, or we can model the world as a single, timeless, stateless unity. Each view has powerful advantages, but neither view alone is completely satisfactory. A grand unification has yet to emerge.

The authors of SICP were not alone in recognizing this limitation of the lazy stream abstraction. Most significantly, Simon Peyton Jones and Philip Wadler would publish an alternative mechanism for IO, the IO Monad. Unlike lazy streams, the monadic interface for IO was more effectively compositional, and it was adopted by the community around the primary pure functional programming language, Haskell, with great eagerness. Ultimately, the concept of using a monadic interface to represent “side effects” was applied in other ways as well, such as the use of the State monad to represent explicitly the sequencing of updates to mutable state.

In adopting monads, the pure functional programming community finally was able to construct practical systems in a language which prevented the problem of shared mutable state. One might imagine that this breakthrough would lead quickly to world domination of pure functional programming and monadic state and IO management. That is not what happened. There are several reasons we could posit as to why.

One reason would be that this really doesn’t matter; who really cares about aliased, mutable state. This would be the answer of my critics from earlier. To this I can only reply: are you satisfied with the quality of the software that you use on a daily basis? Do you never encountering odd bugs or poor performance that clearly - even to you as a user without knowledge of the system’s internals - arise from the problem of state management? If so, you have a very different experience from me.

Another and more credible answer is that, for better or worse, no one can understand what a monad is or what they’re supposed to do with one. To some extent this is the fault of the advocates of monads, who rather than trying to make their designs clearer to the uninitiated have fallen into a cult-like worship of the awesome power of category theory that convinces none but the true believer. And to some extent one might contend that despite the appealing elegance of the recursive, higher order function, this code is just actually harder to understand for most people than updating state. And all of this is without considering the performance cost of implementing a pure functional language on inherently imperative, stateful machines.

It was this last aspect (achieving high performance) that motivated Rust’s design, of course. The history here is somewhat roundabout. The early design of Rust had a different performance target than the version of Rust that was ultimately shipped as 1.0, but it always had a commitment to limiting the use of shared mutable state. I’ve heard from Graydon that Tony Hoare’s essay quoted in the beginning of this post was in his binder of papers that influenced the direction of Rust. The system that Rust ultimately developed - lifetimes, borrowing, ownership, etc - was the result of Mozilla’s shifting of Rust’s focus to act as a replacement language for C++, the primary language used at Mozilla. It turns out, if you can prevent shared, mutable state at compile time, it becomes a lot easier to prevent data races and implement “RAII” and avoid the use of garbage collection.

A quirk of this history is that Rust has adopted a system to type-check state while still being an impure language with plenty of untyped IO side effects. Another quirk of this history is that Rust has robust facilities for runtime checked aliased mutable state, under names like RefCell and Mutex. These types manage to integrate with Rust’s type system to make shared state viable while still ensuring atomically exclusive access whenever the state will be updated. But their existence means that Rust cannot prevent all bugs around this, so in the case of RefCell runtime panics might still occur, and in the case of Mutex there is still a possibility of deadlock.

It’s worth pausing here to reflect on one of the most unfortunate episodes in Rust’s history, the so-called leakpocalypse. Prior to 1.0, Rust’s standard library had an interesting function, which spawned a second thread that was allowed to take references to values in the stack of the thread which spawned it:

fn scoped<'a, T>(f: impl FnOnce() -> T + Send + 'a) -> JoinGuard<'a, T>

This API worked by blocking in the destructor of JoinGuard to ensure that this thread never accessed values being borrowed by the other thread in a way that would violate Rust’s guarantees about mutable and aliased state. The problem was that Rust’s type system does not guarantee that JoinGuard’s destructor will be run, so it was completely possible to “leak” JoinGuard and not run its destructor, allowing the user to violate Rust’s guarantees. To make matters worse, this hole in the soundness of the API was discovered about 1 month before the planned release of the 1.0 version of Rust.

The team was presented with paths forward: one was to declare this API unsound, and say that indeed it was not possible to have a type which could not be leaked in Rust. The other was to add a new marker trait for types which cannot be safely leaked, similar to the marker traits Send (types which cannot be sent to other threads) and Sync (types which cannot be borrowed by other threads). The team ultimately went with the option that must have seemed less invasive and easier to implement under pressure (removing this API). What helped motivate this decision was the fact that there was a known alternative API for scoped threads based on closures, which became the API that is now available in std.

Unfortunately, the trick used to make scoped threads sound did not also work for scoped async tasks, a flaw which has led to much consternation. On this basis it seems clear now that the decision made in the pressure to resolve “leakpocalypse” was surely the wrong one. But no one knew that at the time. I want to highlight, in a historical sense, what seems to me now like a very misguided aspect of the decision making process. There was a sense in which scoped threads were framed as a “neat trick,” whereas dynamic locking mechanisms like Mutex were perceived as the primary way to share state between threads. I think this thinking was wrong: scoped threads were the most interesting part of Rust’s type system, and scoped async tasks would be even more interesting. The user could effectively eliminate dynamic locks (and with them the potential of deadlock). This is exactly what is now called structured concurrency, and with Rust’s type system it would be a powerful way to implement systems. It seems imperative that Rust fix this mistake to unlock (as it were) its full potential as a systems language.

This brings me to my final point. Though Rust’s rules around lifetimes are usually framed in the public consciousness as “a way to avoid garbage collection,” the reality is that they are a much deeper and more significant construct than that. They are a way to make tractable programming in a language which allows both mutable state and aliased state, by guaranteeing that state is not aliased while it is being mutated. This is an incredibly powerful tool for understanding the behavior of the system because you can analyze the behavior of your system locally: you never need to worry about “spooky action at a distance.” And you can do so without manually reconstructing imperative control flow with monads. As I’ve written several times before: monads are a clever way to show you can program without mutation; lifetimes are an even cleverer way to show you can just use mutation.

Unfortunately, most people seem to have taken the wrong lesson from Rust. They see all of this business with lifetimes and ownership as a dirty mess that Rust has had to adopt because it wanted to avoid garbage collection. But this is completely backwards! Rust adopted rules around shared mutable state and this enabled it to avoid garbage collection. These rules are a good idea regardless.

The most promising alternative to Rust’s techniques that has been widely deployed is the concept of “value semantics.” This has been adopted by languages like Swift and Mojo for example, both developed by Chris Lattner. In this model, any time you mutate a value, if the value is aliased a copy of the value is made so that the aliased value is not changed by your mutation. This has a certain performance overhead, though it can be reduced by the clever design of your data types. Lattner’s languages have both adopted some kind of function argument modifier to try to reduce the overhead further in specifically optimized code. On the basis of questionable benchmarks, Lattner’s current company which promotes Mojo (and has raised an eye-watering 130 million dollars) has released articles claiming that Mojo is both easier and faster than Rust, claims that I must say I profoundly doubt. That’s not to say there aren’t features of Mojo I’d love to see Rust steal (like its better integration of data-parallel programming techniques like SIMD and GPU programming.)

But there is a deeper problem with value semantics than just the performance overhead. Suppose I call a method on an object which mutates it while I also hold a reference to it: it may be the right thing to perform a deep copy of that object, or it may not. Maybe I actually do want shared mutable state, and I want to rearchitect my code to use cells or locks to ensure that it is safe. Or maybe this reflects a flaw in my conception of the algorithm, and the compiler informing me of the conflict in my thought will send me back to the drawing board. In other words, I do not want the compiler to just insert code to uphold the bare minimum guarantees, I want the compiler to check my work for me and assist me in developing an algorithm I can confidently assert is right.

To give a clear example, the classic case of shared mutable state causing a bug is the problem of iterator invalidation. Suppose I am iterating over a dynamically sized array of objects, and I want to push a new object into that array. In a language without memory safety, this could cause that array to be reallocated, making the continued iteration of the array a use after free. In a language with value semantics, that will copy the array and create a new one, essentially erasing my insertion. That’s better than undefined behavior, but its almost certainly not my intention. Maybe I am trying to use the array as a ring buffer, in which case I should use a type which has that semantics. Maybe I meant to push into a different variable. Maybe my thinking is just muddied.

I’m not saying that Rust’s solution is perfect. Indeed, lifetime syntax is arcane and confusing to users. But I think any solution that claims to beat Rust should provide the same level of guarantee or indeed better: ideally new languages would allow for imperative programming without any shared mutable state at all. Along these lines, a language that I find very interesting is Hylo, which also guarantees the avoidance of shared mutable state, but without allowing references to be first class values. There are pros and cons to this approach; I think for any chance of success it will need some sort of effect or coroutine system to represent what in Rust are represented as lifetime-parameterized types. Whether in Hylo or in another language, I eagerly await the development of additional techniques to provide this guarantee, so that, contra Tony Hoare, the introduction of references, like jumps, shall be a step backward from which we have well and truly recovered.