Memory Consistency Models

Modern CPUs are almost exclusively formed out of several cores, also known as hardware threads (HARTs) with a complex hierarchy of caches between them and the system’s memory.

This doesn’t seem all that surprising, but the reality is that reasoning about these systems is very hard. I like to view it as a battle between CPU designers¹ programmers and compilers given that each of these groups has conflicting interests:

CPU designers want the least amount of work: simpler logic, potentially higher clock cycle, less things to worry about / verify…
Programmers want concurrent code to be easy to reason about. Concurrency is hard enough without taking memory models into account, which only makes it much worse.
Compiler people don’t need more stuff to worry about. Defining the memory model at a language level and offering primitives to programmers so that they can specify the precise semantics of their code is a lot of work. At the same time, they have to worry about not breaking such code in the optimization process.

Quoting [1]:

A memory consistency model, or, more simply, a memory model, is a specification of the allowed behavior of multithreaded programs executing with shared memory. For a multithreaded program executing with speciﬁc input data, it specifies what values dynamic loads may return. Unlike a single-threaded execution, multiple correct behaviors are usually allowed.

There has been a lot of progress in the study of memory models in the last couple of decades. Most notably, the formalization of the consistency models and the use of tools to reason about them have gained a lot of traction. Early work [2] already demonstrated absolute mastery of the subject, but it seems to me² that the triumph of x86 and its very strict consistency model discouraged work on the field for a while.

Types of models

Memory models can be classified in a spectrum between weak and strict. The dominant computing platform (x86) features a very strict memory model: that’s probably why the field isn’t so popular. Alpha³ and Power are examples of old models that are not intuitive at all.

ARMv7 featured a relaxed model, but it has become stricter (on purpose) after careful consideration from their architects. In my opinion, people from ARM have pushed the field forward by an unquantifiable amount.

Academic research

It is my impression that ARM’s popularity is what revived the field. Absolutely impressive work happened in academia. Luc Maranget from INRIA is a very important author. Jade Alglave is another important figure. Jade’s thesis [3] is a very interesting piece of work.

Hackers gonna hack

In parallel to academic research, a lot of effort happened in the “hacker” side. Most notably, the Linux kernel people developed their in-house memory model (LKMM), tools and documentation for reasoning about concurrent code. It’s nothing short of impressive.

Paul McKenney (who hopefully doesn’t need much of an introduction) is perhaps the most important figure from the “non-academic” standpoint⁴. He is a very important Linux contributor. He and his great collaborators have produced a magnific book [4]. Linux kernel sources also contain a lot of documentation worth reading.

Will Deacon is a another relevant actor. Uh-oh, It’s I/O Ordering! is worth watching.

JF Bastien has also spent some of his time working on related topics (more on the compiler and language side). He isn’t strictly speaking a “memory model” person, but he is definitely worth checking out. He’s an incredible speaker and a very smart person. CppCon 2016: JF Bastien “No Sane Compiler Would Optimize Atomics" is worth watching.

Tools

Tools are absolutely fundamental in order to work with memory models. Memory models have been formalized for a reason (they used to be specified with a few examples and with textual descriptions), let’s take advantage of it.

herdtools is perhaps the most important tool suite. You can play with herd7 online: arm hosts it, INRIA does too. These tools are not that straightforward to use, but they are highly effective once you are familiar with them. They produce great diagrams that help you reason about code. RISC-V’s unprivileged specification is a great example of this (check Appendix A: RVWMO Explanatory Material). At some point I plan on writing about how I first used them for an assignment

Another set of tools that are very important are sanitizers. Their usage is very widespread, and they are available from GCC and Clang. There are sanitizers specifically designed to identify multithreading issues (TSAN), similar tools exists for kernel development (KCSAN).

Resources

[1] V. Nagarajan, D. J. Sorin, M. D. Hill, D. A. Wood, and N. E. Jerger, A Primer on Memory Consistency and Cache Coherence, 2nd ed. Morgan & Claypool Publishers, 2020

[2] K. Gharachorloo, “Memory Consistency Models for Shared-Memory Multiprocessors,” Stanford University, Stanford, CA, USA, 1995.

[3] J. Alglave, “A Shared Memory Poetics,” 2010. Available: http://www0.cs.ucl.ac.uk/staff/j.alglave/these.pdf

[4] P. E. McKenney, “Is Parallel Programming Hard, And, If So, What Can You Do About It? Available: https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

[5] Who’s afraid of a big bad optimizing compiler? https://lwn.net/Articles/793253/

[6] Daniel Lustig, Sameer Sahasrabuddhe, and Olivier Giroux. 2019. A Formal Analysis of the NVIDIA PTX Memory Consistency Model. https://doi.org/10.1145/3297858.3304043

[7] Daniel Lustig. RISC-V’s Memory model presentation https://youtu.be/QkbWgCSAEoo

[8] “Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8” https://www.cl.cam.ac.uk/~pes20/armv8-mca/

If you are reading this, perhaps you will enjoy my post on how the memory model affects the instruction decoder of a CPU core. Note that it assumes familiarity and goes straight to the point.

We could differentiate between core designers and memory hierarchy designers: coherence protocol, etc… ↩︎
I might be wrong. This is only my impression looking back at the literature. ↩︎
Alpha’s memory model was so interesting that it is still being discussed despite the architecture being dead. ↩︎
Note that Paul has collaborated a lot with academics but he comes more from a “hands-on” approach. ↩︎