On this page:
20.1 What is Memory Management?
20.2 Mark and Sweep
20.3 A Taste of Static Garbage Collection
20.3.1 Move Semantics
20.3.2 Borrowing
8.15

20 Memory Management🔗

We continue on to a very important component of many programming languages: garbage collection. First we will define what garbage is in the context of our CESK machine. Then, we will explore the simple mark-and-sweep algorithm for finding and reclaiming garbage from the CESK machine. Finally, we will discuss the core concepts of more sophisticated garbage collection strategies found in realistic languages and discuss their design tradeoffs.

20.1 What is Memory Management?🔗

So far we have not explored languages that free memory. Obviously this will not do: computers do not have infinite RAM, so we need a way to reclaim allocated memory to use for different things later on.

Allocated memory that is no longer needed by the program is called garbage. There are three general strategies that programming languages deploy to manage garbage:

First we will discuss automated garbage collection, then touch on static garbage collection.

20.2 Mark and Sweep🔗

The core idea of automated garbage collection (GC) is to inspect the state of the program to dynamically locate which portions of memory are or are not garbage. For us, the state of the program is nicely represented by the state of the CESK machine. So, we can use the SafeRefLang to explore garbage collection:

(define-type SafeRefLang
  (letE [id : Symbol]
        [e1 : SafeRefLang]
        [e2 : SafeRefLang])
  (identE [id : Symbol])
  (boxE [e : SafeRefLang])
  (addE [e1 : SafeRefLang])
  (unboxE [e : SafeRefLang])
  (numE [n : Number])
  (locE [l : Number]))
 
(define-type Frame
  ;;; frame for (let x e1 e2) when e1 is not a value
  (let1F [E : Env] [id : Symbol] [e : RefLang])
  (boxF [E : Env])
  (unboxF [E : Env])
  ;;; frame for (addE e1 e2) when e1 is not a number
  (add1E [E : Env] [e2 : RefLang])
  ;;; frame for (addE n e2) when n is a number
  (add2E [E : Env] [n : Number]))
 
(define-type State
  (state [C : RefLang]
         [E : Env]
         [S : (Vectorof Number)]
         [K : (Listof Frame)]))

Garbage collectors are usually run in a fully-automated fashion: at heuristically-chosen points during the program’s evaluation they will trigger and automatically run. The simplest kind of garbage collection strategy is called mark and sweep. The idea is relatively simple and has two stages:

Implementing this also requires a few other small changes, which you can explore in the implementation here.

20.3 A Taste of Static Garbage Collection🔗

There is an alternative approach to memory management used in Rust that relies (essentially) on scope to determine when it is safe to collect memory. This may seem intuitive at first, but it is surprisingly tricky to get right!

Let’s see some small examples of Rust programs to get a feel for how it deals with memory management. Here is a tiny Rust program that implements one of our favorite examples, the calculator language:

// this #[derive(Debug)] mumbu-jumbo makes it so that a data structure can be

// easily printed in Rust. Ignore it for now.

#[derive(Debug)]

enum Calc {

  Num(i64),

  Add(Box<Calc>, Box<Calc>)

}

 

fn interp(c : Calc) -> i64 {

   match c {

      Calc::Num(n) => n,

      Calc::Add(l, r) => {

        let l_v = interp(*l);   // Rust uses the `*` operator to unbox / dereference a dynamically allocated value

        let r_v = interp(*r);

        return l_v + r_v

      }

   }

}

 

fn main() {

     let my_calc = Calc::Num(10);

     // we use the {:?} to trigger a debug print

     println!("simple interp: {:?}", interp(my_calc));

 

     let my_calc_2 = Calc::Add(Box::new(Calc::Num(10)), Box::new(Calc::Num(20)));

     println!("simple interp 2: {:?}", interp(my_calc_2));

}

This example shows that Rust has many of the features we’re familiar with in Plait, albeit with syntax that looks quite similar to C, C++, or Java. As usual we define a definitional interpreter by casing on the abstract syntax tree.

The above program compiles and runs with no problem. But, here’s a surprise: what if we modify the main function to call interp twice?

fn main() {

           let my_calc = Calc::Num(10);

           // we use the {:?} to trigger a debug print

           println!("simple interp: {:?}", interp(my_calc));

 

           // let's try printing the same thing again!

           println!("simple interp: {:?}", interp(my_calc));

 

           let my_calc_2 = Calc::Add(Box::new(Calc::Num(10)), Box::new(Calc::Num(20)));

           println!("simple interp 2: {:?}", interp(my_calc_2));

           }

 

Quite surprisingly, this raises a compile error!

   Compiling playground v0.0.1 (/playground)

error[E0382]: use of moved value: `my_calc`

  --> src/main.rs:26:44

   |

21 |     let my_calc = Calc::Num(10);

   |         ------- move occurs because `my_calc` has type `Calc`, which does not implement the `Copy` trait

22 |     // we use the {:?} to trigger a debug print

23 |     println!("simple interp: {:?}", interp(my_calc));

   |                                            ------- value moved here

...

26 |     println!("simple interp: {:?}", interp(my_calc));

   |                                            ^^^^^^^ value used here after move

   |

note: consider changing this parameter type in function `interp` to borrow instead if owning the value isn't necessary

  --> src/main.rs:9:15

   |

9  | fn interp(c : Calc) -> i64 {

   |    ------     ^^^^ this parameter takes ownership of the value

   |    |

   |    in this function

Rust provides a very interesting and informative error message here: it suggests that we "change the parameter in interp to borrow if owning the value isn’t necessary". What on earth does that mean? The answer has to do with how Rust performs automatic safe memory management without garbage collection.

How does Rust know when to collect heap-allocated values like those created by Box::new? There are three rules:

Notice that when a value is dropped in rust is a lexical property of its owner: this means it can be determined by the compiler.

Let’s see what this means by inspecting a smaller example. Consider the following snippet:

fn main() {

    let x = Box::new(10);

    println!("my boxed value: {}", *x); // prints 10

    // x is freed at the end of the function because it goes out of scope

}

In the above example x is the owner of the store-allocated boxed value 10.

20.3.1 Move Semantics🔗

Suppose we want to refer to the same data with more than 1 name. We need to do this all the time, for example when calling functions. What are our options?

Rust allows us to decide which of the above two strategies to use. Passing the responsibility to collect is called moving: you move the owner from one identifier to another. Once you move a value, you can no longer use it with the old name. This makes sense; the allocated memory may have been collected. See how the compiler raises an error for the following code, and think about where memory is allocated and freed:

fn my_func(y : Box<i64>) -> () {

    println!("gobble! {}", *y);

    // at this point y is freed

}

 

fn main() {

    let x = Box::new(10);

    // pass ownership of x to y in my_func

    my_func(x);

    println!("my boxed value: {}", *x);  // we tried to access x, but it was freed by `my_func`! This is a memory error.

}

If we try to run this program, Rust will complain and fail to compile with a very helpful message. Why did Rust complain? Let’s break down what happened:

Suppose we want to keep x around after calling my_func. There are two things we can do: (1) Make a copy of the heap-allocated value and pass that as an argument to my_func, or (2) Retain ownership of the location. The first approach is easy:

fn my_func(y : Box<i64>) -> () {

    println!("gobble! {}", *y);

    // at this point y is freed

}

 

fn main() {

    let x = Box::new(10);

    let x2 = Box::new(*x); // create a copy of the value

    // pass ownership of x to y in my_func

    my_func(x2);

    println!("my boxed value: {}", *x);

    // at this point x is freed

}

Implementing option (2) requires a new language feature that is unique to Rust called borrowing.

20.3.2 Borrowing🔗

Borrowing lets you preserve the original owner while having access to the value that it owns. How does it work? We need to change the signature for my_func and write our code slightly differently:

fn my_func(y : &i64) -> () {

    println!("gobble! {}", *y);

    // at this point y is *not freed* since `y` does not own the location

}

 

fn main() {

    let x = Box::new(10);

    // this function call **does not** transfer ownership: x still owns the location allocated by Box::new(10)

    my_func(&x);

    println!("my boxed value: {}", *x);

    // at this point x is freed

}

The type signature &i64 is a reference to an i64; think of it like a pointer. It can be dereferenced. However, passing a reference does not transfer ownership: this means you can create multiple references to the same value How do we know that the above code is safe? We know that the borrowed context does not outlive the owner’s scope. The Rust compiler performs fairly sophisticated analysis to enforce this liveness requirement

What is an example where a reference does not live long enough? Here is one, which if you’ve programmed in C or C++ will look familiar:

fn broken(x: int) -> &i64 {

    return &x;

}

This is an example of returning a reference to a local variable. Since x has type int, it is owned by this function call, meaning that it cannot be referred to outside of the scope of this function. The rust compiler will reject this program with the following error message:

this function's return type contains a borrowed value, but there is no value

for it to be borrowed from

So, you can see how static garbage collection requires a fair bit more effort than automated garbage collection, but is not error-prone like manual memory management is: the compiler will ensure that memory unsafe operations are prevented (at least in the portions of rust that do not use unsafe code; google that if you’re interested!).