Lecture 8: State

Goals for today:

  • Build a language for handling state and references

Logistics:

  • Homework due Friday, quiz tomorrow morning
  • Recall parameters for quiz: please do not ask any quiz questions during office hours. Only as private Piazza questions. Work on quiz by yourself.

State and mutation

  • The λ-calculus has a very particular property: if you call a function with the same arguments, it will always return the same value. We say that functions in the lambda calculus a pure (this property is also sometimes called referential transparency).
  • Purity is a very nice property: it means that a function’s behavior is determined solely by its arguments, which makes it very easy to reason locally about the behavior of your code
  • This is not how a lot of programming languages work! Consider the following Python code:
x = 10
def inc(y):
  global x
  x = x + 1
  return x + y
  • What is the semantics of inc? Clearly it is not pure! We can call inc many times with the same argument and get different results every time:
>>> inc(0)
11
>>> inc(0)
12
>>> inc(0)
13
  • The semantics of inc involve mutation: the value of the global variable x changes on very call to the function
  • How do we “run this program by hand”? Intuitively, there must exist some global heap that keeps track of the current values that variables have. The value contained in the heap cell for x is mutated whenever x is incremented
# Initially heap has x = 10
>>> inc(0)    
11
# Now heap has x = 11
>>> inc(0)  
12
# Now heap has x = 12
>>> inc(0)
13
# Now heap has x = 13
  • Python has very complicated rules about state and mutation. Consider the following program:
def addone(x):
    x = x + 1
    return x
  • What is the semantics of addone(10)? Let’s see:
>>> y = 10
>>> addone(y)
11
>>> addone(y)
11
>>> addone(y)
11
  • Somewhat surprisingly, the value y never changes. In Python, numeric types like integers are passed to functions by value: the addone function is called with the value 10, so it cannot mutate the y in the outer scope of the function call. So, the function addone is pure.
  • Python has different rules for passing different datatypes to functions. In particular, lists are passed by reference:
def app1(l):
  l.append(0)
  return l
  • Now we can call this function app1 and observe its behavior:
>>> l = []
>>> app1(l)
[0]
>>> app1(l)
[0, 0]
>>> app1(l)
[0, 0, 0]
  • The function app1 is impure: each time it is called with the same argument (the identifier l) it returns a different value.
  • You might ask: what happens if you want to pass an integer by reference in Python? In this situation, the most highly-upvoted answer on Stack Overflow is to put the integer in a list, see here: I hope we can all agree that this is not very elegant.
  • Why do we care about mutation? What does it buy us?
    • Runtime efficiency: many algorithms rely on mutation
    • Code clarity: some programs are easier to express using mutation than functionally
    • Hardware compatibility: the underlying assembly language that ultimately runs our code has finite memory and relies on mutation

State and mutation in Plait

  • So far we have been using only the pure subset of Plait (except for your homework exercise), but Plait has elegant built-in features for handling state and mutation
  • The syntax box e creates a boxed value on the heap and returns a location (or pointer) that points to that value:
> (box 10)
- (Boxof Number)
'#&10
  • We can’t do the usual things we do with numbers with boxed numbers:
> (+ (box 10) 1)
typecheck failed: Number vs. (Boxof Number) in:
  +
  box
  (box 10)
  • So, how do we get the value out of a box? We use unbox (which can be thought of as dereferencing the pointer):
> (unbox (box 10))
- Number
10
  • So far so good, but what about mutation? That’s the interesting part!
  • We can use set-box! to change the value in a box:
> (define  my-counter (box 0))
> (set-box! my-counter (+ (unbox my-counter) 1))
- Void
> my-counter
- (Boxof Number)
'#&1
  • Now, we can implement a pass-by-reference increment function in Plait (where we are using the begin construct to sequence expressions):
> (define (add-one-by-ref x)
    (begin 
     (set-box! x (+ (unbox x) 1))
     (unbox x)))
  • The function add-one-by-ref is impure: I can pass it the same argument (the identifier my-counter, which always holds the same location) many times and get a different result each time
> (define my-counter (box 0))
> (add-one-by-ref my-counter)
- Number
1
> (add-one-by-ref my-counter)
- Number
2

statelang: A tiny language for state

  • Continuing with the theme of this module, our goal now will be to implement a tiny language with state in order to better understand how state works, how it’s implemented, and explore the possible design space

  • We will use the following core syntax for statelang:

<e> ::= (let1 (x <e>) <e>)
       | <num>
       | <id>
       | (box <e>)
       | (ref <e>)
       | (set! <e> <e>)
  • Written as Plait syntax:
(define-type Exp
  (let1E (id : Symbol) (assignment : Exp) (body : Exp))
  (numE (n : Number))
  (unboxE (e : Exp))
  (varE (s : Symbol))
  (boxE (e : Exp))
  (setE (b : Exp) (v : Exp)))
  • Continuing with the usual process, now we want to give a semantics to statelang. To do that, let’s run some programs by hand and see how we want them to behave.
  • Let’s start with the simplest case, (box 10). How should we run this? It’s already tricky!
  • In English, its semantics should be: “allocate a new heap cell that contains the value 10, and evaluate to the location that points to that heap cell”
  • So, we will need to add a notion of a heap and locations to our semantics in order to run this by hand.
  • The heap will be a map from locations to values; the empty heap is denoted []
  • In order to run an expression, it must take a heap as input and produce a new heap as output
  • It looks like this:
(box 10), [] --> 0x0, [0x0 |-> 10]
  • The address 0x0 is a heap location (we write them using hexadecimal notation to distinguish them from numeric values); to allocate a new value on the heap, we need to get a fresh heap location to put that value into.
  • Then, unbox l looks a value up in the heap at a location l (and fail if that location is not found in the heap):
(unbox l), [l |-> v] --> v, [l |-> v]
  • Here is a more interesting example where we allocate multiple references. In this case, we will need to get a fresh location each time:
(let (x (box 1))
  (let (y (box 2)
    (unbox y)))), []
--> (let (y (box 2)) y), [0x0 |-> 1]
--> (unbox 0x1), [0x0 |-> 1, 0x1 |-> 2]
--> 2
  • Finally, the semantics of set! l v should simply assign v to the heap-cell at location l (and fail if the location is not in the heap):
(let (x (box 10)))
  (set! x 20), []
--> (set! 0x0 20) [0x0 |-> 10]
--> 0, [0x0 |-> 20]
  • Here there is an interesting design decision: we arbitrarily decided to make set! evaluate to the numeric value 0. This is just for simplicity; we could introduce another value to denote the result of evaluating set!

A statelang interpreter

  • Now we are ready to implement an interpreter for statelang
  • Our value type should contain both numeric values and locations, which will both be represented as numbers:
(define-type Value
  [numV (n : Number)]
  [locV (l : Number)])

The heap datatype

  • To represent the heap, we will use a datatype that keeps track of (1) what the next fresh address is, and (2) a map that maps addresses to values

(define-type-alias Heap (Number * (Hashof Number Value)))
(define mt-heap (pair 0 (hash '())))

(define (lookup-heap (l : Number) (h : Heap))
  (type-case (Optionof Value) (hash-ref (snd h) l)
             [(none) (error 'runtime "invalid location")]
             [(some v) v]))

(test (lookup-heap 0 (pair 1 (hash (list (pair 0 (numV 10)))))) (numV 10))
(test (lookup-heap 0 (pair 1 (hash (list (pair 0 (locV 10)))))) (locV 10))
(test/exn (lookup-heap 23 (pair 1 (hash (list (pair 0 (locV 10))))))
  "invalid location")

; (extend-heap h v) returns a pair (new-address, new-heap) where 
; new-heap is equal to h with [new-address |-> v]
(extend-heap : (Heap Value -> (Number * Heap)))
(define (extend-heap h value)
  (letrec [(cur-loc (fst h))
        (cur-heap (snd h))
        (new-heap (hash-set cur-heap cur-loc value))]
    (pair cur-loc (pair (+ 1 cur-loc) new-heap))))

(test (extend-heap (pair 1 (hash (list (pair 0 (numV 10))))) (locV 20))
  (pair 1 (pair 2 (hash (list (pair 0 (numV 10)) (pair 1 (locV 20)))))))

The statelang interpreter

  • Now we are ready to make our interpreter, which we will go through together in class:
; the store will store local variables (e.g. ones in let-binds)
; the heap will store boxed variables
(interp : (Exp Store Heap -> (Value * Heap)))
(define (interp e store heap)
  (type-case Exp e
             [(numE n) (pair (numV n) heap)]
             ; (1) evaluate assignment to (assgn-value, assgn-heap)
             ; (2) evaluate body with the extended store and heap
             [(let1E id assignment body)
              (letrec [(eval-assgn (interp assignment store heap))
                       (assgn-value (fst eval-assgn))
                       (assgn-heap (snd eval-assgn))
                       (new-store (extend-store store id assgn-value))]
                (interp body new-store assgn-heap))]

             ; (1) evaluate e to (value, e-heap)
             ; (2) extend e-heap with value
             ; (3) return the new location and new heap
             [(boxE e)
              (letrec [(eval-e (interp e store heap))
                       (value (fst eval-e))
                       (e-heap (snd eval-e))
                       (heap-insertion (extend-heap e-heap value))]
                (pair (locV (fst heap-insertion)) (snd heap-insertion)))]

             ; (1) run body to get (body-v, body-heap)
             ; (2) extract the location from body-v and look it up in body-heap
             ; (3) return (looked up value, body-heap)
             [(unboxE body)
              (letrec [(eval-body (interp body store heap))
                       (body-v (fst eval-body))
                       (body-heap (snd eval-body))]
                (type-case Value body-v
                           [(numV n) (error 'runtime "invalid value for unbox")]
                           [(locV l) (pair (lookup-heap l body-heap) body-heap)]))]

             ; (1) evaluate body to get (l, body-heap)
             ; (2) evaluate arg with body-heap to get (eval-v, eval-heap)
             ; (3) return (0, updated eval-heap with l |-> eval-v)
             [(setE body arg)
              (letrec [(eval-body (interp body store heap))
                       (body-v (fst eval-body))
                       (body-heap (snd eval-body))
                       (eval-arg (interp arg store body-heap))
                       (eval-v (fst eval-arg))
                       (eval-heap (snd eval-arg))]
                (type-case Value body-v
                           [(numV n) (error 'runtime "invalid value for set!")]
                           [(locV l) (pair (numV 0) (set-heap eval-heap l eval-v))]))]))


; let x = box 0 in unbox x
(test (interp (let1E 'x (boxE (numE 0)) (unboxE (varE 'x))) mt-env mt-heap)
  (values (numV 0) (values 1 (hash (list (pair 0 (numV 0)))))))

; let x = box 0 in let y = set! x 10 in unbox x
(test 
 (interp (let1E 'x (boxE (numE 0))
 (let1E 'y (setE (varE 'x) (numE 10))
        (unboxE (varE 'x)))) mt-env mt-heap)
  (values (numV 10) (values 1 (hash (list (pair 0 (numV 10)))))) )

; let x = box 0 in let y = box 1 in unbox x
(test (interp 
  (let1E 'x (boxE (numE 0))
  (let1E 'y (boxE (numE 1))
  (unboxE (varE 'x)))) mt-env mt-heap)
  (values (numV 0) (values 2 (hash (list (pair 0 (numV 0)) (pair 1 (numV 1)))))))

; let x = box 0 in let y = box 1 in let z = set! y 10 in unbox x
(test (interp 
  (let1E 'x (boxE (numE 0))
  (let1E 'y (boxE (numE 1))
  (let1E 'z (setE (varE 'y) (numE 10))
  (unboxE (varE 'x))))) mt-env mt-heap)
  (values (numV 0) (values 2 (hash (list (pair 0 (numV 0)) (pair 1 (numV 10)))))))

; let x = box 0 in let x = box 10 in unbox x
(test (interp 
  (let1E 'x (boxE (numE 0))
  (let1E 'x (boxE (numE 10))
  (unboxE (varE 'x)))) mt-env mt-heap)
  (values (numV 10) (values 2 (hash (list (pair 0 (numV 0)) (pair 1 (numV 10)))))))