Logistics:

Quiz this upcoming Thursday released at 8AM
Homework due this Wednesday
We won’t be releasing homework over Spring Break

Goals for today:

See more examples of implementing common type-system features found in languages, including mutable references, pairs, and variants
See more examples of how to design type systems to prevent runtime errors. In particular, we will see examples of more interesting types that are made out of other types.

Typed references

Recall references in Plait, which we discussed in detail in Lecture 8:

> (define my-counter (box 10))

> my-counter
- (Boxof Number)
'#&10

> (unbox my-counter)
- Number
10

> (set-box! my-counter 25)
- Void

> (unbox my-counter)
- Number
25

Recall our untyped language for references, now extended with Booleans and Numbers:

; Language has: let bindings, variables, numbers, box and set!
; AtomicExp are expressions that cannot change the heap
(define-type AtomicExp
  (varE (id : Symbol))
  (boolE (b : Boolean))
  (numE (n : Number))
  (addE (l : AtomicExp) (r : AtomicExp))
  (andE (l : AtomicExp) (r : AtomicExp)))

; Exp are either AtomicExp or expressions that can change the heap
(define-type Exp
  (let1E (id : Symbol) (assignment : Exp) (body : Exp))
  (atom (a : AtomicExp))
  (unboxE (e : AtomicExp))
  (boxE (e : AtomicExp))
  (setE (b : AtomicExp) (v : AtomicExp)))

; values can be numbers, Booleans, or locations (addresses in the heap)
(define-type Value
  [numV (n : Number)]
  [boolV (b : Boolean)]
  [locV (l : Number)])

Recall the semantics for a heap-manipulating language, which we illustrate here by running a small program:

(let1E 'x (boxE (numE 10)) (unboxE (varE 'x))), {}            ; using {} notation to denote the empty heap
--> (let1E 'x (locV 0x0) (unboxE (varE 'x))), {0x0 |-> 10}    ; evaluate boxE to a fresh location, heap now maps 0x0 to 10
--> (unboxE (varE 'x))[x |-> (locV 0x0)], {0x0 |-> 10}        ; normal semantics for let
--> (unboxE (locV 0x0)), {0x0 |-> 10}                         ; normal semantics for substitution
--> 10                                                        ; lookup address 0x0 in the heap, get the value 10

The interpreter for this language is basically the same as our interpreter for references we covered in Lecture 8, so we will not cover it in detail here; see here for an implementation
For simplicity, we assume that the semantics of (setE l v) returns a number (0)
Some example programs:

; the interp function is of type (interp : (Exp Store Heap -> (Value * Heap)))
; the store is the usual environment in an environment-passing semantics: it maps variable names to Values
; a heap has type (Number * (Hashof Number Value)); it is a pair whose first component has the next 
;   fresh address
; the interpreter returns a pair (Value * Heap): the first component is what the program returns, and 
;   the second component is an updated heap (which may contain newly allocated values)
> (interp (let1E 'x (boxE (numE 0))
 (let1E 'y (setE (varE 'x) (numE 10))
        (unboxE (varE 'x)))) mt-env mt-heap)

- (Value * (Number * (Hashof Number Value)))
(values (numV 10) (values 1 (hash 0 (numV 10))))

Designing a type-system

Let’s design a type-system to prevent runtime errors in our above interpreter
How can we make our interpreter for language “go wrong”?
1. All the usual ways for STLC (plus expects both arguments to be numbers, etc.)
2. You can only unbox a boxed value:
3. You can use an unboxed value in a way that is inconsistent with the type of the value stored in the heap
4. When you update a boxed value, you cannot change its type (this one might be a bit surprising – we’ll see why we want this)
Let’s see some examples of causing our interpreter to have a runtime error:

; unboxing something that is not an address
> (interp (unboxE (numE 10)) mt-env mt-heap)
(runtime error)

What kinds of types should we have? Remember, types are collections of values, and there are 3 kinds of values (numbers, Booleans, and locations), so we have 3 types:

(define-type Type
  [TNum]
  [TBool]
  [TRef (t : Type)])

The interesting new typing rules involve references, which we will include here:

\[\dfrac{Γ ⊢ \texttt{e} : τ}{Γ ⊢ \texttt{box e} : \texttt{Ref}~τ}~\text{(T-Ref)} \quad \quad \dfrac{Γ ⊢ \texttt{e} : \texttt{Ref τ}}{Γ ⊢ \texttt{unbox e : τ}}~\text{(T-Unbox)} \quad\quad \dfrac{Γ ⊢ \texttt{e1} : \texttt{Ref τ} \quad\quad Γ ⊢ \texttt{e2} : \texttt{τ}}{Γ ⊢ \texttt{set e1 e2 : TNum}}~\text{(T-Set)}\]

Example typing derivation:

                                               {x↦Ref Num}(x) = Ref Num
-------------------- T-Num                 --------------------------------- T-Var
{} ⊢ (numE 10) : Num                      {x↦Ref Num} ⊢ (varE 'x) : Ref Num
------------------------------- T-Box     ---------------------------------------- T-Unbox
{} ⊢ (boxE (numE 10)) : Ref Num           {x↦Ref Num} ⊢ (unboxE (varE 'x)) : Num
----------------------------------------------------------------------------------- T-Let
{} ⊢ (let1E 'x (boxE (numE 10)) (unboxE (varE 'x))) : Num

The included typechecker in the code will derive this type for you:

> (type-of mt-env (let1E 'x (boxE (numE 0))
 (let1E 'y (setE (varE 'x) (numE 10))
        (unboxE (varE 'x)))))
- Type
(TNum)

What are the consequences of Requirement #4 above? Let’s see how Plait handles references to understand it a bit.
First, observe that Plait’s reference types do satisfy requirement #4:

> (define my-box (box 10))
> (set-box! my-box 'hello)
. typecheck failed: Number vs. Symbol in:
  set-box!
  my-box
  (quote hello)

There is nothing inherently broken about this program (i.e., it would not cause a runtime error in our interpreter for statelang); why does it reject it?
This goes back to the efficiency requirement on type-systems. Consider the following (hypothetical, ill-typed) Plait program:

> (define my-box (box 10))
> (if (very-complex-function) (set-box! my-box 'hello) (set-box my-box 25))

In this example, we cannot determine the type of my-box without determining whether or not very-complex-function evaluates to #t or #f. Determining whether or not a function evaluates to a particular value can be very expensive, so this would make typechecking very expensive.
So, this is an example of a design decision about which kinds of programs to reject: this type system for references will reject some valid programs for the sake of performance.

Product types

A pair is a value that packages up two values into a single value. These can be used to model structs in languages like C, or handle multiple return values from a function.
Pairs are constructed using the syntax (pair e1 e2), and pairs are destructed using the syntax (fst e) and (snd e), which get the first and second component of the pair respectively.
Let’s see an some examples of pairs in Plait again:

> (pair 1 2)
- (Number * Number)
(values 1 2)
> (fst (pair 1 'hello))
- Number
1
> (snd (pair 1 'hello))
- Symbol
'hello

Let’s build a tiny language for manipulating pairs:

(define-type Exp
  [let1E (x : Symbol) (e1 : Exp) (e2 : Exp)]
  [varE (x : Symbol)]
  [numE (n : Number)]
  [fstE (e : Exp)]
  [sndE (e : Exp)]
  [pairE (e1 : Exp) (e2 : Exp)])

(define-type Value
  [numV (n : Number)]
  [pairV (fst : Value) (snd : Value)])

(define-type-alias Env (Hashof Symbol Value))
(define mt-env (hash empty)) ;; "empty environment"

(define (lookup (n : Env) (s : Symbol))
  (type-case (Optionof Value) (hash-ref n s)
             [(none) (error 'runtime "not bound")]
             [(some v) v]))

(extend : (Env Symbol Value -> Env))
(define (extend old-env new-name value)
  (hash-set old-env new-name value))

(interp : (Env Exp -> Value))
(define (interp env e)
  (type-case Exp e
             [(let1E x e1 e2)
              (interp (extend env x (interp env e1)) e2)]
             [(varE s) (lookup env s)]
             [(numE n) (numV n)]
             [(fstE e1)
              (type-case Value (interp env e1)
                         [(pairV fst snd) fst]
                         [else (error 'runtime "invalid")])]
             [(sndE e1)
              (type-case Value (interp env e1)
                         [(pairV fst snd) snd]
                         [else (error 'runtime "invalid")])]
             [(pairE e1 e2) (pairV (interp env e1) (interp env e2))]
             ))

Making a typechecker for a language with pairs

What kinds of programs cause runtime errors for the above interpreter?

; calling `fst` or `snd` on a non-pair
> (interp mt-env (fstE (numE 10)))
- Value
. . runtime: invalid

; treating the outcome of `fst` or `snd` as the wrong type
> (interp mt-env (fstE (fstE (pairE (numE 10) (numE 20)))))
- Value
. . runtime: invalid

Let’s design a typechecker to prevent these runtime errors
What are our possible types? Looking at our values, there are two kinds of values:
- Numbers
- Pairs that combine two values; we will call this a product of two values
This tells us we should use the following types to capture these possible sets of values:

(define-type Type
  [TNum]
  [TProd (t1 : Type) (t2 : Type)])

A simple type system:

\[\dfrac{\texttt{Γ ⊢ e : T1 * T2}}{\texttt{Γ ⊢ fst e : T1}} \quad\quad \dfrac{\texttt{Γ ⊢ e : T1 * T2}}{\texttt{Γ ⊢ snd e : T2}} \quad\quad \dfrac{\texttt{Γ ⊢ e1 : T1} \quad\quad \texttt{Γ ⊢ e2 : T2}}{\texttt{Γ ⊢ (e1, e2) : T1 * T2}}\]