The simply-typed λ-calculus (STLC)

Logistics: Next homework released, due next Wednesday
Goals for today:
- Design a type-checker for STLC
- Be able to draw a typing derivation for a term in STLC
Today we will continue our development of type systems by developing a type-system for the λ-calculus
Recall the syntax for the (untyped) λ-calculus with numbers:

e ::= (λx. e)
     | (e e)
     | x
     | num

Let’s recall our simple interpreter for the untyped λ-calculus (we’ve made a few minor change from last time; our interpreter now has type LExp -> LExp for simplicity):

(define-type LExp
  [varE (s : Symbol)]
  [numE (n : Number)]
  [lamE (arg : Symbol) (body : LExp)]
  [appE (e : LExp) (arg : LExp)])

; perform e1[x |-> e2]
(subst : (LExp Symbol LExp -> LExp))
(define (subst e1 x e2)
  (type-case LExp e1
    [(varE s) (if (symbol=? s x)
                  e2
                  (varE s))]
    [(numE n) (numE n)]
    [(lamE id body)
    (if (symbol=? x id)
      (lamE id body)              ; shadowing case
      (lamE id (subst body x e2)))]
    [(appE e1App e2App)
     (appE (subst e1App x e2)
           (subst e2App x e2))]))

(define (interp e)
  (type-case LExp e
    [(varE s) (error 'runtime "unbound symbol")]
    [(lamE id body) (lamE id body)]
    [(numE n) (numE n)]
    [(appE e1 e2)
     ; run e1 to get (lambda (id) body)
     ; run e2 to get a value argV
     ; run body[id |-> v]
     (letrec [(e1V (interp e1))
              (body (lamE-body e1V))
              (id (lamE-arg e1V))
              (argV (interp e2))]
       (interp (subst body id argV)))]))

What are the ways that this interpreter can “go wrong” (i.e., what are some terms that I can give it that cause a runtime error)?
Here are a few:

; refer to a variable that is not in scope
> (interp (varE 'x))
runtime: unbound symbol

; call a function with the wrong type of argument
; ((λ x. (x x)) (num 10))
> (interp (appE (lamE 'x (appE (varE 'x) (numE 10))) (numE 10)))
lamE-body: contract violation

Let’s design a type-system to eliminate these kinds of silly runtime errors

Syntax and type-checking STLC

First, what are the possible types we can have?
Recall:
- Types correspond to collections of values
- If a term runs to a value, then the type of the term is the type of that value
With this intuition, we can think about what types we should have. Let’s see some examples:
- (num 10): this is value of type NumberT
- (lamE 'x (numE 10)): already this is quite tricky! There are a few options here:
  - We could say all λ abstractions have type FunctionT, but this seems too imprecise: it would make two terms like (λ x. 10) and (λ x . (x x)) look identical
  - The type of 'x could be anything at all and this program seems valid; this seems like it’s tricky to handle.
To make our lives simpler, let’s consider simple types!
- Base types Number, or functions involving numbers like Number -> Number, (Number -> Number) -> (Number -> Number), etc…
To handle this situation, we will syntactically annotate the arguments of λ abstractions with their type: this way, we know which type of argument a function is expecting
- We will explore alternatives to this annotation strategy later
This leads us to the syntax of our simply-typed λ-calculus:

τ ::= Num | τ → τ 
e ::= (λx:τ. e)
     | (e e)
     | x
     | num

We can implement this type in Plait:

(define-type LType
  [NumT]
  [FunT (arg : LType) (body : LType)])

(define-type LExp
  [varE (s : Symbol)]
  [numE (n : Number)]
  [lamE (arg : Symbol) (typ : LType) (body : LExp)]
  [appE (e : LExp) (arg : LExp)])

Typing judgments for STLC

Let’s design some typing judgments to specify which terms are well-typed in the STLC
What is the type of this program: (λ x:Number. x)?
- It is a function that takes as input a number and returns a number, i.e. Number -> Number
What about this program: (λ x:Number -> Number. (x 10))
- A: (Number -> Number) -> Number
In English, what should the typing rules be?
- If e has type τ’, then (λ x:τ. e) has type τ → τ’
- If e1 has type τ → τ’ and e2 has type τ, then (e1 e2) has type τ’
With these intuitions, we can try to make a simple set of inference rules (note, these rules aren’t quite right yet):

\[\dfrac{}{\texttt{num : Number}}~\text{(T-Num)} \qquad \qquad \dfrac{\texttt{e : τ'}}{\texttt{λ x:τ. e : τ → τ'}}~\text{(T-Lambda)}\] \[\dfrac{\texttt{e1 : τ → τ'} \qquad \texttt{e2 : τ}}{\texttt{(e1 e2) : τ'}}~\text{(T-App)}\]

Let’s try to apply these rules to the term (λ x:Number . x) and see what happens:

\[\dfrac{???}{\texttt{(λ x:Number . x)}}\]

Oops, we get stuck right away! In order to typecheck x, we need to know its type. This brings us to the concept of a type environment. We want to apply the T-Num rule, but can’t, because it only applies to numbers.

The typing context

One way to fix the above issue is to extending our type rules with a type environment that tells us the type of an identifier. We’ll add some new notation to our inference rules for handling this.
We denote type environments as Γ: these associate identifiers with types
- Let {} be the empty type environment
- The notation “Γ ∪ {x ↦ τ}” denotes adding a new variable x with type τ to the context
- We denote looking up a variable’s type in the context as Γ(x)
- If Γ contains a type for x, it is the case that x ∈ Γ
Then, the notation “Γ ⊢ e : τ” says “context Γ proves that e has type τ”
- The symbol “⊢” is called the “turnstile”, and is read “proves”
- Things to the left of the turnstile are assumed true, and things to the right of the turnstile are things to be proven
- For more details, see here
Now, all our type judgments will contain context Γ. Let’s see how these look:

\[\dfrac{}{\texttt{Γ ⊢ num : Number}}~\text{(T-Num)} \qquad \dfrac{x ∈ Γ \qquad Γ(x) = τ}{\texttt{Γ ⊢ x : }τ}~\text{T-Var}\] \[\dfrac{\texttt{Γ ∪ \{x ↦ τ\} ⊢ e : τ'}} {Γ ⊢ \texttt{λ x:τ. e : τ → τ'}}~\text{(T-Lambda)} \qquad \qquad \dfrac{\texttt{Γ ⊢ e1 : τ → τ'} \qquad \texttt{Γ ⊢ e2 : τ}}{\texttt{Γ ⊢ (e1 e2) : τ'}}~\text{(T-App)}\]

Some example typing judgments

The best way to understand these rules is to use them to draw some derivation trees.
Let’s get a feel for how to use these rules by giving some typing derivations for STLC terms.
First, let’s start with our simple example above, (λ x : Number . x). In general, by default we want to show that this term is well-typed in the empty context:

  x ∈ {x ↦ Number}   {x ↦ Number}(x) = Number
  ------------------------------------------- (T-Var)
          {x ↦ Number} ⊢ x : Number
--------------------------------------------- (T-Lambda)
  {} ⊢ (λ x : Number . x) : Number -> Number

Great, that type-checked. Let’s see another example, this time involving application: let’s make a derivation tree for ((λ x: Number . x) 10):

  x ∈ {x ↦ Number}   {x ↦ Number}(x) = Number
  ------------------------------------------- (T-Var)
          {x ↦ Number} ⊢ x : Number
------------------------------------------ (T-Lambda)        ----------- (T-Num)
{} ⊢ (λ x: Number . x) : Number -> Number                    10 : Number
------------------------------------------------------------------------ (T-App)
{} ⊢ ((λ x: Number . x) 10) : Number

Implementing a type checker

Now we are ready to implement a typechecker. This looks a lot like the environment-passing semantics that we have been working with, and we will work through it in class:

(define-type-alias TEnv (Hashof Symbol LType))
(define mt-env (hash empty)) ;; "empty environment"

(define (lookup (n : TEnv) (s : Symbol))
  (type-case (Optionof LType) (hash-ref n s)
             [(none) (error 'type-error "unrecognized symbol")]
             [(some v) v]))

(extend : (TEnv Symbol LType -> TEnv))
(define (extend old-env new-name value)
  (hash-set old-env new-name value))


(define (type-of env e)
  (type-case LExp e
             [(varE s) (lookup env s)]
             [(numE n) (NumT)]
             [(lamE arg typ body)
              (FunT typ (type-of (extend env arg typ) body))]
             [(appE e1 e2)
              (let [(t-e1 (type-of env e1))
                    (t-e2 (type-of env e2))]
                (type-case LType t-e1
                           [(FunT tau1 tau2)
                            (if (equal? tau1 t-e2)
                                tau2
                                (error 'type-error "invalid function call"))]
                           [else (error 'type-error "invalid function call")]))]))

(test (interp (appE (lamE 'x (NumT) (varE 'x)) (numE 10))) (numE 10))
(test (type-of mt-env (lamE 'x (NumT) (varE 'x))) (FunT (NumT) (NumT)))
(test (type-of mt-env (appE (lamE 'x (NumT) (varE 'x)) (numE 10))) (NumT))

Attempting to typecheck Omega

Clearly, our type system eliminates obviously broken terms like calling a function with the wrong type.
However, it is worth pausing to ask: is it really the case that “well-typed terms do not go wrong” for the STLC?
Recall the Omega term, a term that ran forever in the untyped λ-calculus:

\[Ω = ((λ x . (x~x))~~(λ x . (x~x)))\]

Is this a well-typed term according to the rules of the untyped λ-calculus?
Well, it’s not syntactically valid yet: we have to provide a type annotation for $x$ in both lambda abstractions.
And there’s the problem: what type should we give to x? Well, it must be a function; we can try Number -> Number: ((λ x : Number->Number. (x x)) (λ x: Number->Number. (x x)))
We can try type-checking omega with these type annotations:

> (define om (lamE 'x (FunT (NumT) (NumT)) (appE (varE 'x) (varE 'x))))
> om
- LExp
(lamE 'x (FunT (NumT) (NumT)) (appE (varE 'x) (varE 'x)))
> (define omega (appE om om))
> omega
- LExp
(appE
 (lamE 'x (FunT (NumT) (NumT)) (appE (varE 'x) (varE 'x)))
 (lamE 'x (FunT (NumT) (NumT)) (appE (varE 'x) (varE 'x))))
> (type-of mt-env omega)
- LType
. . type-error: invalid function **call**

Oh no! we got a type error. What went wrong?
Let’s look at the type of (λ x : Number -> Number . (x x)). Does this typecheck?
No! Why? x is a function with type Number -> Number, and it is being called an argument of type Number -> Number.
It is not possible in the simply-typed λ-calculus to give a type to x here! Why?
- Suppose x has type T.
- Then, the T-App rule will require that x have type T but also type T -> T
- There are no types where T is equal to T -> T in the STLC.
Very surprising fact: all programs in STLC terminate. This is called normalization.