Logistics:

  • First assignment due tonight
  • Second assignment released tomorrow

Lecture 3: Conditionals and local variables

Key concepts and goals for today:

  • ite with Booleans
  • Scope and shadowing
  • Substitution and how it can be used to implement local variables
  • Natural semantics and inference rules

Conditionals: the ite language

  • Let’s revisit our ite interpreter from last time. This time, we will implement it using Booleans.
  • Here is the abstract syntax datatype for the ite language with Booleans:
;;; type expr =
;;;   | add of expr * expr
;;;   | mul of expr * expr
;;;   | num of number
;;;   | bool of bool
;;;   | ite of expr * expr * expr
(struct eadd (e1 e2) #:transparent)
(struct emul (e1 e2) #:transparent)
(struct enum (n) #:transparent)
(struct ebool (n) #:transparent)
(struct eite (guard thn els) #:transparent)
  • Note that we have an ebool struct: this denotes a Boolean constant
  • This language has two kinds of values: numbers and Booleans. We will represent these two kinds of values in a particular datatype:
;;; type value =
;;;    | vnum of number
;;;    | vbool of bool
(struct vbool (b) #:transparent)
(struct vnum (n) #:transparent)

;;; to-num : value -> number
;;; converts a value to a number or raises a runtime error
(define (to-num v)
  (match v
    [(vnum n) n]
    [_ (error "runtime")])) ; recall that the "_" case is a default case

;;; to-num : value -> bool
;;; converts a value to a bool or raises a runtime error
(define (to-bool v)
  (match v
    [(vbool b) b]
    [_ (error "runtime")]))
  • Now, we give a semantics to ite: each ite term evaluates to a value.
  • Semantics of ite:
    • enum n evaluates to vnum n
    • ebool b evaluates to vbool b
    • eadd e1 e2:
      1. Evaluate e1 to v1
      2. Evaluate e2 to v2
      3. If v1 and v2 are both numbers then evaluate eadd e1 e2 to (+ v1 v1). Otherwise, raise an error error.
    • emul e1 e2:
      1. Evaluate e1 to v1
      2. Evaluate e2 to v2
      3. If v1 and v2 are both numbers then evaluate emul e1 e2 to (* v1 v1). Otherwise, raise an error.
    • eite guard thn els:
      1. Evaluate guard to v. If v is not a vbool, then raise an error.
      2. If v is #t, then evaluate thn. Otherwise, evaluate els.
  • We can implement these semantics in the following interpreter:
;;; interp : expr -> value
;;; evaluates an expression to a value
(define (interp e)
  (match e
    [(eadd e1 e2)
     (let [(n1 (to-num (interp e1)))
           (n2 (to-num (interp e2)))]
       (vnum (+ n1 n2)))]
    [(emul e1 e2)
     (let [(n1 (to-num (interp e1)))
           (n2 (to-num (interp e2)))]
       (vnum (* n1 n2)))]
    [(ebool b) (vbool b)]
    [(eite guard thn els)
     (let [(vguard (to-bool (interp guard)))]
       (if vguard (interp thn) (interp els)))]
    [(enum n) (vnum n)]))

(check-equal? (interp (enum 1)) (vnum 1))
(check-equal? (interp (eadd (enum 10) (enum 20))) (vnum 30))
(check-equal? (interp (emul (enum 10) (enum 20))) (vnum 200))
(check-equal? (interp (eadd (emul (enum 1) (enum 2)) (enum 3))) (vnum 5))
(check-equal? (interp (eite (ebool #t) (enum 2) (enum 3))) (vnum 2))
(check-equal? (interp (eite (ebool #f) (enum 2) (enum 3))) (vnum 3))

Local variables and scope

  • Let’s continue growing our little language by adding another important feature: local variables
  • You’ve programmed with local variables before. For instance, in Python we can create a local variable:
> x = 5
> y = x + 10
> print(x + y)
20
  • Similarly, in Racket we create a local variable using the let syntax:
> (let [(x 10)] (+ x 20))
30
  • Terminology:
    • The name of a variable its identifier. In the above example, x is an identifier.
    • The expression associated with an identifier is the assignment. In the above example, 10 is the assignment to x.
    • Assigning the identifier x to its assignment is called a declaration.
    • If an identifier x is assigned to a particular value by some declaration, we say it is bound to that value
  • The thing that makes “local variables” local is that they are not accessible to the entire program. For instance, in the following Racket program, we see x is not visible outside of the let expression:
> (let [(x 10)] (+ x 20))
30
> x
x: undefined;
 cannot reference an identifier before its definition
  • Definition: The scope of a declaration is the portion of the program for which that declaration can be used.
    • In the above example, the scope of x is the sub-expression (+ x 20), which is called the body of the let expression.
  • There are a variety of rules for scope, and different languages have different rules: scoping rules are one of the key design decisions that distinguish different programming languages.
  • An important kind of scope is lexical (or static) scope, which says that the scope of a declaration can be determined without running the program. Most (but not all!) widely-used languages use lexical scope.
  • An important property of local variables is that there can be multiple declarations for the same identifier. For instance, this is a valid Racket program:
> (let [(x 10)]
    (let [(x 20)] x))
20
  • In the above program the inner-most declaration x is the one that takes precedence. This is a typical design choice in many programming languages, and can be summarized as “identifiers are always bound to their inner-most declaration”.
  • Definition: An outer declaration is called shadowed if there is some inner declaration that of that same identifier.

The let language

  • Now we want to extend our calculator language with the ability to introduce local variables
  • We will use similar scoping rules to Racket, and the following abstract syntax data structure:
;;; type expr =
;;;   | add of expr * expr
;;;   | mul of expr * expr
;;;   | num of number
;;;   | elet of string * expr * expr
(struct eadd (e1 e2) #:transparent)
(struct emul (e1 e2) #:transparent)
(struct enum (n) #:transparent)
(struct elet (id assignment body) #:transparent)
(struct eident (id) #:transparent)
  • The semantics of our let language again evaluates programs to numbers.
  • All the rules for the semantics are the same as calc except for the new terms elet and eident.
  • To give a semantics to elet we will introduce new idea: substitution
  • The goal of substitution is to replace an identifier with an expression while respecting scope. Think of it like “find and replace”: we want to find all instances of x in some expression body and replace it with a new expression assignment.
    • We denote this as body[x |-> assignment]
  • Now we can give a semantics of let in terms of substitution:
  • Semantics of let:
    • (elet id assignment body) evaluates to:
      1. evaluate assignment to v
      2. evaluate body[id |-> (enum v)] to v2
      3. return v2
    • (eident id) raises an error if evaluated
  • Now for the tricky part: how do we define substitution?
  • Different choices will result in different scoping rules
  • To achieve our goal of “identifiers are always bound to their inner-most declaration”, we will give our substitution function the following implementation:
;;; subst : expr -> string -> expr -> expr
;;; performs the substitution expr[id |-> e]
;;; i.e., substitutes e for id in expr
(define (subst expr id e)
  (match expr
    [(eadd e1 e2) (eadd (subst e1 id e)
                        (subst e2 id e))]
    [(emul e1 e2) (emul (subst e1 id e)
                        (subst e2 id e))]
    [(enum num) (enum num)]
    [(elet letid assignment body)
     (if (equal? letid id)
         (elet letid assignment body) ; shadowing case, do nothing
         (elet letid (subst assignment id e) (subst body id e))) ; not shadowing
     ]
    [(eident x)
     ;; if x = id, then we perform substitution. otherwise, do nothing
     (if (equal? id x) e (eident x))]
    ))
  • Now we are ready to implement and test our interpreter:
;;; interp : expr -> value
;;; evaluates an expression to a value
(define (interp expr)
  (match expr
    [(eadd e1 e2) (+ (interp e1) (interp e2))]
    [(emul e1 e2) (* (interp e1) (interp e2))]
    [(eident x) (error "runtime error: unbound identifier")]
    [(elet id binding body)
     (let* [(vbinding (enum (interp binding)))
            (substbody (subst body id vbinding))]
       (interp substbody))]
    [(enum n) n]))

(check-equal? (interp (enum 1)) 1)
(check-equal? (interp (eadd (enum 10) (enum 20))) 30)
(check-equal? (interp (emul (enum 10) (enum 20))) 200)
(check-equal? (interp (eadd (emul (enum 1) (enum 2)) (enum 3))) 5)

;;; check basic case
(check-equal? (interp (elet "x" (enum 2) (eident "x"))) 2)
;;; check shadowing
(check-equal?
 (interp (elet "x" (enum 2)
               (elet "x" (enum 3)
                     (eident "x")))) 3)
;;; check multiple bindings
(check-equal?
 (interp (elet "x" (enum 2)
               (elet "y" (enum 3)
                     (eadd (eident "x") (eident "y"))))) 5)

Static vs. Dynamic Scope

  • A key property of our scoping rules so far is that they are static
  • This might seem like an obvious requirement – most languages you have used satisfy this requirement – but there are examples of languages where which variables are in-scope can depend on the runtime behavior of a program
  • This is an actual common Lisp program that prints out 5 5:
(defvar x 100)

(defmethod fun1 (x)
  (print x)
  (fun2))

(defmethod fun2 ()
  (print x))

(fun1 5)
  • What is happening!?
    • The defvar command introduces a special global variable x.
    • Then, when fun1 is called, it introduces a variable x into scope, and binds it to the value 5
    • Then, it prints x, which has value 5 so the first 5 gets printed and seems normal.
    • This is where things get really weird. Next, fun2 is called, which takes no arguments. It also prints x, which outputs 5, but we surely expect 100 to be printed!
    • This is because the scoping rules in Common Lisp are dynamic: once introduced, a variable never leaves scope, and hence variables always refers to the most recently declared identifier encountered while running the program!
    • (Note: Common Lisp also has a local-like facility that supports lexical scope)
  • Dynamic scope is quite unintuitive and almost certainly a bad design choice
    • Ponder: what are some reasons why dynamic scope is undesirable?
    • Ponder: What properties of our evaluator ensure that our scope cannot be dynamic?
  • Historical note: early implementations of Python and JavaScript had dynamic scope, but few modern examples exist
  • A nice blogpost on scope for more reading: https://prl.khoury.northeastern.edu/blog/2019/09/05/lexical-and-dynamic-scope/