3 Scope and Substitution
This lecture we will continue to grow our small language. Last time we ended with the if-then-else language with numbers. Today we will add Booleans and local variables to this language.
3.1 IfLang with Booleans
Recall the IfLang interpreter from last lecture. Let’s continue to grow this language by extending it with Booleans, and give the usual semantics for if with Booleans. Here is the abstract syntax:
> (define-type IfLang (numI [value : Number]) (boolI [value : Boolean]) (addI [e1 : IfLang] [e2 : IfLang]) (ifI [guard : IfLang] [thn : IfLang] [els : IfLang]))
Let’s give a semantics to this by a definitional interpreter. Definitional interpreters run programs to values. However, now we have two kinds of values – Booleans and numbers – instead of just one. So, let’s add a value datatype that captures the type of values that IfLang programs can run to:
> (define-type Value (numV [v : Number]) (boolV [v : Boolean]))
Then we can define an interpreter that runs programs to values. However, we have to be careful: there are now invalid operations that this program can perform, like adding a number and a Boolean. When these occur, our interpreter should raise a runtime error. Sometimes we will specify what kind of error should be raised; other times, we can rely on our metalanguage (Plait) to raise a runtime error. Here, we opt to let Plait raise a runtime error:
> (interp-ite : (IfLang -> Value))
> (define (interp-ite e) (type-case IfLang e [(numI n) (numV n)] [(boolI b) (boolV b)] [(addI e1 e2) (let ([v1 (numV-v (interp-ite e1))] [v2 (numV-v (interp-ite e2))]) (numV (+ v1 v2)))] [(ifI guard thn els) (if (boolV-v (interp-ite guard)) (interp-ite thn) (interp-ite els))]))
Notice how if we try to run ((addI (numI 10) (boolI 20))) we get a runtime error.
3.2 Let Bindings
Now let’s move onto adding local variables. Our goal is for local variables to behave somewhat like Plait’s local variables, so let’s see how they behave. In Plait, you can introduce a local variable using the let function, also called a let-binding:
> (let [(myvar 10)] (+ myvar 10)) - Number
20
First, some terminology. The first argument to let is a list of binding expressions that associates identifiers with values. The name of a variable is called an identifier; in the above, myvar is an identifier. We say the identifier myvar is bound to the value 10. The body is the second argument to the let function: in the above, (+ myvar 10) is the body.
What makes a local variable local is that it is not visible outside of the body of the let expression. For instance, the following proram will raise a runtime error if you try to run it:
(let [(myvar 10)] (+ myvar 10)) (+ myvar 20)
The scope of an identifier is the portion of the program where it is visible. In Plait, the scope of an identifier for a let binding is the body of the let expression.
3.2.1 Shadowing
What happens when you try to redefine a local variable that has already been defined? Let’s see:
> (let [(myvar 10)] (let [(myvar 20)] myvar)) - Number
20
What happpens is that the we prefer the inner-most binding in the abstract syntax tree: this is called lexical scope, since the scope of a name depends on its position in the abstract syntax tree. We say that the inner binding code{(myvar 20)} shadows the outer binding (myvar 10). Most (but not all!) languages you’ve used have lexical scope rules that determine the scope of an identifier.
3.3 Substitution
Now let’s move on to understanding how we might go about implementing local variables. First, let’s define an abstract syntax tree for LetLang, a small language with let expressions:
> (define-type LetExpr [numE (n : Number)] [addE (l : LetExpr) (r : LetExpr)] [varE (s : Symbol)] [letE (var : Symbol) (assignment : LetExpr) (body : LetExpr)])
Substitution replaces an identifier in one expression with another expresion; think of it like "find-and-replace". We will give an intuitive semantics for let that involves substituting an identifier for its assignment. Let’s step some LetExpr terms to see this. Assume that we are following the same rules for stepping addE that we saw last lecture, so we can focus on how to step letE.
First example:
(letE 'x (numE 10) (addE (varE 'x) (varE 'x))) -- substitute 'x with (enum 10) --> (addE (numE 10) (numE 10)) -- simplify addE --> 20
In the above, to simplify the let, we have substituted the identifier 'x for the expression (numE 10).
The choice of whether or not to simplify a let binding before substitution has to do with eager versus lazy evaluation. Most languages you use are eager, meaning that expressions are fully simplified to values before they are substituted. But, not all languages behave this way; for instance, Haskell is famously a call-by-need langauge that does not deploy eager evaluation before substitution. We will return to these alternative evaluation schemes in a few lectures.
(letE 'x (addE (numE 10) (numE 20)) (addE (varE 'x) (varE 'x))) --> (letE 'x (numE 30) (addE (varE 'x) (varE 'x))) --> (addE (numE 30) (numE 30)) --> 60
Next, we need to carefully consider how we step in the presence of shadowing so that our lexical scoping rules are respected by the runtime behavior of the program. Concretely, the following sequence of steps should occur:
(letE 'x (numE 10) (letE 'x (numE 20) (varE 'x))) --> (letE 'x (numE 20) (varE 'x)) --> 20
Notice that in the above, after the first step of simplification, we do not replace the innermost (varE 'x) with (numE 10). This is critical so that our LetLang programs follow the lexical scoping convention of referring to the inner-most binding.
3.3.1 Implementing Substitution
Now we are ready to implement a funtion called subst that performs substitution in a way that respects our scoping rules:
In class, we will spend quite a bit of time discussing the various design decisions in this function and how they affect the scoping rules of your program. You should make sure to understand how changing this function affects the scoping rules of your resulting interpreter.
> (subst : (Symbol LetExpr LetExpr -> LetExpr))
> (define (subst id assignment body) (type-case LetExpr body [(numE n) (numE n)] [(addE l r) (addE (subst id assignment l) (subst id assignment r))] [(varE s) (if (symbol=? s id) assignment (varE s))] [(letE innervar innerassignment innerbody) (letE innervar (subst id assignment innerassignment) (if (symbol=? innervar id) innerbody (subst id assignment innerbody)))]))
> (test (subst 'x (varE 'x) (numE 10)) (numE 10))
- Void
good (subst (quote x) (varE (quote x)) (numE 10)) at line 11
expected: (numE 10)
given: (numE 10)
> (test (subst 'x (numE 10) (addE (varE 'x) (varE 'x))) (addE (numE 10) (numE 10)))
- Void
good (subst (quote x) (numE 10) (addE (varE (quote x)) (varE (quote x)))) at line 12
expected: (addE (numE 10) (numE 10))
given: (addE (numE 10) (numE 10))
> (test (subst 'x (numE 10) (letE 'x (numE 10) (varE 'x))) (letE 'x (numE 10) (varE 'x)))
- Void
good (subst (quote x) (numE 10) (letE (quote x) (numE 10) (varE (quote x)))) at line 13
expected: (letE 'x (numE 10) (varE 'x))
given: (letE 'x (numE 10) (varE 'x))
3.4 A Definitional LetLang Interpreter
Now that we have implemented substitution it is straightforward to implementa definitional interpreter that runs LetLang programs to values (i.e., numbers):
> (interp-let : (LetExpr -> Number))
> (define (interp-let e) (type-case LetExpr e [(numE n) n] [(addE l r) (+ (interp-let l) (interp-let r))] [(varE s) (error 'runtime "uninitialized")] [(letE id assignment body) (interp-let (subst id (numE (interp-let assignment)) body))]))
3.5 Dynamic Scope
A key property of our scoping rules so far is that they are static, meaning that the scope of an identifier does not depend on the runtime behavior of a program. This might seem like an obvious requirement – most languages you have used satisfy this requirement – but there are examples of languages where which variables are in-scope can depend on the runtime behavior of a program. This is an actual common Lisp program that prints out 5 5:
(defvar x 100) |
|
(defmethod fun1 (x) |
(print x) |
(fun2)) |
|
(defmethod fun2 () |
(print x)) |
|
(fun1 5) |
What is happening!? The defvar function introduces a special global variable x. Then, when fun1 is called, it introduces a variable x into scope, and binds it to the value 5 Then, it prints x, which has value 5 so the first 5 gets printed and seems normal. This is where things get really weird. Next, fun2 is called, which takes no arguments. It also prints x, which outputs 5, but we surely expect 100 to be printed!
This is because the scoping rules in Common Lisp are dynamic: once introduced, a variable never leaves scope, and hence variables always refers to the most recently declared identifier encountered while running the program! (Note: Common Lisp also has a local-like facility that supports lexical scope) Dynamic scope is quite unintuitive and almost certainly a bad design choice
A nice blogpost on scope for more reading: https://prl.khoury.northeastern.edu/blog/2019/09/05/lexical-and-dynamic-scope/