Lecture 2: More Racket, Calculator Lang

Key concepts and goals for today:

Finish up basics of Racket that we will use throughout the class: lists, structs, first-class functions
Implement our first programming language: calc
- Understand abstract syntax surface syntax
- Be able to draw a parse tree
- Understand host semantics

Lists

Lists are built out of two constructors:
- '(), the empty list value
- (cons hd tl), the list constructor that concatenates hd to the list tl
For example, we can construct a list of elements 1, 2, 3 by applying cons three times:

> (cons 1 (cons 2 (cons 3 '())))
'(1 2 3)

Note the syntax '(1 2 3), which is read “quote one two three”. This is how Racket renders lists; more on that later
It is tedious to type cons all the type so there are a number of short-hand ways to describe lists in Racket:

> (list 1 2 3)
'(1 2 3)
> '(1 2 3)
'(1 2 3)

There are a number of useful built-in functions for lists; you can see a full list here
Here are some examples of some useful ones:

> (define my-list '(1 2 3))
> (empty? my-list)
#f
> (length my-list)
3

Now that we’ve built lists, we need a way of destructing them. To do this, we will use the built-in match function:

> (define my-list '(1 2 3))
> (match my-list 
   ['() "empty!]
   [(cons hd tl) "not empty!])

Now we can define some interesting functions involving lists! Here is one that sums all of the elements of a list:

; sum-list: int list -> int
; returns the sum of all elements in the list
(define (sum-list l)
  (match l
    ['() 0]
    [(cons hd tl) (+ hd (sum-list tl))]))

(check-equal? (sum-list '()) 0)
(check-equal? (sum-list '(1 2 3)) 6)

User-defined data-types

Most interesting programs implement their own custom data-types.
An example you will see in the homework is a binary tree. We can build binary trees in Racket as follows:

;;; type tree =
;;;   | node of tree * tree
;;;   | leaf of number
(struct node (l r) #:transparent)
(struct leaf (x) #:transparent)

Note the structure of comment that we used to describe this tree type. You read this as: “A tree is a type that either:
1. a node that is a pair of trees;
2. A leaf that is a number.
We will see more examples of writing these kinds of comments.
The #:transparent syntax is boilerplate: it tells the DrRacket REPL that this struct can be printed. If you’re curious, see here
Now we can build binary trees:

> (leaf 10)
(leaf 10)
> (node (leaf 20) (leaf 30))
(node (leaf 20) (leaf 30))

To destruct your structs and manipulate them, you should use pattern matching:

> (define my-tree (node (leaf 10) (leaf 20)))
> (match my-tree
    [(leaf n) n]
    [(node l r) l])
(leaf 10)

Experiment with matching to get a feel for it
Here is the detailed documentation for pattern matching if that is helpful. There are many more examples.

Local variables

Local variables are declared with the built-in let function:

> (let [(x 10)] (+ x 20))
30

There are a few different syntactic forms of let that offer different conveniences while programming; we will introduce those later as-needed. If you are curious see this part of the reference

First-class functions

We saw last time that every Racket program is either:

A value, which is a number, Boolean, or string
A function call, which is written (func-name arg1 arg2 ... argn).

There is a third kind of Racket value that we will use quite often: \emph{functions}. We declare a function value as follows:

> (lambda (x) (+ 1 x))
#<procedure>

We will refer to these as $\lambda$-terms or $\lambda$-expressions. We can call a $\lambda$-term in the usual way we call functions in Racket:

> ((lambda (x) (+ x 1)) 5)
6

Functions can be passed as arguments to other functions, just like any other Racket value. For example, the following defines a function call-twice that takes an argument f and evaluates it twice on some initial argument k:

> (define (call-twice f k) (f (f k)))
> (call-twice (lambda (f) (* 2 f)) 2) ; computes (* 2 (* 2 2))
8

Functions can also be returned. Let’s make a function make-adder k that makes a function that adds k to whatever it is called with:

> (define (make-adder k) (lambda (x) (+ x k)))
> (define add-5 (make-adder 5))
> (add-5 10)
15

There are a few more Racket features we will use in this class, but not too many more. They will be explained as they are encountered.

Calculator Lang: your first programming language

Abstract syntax trees

Recall: A programming language consists of two components:
- Syntax: text that describes programs
- Semantics: the meaning of the program
There are many different kinds of syntax, even for simple operations like addition:
- s-expressions, such as Racket: (+ 1 2)
- infix, like in Python: 1 + 2
- postfix, like in Forth: 1 2 +
All of these syntactic forms represent are essentially equivalent and represent the same operation: adding 1 and 2.
Our first step on our journey to defining the calculator language calc is to abstract our notion of syntax.
- To keep things clear, we will typically refer to the textual version of syntax as surface syntax.
Definition: An abstract syntax tree (AST) is a tree-like data structure for representing syntax.
- The internal nodes of the tree are called non-terminal nodes (or production nodes).
- The leaf nodes of the tree are called terminal nodes.
The process of converting surface syntax into abstract syntax is called parsing.
Example: we want to parse all of the above 1 + 2 expressions into the same kind of AST structure like this, with internal node + and two terminal nodes 1 and 2:

  +
 / \
1   2

Example: ASTs are useful for disambiguating the orders of operations. For instance, the expression 2 * 3 + 4 can be unambiguously written as an AST:

Note that ASTs don’t require the use of parenthesis to disambiguate the order of operations.

Syntax of `calc`

Now we can describe the syntax of our calculator language
Goal: Design a small programming language that can add and multiply numbers
Example surface-syntax programs in infix notation:
1. 2
2. 1 + 2
3. 2 * (3 + 4)
We will represent the abstract syntax of these programs using the following AST datastructure in Racket:

;;; type expr =
;;;   | add of expr * expr
;;;   | mul of expr * expr
;;;   | num of number
(struct add (e1 e2) #:transparent)
(struct mul (e1 e2) #:transparent)
(struct num (n) #:transparent)

Now we can write example ASTs for each of the above programs:
1. (num 2)
2. (add (num 1) (num 2))
3. (mul (num 2) (add (num 3) (num 4)))
We will be working directly with ASTs for now; we will return to the problem of parsing later.

Semantics of `calc`

The goal of semantics is to describe what programs mean
What does this program mean: (add (num 1) (num 2))?
- Intuitively you might say it means “add 1 to 2”: the meaning of this program is to run the program to evaluate it to a particular value. This is called interpreting the program.
- But, what do “add”, “1”, and “2” mean? is it binary addition? real-number addition?
We are left with a circularity problem: to assign a meaning to our program, we need to use some external language to define its meaning.
Definition: The host language is the language used to assign meaning to programs.
We will use Racket as our host language for calc: we will use Racket’s definition of numbers to interpret numbers, and Racket’s definition of addition to interpret add.
Semantics of calc:
- (num n): evaluates to the Racket number n
- (add e1 e2):
  1. evaluate e1 to a Racket number v1
  2. evaluate e2 to a Racket number v2
  3. return the Racket addition (+ v1 v2)
- The semantics of mul is similar to add, except with Racket multiplication instead of addition.
We can implement the above semantics as a program called an interpreter:

;;; interp : expr -> number
;;; evaluates a calc expression to a number
(define (interp e)
  (match e
    [(add e1 e2) (+ (interp e1) (interp e2))]
    [(mul e1 e2) (* (interp e1) (interp e2))]
    [(num n) n]))

(check-equal? (interp (num 1)) 1)
(check-equal? (interp (add (num 10) (num 20))) 30)
(check-equal? (interp (mul (num 10) (num 20))) 200)
(check-equal? (interp (add (mul (num 1) (num 2)) (num 3))) 5)

This interpreter defines the semantics for calc: it gives a meaning to all calc programs in terms of Racket programs.
There are many other ways to have programmed this interpreter: we have chosen just one possible implementation
Ponder: What are the consequences of our choice of host language?
- What if we had chosen C instead of Racket to implement our interpreter? What are some programs that would behave differently?

Parsing s-expressions

Now we will develop an improved surface syntax for calc that is easier to use than manually writing AST nodes
We will use s-expressions as the basis for our new surface syntax
Definition: An s-expression is either:
1. A symbol, written 'symbol (read “quote symbol”), 'a
2. A list, written '(a b c)
Racket’s syntax is based on s-expressions, which makes it very easy to work with. For example, we can easily generate an s-expressions for representing different calc programs:

> '(+ 1 2)
'(+ 1 2)
> '(* (+ 1 2) 3)
'(* (+ 1 2) 3)

Note that these s-expressions are not yet calc AST nodes!
To translate these s-expressions into calc AST, we need a parser:

;;; parse-sexpr: sexpr -> expr
;;; parsers an s-expression into an expression
;;;   this comment describes the surface-syntax of our language using Backus-Naur Form (BNF). 
;;;   we will discuss BNF a bit more later on once we have more complicated languages.
;;;   sexpr ::= (+ <sexpr> <sexpr>) | num | (* <sexpr> <sexpr>)
(define (parse-sexpr s)
  (match s
    [(list op s1 s2)
     (cond
       [(equal? op '+) (add (parse-sexpr s1) (parse-sexpr s2))]
       [(equal? op '*) (mul (parse-sexpr s1) (parse-sexpr s2))]
       [#t (error "parse error: invalid operation")])]
    [v (if (number? v) (num v) (error "parse error: not a number"))]))

(check-equal? (parse-sexpr '(+ 1 2)) (add (num 1) (num 2)))
(check-equal? (parse-sexpr '(+ (* 3 4) 2)) (add (mul (num 3) (num 4)) (num 2)))

;;; parse-and-run : sexpr -> number
;;; parses an s-expression into an expr and then runs it
(define (parse-and-run s)
  (interp (parse-sexpr s)))

(check-equal? (parse-and-run '(+ 1 2)) 3)
(check-equal? (parse-and-run '(+ (* 3 4) 2)) 14)

Next time

Conditionals (if-then-else)
Let language, scope, and substitution