Lecture 2: More Racket, Calculator Lang

Key concepts and goals for today:

  • Finish up basics of Racket that we will use throughout the class: lists, structs, first-class functions
  • Implement our first programming language: calc
    • Understand abstract syntax surface syntax
    • Be able to draw a parse tree
    • Understand host semantics

Lists

  • Lists are built out of two constructors:
    • '(), the empty list value
    • (cons hd tl), the list constructor that concatenates hd to the list tl
  • For example, we can construct a list of elements 1, 2, 3 by applying cons three times:
> (cons 1 (cons 2 (cons 3 '())))
'(1 2 3)
  • Note the syntax '(1 2 3), which is read “quote one two three”. This is how Racket renders lists; more on that later
  • It is tedious to type cons all the type so there are a number of short-hand ways to describe lists in Racket:
> (list 1 2 3)
'(1 2 3)
> '(1 2 3)
'(1 2 3)
  • There are a number of useful built-in functions for lists; you can see a full list here
  • Here are some examples of some useful ones:
> (define my-list '(1 2 3))
> (empty? my-list)
#f
> (length my-list)
3
  • Now that we’ve built lists, we need a way of destructing them. To do this, we will use the built-in match function:
> (define my-list '(1 2 3))
> (match my-list 
   ['() "empty!]
   [(cons hd tl) "not empty!])
  • Now we can define some interesting functions involving lists! Here is one that sums all of the elements of a list:
; sum-list: int list -> int
; returns the sum of all elements in the list
(define (sum-list l)
  (match l
    ['() 0]
    [(cons hd tl) (+ hd (sum-list tl))]))

(check-equal? (sum-list '()) 0)
(check-equal? (sum-list '(1 2 3)) 6)

User-defined data-types

  • Most interesting programs implement their own custom data-types.
  • An example you will see in the homework is a binary tree. We can build binary trees in Racket as follows:
;;; type tree =
;;;   | node of tree * tree
;;;   | leaf of number
(struct node (l r) #:transparent)
(struct leaf (x) #:transparent)
  • Note the structure of comment that we used to describe this tree type. You read this as: “A tree is a type that either:
    1. a node that is a pair of trees;
    2. A leaf that is a number.
  • We will see more examples of writing these kinds of comments.
  • The #:transparent syntax is boilerplate: it tells the DrRacket REPL that this struct can be printed. If you’re curious, see here
  • Now we can build binary trees:
> (leaf 10)
(leaf 10)
> (node (leaf 20) (leaf 30))
(node (leaf 20) (leaf 30))
  • To destruct your structs and manipulate them, you should use pattern matching:
> (define my-tree (node (leaf 10) (leaf 20)))
> (match my-tree
    [(leaf n) n]
    [(node l r) l])
(leaf 10)
  • Experiment with matching to get a feel for it
  • Here is the detailed documentation for pattern matching if that is helpful. There are many more examples.

Local variables

  • Local variables are declared with the built-in let function:
> (let [(x 10)] (+ x 20))
30
  • There are a few different syntactic forms of let that offer different conveniences while programming; we will introduce those later as-needed. If you are curious see this part of the reference

First-class functions

We saw last time that every Racket program is either:

  • A value, which is a number, Boolean, or string
  • A function call, which is written (func-name arg1 arg2 ... argn).

There is a third kind of Racket value that we will use quite often: \emph{functions}. We declare a function value as follows:

> (lambda (x) (+ 1 x))
#<procedure>

We will refer to these as $\lambda$-terms or $\lambda$-expressions. We can call a $\lambda$-term in the usual way we call functions in Racket:

> ((lambda (x) (+ x 1)) 5)
6
  • Functions can be passed as arguments to other functions, just like any other Racket value. For example, the following defines a function call-twice that takes an argument f and evaluates it twice on some initial argument k:
> (define (call-twice f k) (f (f k)))
> (call-twice (lambda (f) (* 2 f)) 2) ; computes (* 2 (* 2 2))
8
  • Functions can also be returned. Let’s make a function make-adder k that makes a function that adds k to whatever it is called with:
> (define (make-adder k) (lambda (x) (+ x k)))
> (define add-5 (make-adder 5))
> (add-5 10)
15
  • There are a few more Racket features we will use in this class, but not too many more. They will be explained as they are encountered.

Calculator Lang: your first programming language

Abstract syntax trees

  • Recall: A programming language consists of two components:
    • Syntax: text that describes programs
    • Semantics: the meaning of the program
  • There are many different kinds of syntax, even for simple operations like addition:
    • s-expressions, such as Racket: (+ 1 2)
    • infix, like in Python: 1 + 2
    • postfix, like in Forth: 1 2 +
  • All of these syntactic forms represent are essentially equivalent and represent the same operation: adding 1 and 2.
  • Our first step on our journey to defining the calculator language calc is to abstract our notion of syntax.
    • To keep things clear, we will typically refer to the textual version of syntax as surface syntax.
  • Definition: An abstract syntax tree (AST) is a tree-like data structure for representing syntax.
    • The internal nodes of the tree are called non-terminal nodes (or production nodes).
    • The leaf nodes of the tree are called terminal nodes.
  • The process of converting surface syntax into abstract syntax is called parsing.
  • Example: we want to parse all of the above 1 + 2 expressions into the same kind of AST structure like this, with internal node + and two terminal nodes 1 and 2:
  +
 / \
1   2
  • Example: ASTs are useful for disambiguating the orders of operations. For instance, the expression 2 * 3 + 4 can be unambiguously written as an AST:
    +
  /   \
  *   4
 / \
2   3
  • Note that ASTs don’t require the use of parenthesis to disambiguate the order of operations.

Syntax of calc

  • Now we can describe the syntax of our calculator language
  • Goal: Design a small programming language that can add and multiply numbers
  • Example surface-syntax programs in infix notation:
    1. 2
    2. 1 + 2
    3. 2 * (3 + 4)
  • We will represent the abstract syntax of these programs using the following AST datastructure in Racket:
;;; type expr =
;;;   | add of expr * expr
;;;   | mul of expr * expr
;;;   | num of number
(struct add (e1 e2) #:transparent)
(struct mul (e1 e2) #:transparent)
(struct num (n) #:transparent)
  • Now we can write example ASTs for each of the above programs:
    1. (num 2)
    2. (add (num 1) (num 2))
    3. (mul (num 2) (add (num 3) (num 4)))
  • We will be working directly with ASTs for now; we will return to the problem of parsing later.

Semantics of calc

  • The goal of semantics is to describe what programs mean
  • What does this program mean: (add (num 1) (num 2))?
    • Intuitively you might say it means “add 1 to 2”: the meaning of this program is to run the program to evaluate it to a particular value. This is called interpreting the program.
    • But, what do “add”, “1”, and “2” mean? is it binary addition? real-number addition?
  • We are left with a circularity problem: to assign a meaning to our program, we need to use some external language to define its meaning.
  • Definition: The host language is the language used to assign meaning to programs.
  • We will use Racket as our host language for calc: we will use Racket’s definition of numbers to interpret numbers, and Racket’s definition of addition to interpret add.
  • Semantics of calc:
    • (num n): evaluates to the Racket number n
    • (add e1 e2):
      1. evaluate e1 to a Racket number v1
      2. evaluate e2 to a Racket number v2
      3. return the Racket addition (+ v1 v2)
    • The semantics of mul is similar to add, except with Racket multiplication instead of addition.
  • We can implement the above semantics as a program called an interpreter:
;;; interp : expr -> number
;;; evaluates a calc expression to a number
(define (interp e)
  (match e
    [(add e1 e2) (+ (interp e1) (interp e2))]
    [(mul e1 e2) (* (interp e1) (interp e2))]
    [(num n) n]))

(check-equal? (interp (num 1)) 1)
(check-equal? (interp (add (num 10) (num 20))) 30)
(check-equal? (interp (mul (num 10) (num 20))) 200)
(check-equal? (interp (add (mul (num 1) (num 2)) (num 3))) 5)
  • This interpreter defines the semantics for calc: it gives a meaning to all calc programs in terms of Racket programs.
  • There are many other ways to have programmed this interpreter: we have chosen just one possible implementation
  • Ponder: What are the consequences of our choice of host language?
    • What if we had chosen C instead of Racket to implement our interpreter? What are some programs that would behave differently?

Parsing s-expressions

  • Now we will develop an improved surface syntax for calc that is easier to use than manually writing AST nodes
  • We will use s-expressions as the basis for our new surface syntax
  • Definition: An s-expression is either:
    1. A symbol, written 'symbol (read “quote symbol”), 'a
    2. A list, written '(a b c)
  • Racket’s syntax is based on s-expressions, which makes it very easy to work with. For example, we can easily generate an s-expressions for representing different calc programs:
> '(+ 1 2)
'(+ 1 2)
> '(* (+ 1 2) 3)
'(* (+ 1 2) 3)
  • Note that these s-expressions are not yet calc AST nodes!
  • To translate these s-expressions into calc AST, we need a parser:
;;; parse-sexpr: sexpr -> expr
;;; parsers an s-expression into an expression
;;;   this comment describes the surface-syntax of our language using Backus-Naur Form (BNF). 
;;;   we will discuss BNF a bit more later on once we have more complicated languages.
;;;   sexpr ::= (+ <sexpr> <sexpr>) | num | (* <sexpr> <sexpr>)
(define (parse-sexpr s)
  (match s
    [(list op s1 s2)
     (cond
       [(equal? op '+) (add (parse-sexpr s1) (parse-sexpr s2))]
       [(equal? op '*) (mul (parse-sexpr s1) (parse-sexpr s2))]
       [#t (error "parse error: invalid operation")])]
    [v (if (number? v) (num v) (error "parse error: not a number"))]))

(check-equal? (parse-sexpr '(+ 1 2)) (add (num 1) (num 2)))
(check-equal? (parse-sexpr '(+ (* 3 4) 2)) (add (mul (num 3) (num 4)) (num 2)))

;;; parse-and-run : sexpr -> number
;;; parses an s-expression into an expr and then runs it
(define (parse-and-run s)
  (interp (parse-sexpr s)))

(check-equal? (parse-and-run '(+ 1 2)) 3)
(check-equal? (parse-and-run '(+ (* 3 4) 2)) 14)

Next time

  • Conditionals (if-then-else)
  • Let language, scope, and substitution