Logistics:
- If you’re having trouble installing OCaml and/or don’t like using the online virtual environment, you can run a virtual machine by following these instructions
- Homework is due Wednesday, next homework will be released Wednesday and due the following Wednesday (March 27)
This week: making something that looks a lot like Python
Big question: how does language design affect what kinds of bugs you can have or programs you can write?

Dynamic typechecking and compiling to tiny assembly

So far we’ve been discussing static typechecking where types are determined without running the prgram
Some programming languages offer dynamic typechecking instead of or in addition to static typechecking, which checks that values are the correct type at runtime
An example is Python, which will determine at runtime whether or not you can perform an operation on two pieces of data:

>>> 1 + "two"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

How is dynamic typechecking implemented in practice? Let’s see!
Let’s consider the tiny calculator language:

type calc =
  | Num of int
  | AddInt of calc * calc
  | Bool of bool
  | And of calc * calc

In the semantics of calc, adding non-numbers or conjoining non-Booleans causes 'runtime error.
Now, let’s imagine we want to implement a compiler that compiles Calc programs into a tiny assembly language that does not have Booleans
- Terminology: A compiler translates programs written in a source language into a target language, typically in a way that preserves the semantis of the source language.
- This situation is just like in Python: Python works by compiling programs into a bytecode that looks a lot like this language we are about to make
- Sometimes we refer to compilation targets like asm as abstract machines: they look very similar to a von-Neumann-style CPU architecture
Here is the syntax of asm:

type asm =
  (* load location in register 0 into reg *)
  | Load of { reg: int; addr: int }
  (* store register[reg] into heap[addr] *)
  | Store of { reg : int; addr: int }
  (* set register[reg] = value *)
  | Setreg of { reg: int; value: int }
  (* trap: gives a runtime error if register[0] != v *)
  | Trap of int
  (* set register[0] = register[1] + register[2] *)
  | AddInt
  (* set register[0] = register[1] * register[2] *)
  | MulInt
  (* terminate *)
  | Ret

asm programs look radically different from any programs we’ve seen so far: they are a list of asm instructions that get executed in sequence.
asm is a tiny idealized subset of a typical x86 assembly language; this is fairly close to what your CPU actually runs when it is executing code
The semantics of TinyAsm consists of three components:
- Registers, which act as working memory and facilitate operations like addition.
- Heap, which maps addresses to numbers.
- Instruction counter, which tracks which instruction to execute next.
These components are wrapped up in a record:

type state = {
  reg: int array;
  heap: int array;
  program: asm array;
  (* instruction counter *)
  insn: int ref
}

The array type is a mutable array, see documentation; we will see some examples of how to use it
A asm program runs each instruction until it encounters Ret, at which point it returns the contents in register 0
All memory and registers are intialized to -1
Let’s see some examples of running asm programs to get a feel for how they look
Let’s consider running the following tniy program that adds 1 and 2 and returns the result:

[Setreg { reg=1; value=1 };
 Setreg { reg=2; value=2 };
 AddInt;
 Ret]

We can run this program as follows:

instruction pointer
|   empty reg   
|   |   initial heap (all -1, not shown) 
v   v   v
0, [], []
--> 1, [1 ↦ 1], []
--> 2, [1 ↦ 1, 2 ↦ 2], []
--> 3, [0 ↦ 3, 1 ↦ 1, 2 ↦ 2], []
--> 3

Now let’s run an example program that manipulates the heap:

[Setreg { reg=1; value=1 };
 Store { reg = 1; addr=4 };  (* store register 1 into heap[0x4] *)
 Load { reg=0; addr=4 };     (* load address 0x4 into register 0 *)
 Ret]

0, [], []
--> 1, [1 ↦ 1], [0x4 ↦ 1]
--> 2, [1 ↦ 1], [0x4 ↦ 1]
--> 3, [1 ↦ 1, 0 ↦ 1], [0x4 ↦ 1]
--> 

Notice something: Assuming that you have infinite memory, there is no way to cause a runtime error in asm programs according to these semantics
Now, we can implement an interpreter that implements these semantics:

let rec interp_insn (state:state) : unit =
  match Array.get state.program (!(state.insn)) with
  | Load { reg=r; addr=addr} ->
    Array.set state.reg r (Array.get state.heap addr);
    state.insn := !(state.insn) + 1;
    interp_insn state
  | Store { reg=r; addr=addr } ->
    let v = Array.get state.reg r in
    Array.set state.heap addr v;
    state.insn := !(state.insn) + 1;
    interp_insn state
  | Setreg { reg=r; value=v } ->
    Array.set state.reg r v;
    state.insn := !(state.insn) + 1;
    interp_insn state
  | Trap(i) ->
    if (Array.get state.reg 0) = i then () else raise Trap;
    state.insn := !(state.insn) + 1;
    interp_insn state
  | AddInt ->
    let v = (Array.get state.reg 1) + (Array.get state.reg 2) in
    Array.set state.reg 0 v;
    state.insn := !(state.insn) + 1;
    interp_insn state
  | MulInt ->
    let v = (Array.get state.reg 1) * (Array.get state.reg 2) in
    Array.set state.reg 0 v;
    state.insn := !(state.insn) + 1;
    interp_insn state
  | Ret -> ()

Compiling unsafely

Now, how do we compile calc to asm?
First, we need a way to interpret calc values as numbers
- For numbers, it’s easy: they map to themselves. For Booleans, let’s treat true as 1 and false as 0
In this compiler, we won’t be checking to see if valid calculator operations are being performed: we will simply generate assembly and hope for the best!
Our goal is to make a function unsafe_calc_to_asm that generates an asm program with “identical semantics” to the input calc program
So, how do we start? There are many ways to achieve this, so here is one proposal: we store the result of evaluating each sub-expression in an address
- Note: Our goal here is not efficiency. We may perform many unnecessary compuatations here.
Example:

unsafe_calc_to_asm_prog (Num(10));;
- : asm list =
[Setreg {reg = 0; value = 10};  (* set reg[0] = 10 *)
 Store {reg = 0; addr = 0};     (* store result of evaluating Num(10) in address 0x0 *)
 Load {reg = 0; addr = 0};      (* load result into address 0 *)
 Ret] 

(* a more interesting example: adding two numbers *)
> unsafe_calc_to_asm_prog (AddInt(Num(10), Num(20)));;
- : asm list =
[ (* first, load 10 into address 0x0 *)
  Setreg {reg = 0; value = 10}; 
  Store {reg = 0; addr = 0};
  (* then, load 20 into address 0x1 *)
  Setreg {reg = 0; value = 20}; 
  Store {reg = 0; addr = 1};
  (* perform addition *)
  Load {reg = 1; addr = 0}; 
  Load {reg = 2; addr = 1}; 
  AddInt;
  (* store result in 0x2 *)
  Store {reg = 0; addr = 2}; 
  (* load result of addition and return it*)
  Load {reg = 0; addr = 2}; 
  Ret]

How do we handle conjunction? simple: we multiply the two arguments in asm
How do we implement this?
An implementation:

(**
   compiles calculator lang to TinyAsm

   returns a pair (Listof TinyAsm, Number) where the second component is the
   address that holds the result.
*)
let rec unsafe_calc_to_asm (counter: int ref) (c: calc) : (int * asm list) =
  match c with
  | Num(n) ->
    (* store n in a fresh location *)
    let new_loc = fresh counter in
    (new_loc, [Setreg {reg=0; value=n};
               Store {reg=0; addr=new_loc}])
  | Bool(b) ->
    (* store n in a fresh location *)
    let new_loc = fresh counter in
    (new_loc, [Setreg {reg=0; value=if b then 1 else 0};
               Store {reg=0; addr=new_loc}])
  | AddInt(e1, e2) ->
    let (addr1, prog1) = unsafe_calc_to_asm counter e1 in
    let (addr2, prog2) = unsafe_calc_to_asm counter e2 in
    let result_addr = fresh counter in
    let new_prog = [Load { reg=1; addr=addr1 };
                    Load { reg=2; addr=addr2 };
                    AddInt;
                    Store {reg=0; addr=result_addr}] in
    (result_addr, List.concat[prog1; prog2; new_prog])
  | And(e1, e2) ->
    let (addr1, prog1) = unsafe_calc_to_asm counter e1 in
    let (addr2, prog2) = unsafe_calc_to_asm counter e2 in
    let result_addr = fresh counter in
    let new_prog = [Load { reg=1; addr=addr1 };
                    Load { reg=2; addr=addr2 };
                    MulInt;
                    Store {reg=0; addr=result_addr}] in
    (result_addr, List.concat[prog1; prog2; new_prog])

let unsafe_calc_to_asm_prog (c:calc) : asm list =
  let (addr, asm) = unsafe_calc_to_asm (ref 0) c in
  let asm = List.concat [asm; [Load{reg = 0; addr=addr}; Ret]] in
  asm

Why do we call this unsafe? It will happily run bad programs without warning us! This is bad for two reasons:
1. It violates the original semantics of calc programs, which are supposed to error when invalid operations are performed.
2. It is difficult to debug and diagnose because we are not explicitly warned when errors occur; errors can propagate to strange faraway places in code and be difficult to localize or, even worse, go undetected.
Concretely, we can add Booleans and integers without issue in this unsafe compiler:

> run_calc_unsafe (AddInt(Num(10), Bool(true)));;
- : calcv = VNum 11

Compiling safely

The principle of safety says that we should raise an error when it occurrs rather than silently continue to run the program.
- For instance, in C/C++, you can derefernce an illegal pointer into uninitialized memory. This is unsafe behavior because no runtime error is raised.
- See here for a great blog post on this topic
The principle of dynamic safety is that the system should detect and fail when bad behavior is performed rather than continue to run and produce garbage
- Think of static safety as ensuring nothing bad happens without running the program, and dynamic safety ensuring nothing bad happens at runtime by raising an explicit error whenever it detects it.
Similar to Python, we should perform runtime checks to see if values are being used in illegal ways. If they are, we should raise an explicit error.
The key idea of enforcing dynamic safety during calc compilation is to insert a tag alongside each piece of data that is stored in the heap that holds the type of that data
- This tag is typically an arbitrarily chosen number, one for each type. For instance, we will use the tag 1924 for Booleans and 9418 for integers.
- Then, during every read that expects a particular value, the tag is checked to ensure that data is not misused.
- We will make use of the Trap nstruction in asm, which raises a runtime error if the value in register 0 is not equal to v
Example:

> safe_calc_to_asm_prog (Num(10));;
- : asm list = [
  (* store 10 in address 0x0 and the integer tag in 0x1 *)
  Setreg {reg = 0; value = 10}; 
  Setreg {reg = 1; value = 9418};
  Store {reg = 0; addr = 0}; 
  Store {reg = 1; addr = 1};
  Load {reg = 0; addr = 0}; 
  Ret]

(* a more interesting example: adding two integers *)
safe_calc_to_asm_prog (AddInt(Num(10), Bool(true)));;
- : asm list =
[
  (* store 10 and true (interpreted as 1) onto the heap, along with their tags *)
  Setreg {reg = 0; value = 10}; 
  Setreg {reg = 1; value = 9418};
  Store {reg = 0; addr = 0}; 
  Store {reg = 1; addr = 1};
  Setreg {reg = 0; value = 1};
  Setreg {reg = 1; value = 1924};
  Store {reg = 0; addr = 2}; 
  Store {reg = 1; addr = 3};
  (* at this point, the heap looks like: 
    [10; 9418; 1; 1924; -1; -1; ...]
    Now, perform addition, which is the same as before except tags are checked to 
    be the integer tag: *)
  Load {reg = 0; addr = 1}; 
  Trap 9418; 
  Load {reg = 0; addr = 3}; 
  Trap 9418;
  Load {reg = 1; addr = 0}; 
  Load {reg = 2; addr = 2}; 
  AddInt;
  (* finally, store the result in 0x4 along with an integer tag in 0x5 *)
  Store {reg = 0; addr = 4}; 
  Setreg {reg = 1; value = 9418};
  Store {reg = 1; addr = 5}; 
  Load {reg = 0; addr = 4}; 
  Ret]

An implementation:

(**
   compiles calculator lang to TinyAsm

   returns a pair (Listof TinyAsm, Number) where the second component is the
   address that holds the result.
*)
let rec safe_calc_to_asm (counter: int ref) (c: calc) : (int * asm list) =
  match c with
  | Num(n) ->
    (* store n in a fresh location *)
    let addr_v = fresh counter in
    let addr_tag = fresh counter in
    (addr_v, [Setreg {reg=0; value=n};
              Setreg {reg=1; value=int_tag};
              Store {reg=0; addr=addr_v};
              Store {reg=1; addr=addr_tag}])
  | Bool(b) ->
    (* store n in a fresh location *)
    let addr_v = fresh counter in
    let addr_tag = fresh counter in
    (addr_v, [Setreg {reg=0; value=if b then 1 else 0};
              Setreg {reg=1; value=bool_tag};
              Store {reg=0; addr=addr_v};
              Store {reg=1; addr=addr_tag}])
  | AddInt(e1, e2) ->
    let (addr1, prog1) = safe_calc_to_asm counter e1 in
    let (addr2, prog2) = safe_calc_to_asm counter e2 in
    let result_addr = fresh counter in
    let tag_addr = fresh counter in
    let new_prog = [
      (* add a check to ensure e1 and e2 have the correct tag *)
      Load {reg = 0; addr = addr1 + 1};
      Trap(int_tag);
      Load {reg = 0; addr = addr2 + 1};
      Trap(int_tag);
      (* now load the desired values in and add them *)
      Load { reg=1; addr=addr1 };
      Load { reg=2; addr=addr2 };
      AddInt;
      Store {reg=0; addr=result_addr};
      (* store the tag *)
      Setreg {reg = 1; value = int_tag};
      Store {reg = 1; addr=tag_addr};
    ] in
    (result_addr, List.concat[prog1; prog2; new_prog])
  | And(e1, e2) ->
    let (addr1, prog1) = safe_calc_to_asm counter e1 in
    let (addr2, prog2) = safe_calc_to_asm counter e2 in
    let result_addr = fresh counter in
    let tag_addr = fresh counter in
    let new_prog = [
      (* add a check to ensure e1 and e2 have the correct tag *)
      Load {reg = 0; addr = addr1 + 1};
      Trap(bool_tag);
      Load {reg = 0; addr = addr2 + 1};
      Trap(bool_tag);
      (* now load the desired values in and add them *)
      Load { reg=1; addr=addr1 };
      Load { reg=2; addr=addr2 };
      MulInt;
      Store {reg = 0; addr=result_addr};
      (* store the tag *)
      Setreg {reg = 1; value = bool_tag};
      Store {reg = 1; addr=tag_addr};
    ] in
    (result_addr, List.concat[prog1; prog2; new_prog])

let safe_calc_to_asm_prog (c:calc) : asm list =
  let (addr, asm) = safe_calc_to_asm (ref 0) c in
  (* add ret to the end *)
  List.concat [asm; [Load{reg = 0; addr=addr}; Ret]]

Now, if we run a program that tries to add Booleans and integers, we get a Trap exception:

> run_calc_safe (AddInt(Num(10), Bool(true)));;
Exception: Trap.