- Logistics:
- If you’re having trouble installing OCaml and/or don’t like using the online virtual environment, you can run a virtual machine by following these instructions
- Homework is due Wednesday, next homework will be released Wednesday and due the following Wednesday (March 27)
- This week: making something that looks a lot like Python
- Big question: how does language design affect what kinds of bugs you can have or programs you can write?
Dynamic typechecking and compiling to tiny assembly
- So far we’ve been discussing static typechecking where types are determined without running the prgram
- Some programming languages offer dynamic typechecking instead of or in addition to static typechecking, which checks that values are the correct type at runtime
- An example is Python, which will determine at runtime whether or not you can perform an operation on two pieces of data:
>>> 1 + "two"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
- How is dynamic typechecking implemented in practice? Let’s see!
- Let’s consider the tiny calculator language:
type calc =
| Num of int
| AddInt of calc * calc
| Bool of bool
| And of calc * calc
- In the semantics of
calc
, adding non-numbers or conjoining non-Booleans causes'runtime
error. - Now, let’s imagine we want to implement a compiler that compiles
Calc
programs into a tiny assembly language that does not have Booleans- Terminology: A compiler translates programs written in a source language into a target language, typically in a way that preserves the semantis of the source language.
- This situation is just like in Python: Python works by compiling programs into a bytecode that looks a lot like this language we are about to make
- Sometimes we refer to compilation targets like
asm
as abstract machines: they look very similar to a von-Neumann-style CPU architecture
- Here is the syntax of
asm
:
type asm =
(* load location in register 0 into reg *)
| Load of { reg: int; addr: int }
(* store register[reg] into heap[addr] *)
| Store of { reg : int; addr: int }
(* set register[reg] = value *)
| Setreg of { reg: int; value: int }
(* trap: gives a runtime error if register[0] != v *)
| Trap of int
(* set register[0] = register[1] + register[2] *)
| AddInt
(* set register[0] = register[1] * register[2] *)
| MulInt
(* terminate *)
| Ret
asm
programs look radically different from any programs we’ve seen so far: they are a list ofasm
instructions that get executed in sequence.asm
is a tiny idealized subset of a typical x86 assembly language; this is fairly close to what your CPU actually runs when it is executing code- The semantics of
TinyAsm
consists of three components:- Registers, which act as working memory and facilitate operations like addition.
- Heap, which maps addresses to numbers.
- Instruction counter, which tracks which instruction to execute next.
- These components are wrapped up in a record:
type state = {
reg: int array;
heap: int array;
program: asm array;
(* instruction counter *)
insn: int ref
}
- The
array
type is a mutable array, see documentation; we will see some examples of how to use it - A
asm
program runs each instruction until it encountersRet
, at which point it returns the contents in register 0 - All memory and registers are intialized to -1
- Let’s see some examples of running
asm
programs to get a feel for how they look - Let’s consider running the following tniy program that adds 1 and 2 and returns the result:
[Setreg { reg=1; value=1 };
Setreg { reg=2; value=2 };
AddInt;
Ret]
- We can run this program as follows:
instruction pointer
| empty reg
| | initial heap (all -1, not shown)
v v v
0, [], []
--> 1, [1 ↦ 1], []
--> 2, [1 ↦ 1, 2 ↦ 2], []
--> 3, [0 ↦ 3, 1 ↦ 1, 2 ↦ 2], []
--> 3
- Now let’s run an example program that manipulates the heap:
[Setreg { reg=1; value=1 };
Store { reg = 1; addr=4 }; (* store register 1 into heap[0x4] *)
Load { reg=0; addr=4 }; (* load address 0x4 into register 0 *)
Ret]
0, [], []
--> 1, [1 ↦ 1], [0x4 ↦ 1]
--> 2, [1 ↦ 1], [0x4 ↦ 1]
--> 3, [1 ↦ 1, 0 ↦ 1], [0x4 ↦ 1]
-->
- Notice something: Assuming that you have infinite memory, there is no way to cause a runtime error in
asm
programs according to these semantics - Now, we can implement an interpreter that implements these semantics:
let rec interp_insn (state:state) : unit =
match Array.get state.program (!(state.insn)) with
| Load { reg=r; addr=addr} ->
Array.set state.reg r (Array.get state.heap addr);
state.insn := !(state.insn) + 1;
interp_insn state
| Store { reg=r; addr=addr } ->
let v = Array.get state.reg r in
Array.set state.heap addr v;
state.insn := !(state.insn) + 1;
interp_insn state
| Setreg { reg=r; value=v } ->
Array.set state.reg r v;
state.insn := !(state.insn) + 1;
interp_insn state
| Trap(i) ->
if (Array.get state.reg 0) = i then () else raise Trap;
state.insn := !(state.insn) + 1;
interp_insn state
| AddInt ->
let v = (Array.get state.reg 1) + (Array.get state.reg 2) in
Array.set state.reg 0 v;
state.insn := !(state.insn) + 1;
interp_insn state
| MulInt ->
let v = (Array.get state.reg 1) * (Array.get state.reg 2) in
Array.set state.reg 0 v;
state.insn := !(state.insn) + 1;
interp_insn state
| Ret -> ()
Compiling unsafely
- Now, how do we compile
calc
toasm
? - First, we need a way to interpret
calc
values as numbers- For numbers, it’s easy: they map to themselves. For Booleans, let’s treat
true
as1
andfalse
as0
- For numbers, it’s easy: they map to themselves. For Booleans, let’s treat
- In this compiler, we won’t be checking to see if valid calculator operations are being performed: we will simply generate assembly and hope for the best!
- Our goal is to make a function
unsafe_calc_to_asm
that generates anasm
program with “identical semantics” to the inputcalc
program - So, how do we start? There are many ways to achieve this, so here is one proposal: we store the result of evaluating each sub-expression in an address
- Note: Our goal here is not efficiency. We may perform many unnecessary compuatations here.
- Example:
unsafe_calc_to_asm_prog (Num(10));;
- : asm list =
[Setreg {reg = 0; value = 10}; (* set reg[0] = 10 *)
Store {reg = 0; addr = 0}; (* store result of evaluating Num(10) in address 0x0 *)
Load {reg = 0; addr = 0}; (* load result into address 0 *)
Ret]
(* a more interesting example: adding two numbers *)
> unsafe_calc_to_asm_prog (AddInt(Num(10), Num(20)));;
- : asm list =
[ (* first, load 10 into address 0x0 *)
Setreg {reg = 0; value = 10};
Store {reg = 0; addr = 0};
(* then, load 20 into address 0x1 *)
Setreg {reg = 0; value = 20};
Store {reg = 0; addr = 1};
(* perform addition *)
Load {reg = 1; addr = 0};
Load {reg = 2; addr = 1};
AddInt;
(* store result in 0x2 *)
Store {reg = 0; addr = 2};
(* load result of addition and return it*)
Load {reg = 0; addr = 2};
Ret]
- How do we handle conjunction? simple: we multiply the two arguments in
asm
- How do we implement this?
- An implementation:
(**
compiles calculator lang to TinyAsm
returns a pair (Listof TinyAsm, Number) where the second component is the
address that holds the result.
*)
let rec unsafe_calc_to_asm (counter: int ref) (c: calc) : (int * asm list) =
match c with
| Num(n) ->
(* store n in a fresh location *)
let new_loc = fresh counter in
(new_loc, [Setreg {reg=0; value=n};
Store {reg=0; addr=new_loc}])
| Bool(b) ->
(* store n in a fresh location *)
let new_loc = fresh counter in
(new_loc, [Setreg {reg=0; value=if b then 1 else 0};
Store {reg=0; addr=new_loc}])
| AddInt(e1, e2) ->
let (addr1, prog1) = unsafe_calc_to_asm counter e1 in
let (addr2, prog2) = unsafe_calc_to_asm counter e2 in
let result_addr = fresh counter in
let new_prog = [Load { reg=1; addr=addr1 };
Load { reg=2; addr=addr2 };
AddInt;
Store {reg=0; addr=result_addr}] in
(result_addr, List.concat[prog1; prog2; new_prog])
| And(e1, e2) ->
let (addr1, prog1) = unsafe_calc_to_asm counter e1 in
let (addr2, prog2) = unsafe_calc_to_asm counter e2 in
let result_addr = fresh counter in
let new_prog = [Load { reg=1; addr=addr1 };
Load { reg=2; addr=addr2 };
MulInt;
Store {reg=0; addr=result_addr}] in
(result_addr, List.concat[prog1; prog2; new_prog])
let unsafe_calc_to_asm_prog (c:calc) : asm list =
let (addr, asm) = unsafe_calc_to_asm (ref 0) c in
let asm = List.concat [asm; [Load{reg = 0; addr=addr}; Ret]] in
asm
- Why do we call this unsafe? It will happily run bad programs without warning us! This is bad for two reasons:
- It violates the original semantics of
calc
programs, which are supposed to error when invalid operations are performed. - It is difficult to debug and diagnose because we are not explicitly warned when errors occur; errors can propagate to strange faraway places in code and be difficult to localize or, even worse, go undetected.
- It violates the original semantics of
- Concretely, we can add Booleans and integers without issue in this unsafe compiler:
> run_calc_unsafe (AddInt(Num(10), Bool(true)));;
- : calcv = VNum 11
Compiling safely
- The principle of safety says that we should raise an error when it occurrs rather than silently continue to run the program.
- For instance, in C/C++, you can derefernce an illegal pointer into uninitialized memory. This is unsafe behavior because no runtime error is raised.
- See here for a great blog post on this topic
- The principle of dynamic safety is that the system should detect and fail when bad behavior is performed rather than continue to run and produce garbage
- Think of static safety as ensuring nothing bad happens without running the program, and dynamic safety ensuring nothing bad happens at runtime by raising an explicit error whenever it detects it.
- Similar to Python, we should perform runtime checks to see if values are being used in illegal ways. If they are, we should raise an explicit error.
- The key idea of enforcing dynamic safety during
calc
compilation is to insert a tag alongside each piece of data that is stored in the heap that holds the type of that data- This tag is typically an arbitrarily chosen number, one for each type. For instance, we will use the tag
1924
for Booleans and9418
for integers. - Then, during every read that expects a particular value, the tag is checked to ensure that data is not misused.
- We will make use of the
Trap
nstruction inasm
, which raises a runtime error if the value in register 0 is not equal tov
- This tag is typically an arbitrarily chosen number, one for each type. For instance, we will use the tag
- Example:
> safe_calc_to_asm_prog (Num(10));;
- : asm list = [
(* store 10 in address 0x0 and the integer tag in 0x1 *)
Setreg {reg = 0; value = 10};
Setreg {reg = 1; value = 9418};
Store {reg = 0; addr = 0};
Store {reg = 1; addr = 1};
Load {reg = 0; addr = 0};
Ret]
(* a more interesting example: adding two integers *)
safe_calc_to_asm_prog (AddInt(Num(10), Bool(true)));;
- : asm list =
[
(* store 10 and true (interpreted as 1) onto the heap, along with their tags *)
Setreg {reg = 0; value = 10};
Setreg {reg = 1; value = 9418};
Store {reg = 0; addr = 0};
Store {reg = 1; addr = 1};
Setreg {reg = 0; value = 1};
Setreg {reg = 1; value = 1924};
Store {reg = 0; addr = 2};
Store {reg = 1; addr = 3};
(* at this point, the heap looks like:
[10; 9418; 1; 1924; -1; -1; ...]
Now, perform addition, which is the same as before except tags are checked to
be the integer tag: *)
Load {reg = 0; addr = 1};
Trap 9418;
Load {reg = 0; addr = 3};
Trap 9418;
Load {reg = 1; addr = 0};
Load {reg = 2; addr = 2};
AddInt;
(* finally, store the result in 0x4 along with an integer tag in 0x5 *)
Store {reg = 0; addr = 4};
Setreg {reg = 1; value = 9418};
Store {reg = 1; addr = 5};
Load {reg = 0; addr = 4};
Ret]
- An implementation:
(**
compiles calculator lang to TinyAsm
returns a pair (Listof TinyAsm, Number) where the second component is the
address that holds the result.
*)
let rec safe_calc_to_asm (counter: int ref) (c: calc) : (int * asm list) =
match c with
| Num(n) ->
(* store n in a fresh location *)
let addr_v = fresh counter in
let addr_tag = fresh counter in
(addr_v, [Setreg {reg=0; value=n};
Setreg {reg=1; value=int_tag};
Store {reg=0; addr=addr_v};
Store {reg=1; addr=addr_tag}])
| Bool(b) ->
(* store n in a fresh location *)
let addr_v = fresh counter in
let addr_tag = fresh counter in
(addr_v, [Setreg {reg=0; value=if b then 1 else 0};
Setreg {reg=1; value=bool_tag};
Store {reg=0; addr=addr_v};
Store {reg=1; addr=addr_tag}])
| AddInt(e1, e2) ->
let (addr1, prog1) = safe_calc_to_asm counter e1 in
let (addr2, prog2) = safe_calc_to_asm counter e2 in
let result_addr = fresh counter in
let tag_addr = fresh counter in
let new_prog = [
(* add a check to ensure e1 and e2 have the correct tag *)
Load {reg = 0; addr = addr1 + 1};
Trap(int_tag);
Load {reg = 0; addr = addr2 + 1};
Trap(int_tag);
(* now load the desired values in and add them *)
Load { reg=1; addr=addr1 };
Load { reg=2; addr=addr2 };
AddInt;
Store {reg=0; addr=result_addr};
(* store the tag *)
Setreg {reg = 1; value = int_tag};
Store {reg = 1; addr=tag_addr};
] in
(result_addr, List.concat[prog1; prog2; new_prog])
| And(e1, e2) ->
let (addr1, prog1) = safe_calc_to_asm counter e1 in
let (addr2, prog2) = safe_calc_to_asm counter e2 in
let result_addr = fresh counter in
let tag_addr = fresh counter in
let new_prog = [
(* add a check to ensure e1 and e2 have the correct tag *)
Load {reg = 0; addr = addr1 + 1};
Trap(bool_tag);
Load {reg = 0; addr = addr2 + 1};
Trap(bool_tag);
(* now load the desired values in and add them *)
Load { reg=1; addr=addr1 };
Load { reg=2; addr=addr2 };
MulInt;
Store {reg = 0; addr=result_addr};
(* store the tag *)
Setreg {reg = 1; value = bool_tag};
Store {reg = 1; addr=tag_addr};
] in
(result_addr, List.concat[prog1; prog2; new_prog])
let safe_calc_to_asm_prog (c:calc) : asm list =
let (addr, asm) = safe_calc_to_asm (ref 0) c in
(* add ret to the end *)
List.concat [asm; [Load{reg = 0; addr=addr}; Ret]]
- Now, if we run a program that tries to add Booleans and integers, we get a
Trap
exception:
> run_calc_safe (AddInt(Num(10), Bool(true)));;
Exception: Trap.