Lab 8: Accumulative and Generative Recursion, Tries

Part 1: Accumulative and Generative Recursion (30 mins)

Generative Recursion refers to recursion that does not follow the data definition. Consider the data definition of (non-negative) integers:

;; An Integer is one of:
;; - 0
;; - (add1 Integer)

Following the template, the recursive case should call (int-template (add1 Integer)). The gcd function, which computes the greatest common divisor of two integers, does not follow the data definition of numbers at all:

;; my-gcd: Integer Integer -> Integer
(define (my-gcd x y)
  (cond
    [(= y 0) x]
    [else (my-gcd y (remainder x y))]))

There is a theorem related to gcd (reference: Wolfram Mathworld):

If $a, b$ are integers and not both equal to 0, then there exists two integers $u$ and $v$ such that

\[gcd(a, b) = ua + vb\]

There is an algorithm that computes $\text{bezout}(a, b) = (u, v)$:

\[\text{bezout}(a,b) = (v', u' - qv')\]

where

$(u', v') = \text{bezout}(b, a \text{ mod } b)$ and $a = qb + (a \text{ mod } b)$

and $x \text{ mod } y$ is the remainder of $x$ divided by $y$.

Implement the bezout function:

;; bezout: Integer Integer -> (list Integer Integer)

(check-expect (bezout 7 0) (list 1 0))
(check-expect (bezout 21 14) (list 1 -1))
(check-expect (bezout 525 182) (list -9 26))

The Fibonacci Sequence can be defined the following set of equations:

\[f_0 = 0\] \[f_1 = 1\] \[f_n = f_{n-1} + f_{n-2}\]

where $f_i$ is the $i^{th}$ fibonacci number.

Consider the following definition for (fib n), producing the nth fibonacci number:

;; fib : Integer -> Integer
(define (fib n)
  (cond [(zero? n) 0]
        [(= 1 n) 1]
        [else (+ (fib (- n 2)) (fib (- n 1)))]))

(check-expect (fib 1) 1)
(check-expect (fib 2) 1)
(check-expect (fib 3) 2)
(check-expect (fib 35) 9227465) ;; quite slow!

Each application of fib for n > 1 invokes two other calls to fib. This means we will have a total of exponential number of application of the same function!

Re-write (fib n) using accumulative recursion. Hint: you will need to write an helper with some accumulated value(s) as argument!

Part 2: Trie and Spell Checking (70 mins)

First of all, we are using the existing hash tables implementation in Lab 7, along with the hash-update function. For your convenience, every function is in this updated library file. Make sure that it is included in your implementation:

(require "./hashtable-extras-2.rkt")

Note: Do not use this file for your Homeworks! You may encounter issues if you do so.

In this lab, you will be developing a set of operations for working with tries. Tries are an efficient way to store a set of words by their prefix.

;; a Trie is [Hash-table-of 1String Trie]
;; The key for the Trie is one of:
;; "a" to "z", representing the letters
;; "" (the empty string), representing the termination marker.
;; The value represents the sub-trie (also called child) of the current Trie. 
;; The corresponding value for key "" should be an empty Trie (i.e. empty hash table).

;; empty trie is the base case. i.e. no strings are represented in trie.
(define empty-trie (make-hash empty))
;; This is a Trie that contains only the empty string ("").
(define empty-string-trie (make-hash (list (list "" empty-trie))))
;; A trie containing strings "cat" and "car".
(define TRIE-1
  (make-hash (list (list "c"
                         (make-hash
                          (list (list "a" (make-hash
                                           (list (list "t" empty-string-trie)
                                                 (list "r" empty-string-trie))))))))))
;; A trie containing strings "at" and "ate".
(define TRIE-2
  (make-hash (list (list "a"
                         (make-hash
                          (list (list "t" (make-hash
                                           (list (list "" empty-trie)
                                                 (list "e" empty-string-trie))))))))))

There are a few distinct features of tries comparing to other tree-like data structure you have seen. First, the trees you have seen store data in their nodes, but nodes themselves in tries does not contain any information. In fact, each edge (i.e. parent-child relationship) is associated by a character. Also, to distinguish the strings represented with any prefix , a special edge is needed to indicate termination. For example, TRIE-2 only represents string "at" and "ate", and "a" is not considered as contained in TRIE-2. Note the position of the termination marker ((list "" empty-trie)). If a Trie has an association with key "", then the “chain” from root to the Trie represents a valid string, with nth edge in the chain representing nth character.

(TA Demo: pictorial representation of tries.)

Write a few more examples of Trie.

Write trie-member? which determines if a trie contains a given string.

;; trie-member? : Trie String -> Boolean
;; Determine if the trie contains the given string 

Write trie-strings which produces all strings in a trie in lexicographical order.

;; trie-strings : Trie -> [List-of String]
;; Generate a sorted list of all of the strings present in the trie

Write trie-insert which inserts all strings in a list into a trie.

;; trie-insert : Trie String -> Trie
;; Given a previous trie, generate a new trie which contains the new string 

Use the following definition of trie-build to test your function:

;; trie-build: [list-of String] -> Trie
(define (trie-build los) (trie-insert empty-trie los))

(check-expect (trie-build (list "")) empty-string-trie)
(check-expect (trie-build (list "cat" "car")) TRIE-1)
(check-expect (trie-build (list "at" "ate")) TRIE-2)

Write the following function:

;; trie-search-subst-dist: Trie String Integer -> [list-of String]
;; Given a trie, a string, and an integer distance, return a list of all strings in the trie that are
;; within n substitution distance of the given string. 
;; The substitution distance is defined as the minimum number of single-character replacements.

(check-expect (sort (trie-search-subst-dist TRIE-1 "cas" 1) string<?) (list "car" "cat"))
(check-expect (sort (trie-search-subst-dist TRIE-1 "cup" 2) string<?) (list "car" "cat"))
(check-expect (sort (trie-search-subst-dist TRIE-1 "cup" 1) string<?) (list))
(check-expect (sort (trie-search-subst-dist TRIE-1 "cap" 0) string<?) (list))
(check-expect (trie-search-subst-dist TRIE-1 "ab" 3) (list))

(An optional challenge) Extend your implementation of trie-search-subst-dist to find all strings within n edit distances in the trie.

;; trie-search: Trie String Integer -> [list-of String]
;; Given a trie, a string, and an integer distance, return a list of all strings in the trie that are
;; within n edit distance of the given string. 
;; Edit distance is defined as the minimum number of single-character substitutions, insertions, or
;; deletions.

(check-expect (trie-search TRIE-1 "parrot" 4) (list "car" "cat"))
;; Explanation:
;; "parrot" 
;; -> (substitute p -> c) "carrot"
;; -> (delete t) "carro"
;; -> (delete r) "caro"
;; -> (delete o) "car" 
;; 
;; "parrot" 
;; -> (substitute p -> c) "carrot"
;; -> (delete r) "carot"
;; -> (delete r) "caot"
;; -> (delete o) "cat"