A note on memoization

Memoization, coined after Donald Mitche, is a technique used for optimizing the runtime speed of a function call. It is a specific case of caching, where the return value of the function call is saved, in order for subsequent calls to look up that value.Memoization requires that the function being memoized is deterministic, and referentially transparent.

A deterministic function always returns the same value, given a fixed input value (or set of input values).

Referentially transparent indicates that replacing the function call with the return value directly is the same action – thus skipping the entire calculation. Said in another way, the function does not introduce side effects.

A good example where memoization may be applied succesfully, is with recursive functions. A recursive function will refer to itself. The Fibonacci sequence is a classic example of how a number sequence can be expressed in a recursive form. The Fibonacci sequence uses the two previous numbers in the sequence to calculate the next number.

For a given position n in the Fibonacci sequence, the number represented at that position is therefore n-1 + n-2

# The first 5 integers of the sequence
1, 1, 2, 3, 5

# base cases for the first two numbers
n1 = 1
n2 = 1

# Expressing the next number in the sequence at position n
n = n-1 + n-2

n3 = n2 + n1 => 1 + 1 = 2
n4 = n3 + n2 => 2 + 1 = 3
n5 = n4 + n3 => 3 + 2 = 5 


We can write this sequence relation for each numbers position as a recursive function like this

# python 3.x simple fibonacci recursive function

def fib(n):
    if n == 1 or n == 2:
        return 1
    else:
        return fib(n-1) + fib(n-2)

Note how we immediately return the value of the base cases if our input is 1 or 2. Otherwise the function fib(n) is recursively called to find the previous numbers in the sequence. When n gets increasingly larger, this technique won’t work, at least not in python (see note).

Lets see it in action. by counting the number of times the function is called with the same arguments when fib(5) is called.

def fib(n):
    if n == 1 or n == 2:
        return 1
    else:
        print('called fib ', n)
        return fib(n-1) + fib(n-2)

fib(5)

# running the following program on my laptop results in
> called fib 5
> called fib 4
> called fib 3
> called fib 3

Note how the third Fibonacci number is calculated twice. As fib(5) is calculated, both fib(4) and fib(3) must be calculated before the final result of fib(5) is found. However fib(4) also calculates fib(3) when finding it’s intermediate result, creating an unnecessary calculation. this only gets worse as numbers go up. For fib(10) the number of calls to fib(3) in intermediate steps is 21. Now as hinted, memoization can greatly speed up the runtime as n increases. However it comes at a cost of memory. When the intermediate result is returned, we should store it, for later. Depending on the result it self and the total amount of intermediate results, the memory consumption may rise.

# fibbonacci with induced memoization

# introduce an array holding the intermediate fib(n) results
# the n'th position holds the fib(n) result

intermediate = [1,1]

def fib(n):
    if len(intermediate) >= n:
        return intermediate[n-1] # 0-indexed
    else:
        intermediate.append(fib(n-1) + fib(n-2))
        return intermediate[n-1]

As fib(n) is called, the intermediate array grows in size, with the results it hasn’t yet calculated. The redundant calculation of the third Fibonacci which was 21 from fib(10) is now reduced to 1 calculation again, due to the array lookup.

Note:
The maximum recursion level of a function in python is 999 per default. Python does not optimize for recursion. Other languages do. Some languages have meta-annotations telling the underlying compiler how to optimize a certain code block. The scala language compiler does this for you automatically if it finds the function is recursive.