Session 32: Random-access lists

The problem

Regular lists in Haskell have stack-like behavior: You can add to the list's front, or you can remove from the list's front. Both of these operations take O(1) time. Any other operations on the list, however, take O(n) time.

Suppose we want more array-like behavior, however, where we can access elements in the list rapidly? We want a persistent data structure that handles the following operations efficiently.

Add to the front of the list. (O(1) with a regular list)
Remove from the front of the list. (O(1) with a regular list)
Access the kth element of the list. (O(k) with a regular list)

Skew binary representation

Skew binary representation is a variation of binary representation. In binary representation, the places are worth 1, 2, 4, 8, ... (powers of 2). In this representation, however, the places are worth 1, 3, 7, 15, ... (powers of 2, minus 1). In each place, we can put a 0, 1, or 2, but every place after a 2 must be a 0. For example, we might represent the number 13₍₁₀₎ as 0120:

 0  1  2  0
__ __ __ __        0*15 + 1*7 + 2*3 + 0*1 = 13
15  7  3  1

Here are the first few numbers represented in this representation.

0  0000       0*1       = 0
1  0001       1*1       = 1
2  0002       2*1       = 2
3  0010       1*3       = 3
4  0011       1*3 + 1*1 = 4
5  0012       1*3 + 2*1 = 5
6  0020       2*3       = 6
7  0100       1*7       = 7
8  0101       1*7 + 1*1 = 8

Though we won't prove it here, it's a fact that each number can be represented in this representation in exactly one way.

What's the advantage of this? You'll basically have to play along with me for a while. The useful fact is that, as we increment numbers in this representation, each increment changes at most two places in the number, and those two places would be next to each other. For example, when we go from 13 to 14, 0120 goes to 0200; from 14 to 15, 0220 goes to 1000. In regular binary representation, when we increment, any number of places can change; going from 7 (0111) to 8 (1000) involves changing four places.

We can define a Haskell type for representing numbers in this representation. (I'm not claiming any usefulness to this particular representation - I'm building up to something else.)

type SkewBinary = [Integer]

In this representation, we list the place values of each nonzero place in increasing order; if a 2 is in the place, we duplicate it in our list. For example, 13₍₁₀₎ (which is 0120) would be represented as [3,3,7]. Here are the first several numbers.

0  0000    []
1  0001    [1]
2  0002    [1,1]
3  0010    [3]
4  0011    [1,3]
5  0012    [1,1,3]
6  0020    [3,3]
7  0100    [7]
8  0101    [1,7]

Note that each time, the numbers in the list representation naturally add up to the original list.

Now let's write functions for incrementing and decrementing each of the numbers.

inc (x0:x1:rest) | x0 == x1  =  (x0 + x1 + 1) : rest
                 | x0 /= x1  =  1 : x0 : x1 : rest
inc rest  =  1 : rest

dec (1:rest) =  rest
dec (x:rest) =  let half = (x - 1) `div` 2
                in half : half : rest

Skew binary list

Back to our problem of the random-access list. What we'll do is use a variation of the skew binary representation as a list, where we replace each number in the list with a tree of that size. For example, 13 would be reprepresented as a list as [3,3,7]. To represent the list of the 13 numbers 10, 20, 30, ..., 130, we would have a list of trees.

The 11 elements 10, 20, ..., 110 would be as follows, based on the list [1,3,7].

To define our data type for the list, we'll first define a Tree type.

data Tree a = Nil | Node a (Tree a) (Tree a)

We represent the list actually as a list of tuples, in which the first element of each tuple is an integer, and the second element is a tree of that size.

type RAList a = [(Integer, Tree a)]

The code to insert into the tree will look very similar to the inc function from before.

consTree info [] = [(1, Node info Nil Nil)]
consTree info (node1 : []) = [(1, Node info Nil Nil), node1]
consTree info all@((size0, tree0) : (size1, tree1) : rest)
    = if size0 == size1
      then (size0 + size1 + 1, Node info tree0 tree1) : rest
      else (1, Node info Nil Nil) : all

Note that this takes O(1) time: We never have to go beyond the first two elements in the list, and we never delve into the trees within these first two elements.

Remove the first element is analogous to the dec function of before.

tailTree ((1, Node ret Nil Nil) : rest) = rest
tailTree ((size, Node ret left right) : rest)
    = let half = size `div` 2
      in (half, left) : (half, right) : rest

Again, it is O(1) time: We only look at the first element of the list, and we never go more than one level deep into the tree.

Searching to the kth element is a matter of traversing the list. First we write a function to find the kth element within a tree, where the elements are ordered as we saw in the list above (with the first element at top, the first half ordered recursively on the left, and the second half ordered ordered recursively on the right).

findT k tsize (Node top lt rt) | k == 0         =  top
                               | 2 * k < tsize  =  findT k (tsize `div` 2) lt
                               | True           =  findT k (tsize `div` 2) rt

Note that this takes O(log n) time, since it goes down the tree, which would have a depth of log n since it is a complete tree. Now we can write our find function.

find k ((hsize, htree) : tail) | k < hsize = findT k hsize htree
                               | True      = find (k - hsize) tail

Here, if k indicates that the value can be found in the first tree, then we use findT to locate it. Otherwise, we continue forward to the next element, acknowledging that we have passed hsize elements already. Since the number will have at most log n non-zero digits (after this, the digits will begin to represent more than n), this will entail searching through at most log n elements of the list. Then we pay O(log n) for searching for the tree found there, for a total of O(log n) time.

In summary, we get the following.

	regular	skew binary
operation	list	list
Add to front	`O`(1)	`O`(1)
Remove from front	`O`(1)	`O`(1)
Access `k`th item	`O`(`k`)	`O`(log `n`)