Haskell tuples & lists

by Carl Burch, Hendrix College, August 2012

Haskell Lists by Carl Burch is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
Based on a work at www.cburch.com/books/hslist/.

1. Tuples
2. Lists
3. Strings
4. Applying functions to lists
4.1. Primality
4.2. Polygon area
5. Folds
5.1. Built-in folds
5.2. Other folding examples
6. List comprehensions

When we want to process a collection of data, the imperative programmer is accustomed to turning to an array. Unfortunately, like loops, arrays don't translate easily to the functional paradigm: They depend on the capability to alter values in particular array locations, and such memory manipulations are verboten in functional programming. Instead, functional languages like Haskell commonly support collections of data via tuples and lists.

1. Tuples

A tuple is a fixed-length coupling of values, written in parentheses with the values separated by commas. One way to use this is to pass all parameters into a function as one value, rather than the curried functions we've seen so far.

max3 :: (Double, Double, Double) -> Double

max3 (x, y, z) = max (max x y) z

To call this max3 function, we'd need to pass the full tuple of three values as a parameter: “max3 (12, 5, 13)”. Since max3 isn't curried, we can no longer partially call the function with parameters.

Of course, this brings up the question: Why would you prefer tuples as parameters rather than currying the function? There is no definitive answer to this; while most Haskell programmers prefer currying, some indeed prefer tuples. There are a few reasons for preferring tuples: it enhances readability (in the opinion of some), it enables a compiler to catch errors with passing the wrong number of arguments more immediately, and it could improve performance marginally — though one should never underestimate the power of a compiler to figure out the best way to execute any code. None of these reasons are so strong, though, as to convince a majority of Haskell programmers that they should avoid curried functions.

But tuples are useful in other contexts, too. One such use is for returning multiple values from a function. The following function, for instance, finds the minimum and maximum values of a function f within a range (a, b) by simply trying different x-values delta apart. It returns both the minimum and the maximum encoded in a tuple.

minMax :: (Double -> Double) -> Double -> Double -> Double -> (Double, Double)

minMax f a b delta | a + delta > b = (fa, fa)

                   | otherwise = (min fa mn, max fa mx)

                         where fa = f a

                               (mn, mx) = minMax f (a + delta) b delta

In the above examples, the tuples have multiple values of the same type. But tuples can combine unrelated types together as well: The tuple “(5, True)” is fine, for example.

Haskell provides a couple of built-in functions that are useful for pairs (tuples of length 2).

fst pair: Returns the first value from a tuple with two values.
snd pair: Returns the second value from a tuple with two values.

2. Lists

A list is a singly linked list like one sees in an imperative language, but with one important difference: We cannot change any values within a list, including the pointers from one list node to another. The only operation we have available is to insert a node at the beginning of the list.

A list in Haskell can be written using square brackets with commas separating the list's individual values. Thus, the expression “[2,3,5]” represents a list with three values, of which the first is 2, the second is 3, and the third is 5. Haskell also allows expressing a list of successive values, as in “[10..20]” containing the eleven integers from 10 to 20. Naturally, the empty list would be written “[].”

To write functions working with lists, we can use four fundamental operations:

null lst: Returns true if lst is empty.
head lst: Returns the first value of lst.
tail lst: Returns the list of values from lst following its first value.
v : lst: Returns a list beginning with v and followed with the values of lst.

We can easily use these to write our own list functions that iterate through the list to arrive at their result. For example, we might want to create a function that counts the number of times a particular integer occurs in a list.

occurs :: Integer -> [Integer] -> Integer

occurs value lst | null lst           = 0

                 | head lst == value  = 1 + occurs value (tail lst)

                 | otherwise          = occurs (tail lst)

Note in the type signature that the type of a list of Integers is written by placing brackets around the value type (“[Integer]”).

Suppose we want a function that returns all the positive integers in a list. In this case we need to use the colon operator (:) — sometimes called the cons operator for the name of the equivalent LISP function for constructing new list nodes.

positives :: [Integer] -> [Integer]

positives lst | null lst      = []

              | head lst > 0  = head lst : positives (tail lst)

              | otherwise     = positives (tail lst)

Note that this function creates a new list entirely. The old list is still around, intact — as it must be if the function doesn't change any existing memory.

Note, however, that the above definitions of occurs and positives are not how Haskell programmers generally write list-iterating functions. Instead, they rely on Haskell's pattern-matching capability, which is quite flexible.

occurs value [] = 0

occurs value (x:xs) = (if value == x then 1 else 0) + occurs value xs



positives [] = []

positives (x:xs) = if x > 0 then x : positives xs else positives xs

Given a value and a list, occurs will try first to match the list to the empty list in the first case's pattern. If that doesn't match, then it attempts to match the list to the pattern “(x:xs)” of the second case. The non-empty list will match this, with x being matched to the first value and xs being matched to the list of succeeding values.

Thus far, we have seen Haskell's four pre-defined functions for dealing with lists. These are the only four functions you ever absolutely need. But because lists are so widely used in Haskell programs, the Prelude provides many more functions. For example, there is a length function and the analogues to head and tail but dealing with the end of the list. These functions all take O(n) time, compared to O(1) for the fundamental operations listed above.

length lst: Returns the number of values in lst.
last lst: Returns the final value of lst.
init lst: Returns a list of all but the final value of lst.

There are a couple of infix operators for lists, too.

lst !! k: Returns the kth value of lst (indexed from 0).
lst₀ ++ lst₁: Returns a list containing the values of lst₀ followed by the values of lst₁ (concatenation).

Be careful with these functions. Most especially, avoid accessing lists like you would an array. For example, one could write occurs by defining a helper function to iterate through the indices of the list.

occurs :: Integer -> [Integer] -> Integer

occurs value lst = sub 0              -- an inefficient implementation

    where sub i | i == length lst      = 0

                | (lst !! i) == value  = 1 + sub (i + 1)

                | otherwise             = sub (i + 1)

You need to remember, though, that all the functions are based on the basic four — null, head, tail, and (:). A function like length will take time proportional to the list's length (O(n)), and the ‘!!’ operator takes time proportional to the index accessed. Thus, the above function is much more inefficient than our earlier tries: It takes O(n²) time, whereas the others take O(n) time.

You should similarly be careful with using ++. Suppose you wanted a function squares to produce the a list of the first n perfect squares. You might be tempted to write the following.

squares :: Integer -> [Integer]        -- an inefficient implementation

squares 0 = [0]

squares n = squares (n - 1) ++ [n ** 2]

But this is an inefficient implementation: The concatenation operator is implemented using the basic four functions, essentially as defined below.

(++) :: [a] -> [a] -> [a]             -- as pre-defined in Prelude

  []   ++ rest = rest

(x:xs) ++ rest = x : (xs ++ rest)

You can see that the function ends up creating a copy of the first argument list and placing the other list after it. As a result, our squares function would take O(n) time with each recursive call, for a total of O(n²) time. A much more efficient implementation would be one using the cons operator ‘:’; this approach constructs the list starting from the tail (rather than from the head as before). Admittedly, the definition is a bit more complex since it involves a helper function.

squares :: Integer -> [Integer]

squares n = sub 0

    where sub i | i > n      = []

                | otherwise  = (i * i) : sub (i + 1)

3. Strings

A Haskell string is simply a list of character values, and in fact the built-in type String is simply another name for the type [Char]. Consequently, we process strings the same way up we process lists. Following, for instance, is a function for counting the words in a string.

countWords :: String -> Integer

countWords str = sub str True

       where sub [] _ = 0

             sub (' ':cs) _ = sub cs True

             sub (_:cs) True = 1 + sub cs False

             sub (_:cs) False = sub cs False

As you can see, this relies on a helper function sub, which takes two parameters: The first parameter is a list of the unprocessed characters in the string, while the second parameter is a Boolean indicating whether we've just reached the end of a word (and so seeing a non-space character will indicate a new word to be counted).

As it happens, Haskell has a built-in function words that takes a String and breaks it into a list of Strings. We can naturally build a much simpler definition using this.

countWords :: String -> Integer

countWords str = length (words str)

4. Applying functions to lists

The built-in list functions also include some higher-order functions that are quite often useful. Indeed, each time you want to do something with a list, I recommend first asking yourself whether you can use these functions. If you can (and the answer is typically yes), it's generally easier to use it than to step through the desired list using recursion.

map f lst: Returns a list containing the results of applying f to each value of lst.
filter f lst: Returns a list of the values of lst for which f returns true.
zipWith f lst₀ lst₁: Returns a list containing the results of applying f to corresponding values in lst₀ and lst₁.

For example, we could easily rewrite squares using map and end up with a much simpler version.

squares :: [Integer] -> [Integer]

squares n = map (**2) [1..n]

We could similarly rewrite positives and occurs using filter.

positives list = filter (>0) list



occurs value lst = length (filter (== value) lst)

The zipWith function looks more contrived, but it is simply like map except that it uses a function taking two arguments. Given this function and two lists, zipWith steps through both of the lists in synch, passes their corresponding values into the function, and adds the result into its return list. In the following example, we take a list of first names and a list of last names and paste corresponding pairs together.

names = zipWith (\x -> \y -> x ++ " " ++ y) firsts lasts

             where firsts = ["Bill", "George", "Barack"]

                   lasts = ["Clinton", "Bush", "Obama"]

4.1. Primality

Using list processing can be useful even when we're dealing with problems that don't obviously have anything to do with lists. Recall that we previously defined isPrime for testing whether an integer is prime.

isPrime n = testFactorsFrom 2 n

    where testFactorsFrom i n | i * i > n = True

                              | n `rem` i == 0 = False

                              | otherwise = testFactorsFrom (i + 1) n

But suppose we instead create a list of all the values of i that the above is meant to iterate through; then we can filter out those that divide into n exactly and test whether any were found. In the implementation below, we define the helper list is to list the values of i to be tested.

isPrime :: Integer -> Bool

isPrime n = null (filter (\i -> n `rem` i == 0) is)

               where is = [2..floor (sqrt (fromInteger n))]

The most difficult part of this is the computation of the upper bound for is. The parameter n is an integer, but sqrt only deals with floating-point types, so we use the built-in function fromInteger that takes n and returns its floating-point equivalent. However, we need the range's upper bound to be an integer so that the list contains integers for rem, so we use floor to round the floating-point value given by sqrt down to the first integer.

Looking at this, you may object that it seems needlessly inefficient: It says to construct a list of all numbers in is, to test each of them for a remainder, and only then to see if the list is empty or not. Doesn't this seem like overkill if 2 or 3 divides into n (as happens quite often)?

But it turns out that this inefficiency is not as bad as it initially looks. Haskell uses lazy evaluation, which essentially means that it evaluates expressions only when they're needed. In this case, it will build the list only as far as it needs to determine the return value for null: As soon as filter yields a value, null will stop requesting that filter go further through its list. So if indeed n is divisible by 2, no other numbers will be attempted. (If, however, we had written “length (filter…) > 0” instead of “null (filter…)”, then it would need to evaluate the entire list to compute its exact length.) Lazy evaluation is a very different approach to executing programs, and we'll study it more intensively later; but for now, rest assured that our modified isPrime actually only goes as far as we need.

4.2. Polygon area

Suppose we have a polygon defined by a list of n points (x_i, y_i) in the two-dimensional Cartesian plane. Gauss discovered a simple formula for computing the area of this polygon:

½ ∑_i = 0^n − 1 (x_i − x_i + 1) (y_i + y_i + 1)

In this formula, we have to suppose a “wrap-around” effect: x_n is the same as x₀, and y_n is the same as y₀.

area :: [(Double,Double)] -> Double

area points = 0.5 * sum (zipWith (*) 1 xdiffs ysums)

    where xdiffs = zipWith (-) xs (tail xs ++ [head xs])

          ysums  = zipWith (+) ys (tail ys ++ [head ys])

          xs = map fst points

          ys = map snd points

(This idea for an example comes from Damir Medak and Gerhard Navratil's Haskell Tutorial. Incidentally, the formula used applies only when the polygon not intersect itself.)

5. Folds

A related category of list functions are those that perform a fold, where we take a two-argument function and essentially insert it between all values in a list. For example, perhaps we want to add a list's values. Fundamentally, we want to “insert” an addition between each successive value.

`[`	`2`	`,`	`3`	`,`	`4`	`,`	`5`	`]`
	2	+	3	+	4	+	5

There are two functions for folding, depending on whether we want to apply our function left-associatively (starting with 2 + 3 above) or right-associatively (starting with 4 + 5). For an associative operation like addition, it doesn't matter, but it could for other operations.

foldl f b lst: Returns the result of applying f to successive values of lst, starting with b and going forward through the list (left-associatively).
foldr f b lst: Returns the result of applying f to successive values of lst, starting with b and going backward through the list (right-associatively).

The folding functions take a starting value b. This will be placed on the left or right side depending on whether we're using foldl or foldr. If the list is empty, the result returned is simply b.

Getting back to summing a list, we can write a sum function as follows.

sum list = foldl (+) 0 list -- already defined in Prelude

Here, we are saying to start with 0, add the first value to it, then add the second value to that, then add the third value to that,… and after we add the final value, return the result.

5.1. Built-in folds

In fact, though, this sum function is already defined in Haskell's Prelude. The Prelude includes a variety of pre-defined functions that basically are specific folds on lists using the (+), (*), min, max, (++), (&&), and (||) functions respectively.

sum lst: Returns the sum of the numbers in a list lst.
product lst: Returns the product of the numbers in a list lst.
minimum lst: Returns the smallest value in a list lst.
maximum lst: Returns the largest value in a list lst.
concat lst: Returns the result of concatenating all lists in a list lst of lists.
and lst: Returns the logical AND of the values in a list lst of Booleans.
or lst: Returns the logical OR of the values in a list lst of Booleans.

5.2. Other folding examples

The `factor` function

Another example using folding is in defining a function factor that given a list of integers finds the largest integer dividing into all of them exactly. For instance, we'd expect “factor [36,24,18]” to return 6. We can write factor using the gcd function already defined in the Prelude to compute the greatest common divisor of any two numbers.

factor :: [Integer] -> Integer

factor lst = foldl gcd (head lst) (tail lst)

The `evalPoly` function

A more obscure application of foldl is in evaluating polynomials. Suppose we represent a polynomial using a list of coefficients: For example, the polynomial x³ + 4 x + 2 would be represented using the list “[2, 4, 0, 1]”. We want to write a function that takes such a list and a value x and evaluates the polynomial given that value.

One way to accomplish this is to use zipWith to associate each coefficient with its exponent and to compute the value of that term, then to add the various terms together.

evalPoly coeffs x = sum (zipWith computeTerm coeffs [0..len coeffs - 1])

                  where computeTerm c i = c * (x ** i)

This approach is fine, but it suffers from two minor disadvantages. First, it requires exponentiation, which is a bit slower than multiplication. And second, Haskell's exponentiation operator applies only to floating-point numbers, while we may want to use evalPoly with integer values. We can avoid both issues using repeated multiplication to compute the exponentiated values.

We can do this using a fold. Our approach will actually pass a tuple of two values from each coefficient to the next: The first of the two values is the sum of the previous terms, while the second is the value of xⁱ that should be used for that coefficient.

evalPoly coeffs x = fst (foldl nextCoeff (0, 1) coeffs)

                  where nextCoeff (sum, xi) c = (sum + c * xi, x * xi)

Seeing it work on an example should help understand how this works. Let's try it with [2, 4, 0, 1] for coeffs and 3 for x, for which we expect the return value 3³ + 4 ⋅ 3 + 2 = 41.

	2		4		0		1
(0, 1)	→	(2, 3)	→	(14, 9)	→	(14, 27)	→	(41, 81)

As we go through the coefficients, the first value in each tuple holds the sum of the terms for the previous coefficients, while the second value holds xⁱ for that coefficient. You can see that the final tuple has the correct evaluation result of 41 as its first value.

While that works well, and it's an excellent example to try to understand from the point of view of understanding folds, there's an even better approach relying on a technique called Horner's method. Horner's method suggests a way of rewriting a polynomial without relying on exponentiation.

x³ + 4 x + 2 = ((1 ⋅ x + 0) ⋅ x + 4) ⋅ x + 2

What's going on here is that we start from the highest-order term and slowly multiply x into it. In implementing evalPoly, then, we would want to start with the highest-order coefficients, which are at the end of the list and so we would want to use foldr.

evalPoly coeffs x = foldr (\c -> \s -> s * x + c) 0 coeffs

Now our function takes two parameters: c will be the current coefficient (starting from the final one) while s will be the computed value from the higher-order coefficients. Following traces how it works with our example.

	2		4		0		1
13 ⋅ 3 + 2 = 41	←	3 ⋅ 3 + 4 = 13	←	1 ⋅ 3 + 0 = 3	←	0 ⋅ 3 + 1 = 1	←	0

The `lastPrime` function

Consider writing a function for finding the final prime in a list of numbers. The obvious way to do this is to filter out all the primes using filter and our already-written isPrime function, and then to use last to retrieve the final value in the list.

lastPrime = last . filter isPrime

The downside to this is that ends up computing the primality of all numbers in the list. Admittedly, we can avoid this by reversing the list and relying on the fact that Haskell uses lazy evaluation.

lastPrime = first . filter isPrime . reverse

But a different way of doing this, without relying on lazy evaluation, is to use foldr. We'll pass 0 starting from the end of the list, and it will be passed up the list until we reach the last prime. The number passed after that is the prime, and so there are no further primality tests (since r is no longer 0).

lastPrime lst = foldr (\n -> \r -> if r == 0 && isPrime n then n else r) 0 lst

6. List comprehensions

Because list processing is so common, Haskell provides a special syntax for combining operations called a list comprehension. It is based on the set-builder notation commonly used in mathematics, where one might write { n ∈ N : n mod 3 = 1 } to represent the set { 1, 4, 7, … }. To illustrate the Haskell syntax, we'll use a list comprehension to rewrite our positives function, which was to return a list of the positive values in the parameter list.

positives lst = [ x | x <- lst, x > 0 ]

The list comprehension consists of two parts separated by a vertical bar. After the vertical bar are comma-separated clauses, each of which either fits the form “symbol <- listExpression” or is a Boolean expression. A “symbol <- listExpression” clause says that the remainder of the comprehension should be evaluated with symbol taking on each value from the list. A Boolean expression clause filters out only those values for which the expression is true. For each symbol value that satisfies the Boolean clauses, the expression before the vertical bar is evaluated, with its value included in the constructed list.

As a more complex example, the following expression computes all products of two distinct two-digit primes.

[ x * y | x <- [10..99], isPrime x, y <- [(x + 1)..99], isPrime y ]

In this case, we have two symbols varying over different lists. We iterate x over all two-digit integers, testing whether it is prime; for each x that satisfies that test we iterate y over all two-digit integers beyond x, and for those that are prime, we include x * y in our resulting list.

The message to take from this document is this: Quite often, whenever you want to do something with functional programming, the obvious way to do the problem might be to use recursion, but it often pays to consider doing it instead with the powerful set of built-in list functions. That's true even with functions like isPrime, where list functions initially seem not to apply. Lists are one of the most powerful tools at your disposal in programming using a functional language.