by Carl Burch, Hendrix College, August 2012
When we want to process a collection of data, the imperative programmer is accustomed to turning to an array. Unfortunately, like loops, arrays don't translate easily to the functional paradigm: They depend on the capability to alter values in particular array locations, and such memory manipulations are verboten in functional programming. Instead, functional languages like Haskell commonly support collections of data via tuples and lists.
A tuple is a fixed-length coupling of values, written in parentheses with the values separated by commas. One way to use this is to pass all parameters into a function as one value, rather than the curried functions we've seen so far.
max3 :: (Double, Double, Double) -> Double
max3 (x, y, z) = max (max x y) z
To call this max3
function, we'd need to pass the
full tuple of three values as a parameter:
“max3 (12, 5, 13)
”.
Since max3
isn't curried, we can no longer partially
call the function with parameters.
Of course, this brings up the question: Why would you prefer tuples as parameters rather than currying the function? There is no definitive answer to this; while most Haskell programmers prefer currying, some indeed prefer tuples. There are a few reasons for preferring tuples: it enhances readability (in the opinion of some), it enables a compiler to catch errors with passing the wrong number of arguments more immediately, and it could improve performance marginally — though one should never underestimate the power of a compiler to figure out the best way to execute any code. None of these reasons are so strong, though, as to convince a majority of Haskell programmers that they should avoid curried functions.
But tuples are useful in other contexts, too. One such use is for returning multiple values from a function. The following function, for instance, finds the minimum and maximum values of a function f within a range (a, b) by simply trying different x-values delta apart. It returns both the minimum and the maximum encoded in a tuple.
minMax :: (Double -> Double) -> Double -> Double -> Double -> (Double, Double)
minMax f a b delta | a + delta > b = (fa, fa)
| otherwise = (min fa mn, max fa mx)
where fa = f a
(mn, mx) = minMax f (a + delta) b delta
In the above examples, the tuples have multiple values of the
same type. But tuples can combine unrelated types together as
well: The tuple “(5, True)
” is fine, for
example.
Haskell provides a couple of built-in functions that are useful for pairs (tuples of length 2).
fst
pairsnd
pairA list is a singly linked list like one sees in an imperative language, but with one important difference: We cannot change any values within a list, including the pointers from one list node to another. The only operation we have available is to insert a node at the beginning of the list.
A list in Haskell can be written using square brackets with
commas separating the list's individual values.
Thus, the expression “[2,3,5]
” represents a list with
three values, of which the first is 2, the second is 3, and
the third is 5. Haskell also allows expressing a list of
successive values, as in “[10..20]
” containing the
eleven integers from 10 to 20. Naturally, the empty list would
be written “[]
.”
To write functions working with lists, we can use four fundamental operations:
null
lsthead
lsttail
lst:
lstWe can easily use these to write our own list functions that iterate through the list to arrive at their result. For example, we might want to create a function that counts the number of times a particular integer occurs in a list.
occurs :: Integer -> [Integer] -> Integer
occurs value lst | null lst = 0
| head lst == value = 1 + occurs value (tail lst)
| otherwise = occurs (tail lst)
Note in the type signature that the type of a list of Integer
s
is written
by placing brackets around the value type (“[Integer]
”).
Suppose we want a function that returns all the positive
integers in a list.
In this case we need to use the colon operator (:
) —
sometimes called the cons operator for the name of the
equivalent LISP function for constructing new list nodes.
positives :: [Integer] -> [Integer]
positives lst | null lst = []
| head lst > 0 = head lst : positives (tail lst)
| otherwise = positives (tail lst)
Note that this function creates a new list entirely. The old list is still around, intact — as it must be if the function doesn't change any existing memory.
Note, however, that the above definitions of occurs
and
positives
are not
how Haskell programmers
generally write list-iterating functions. Instead, they rely on
Haskell's pattern-matching capability, which is quite flexible.
occurs value [] = 0
occurs value (x:xs) = (if value == x then 1 else 0) + occurs value xs
positives [] = []
positives (x:xs) = if x > 0 then x : positives xs else positives xs
Given a value and a list, occurs
will try first to match
the list to the empty list in the first case's pattern.
If that doesn't match, then it attempts to match the list to the
pattern “(x:xs)
” of the second case.
The non-empty list will match this, with x
being
matched to the first value and xs
being matched to
the list of succeeding values.
Thus far, we have seen Haskell's four pre-defined functions for
dealing with lists.
These are the only four functions you ever absolutely need.
But because lists are so widely used in Haskell programs, the
Prelude provides many more functions.
For example, there is a length
function and the
analogues to head
and tail
but dealing with the
end of the list.
These functions all take O(n) time,
compared to O(1) for the fundamental operations
listed above.
length
lstlast
lstinit
lstThere are a couple of infix operators for lists, too.
!!
k++
lst1Be careful with these functions. Most especially, avoid
accessing lists like you would an array.
For example, one could write occurs
by defining a
helper function to iterate through the indices of the list.
occurs :: Integer -> [Integer] -> Integer
occurs value lst = sub 0 -- an inefficient implementation
where sub i | i == length lst = 0
| (lst !! i) == value = 1 + sub (i + 1)
| otherwise = sub (i + 1)
You need to remember, though, that all the functions are based
on the basic four — null
, head
, tail
,
and (:)
. A function like length
will
take time proportional to the list's length
(O(n)), and the
‘!!
’ operator takes time proportional to the index
accessed.
Thus, the above function is much more inefficient than our
earlier tries: It takes O(n²) time,
whereas the others take O(n) time.
You should similarly be careful with using ++
.
Suppose you wanted a function squares
to produce the a list of the first n perfect squares. You
might be tempted to write the following.
squares :: Integer -> [Integer] -- an inefficient implementation
squares 0 = [0]
squares n = squares (n - 1) ++ [n ** 2]
But this is an inefficient implementation: The concatenation operator is implemented using the basic four functions, essentially as defined below.
(++) :: [a] -> [a] -> [a] -- as pre-defined in Prelude
[] ++ rest = rest
(x:xs) ++ rest = x : (xs ++ rest)
You can see that the function ends up creating a copy of the
first argument list and placing the other list after it.
As a result, our squares
function would take
O(n) time
with each recursive call, for a total of
O(n²) time.
A much more efficient implementation would be one using the cons
operator ‘:
’; this approach constructs the list starting
from the tail (rather than from the head as before).
Admittedly, the definition is a bit more complex since it
involves a helper function.
squares :: Integer -> [Integer]
squares n = sub 0
where sub i | i > n = []
| otherwise = (i * i) : sub (i + 1)
A Haskell string is simply a list of character values,
and in fact the built-in type String
is simply another
name for the type [Char]
. Consequently, we
process strings the same way up we process lists.
Following, for instance, is a function for counting the words in
a string.
countWords :: String -> Integer
countWords str = sub str True
where sub [] _ = 0
sub (' ':cs) _ = sub cs True
sub (_:cs) True = 1 + sub cs False
sub (_:cs) False = sub cs False
As you can see, this relies on a helper function sub
,
which takes two parameters: The first parameter is a list of the
unprocessed characters in the string, while the second parameter
is a Boolean indicating whether we've just reached the end of a
word (and so seeing a non-space character will
indicate a new word to be counted).
As it happens, Haskell has a built-in function words
that takes a String
and breaks it into a list of
String
s. We can naturally build a much simpler
definition using this.
countWords :: String -> Integer
countWords str = length (words str)
The built-in list functions also include some higher-order functions that are quite often useful. Indeed, each time you want to do something with a list, I recommend first asking yourself whether you can use these functions. If you can (and the answer is typically yes), it's generally easier to use it than to step through the desired list using recursion.
map
f lstfilter
f lstzipWith
f lst0 lst1For example, we could easily rewrite squares
using map
and end up with a much simpler version.
squares :: [Integer] -> [Integer]
squares n = map (**2) [1..n]
We could similarly rewrite positives
and occurs
using filter
.
positives list = filter (>0) list
occurs value lst = length (filter (== value) lst)
The zipWith
function looks more contrived, but it is
simply like map
except that it uses a function taking two arguments.
Given this function and two lists, zipWith
steps through
both of the lists in synch, passes their corresponding values
into the function, and adds the result into its return list.
In the following example, we take a list of first names and a
list of last names and paste corresponding pairs together.
names = zipWith (\x -> \y -> x ++ " " ++ y) firsts lasts
where firsts = ["Bill", "George", "Barack"]
lasts = ["Clinton", "Bush", "Obama"]
Using list processing can be useful even when we're dealing
with problems that don't obviously have anything to do with
lists. Recall that we previously defined isPrime
for
testing whether an integer is prime.
isPrime n = testFactorsFrom 2 n
where testFactorsFrom i n | i * i > n = True
| n `rem` i == 0 = False
| otherwise = testFactorsFrom (i + 1) n
But suppose we instead create a list of all the values of
i
that the above is meant to iterate through;
then we can filter out those that divide into n
exactly
and test whether any were found. In the implementation below, we
define the helper list is
to list the values of
i
to be tested.
isPrime :: Integer -> Bool
isPrime n = null (filter (\i -> n `rem` i == 0) is)
where is = [2..floor (sqrt (fromInteger n))]
The most difficult part of this is the computation of the upper bound
for is
. The parameter n
is an integer,
but sqrt
only deals with floating-point types, so we use
the built-in function fromInteger
that takes n
and returns its floating-point equivalent.
However, we need the range's upper bound
to be an integer so that the list contains integers for rem
,
so we use floor
to round the floating-point value given
by sqrt
down to the first integer.
Looking at this, you may object that it seems needlessly
inefficient: It says to construct a list of all numbers in
is
, to test each of them for a remainder, and
only then to see if the list is empty or not.
Doesn't this seem like overkill if 2 or 3 divides into n
(as happens quite often)?
But it turns out that this inefficiency is not as bad as it
initially looks. Haskell uses lazy evaluation,
which essentially means that it evaluates expressions only when
they're needed.
In this case, it will build the list only as far as it needs
to determine the return value for null
: As soon as
filter
yields a value, null
will stop requesting
that filter
go further through its list.
So if indeed n
is divisible by 2, no other numbers will
be attempted.
(If, however, we had written
“length (filter
…) > 0
”
instead of
“null (filter
…)
”, then it
would need to evaluate the entire list to compute its exact
length.)
Lazy evaluation is a very different approach to executing programs,
and we'll study it more intensively later;
but for now, rest assured that our modified isPrime
actually
only goes as far as we need.
Suppose we have a polygon defined by a list of n points (xi, yi) in the two-dimensional Cartesian plane. Gauss discovered a simple formula for computing the area of this polygon:
In this formula, we have to suppose a “wrap-around” effect: xn is the same as x0, and yn is the same as y0.
area :: [(Double,Double)] -> Double
area points = 0.5 * sum (zipWith (*) 1 xdiffs ysums)
where xdiffs = zipWith (-) xs (tail xs ++ [head xs])
ysums = zipWith (+) ys (tail ys ++ [head ys])
xs = map fst points
ys = map snd points
(This idea for an example comes from Damir Medak and Gerhard Navratil's Haskell Tutorial. Incidentally, the formula used applies only when the polygon not intersect itself.)
A related category of list functions are those that perform a fold, where we take a two-argument function and essentially insert it between all values in a list. For example, perhaps we want to add a list's values. Fundamentally, we want to “insert” an addition between each successive value.
[ | 2 | , | 3 |
, | 4 | , | 5 |
] |
2 | + | 3 | + | 4 | + | 5 |
There are two functions for folding, depending on whether we want to apply our function left-associatively (starting with 2 + 3 above) or right-associatively (starting with 4 + 5). For an associative operation like addition, it doesn't matter, but it could for other operations.
foldl
f b lstfoldr
f b lstThe folding functions take a starting value b.
This will be placed on the left or right side depending on
whether we're using foldl
or foldr
. If the
list is empty, the result returned is simply b.
Getting back to summing a list, we can write a sum
function as follows.
sum list = foldl (+) 0 list -- already defined in Prelude
Here, we are saying to start with 0, add the first value to it, then add the second value to that, then add the third value to that,… and after we add the final value, return the result.
In fact, though, this sum
function is already defined in
Haskell's Prelude.
The Prelude includes a variety of pre-defined functions
that basically are specific folds on lists using the
(+)
, (*)
, min
, max
, (++)
,
(&&)
, and (||)
functions respectively.
sum
lstproduct
lstminimum
lstmaximum
lstconcat
lstand
lstor
lstfactor
functionAnother example using folding is in defining a function factor
that given a list of
integers finds the largest integer dividing into all of them
exactly. For instance, we'd expect
“factor [36,24,18]
”
to return 6.
We can write factor
using the gcd
function already defined
in the Prelude to compute the greatest common divisor of any two
numbers.
factor :: [Integer] -> Integer
factor lst = foldl gcd (head lst) (tail lst)
evalPoly
functionA more obscure application of foldl
is in
evaluating polynomials. Suppose we represent a polynomial
using a list of coefficients: For example, the polynomial
x³ + 4 x + 2
would be represented using the list “[2, 4, 0, 1]
”.
We want to write a function that takes such a list and a value
x and evaluates the polynomial given that value.
One way to accomplish this is to use zipWith
to associate each coefficient with its exponent and to compute
the value of that term, then to add the various terms together.
evalPoly coeffs x = sum (zipWith computeTerm coeffs [0..len coeffs - 1])
where computeTerm c i = c * (x ** i)
This approach is fine, but it suffers from two minor disadvantages.
First, it requires exponentiation, which is a bit slower than
multiplication.
And second, Haskell's exponentiation operator applies only to
floating-point numbers, while we may want to use evalPoly
with integer values.
We can avoid both issues using repeated multiplication to
compute the exponentiated values.
We can do this using a fold. Our approach will actually pass a tuple of two values from each coefficient to the next: The first of the two values is the sum of the previous terms, while the second is the value of xi that should be used for that coefficient.
evalPoly coeffs x = fst (foldl nextCoeff (0, 1) coeffs)
where nextCoeff (sum, xi) c = (sum + c * xi, x * xi)
Seeing it work on an example should help understand how this works.
Let's try it with [2, 4, 0, 1]
for
coeffs
and 3
for x
,
for which we expect the return value
3³ + 4 ⋅ 3 + 2 = 41.
2 | 4 | 0 | 1 | |||||
(0, 1) | → | (2, 3) | → | (14, 9) | → | (14, 27) | → | (41, 81) |
As we go through the coefficients, the first value in each tuple holds the sum of the terms for the previous coefficients, while the second value holds xi for that coefficient. You can see that the final tuple has the correct evaluation result of 41 as its first value.
While that works well, and it's an excellent example to try to understand from the point of view of understanding folds, there's an even better approach relying on a technique called Horner's method. Horner's method suggests a way of rewriting a polynomial without relying on exponentiation.
What's going on here is that we start from the highest-order
term and slowly multiply x into it. In implementing
evalPoly
, then, we would want to start with the
highest-order coefficients, which are at the end of the list
and so we would want to use foldr
.
evalPoly coeffs x = foldr (\c -> \s -> s * x + c) 0 coeffs
Now our function takes two parameters: c
will be the
current coefficient (starting from the final one) while
s
will be the computed value from the higher-order
coefficients. Following traces how it works with our example.
2 | 4 | 0 | 1 | |||||
13 ⋅ 3 + 2 = 41 | ← | 3 ⋅ 3 + 4 = 13 | ← | 1 ⋅ 3 + 0 = 3 | ← | 0 ⋅ 3 + 1 = 1 | ← | 0 |
lastPrime
functionConsider writing a function for finding the final prime in a list of
numbers. The obvious way to do this is to filter out all the primes using
filter
and our already-written isPrime
function, and
then to use last
to retrieve the final value in the list.
lastPrime = last . filter isPrime
The downside to this is that ends up computing the primality of all numbers in the list. Admittedly, we can avoid this by reversing the list and relying on the fact that Haskell uses lazy evaluation.
lastPrime = first . filter isPrime . reverse
But a different way of doing this, without relying on lazy
evaluation, is to use foldr
.
We'll pass 0 starting from the end of the list,
and it will be passed up the list until we reach the last prime.
The number passed after that is the prime,
and so there are no further primality tests
(since r
is no longer 0).
lastPrime lst = foldr (\n -> \r -> if r == 0 && isPrime n then n else r) 0 lst
Because list processing is so common, Haskell provides a
special syntax for combining operations called a list
comprehension. It is based on the set-builder notation
commonly used in mathematics, where one might write
{ n ∈ N : n mod 3 = 1 }
to represent the set
{ 1, 4, 7, … }.
To illustrate the Haskell syntax, we'll use a list comprehension to
rewrite our positives
function,
which was to return a list of the positive values in the
parameter list.
positives lst = [ x | x <- lst, x > 0 ]
The list comprehension consists of two parts separated by a
vertical bar. After the vertical bar are
comma-separated clauses, each of which either fits the form
“symbol <-
listExpression”
or is a Boolean expression.
A “symbol <-
listExpression”
clause says that the remainder of the comprehension should be
evaluated with symbol taking on each value from the list.
A Boolean expression clause filters out only those values for
which the expression is true.
For each symbol value that satisfies the Boolean clauses,
the expression before the vertical bar is evaluated,
with its value included in the constructed list.
As a more complex example, the following expression computes all products of two distinct two-digit primes.
[ x * y | x <- [10..99], isPrime x, y <- [(x + 1)..99], isPrime y ]
In this case, we have two symbols varying over different lists.
We iterate x
over all two-digit integers, testing
whether it is prime; for each x
that satisfies that
test we iterate y
over all two-digit integers beyond
x
, and for those that are prime, we include x * y
in
our resulting list.
The message to take from this document is this:
Quite often, whenever you want to do something with functional programming,
the obvious way to do the problem might be to use recursion, but it often
pays to consider doing it instead with the powerful set of
built-in list functions.
That's true even with functions like isPrime
, where list
functions initially seem not to apply.
Lists are one of the most powerful tools at your disposal in
programming using a functional language.