Session 35: Lazy language compilation

Tail recursion

A tail-recursive call is a recursive call to a subroutine that is the last thing performed by the function. In this case, the value returned by the recursive call must be value returned by the function. The following function's recursive call is tail-recursive.

sum 0 s = if n == 0 then s else sum (n - 1) (n + s)
In the else case, the value computed by the recursive call sum (n - 1) (n + s) is the value returned by the function. Thus, this is a tail-recursive call. The following is not tail-recursive.
sum n = if n == 0 then 0 else n + sum (n - 1)
Here, there is a recursive call, but the value returned by it is added to another value; thus, this recursive call is not tail-recursive.

Tail-recursive calls are important because they can be optimized away into a jump: Instead of making a recursive call, the code can simply replace the locations holding parameter values with the new parameter values and then jump back to the beginning of the function. This is significant because recursive calls can incur a fair amount of overhead in growing the stack.

Tail recursion is very easy for compilers to remove, and recursive functions are often tail-recursive; thus, any good compiler will handle it. Programmers who know this often intentionally write tail-recursive functions, and so it this optimization is even more effective.

There is no reason that tail recursion cannot also be optimized away in imperative programs also. Optimizing tail recursion is less important then (since programmers tend to use loops instead), but optimizing compilers still try to do it because of its relative simplicity.

Function inlining

Function inlining involves substituting the body of a function in place of making a function call. This saves a function call, and it can possible enable further optimizations to take place.

Like the elimination of tail-recursive calls, function inlining can also be done for imperative languages, but it is more important for functional languages, which tend to use a larger number of small functions. It's particularly important when it can eliminate the need for building additional functions.

Consider the following example.

qu n = let sq = \n -> n * n
	   in (sq . sq) n
(This uses the function composition operator . included in the standard prelude.) A naive implementation of this would involve using the function composition operator to build a new function, but in fact function inlining could be use to reduce this to the following, equivalent formulation.
qu n = let m = n * n
	   in m * m
(We could go ahead and reduce this to
qu n = (n * n) * (n * n)
However, this is more inefficient, since it involves evaluating n four times.)

Strictness analysis

In lazy languages, the overhead of dealing with thunks can be a prohibitive expense. This is an expense of lazy languages over eager languages, although lazy languages have other benefits (such as the possibility for infinite lists, and a mathematical guarantee that it will terminate if there is any way for a program to terminate).

The idea is to selectively evaluate some arguments before passing them into functions. In an imperative language, rearranging computation is dangerous territory, because computation can produce side effects, such as changing a global variable. But in a purely functional language, where side effects are prohibited by the language's structure, rearranging computation does not present a problem.

That is, except for one side effect that is important: The consideration of the amount of time a program takes. Suppose you had the following program.

cond b x y = if b then x else y
Suppose we call cond True 1 (fact (-1)). It would not be valid to evaluate fact (-1) before passing its value into y. The problem is that this value will never be needed, but any attempt to compute it will result in infinite recursion.

So what lazy language systems do is perform strictness analysis, in which the system determines which parameters to a function are necessarily computed. In the cond function above, b is always computed, so it does no harm to compute it before calling the function. But the values of x and y may or may not be necessary, so the language system can't evaluate it beforehand.

If the system can determine that a parameter is strictly used, then it can compile the program so that the function can assume the value is computed (without checking to see whether it's a thunk). The result is that the thunk overhead can be avoided in many cases.

Note that strictness analysis is impossible to do reliably, as that would imply a solution to the Halting Problem. But language systems can identify a large number of cases pretty easily, and so it proves to be a useful optimization for lazy-language compilers to perform.

Invariant hoisting

Consider the following function.

mult i j = j * longComp i

mult20 = mult 20
You might think that the interpreter could determine the result of longComp 20 when mult20 is defined, to save time. This would save considerable time in the future when mult20 is called. In fact, however, this would be an invalid optimization, since the program may never use mult20, and so if longComp 20 never terminated, the program would be wrong.

It would, however, be fine for the computer to remember longComp 20 once mult20 is used. This could save a considerable amount of time. (Note that Hugs does not perform this optimization.)

Deforestation

Deforestation refers to the removal of intermediate data structure computation. Consider the following function definition for determining whether a number is prime.

isPrime n = and (map (\j -> n `mod` j /= 0) [2..(n-1)])
Given the number 9, this would pass the list [2,3,4,5,6,7,8] to the function map, along with the function \j->n `mod` j /= 0. The map function returns the list of values its given function returns on each of the values in the list; in this case, it would return [True, False, True, True, True, True, True, True]. This is passed to the and function, which returns the Boolean AND of all the Booleans in the list it is given; in this case, the second element is False, and so and would return False.

Although this is admirably brief, it is a wasteful way of writing the function. A naive translation would waste both memory and time in constructing the list of numbers and then the list of Boolean values. However, a good compiler could deforest the computation by interleaving it so the computation occurs as in the efficient, non-wasteful equivalent formulation.

isPrime n =
  let
    it i | i == n         = True
         | n `mod` i /= 0 = False
         | True           = it (i + 1)
  in isPrime 2
This is a pretty radical change from the original, but it's the same computationally.

SKI combinators

The final optimization is performed in most current lazy-execution systems, in which each function is compiled into SKI form. This is motivated by the problem of passing thunks into a program: A thunk is meant to be evaluated in the context of particular set of identifier-value bindings, but the ``current'' bindings at the time the evaluation actually takes place is very different. (This is related to the issue of dynamic vs. static scoping.)

In lambda expressions, bindings take place when arguments are passed into functions, so the idea of SKI form is to eliminate all arguments from an expression, and this entails removing the lambdas. Since lambdas are essential to lambda expressions, this seems like a tall order. We accomplish the task using three pre-defined three functions, S, K, and I.

S f g x = (f x) (g x)
K c x   = c
I x     = x
Now, suppose we have a lambda expression in which the only lambda is the outermost occurrence.
\x.+ x 1
We'll eliminate the lambda by defining an operator [x] and pushing it through the function's return-value expression using the following rules.
[x] (e1 e2)      => S ([x] e1) ([x] e2)
[x] y            => K y          assuming y /= x
[x] x            => x
For example:
[x] ((+ x) 1) => S ([x] (+ x)) ([x] 1)
              => S (S ([x] +) ([x] x)) ([x] 1)
              => S (S (K +) ([x] x)) ([x] 1)
              => S (S (K +) I) ([x] 1)
              => S (S (K +) I) (K 1)
The idea is that this is equivalent to our original lambda expression. We can illustrate this using an example: Suppose we apply the original lambda expression to 5.
(\x.+ x 1) 5 = + 5 1 = 6
And suppose we use our SKI-form expression instead.
S (S (K +) I) (K 1) 5 = (S (K +) I 5) (K 1 5)   def'n of S
                      = ((K + 5) (I 5)) (K 1 5) def'n of S
                      = (   +    (I 5)) (K 1 5) def'n of K
                      = (   +      5  ) (K 1 5) def'n of I
                      =     +      5       1    def'n of K
                      = 6

Though SKI-form expressions are difficult for humans to read, they're convenient for computers to manipulate. There is a particular way of reducing SKI-form expressions called graph rewriting that works quite efficiently.

SKI-form reductions is prominent enough that most serious lazy-language systems use it for their systems. Using this technique, coupled with the techniques described above, Haskell systems turn out in practice to be quite efficient. In fact, the purely function environment leads to many new optimizations that can't take place in imperative systems. Although comparisons with other systems are difficult to make completely, Haskell compilers' performance seems comparable to Java compilers' performance.

The Great Computer Language Shootout

Additional notes and SKI reduction from a different college