Test 3 Review A: Questions

R3a.1.

For each of the following languages, identify which category best describes it:

1. procedural

2. object-oriented

3. functional

4. logical

a.	Fortran (1957)
b.	Lisp (1959)
c.	C (1971)
d.	Prolog (1973)
e.	ML (1978)
f.	Smalltalk (1980)
g.	Ada (1983)
h.	Python (1991)
i.	Java (1994)
j.	JavaScript (1997)

R3a.2.

In what language is the following written?

MOVE 1 TO FACT PERFORM VARYING I FROM 1 BY 1 UNTIL I = N MULTIPLY I BY FACT END-PERFORM

R3a.3.

In what language is the following written?

factorial ^ (self = 1 ifTrue: [1] ifFalse: [self * (self - 1) factorial])

R3a.4.

Consider the context-free grammar X −> a X | b a X | c. Suppose we have the TokenStream monad defined in class with a takeToken function of type TokenStream String, with the result being the string found as the next token or an empty string if the input has no more tokens. Write a function parseX of type TokenStream Bool, whose result is True whenever the token stream matches the grammar.

R3a.5.

Consider the following context-free grammar, which includes expressions such as “2+-1+1”.

X −> N + X | N
N −> 1 | 2 | − N

Suppose we have the TokenStream monad defined in class with a takeToken function of type TokenStream String, with the result being the string found as the next token or an empty string if the input has no more tokens, and a syntaxError function of type TokenStream Int, which displays an error message and terminates the program. Write a function evalX of type TokenStream Int, whose result is the arithmetic value of the expression or 0 if the expression is invalid.

R3a.6.

Suppose that gcc sees the following while loop.

while (r0 < r1) { r2 += r0; r0++; }

In compiling this code for the ARM instruction set, gcc will choose version (b.) of the following two alternatives, even though it is less intuitive and longer (six versus five instructions).

a. loop CMP R0, R1 BGE done ADD R2, R2, R0 ADD R0, R0, #1 B loop done b. CMP R0, R1 BGE done loop ADD R2, R2, R0 ADD R0, R0, #1 CMP R0, R1 BLT loop done

Explain why the compiler prefers (b.).

R3a.7.

Given the C loop “for (r0 = 1; r0 <= 3; r0++) r5 *= r0;”, a straightforward compilation is given at left below. But some compilers will instead generate the version at right.

    MOV R0, #1
top MUL R5, R5, R0
    ADD R0, R0, #1
    CMP R0, #3
    BLE top

    MOV R0, #1
    MUL R5, R5, R0
    ADD R0, R0, #1
    MUL R5, R5, R0
    ADD R0, R0, #1
    MUL R5, R5, R0
    ADD R0, R0, #1

What is the name of the compiler optimization being applied to arrive at the code listed on the right? And why is it potentially more efficient (though it is longer)?

R3a.8.

Give an example where the optimization of common subexpression elimination applies, and explain how it applies to your example. You can write your example in C/Java, or you can write it in ARM assembly.

R3a.9.

Explain the optimization technique of strength reduction. In what situations does it apply? How does a compiler transform code using the technique? Feel free to give an example before and after the optimization; you might write the example in C or in ARM assembly.

R3a.10.

Consider the C fragment and corresponding ARM assembly code below.

do { r0 += 2 * r1; r0 -= 2 * r2; r2++; } while (r2 < r1); up MOV R3, #2 ; r0 += 2 * r1; MUL R3, R3, R1 ADD R0, R0, R3 MOV R3, #2 ; r0 -= 2 * r2; MUL R3, R3, R2 SUB R0, R0, R3 ADD R2, R2, #1 ; r2++; CMP R2, R1 ; if (r2 < r1) goto up; BLT up

For each of the below alternative assembly translations, identify which of the following optimization techniques it best represents.

A. peephole optimization
B. common subexpression elimination
C. strength reduction

1.
MOV R4, #2 MUL R4, R4, R2 up ADD R0, R0, R4 MOV R3, #2 MUL R3, R3, R2 SUB R0, R0, R3 ADD R2, R2, #1 CMP R2, R1 BLT up 2.
MOV R4, #2 MUL R4, R4, R3 up MOV R3, #2 MUL R3, R3, R1 ADD R0, R0, R3 SUB R0, R0, R4 ADD R2, R2, #1 ADD R4, R4, #2 CMP R2, R1 BLT up 3.
up ADD R0, R0, R1, LSL #1 SUB R0, R0, R2, LSL #1 ADD R2, R2, #1 CMP R2, R1 BLT up

R3a.11.

Consider the following C code with its Intel translation at right. The assembly translation uses R0 for i, R1 for j, and R2 for n.

for (int i = 0; i < n; i++) j += 2 * i + 1; MOV R0, #0 again CMP R0, R2 BGE done MOV R3, #2 MUL R3, R3, R0 ADD R3, R3, #1 ADD R1, R1, R3 ADD R0, R0, #1 B again done

For each of the following, select which of the following optimization techniques is being applied.

a. peephole optimization
b. common subexpression elimination
c. strength reduction
d. loop unrolling

1.
MOV R0, #0 again CMP R0, R2 BGE done MOV R3, #2 MUL R3, R3, R0 ADD R3, R3, #1 ADD R1, R1, R3 ADD R0, R0, #1 CMP R0, R2 BGE done MOV R3, #2 MUL R3, R3, R0 ADD R3, R3, #1 ADD R1, R1, R3 ADD R0, R0, #1 B again done 2.
MOV R0, #0 again CMP R0, R2 BGE done ADD R1, R1, R0, LSL #1 ADD R1, R1, #1 ADD R0, R0, #1 B again done 3.
MOV R0, #0 MOV R4, #1 again CMP R0, R2 BGE done ADD R1, R1, R4 ADD R0, R0, #1 ADD R4, R4, #2 B again done

R3a.12.

Below is some C code with an assembly translation. The assembly code uses R0 for holding i, R1 for holding n, and R2 for holding a.

for (i = 0; i < n; i++) { a[i] = 2 * n - 2 * i + 1; } MOV R0, #0 CMP R0, R1 BGE done again MOV R3, R1, LSL #1 SUB R3, R3, R0, LSL #1 ADD R3, R3, #1 STR R3, [R2], #4 ADD R0, R0, #1 CMP R0, R1 BLT again done

Rewrite the assembly code below to illustrate the following two optimization techniques.

a. Common subexpression elimination
b. Strength reduction

R3a.13.

Consider the following two programs, using a function named findGoldbach() defined in a separate file.

a.	b.
`int main() { int n; scanf("%d", &n); if (findGoldbach(n) != 0) { printf("%d %d\n", findGoldbach(n), n - findGoldbach(n)); } else { printf("no pair found\n"); } return 0; }`	`int main() { int n, p; scanf("%d", &n); p = findGoldbach(n); if (p != 0) { printf("%d %d\n", p, n - p); } else { printf("no pair found\n"); } return 0; }`

Both alternatives work the same. But version (a.) is considerably slower, since findGoldbach() is quite slow, and (b.) uses findGoldbach() only once, whereas (a.) may call it as many as three times.

Given (a.), gcc will not choose to optimize it by transforming it to (b.). despite the fact that (b.) works identically and is much faster. Explain why it does not perform this optimization.

R3a.14.

Explain why a good optimizer will not convert the code on the left to the code on the right (using common subexpression elimination), even though the code on the right is more efficient.

a. b.
int myst(int *x, int *y) { for (int i = 0; i < 20; i++) { x[i] = y[10] + 1; } } int myst(int *x, int *y) { int val = y[10] + 1; for (int i = 0; i < 20; i++) { x[i] = val; } }

a.	b.
`int myst(int x, int y) { for (int i = 0; i < 20; i++) { x[i] = y[10] + 1; } }`	`int myst(int x, int y) { int val = y[10] + 1; for (int i = 0; i < 20; i++) { x[i] = val; } }`

R3a.15.

In ARM assembly language, each arithmetic instruction changes the first-listed register to a value computed based on the following one or two arguments.

MOV R0, #5

ADD R1, R0, R0

MUL R2, R1, R0

ORR R3, R2, R1

SUB R4, R3, R0

AND R1, R3, R4

ADD R4, R4, R1

a. Draw an interference graph indicating conflicts between registers.

b. From this graph, describe how the code could be rewritten using only three registers.

R3a.16.

Given the below Java class definition, describe how a compiler will represent each individual object of the class in memory and how it will translate each method into a subroutine.

class Microwave {

    int serialNumber;

    double capacity;



    Microwave(int id, double cap) { serialNumber = id; capacity = cap; }



    boolean canHold(double size) { return size <= this.capacity; }

}

R3a.17.

JavaScript programs create objects without designating them as members of a particular class. Nonetheless, some JavaScript interpreters go to quite a bit of effort to “discover” which class a variable belongs to through the notion of hidden classes. How does spending this effort at runtime to discover a class help with improving runtime performance?

R3a.18.

Explain the notion of just-in-time compilation as it relates to JavaScript interpreters.

R3a.19.

In Prolog, suppose we have a child predicate defined as described in class (e.g., child(harry, diana) since Harry is Diana's son), assuming that each person is the child of two parents. Define a Prolog predicate halfsib(A, B) that applies when A is a half-sibling of B (i.e., they share one parent but have other parents that are distinct).

R3a.20.

In Prolog, write a predicate fib(K, N) that applies when N is the Kth Fibonacci number.

R3a.21.

In Prolog, write a predicate repeat(K, X, L) that applies when L is a list consisting of K repetitions of X.

R3a.22.

In Prolog, write a predicate reverse(L, R) that applies when the lists L and R are reversed images of each other. For example, reverse([a,b,c], [c,b,a]) would apply. Feel free to use the append predicate we saw in class.

R3a.23.

In Prolog, write a predicate insert(X, L, M) that applies when M is the same as L except with X inserted between two adjacent elements A and B so that A ≤ X ≤ B. (You can assume that two such elements exist — that is, that X is between the L's minimum and maximum.) For example, insert(2, [1,3,4], M) should apply only when M is [1,2,3,4]. Feel free to use the append predicate we saw in class.

R3a.24.

As we saw, shared-memory concurrency systems tend to be more difficult to use than message-passing systems. What advantage does a shared-memory system have over message-passing systems?

Test 3 Review A: Solutions

R3a.1.

a.	Fortran (1957)	procedural
b.	Lisp (1959)	functional
c.	C (1971)	procedural
d.	Prolog (1973)	logical
e.	ML (1978)	functional
f.	Smalltalk (1980)	object-oriented
g.	Ada (1983)	procedural
h.	Python (1991)	procedural
i.	Java (1994)	object-oriented
j.	JavaScript (1997)	procedural

R3a.2.

COBOL

R3a.3.

Smalltalk

R3a.4.

parseX =

    do  t0 <- takeToken

        case t0 of

         "a" -> parseX

         "b" ->

            do  t1 <- takeToken

                if t1 == "a" then parseX else return false

         "c" ->

            do  t1 <- takeToken

                return t1 == ""

R3a.5.

evalX :: TokenStream Int

evalX =

    do  n0 <- evalN

        next <- takeToken

        case next of

         "+" ->

            do  n1 <- evalX

                return (n0 + n1)

         "" -> return n0

         _ -> syntaxError



evalN :: TokenStream Int

evalN =

    do  t0 <- takeToken

        case t0 of

         case "-" ->

            do  val <- evalN

                return (-val)

         case "1" -> return 1

         case "2" -> return 2

         _ -> syntaxError

R3a.6.

Even though it uses more instructions, (b.) has fewer instructions per iteration of the loop. (All the instructions are the same, except that there is only one jump instruction (BLT) instead of two (BGE and B).) Having fewer instructions per iteration of the loop means that the total number of instructions executed will be less, even if there are actually more overall instructions in the actual code.

R3a.7.

This is loop unrolling. It is more efficient because it executes fewer instructions overall: The first version does a comparison and branch instruction for each value of R0, whereas the second version does not.

R3a.8.

for (int i = 0; i < 100; i++) { System.out.println(10 * n + i); }

This loop has a common subexpression of 10 * n. We can save time by computing 10 * n once before we enter the loop, and reusing this result each iteration.

int ten_times_n = 10 * n; for (int i = 0; i < 100; i++) { System.out.println(ten_times_n + i); }

R3a.9.

Strength reduction applies when the code contains a loop, with a counter increasing by a fixed amount each time through the loop, where the loop's body includes multiplying the counter by a value. In this case, the compiler can introduce a new variable, and add the multiplication factor to the variable each time the the counter increments.

Without strength reduction		With strength reduction
`i = 0; total = 0; while (i < n) { total += 5 * i; i++; }`		`i = 0; i5 = 0; total = 0; while (i < n) { total += i5; i++; i5 += 5; }`

R3a.10.

1. B. common subexpression elimination
2. C. strength reduction
3. A. peephole optimization

R3a.11.

1. D. loop unrolling
2. A. peephole optimization
3. C. strength reduction

R3a.12.

a. Subexpression elimination
MOV R4, R1, LSL #1 ADD R4, R4, #1 CMP R0, R1 BGE done again SUB R3, R4, R0, LSL #1 STR R3, [R2], #4 ADD R0, R0, #1 CMP R0, R1 BLT again done b. Strength reduction
MOV R4, #1 MOV R0, #0 CMP R0, R1 BGE done again MOV R3, R1, LSL #1 ADD R3, R3, R4 STR R3, [R2], #4 ADD R0, R0, #1 SUB R4, R4, #2 CMP R0, R1 BLT again done

R3a.13.

Though they may in fact work identically, gcc will not be able to determine this, because it compiles files (and even separate functions within a file) separately. Thus, it will not look into the contents of findGoldbach() while it compiles main(), and so it must assume that findGoldbach() could have some effect other than simply computing a value. (It may print something to the screen, for example.) If so, then this effect would occur three times in (a.), but it would occur only once in (b.), and so the two would not be equivalent.

(Incidentally, a good programmer, who understands that they are in fact equivalent, would write (b.) rather than (a.) precisely because compilers won't be able to perform the transformation themselves.)

R3a.14.

It could be that the arrays x and y overlap in some way. For example, if x and y reference the same memory address, then the left version would change x[11] and following values in x to be the original value of y[10] + 2, whereas the right version would change it to be y[10] + 1.

R3a.15.

a.
a.	One possible coloring is to use `R0` for `R0` and `R4` in the original program; we'd use `R1` for the first `R1` and `R3` in the original program; and we'd use `R2` for `R2` and the second `R1` in the original program. `MOV R0, #5 ADD R1, R0, R0 MUL R2, R1, R0 ORR R1, R2, R1 SUB R0, R1, R0 AND R2, R1, R0 ADD R0, R0, R2`

R3a.16.

The compiler will decide on a fixed offset for each instance variable; for instance, it may decide that each object will require 12 bytes, with capacity stored in the first 8 bytes while serialNumber is always stored in the next 4 bytes.

In compiling the canHold method, it will create a subroutine taking the address of a Microwave object (this) as its first parameter and the size as its second parameter; whenever the method accesses an instance variable, it will retrieve this information from memory at the corresponding fixed offset from the this address.

R3a.17.

A class can describe how each individual object is laid out in memory. Thus, rather than have each object be a hash table mapping names to values, the interpreter can lay the object out as a struct with fields at fixed offsets. Since hash tables take significantly more memory than a struct, we can save a lot of memory by having just the one hash table as part of the hidden class shared by many objects.

Also, it means that we can conceivably improved speed: In a loop iterating through several objects accessing the same field or method, the interpreter can determine the offset of the information for each object just once and then quickly retrieve the information at that offset for every object in the loop.

R3a.18.

As the interpreter executes the program using its in-memory representation, it discovers which fragments of code are used most heavily, and it spends time translating these fragments to optimized machine language so that subsequent attempts to execute a fragment can be executed directly in machine language rather than through tree traversal.

R3a.19.

halfsib(A, B) :- child(A, X), child(B, X), child(A, Y), child(B, Z), Y \= Z.

R3a.20.

fib(0, 0).
fib(1, 1).
fib(K, N) :- I is K - 2, fib(I, L), J is K - 1, fib(J, M), N is L + M.

R3a.21.

repeat(0, X, []).
repeat(K, X, [X | T]) :- J is K - 1, repeat(J, X, T).

R3a.22.

reverse([], [])
reverse([H | T], R) :- reverse(T, Q), append(Q, [H], R).

R3a.23.

insert(X, L, M) :- append(H, [A, B | T], M), A <= X, X <= B, append(H, [A, X, B | T], M)

R3a.24.

For systems with several integrated cores accessing the same RAM, shared-memory systems can communicate very quickly by simply accessing memory locations that have been changed by other threads. There is no need to copy information from one thread to another.