Test 3 Review A: Questions

R3a.1.

For each of the following languages, identify which category best describes it:

1.procedural
2.object-oriented
3.functional
4.logical
a.Fortran (1957)
b.Lisp (1959)
c.C (1971)
d.Prolog (1973)
e.ML (1978)
f.Smalltalk (1980)
g.Ada (1983)
h.Python (1991)
i.Java (1994)
j.JavaScript (1997)
R3a.2.

In what language is the following written?

MOVE 1 TO FACT
PERFORM VARYING I FROM 1 BY 1 UNTIL I = N
  MULTIPLY I BY FACT
END-PERFORM
R3a.3.

In what language is the following written?

factorial
    ^ (self = 1 ifTrue: [1] ifFalse: [self * (self - 1) factorial])
R3a.4.

Consider the context-free grammar X −> a X | b a X | c. Suppose we have the TokenStream monad defined in class with a takeToken function of type TokenStream String, with the result being the string found as the next token or an empty string if the input has no more tokens. Write a function parseX of type TokenStream Bool, whose result is True whenever the token stream matches the grammar.

R3a.5.

Consider the following context-free grammar, which includes expressions such as “2+-1+1”.

X −> N + X | N
N −> 1 | 2 |  N
Suppose we have the TokenStream monad defined in class with a takeToken function of type TokenStream String, with the result being the string found as the next token or an empty string if the input has no more tokens, and a syntaxError function of type TokenStream Int, which displays an error message and terminates the program. Write a function evalX of type TokenStream Int, whose result is the arithmetic value of the expression or 0 if the expression is invalid.

R3a.6.

Suppose that gcc sees the following while loop.

while (r0 < r1) {
    r2 += r0;
    r0++;
}

In compiling this code for the ARM instruction set, gcc will choose version (b.) of the following two alternatives, even though it is less intuitive and longer (six versus five instructions).

a.loop    CMP R0R1
        BGE done
        ADD R2R2R0
        ADD R0R0#1
        B loop
done
    b.        CMP R0R1
        BGE done
loop    ADD R2R2R0
        ADD R0R0#1
        CMP R0R1
        BLT loop
done

Explain why the compiler prefers (b.).

R3a.7.

Given the C loop “for (r0 = 1r0 <= 3r0++) r5 *= r0;”, a straightforward compilation is given at left below. But some compilers will instead generate the version at right.

    MOV R0, #1
top MUL R5, R5, R0
    ADD R0, R0, #1
    CMP R0, #3
    BLE top
    MOV R0, #1
    MUL R5, R5, R0
    ADD R0, R0, #1
    MUL R5, R5, R0
    ADD R0, R0, #1
    MUL R5, R5, R0
    ADD R0, R0, #1

What is the name of the compiler optimization being applied to arrive at the code listed on the right? And why is it potentially more efficient (though it is longer)?

R3a.8.

Give an example where the optimization of common subexpression elimination applies, and explain how it applies to your example. You can write your example in C/Java, or you can write it in ARM assembly.

R3a.9.

Explain the optimization technique of strength reduction. In what situations does it apply? How does a compiler transform code using the technique? Feel free to give an example before and after the optimization; you might write the example in C or in ARM assembly.

R3a.10.
Consider the C fragment and corresponding ARM assembly code below.

do {
    r0 += 2 * r1;
    r0 -= 2 * r2;
    r2++;
while (r2 < r1);
up  MOV R3#2      ; r0 += 2 * r1;
    MUL R3R3R1
    ADD R0R0R3

    MOV R3#2      ; r0 -= 2 * r2;
    MUL R3R3R2
    SUB R0R0R3

    ADD R2R2#1  ; r2++;

    CMP R2R1      ; if (r2 < r1) goto up;
    BLT up

For each of the below alternative assembly translations, identify which of the following optimization techniques it best represents.

A. peephole optimization
B. common subexpression elimination
C. strength reduction
1.
    MOV R4#2
    MUL R4R4R2

up  ADD R0R0R4

    MOV R3#2
    MUL R3R3R2
    SUB R0R0R3

    ADD R2R2#1

    CMP R2R1
    BLT up
2.
    MOV R4#2
    MUL R4R4R3

up  MOV R3#2
    MUL R3R3R1
    ADD R0R0R3

    SUB R0R0R4

    ADD R2R2#1
    ADD R4R4#2

    CMP R2R1
    BLT up
3.
up  ADD R0R0R1, LSL #1

    SUB R0R0R2, LSL #1

    ADD R2R2#1

    CMP R2R1
    BLT up
R3a.11.
Consider the following C code with its Intel translation at right. The assembly translation uses R0 for i, R1 for j, and R2 for n.

for (int i = 0i < ni++)
    j += 2 * i + 1;
      MOV R0#0
again CMP R0R2
      BGE done

      MOV R3#2
      MUL R3R3R0
      ADD R3R3#1
      ADD R1R1R3

      ADD R0R0#1
      B again
done

For each of the following, select which of the following optimization techniques is being applied.

a. peephole optimization
b. common subexpression elimination
c. strength reduction
d. loop unrolling
1.
      MOV R0#0
again CMP R0R2
      BGE done

      MOV R3#2
      MUL R3R3R0
      ADD R3R3#1
      ADD R1R1R3

      ADD R0R0#1
      CMP R0R2
      BGE done

      MOV R3#2
      MUL R3R3R0
      ADD R3R3#1
      ADD R1R1R3

      ADD R0R0#1
      B again
done
2.
      MOV R0#0
again CMP R0R2
      BGE done

      ADD R1R1R0, LSL #1
      ADD R1R1#1

      ADD R0R0#1
      B again
done
3.
      MOV R0#0
      MOV R4#1
again CMP R0R2
      BGE done

      ADD R1R1R4

      ADD R0R0#1
      ADD R4R4#2
      B again
done
R3a.12.
Below is some C code with an assembly translation. The assembly code uses R0 for holding i, R1 for holding n, and R2 for holding a.

for (i = 0i < ni++) {
    a[i] = 2 * n - 2 * i + 1;
}
      MOV R0#0
      CMP R0R1
      BGE done
again MOV R3R1, LSL #1
      SUB R3R3R0, LSL #1
      ADD R3R3#1
      STR R3, [R2], #4

      ADD R0R0#1
      CMP R0R1
      BLT again
done

Rewrite the assembly code below to illustrate the following two optimization techniques.

a. Common subexpression elimination
b. Strength reduction

R3a.13.
Consider the following two programs, using a function named findGoldbach() defined in a separate file.

a. b.
int main() {
    int n;

    scanf("%d", &n);
    if (findGoldbach(n) != 0) {
        printf("%d %d\n"findGoldbach(n),
            n - findGoldbach(n));
    } else {
        printf("no pair found\n");
    }
    return 0;
}
int main() {
    int np;

    scanf("%d", &n);
    p = findGoldbach(n);
    if (p != 0) {
        printf("%d %d\n"pn - p);
    } else {
        printf("no pair found\n");
    }
    return 0;
}

Both alternatives work the same. But version (a.) is considerably slower, since findGoldbach() is quite slow, and (b.) uses findGoldbach() only once, whereas (a.) may call it as many as three times.

Given (a.), gcc will not choose to optimize it by transforming it to (b.). despite the fact that (b.) works identically and is much faster. Explain why it does not perform this optimization.

R3a.14.

Explain why a good optimizer will not convert the code on the left to the code on the right (using common subexpression elimination), even though the code on the right is more efficient.

a. b.
int myst(int *xint *y) {
    for (int i = 0i < 20i++) {
        x[i] = y[10] + 1;
    }
}
int myst(int *xint *y) {
    int val = y[10] + 1;
    for (int i = 0i < 20i++) {
        x[i] = val;
    }
}
R3a.15.

In ARM assembly language, each arithmetic instruction changes the first-listed register to a value computed based on the following one or two arguments.

MOV R0#5
ADD R1R0R0
MUL R2R1R0
ORR R3R2R1
SUB R4R3R0
AND R1R3R4
ADD R4R4R1

a. Draw an interference graph indicating conflicts between registers.

b. From this graph, describe how the code could be rewritten using only three registers.

R3a.16.

Given the below Java class definition, describe how a compiler will represent each individual object of the class in memory and how it will translate each method into a subroutine.

class Microwave {
    int serialNumber;
    double capacity;

    Microwave(int iddouble cap) { serialNumber = idcapacity = cap; }

    boolean canHold(double size) { return size <= this.capacity; }
}
R3a.17.

JavaScript programs create objects without designating them as members of a particular class. Nonetheless, some JavaScript interpreters go to quite a bit of effort to “discover” which class a variable belongs to through the notion of hidden classes. How does spending this effort at runtime to discover a class help with improving runtime performance?

R3a.18.

Explain the notion of just-in-time compilation as it relates to JavaScript interpreters.

R3a.19.

In Prolog, suppose we have a child predicate defined as described in class (e.g., child(harry, diana) since Harry is Diana's son), assuming that each person is the child of two parents. Define a Prolog predicate halfsib(A, B) that applies when A is a half-sibling of B (i.e., they share one parent but have other parents that are distinct).

R3a.20.

In Prolog, write a predicate fib(K, N) that applies when N is the Kth Fibonacci number.

R3a.21.

In Prolog, write a predicate repeat(K, X, L) that applies when L is a list consisting of K repetitions of X.

R3a.22.

In Prolog, write a predicate reverse(L, R) that applies when the lists L and R are reversed images of each other. For example, reverse([a,b,c], [c,b,a]) would apply. Feel free to use the append predicate we saw in class.

R3a.23.

In Prolog, write a predicate insert(X, L, M) that applies when M is the same as L except with X inserted between two adjacent elements A and B so that A ≤ X ≤ B. (You can assume that two such elements exist — that is, that X is between the L's minimum and maximum.) For example, insert(2, [1,3,4], M) should apply only when M is [1,2,3,4]. Feel free to use the append predicate we saw in class.

R3a.24.

As we saw, shared-memory concurrency systems tend to be more difficult to use than message-passing systems. What advantage does a shared-memory system have over message-passing systems?

Test 3 Review A: Solutions

R3a.1.
a.Fortran (1957)procedural
b.Lisp (1959)functional
c.C (1971)procedural
d.Prolog (1973)logical
e.ML (1978)functional
f.Smalltalk (1980)object-oriented
g.Ada (1983)procedural
h.Python (1991)procedural
i.Java (1994)object-oriented
j.JavaScript (1997)procedural
R3a.2.

COBOL

R3a.3.

Smalltalk

R3a.4.
parseX =
    do  t0 <- takeToken
        case t0 of
         "a" -> parseX
         "b" ->
            do  t1 <- takeToken
                if t1 == "a" then parseX else return false
         "c" ->
            do  t1 <- takeToken
                return t1 == ""
R3a.5.
evalX :: TokenStream Int
evalX =
    do  n0 <- evalN
        next <- takeToken
        case next of
         "+" ->
            do  n1 <- evalX
                return (n0 + n1)
         "" -> return n0
         _ -> syntaxError

evalN :: TokenStream Int
evalN =
    do  t0 <- takeToken
        case t0 of
         case "-" ->
            do  val <- evalN
                return (-val)
         case "1" -> return 1
         case "2" -> return 2
         _ -> syntaxError
R3a.6.

Even though it uses more instructions, (b.) has fewer instructions per iteration of the loop. (All the instructions are the same, except that there is only one jump instruction (BLT) instead of two (BGE and B).) Having fewer instructions per iteration of the loop means that the total number of instructions executed will be less, even if there are actually more overall instructions in the actual code.

R3a.7.

This is loop unrolling. It is more efficient because it executes fewer instructions overall: The first version does a comparison and branch instruction for each value of R0, whereas the second version does not.

R3a.8.
for (int i = 0i < 100i++) {
    System.out.println(10 * n + i);
}

This loop has a common subexpression of 10 * n. We can save time by computing 10 * n once before we enter the loop, and reusing this result each iteration.

int ten_times_n = 10 * n;
for (int i = 0i < 100i++) {
    System.out.println(ten_times_n + i);
}
R3a.9.
Strength reduction applies when the code contains a loop, with a counter increasing by a fixed amount each time through the loop, where the loop's body includes multiplying the counter by a value. In this case, the compiler can introduce a new variable, and add the multiplication factor to the variable each time the the counter increments.

Without strength reduction With strength reduction
i = 0;
total = 0;
while (i < n) {
    total += 5 * i;
    i++;
}
   i = 0;
i5 = 0;
total = 0;
while (i < n) {
    total += i5;
    i++;
    i5 += 5;
}
R3a.10.
1. B. common subexpression elimination
2. C. strength reduction
3. A. peephole optimization
R3a.11.
1. D. loop unrolling
2. A. peephole optimization
3. C. strength reduction
R3a.12.
a. Subexpression elimination
      MOV R4R1, LSL #1
      ADD R4R4#1

      CMP R0R1
      BGE done
again SUB R3R4R0, LSL #1
      STR R3, [R2], #4

      ADD R0R0#1
      CMP R0R1
      BLT again
done
b. Strength reduction
      MOV R4#1
      MOV R0#0
      CMP R0R1
      BGE done
again MOV R3R1, LSL #1
      ADD R3R3R4
      STR R3, [R2], #4

      ADD R0R0#1
      SUB R4R4#2
      CMP R0R1
      BLT again
done
R3a.13.

Though they may in fact work identically, gcc will not be able to determine this, because it compiles files (and even separate functions within a file) separately. Thus, it will not look into the contents of findGoldbach() while it compiles main(), and so it must assume that findGoldbach() could have some effect other than simply computing a value. (It may print something to the screen, for example.) If so, then this effect would occur three times in (a.), but it would occur only once in (b.), and so the two would not be equivalent.

(Incidentally, a good programmer, who understands that they are in fact equivalent, would write (b.) rather than (a.) precisely because compilers won't be able to perform the transformation themselves.)

R3a.14.

It could be that the arrays x and y overlap in some way. For example, if x and y reference the same memory address, then the left version would change x[11] and following values in x to be the original value of y[10] + 2, whereas the right version would change it to be y[10] + 1.

R3a.15.
a.
a.

One possible coloring is to use R0 for R0 and R4 in the original program; we'd use R1 for the first R1 and R3 in the original program; and we'd use R2 for R2 and the second R1 in the original program.

MOV R0#5
ADD R1R0R0
MUL R2R1R0
ORR R1R2R1
SUB R0R1R0
AND R2R1R0
ADD R0R0R2
R3a.16.

The compiler will decide on a fixed offset for each instance variable; for instance, it may decide that each object will require 12 bytes, with capacity stored in the first 8 bytes while serialNumber is always stored in the next 4 bytes.

In compiling the canHold method, it will create a subroutine taking the address of a Microwave object (this) as its first parameter and the size as its second parameter; whenever the method accesses an instance variable, it will retrieve this information from memory at the corresponding fixed offset from the this address.

R3a.17.

A class can describe how each individual object is laid out in memory. Thus, rather than have each object be a hash table mapping names to values, the interpreter can lay the object out as a struct with fields at fixed offsets. Since hash tables take significantly more memory than a struct, we can save a lot of memory by having just the one hash table as part of the hidden class shared by many objects.

Also, it means that we can conceivably improved speed: In a loop iterating through several objects accessing the same field or method, the interpreter can determine the offset of the information for each object just once and then quickly retrieve the information at that offset for every object in the loop.

R3a.18.

As the interpreter executes the program using its in-memory representation, it discovers which fragments of code are used most heavily, and it spends time translating these fragments to optimized machine language so that subsequent attempts to execute a fragment can be executed directly in machine language rather than through tree traversal.

R3a.19.
halfsib(A, B) :- child(A, X), child(B, X), child(A, Y), child(B, Z), Y \= Z.
R3a.20.
fib(0, 0).
fib(1, 1).
fib(K, N) :- I is K - 2, fib(I, L), J is K - 1, fib(J, M), N is L + M.
R3a.21.
repeat(0, X, []).
repeat(K, X, [X | T]) :- J is K - 1, repeat(J, X, T).
R3a.22.
reverse([], [])
reverse([H | T], R) :- reverse(T, Q), append(Q, [H], R).
R3a.23.
insert(X, L, M) :- append(H, [A, B | T], M), A <= X, X <= B, append(H, [A, X, B | T], M)
R3a.24.

For systems with several integrated cores accessing the same RAM, shared-memory systems can communicate very quickly by simply accessing memory locations that have been changed by other threads. There is no need to copy information from one thread to another.