Test 3 Review A: Questions
For each of the following languages, identify which category best describes it:
1. procedural 2. object-oriented 3. functional 4. logical
a. | Fortran (1957) |
b. | Lisp (1959) |
c. | C (1971) |
d. | Prolog (1973) |
e. | ML (1978) |
f. | Smalltalk (1980) |
g. | Ada (1983) |
h. | Python (1991) |
i. | Java (1994) |
j. | JavaScript (1997) |
In what language is the following written?
MOVE 1 TO FACT
PERFORM VARYING I FROM 1 BY 1 UNTIL I = N
MULTIPLY I BY FACT
END-PERFORM
In what language is the following written?
factorial
^ (self = 1 ifTrue: [1] ifFalse: [self * (self - 1) factorial])
Consider the context-free grammar
X −> a X | b a X | c.
Suppose we have the TokenStream
monad defined in class
with a takeToken
function of type TokenStream String
,
with the result being the string found as the next token
or an empty string if the input has no more tokens.
Write a function parseX
of type TokenStream Bool
,
whose result is True
whenever the token stream matches the grammar.
Consider the following context-free grammar, which
includes expressions such as “2+-1+1
”.
X −> N + X | NSuppose we have the
N −> 1 | 2 | − N
TokenStream
monad defined in class
with a takeToken
function of type TokenStream String
,
with the result being the string found as the next token
or an empty string if the input has no more tokens,
and a syntaxError
function of type TokenStream Int
,
which displays an error message and terminates the program.
Write a function evalX
of type TokenStream Int
,
whose result is the arithmetic value of the expression or
0 if the expression is invalid.
Suppose that gcc sees the following while
loop.
while (r0 < r1) {
r2 += r0;
r0++;
}
In compiling this code for the ARM instruction set, gcc will choose version (b.) of the following two alternatives, even though it is less intuitive and longer (six versus five instructions).
a. | loop CMP R0, R1 | b. | CMP R0, R1 |
Explain why the compiler prefers (b.).
Given the C loop
“for (r0 = 1; r0 <= 3; r0++) r5 *= r0;
”,
a straightforward compilation is given at left below. But some
compilers will instead generate the version at right.
MOV R0, #1 top MUL R5, R5, R0 ADD R0, R0, #1 CMP R0, #3 BLE top MOV R0, #1 MUL R5, R5, R0 ADD R0, R0, #1 MUL R5, R5, R0 ADD R0, R0, #1 MUL R5, R5, R0 ADD R0, R0, #1
What is the name of the compiler optimization being applied to arrive at the code listed on the right? And why is it potentially more efficient (though it is longer)?
Give an example where the optimization of common subexpression elimination applies, and explain how it applies to your example. You can write your example in C/Java, or you can write it in ARM assembly.
Explain the optimization technique of strength reduction. In what situations does it apply? How does a compiler transform code using the technique? Feel free to give an example before and after the optimization; you might write the example in C or in ARM assembly.
do {
r0 += 2 * r1;
r0 -= 2 * r2;
r2++;
} while (r2 < r1);up MOV R3, #2 ; r0 += 2 * r1;
MUL R3, R3, R1
ADD R0, R0, R3
MOV R3, #2 ; r0 -= 2 * r2;
MUL R3, R3, R2
SUB R0, R0, R3
ADD R2, R2, #1 ; r2++;
CMP R2, R1 ; if (r2 < r1) goto up;
BLT up
For each of the below alternative assembly translations, identify which of the following optimization techniques it best represents.
A. peephole optimization
B. common subexpression elimination
C. strength reduction
1. MOV R4, #2 | 2. MOV R4, #2 | 3.up ADD R0, R0, R1, LSL #1 |
R0
for i
,
R1
for j
, and
R2
for n
.
for (int i = 0; i < n; i++)
j += 2 * i + 1;MOV R0, #0
again CMP R0, R2
BGE done
MOV R3, #2
MUL R3, R3, R0
ADD R3, R3, #1
ADD R1, R1, R3
ADD R0, R0, #1
B again
done
For each of the following, select which of the following optimization techniques is being applied.
a. peephole optimization
b. common subexpression elimination
c. strength reduction
d. loop unrolling
1. MOV R0, #0 | 2. MOV R0, #0 | 3. MOV R0, #0 |
R0
for holding i
,
R1
for holding n
,
and R2
for holding a
.
for (i = 0; i < n; i++) { | MOV R0, #0 |
Rewrite the assembly code below to illustrate the following two optimization techniques.
a. Common subexpression elimination
b. Strength reduction
findGoldbach()
defined in a separate file.
a. | b. |
int main() { | int main() { |
Both alternatives work the same. But version (a.) is considerably
slower, since findGoldbach()
is quite slow, and
(b.) uses findGoldbach()
only
once, whereas (a.) may call it as
many as three times.
Given (a.), gcc will not choose to optimize it by transforming it to (b.). despite the fact that (b.) works identically and is much faster. Explain why it does not perform this optimization.
Explain why a good optimizer will not convert the code on the left to the code on the right (using common subexpression elimination), even though the code on the right is more efficient.
a. b. int myst(int *x, int *y) {
for (int i = 0; i < 20; i++) {
x[i] = y[10] + 1;
}
}int myst(int *x, int *y) {
int val = y[10] + 1;
for (int i = 0; i < 20; i++) {
x[i] = val;
}
}
In ARM assembly language, each arithmetic instruction changes the first-listed register to a value computed based on the following one or two arguments.
MOV R0, #5
ADD R1, R0, R0
MUL R2, R1, R0
ORR R3, R2, R1
SUB R4, R3, R0
AND R1, R3, R4
ADD R4, R4, R1
a. Draw an interference graph indicating conflicts between registers.
b. From this graph, describe how the code could be rewritten using only three registers.
Given the below Java class definition, describe how a compiler will represent each individual object of the class in memory and how it will translate each method into a subroutine.
class Microwave {
int serialNumber;
double capacity;
Microwave(int id, double cap) { serialNumber = id; capacity = cap; }
boolean canHold(double size) { return size <= this.capacity; }
}
JavaScript programs create objects without designating them as members of a particular class. Nonetheless, some JavaScript interpreters go to quite a bit of effort to “discover” which class a variable belongs to through the notion of hidden classes. How does spending this effort at runtime to discover a class help with improving runtime performance?
Explain the notion of just-in-time compilation as it relates to JavaScript interpreters.
In Prolog, suppose we have a child
predicate defined as
described in class
(e.g., child(harry, diana)
since Harry is Diana's son),
assuming that each person is the child of two parents.
Define a Prolog predicate halfsib(A, B)
that applies when A is a half-sibling of B
(i.e., they share one parent but have other parents that are distinct).
In Prolog, write a predicate fib(K, N)
that applies when N is the Kth Fibonacci number.
In Prolog, write a predicate repeat(K, X, L)
that applies
when L is a list consisting of K repetitions of X.
In Prolog, write a predicate reverse(L, R)
that applies
when the lists L and R are reversed images of each other.
For example, reverse([a,b,c], [c,b,a])
would apply.
Feel free to use the append
predicate we saw in class.
In Prolog, write a predicate insert(X, L, M)
that applies when
M is the same as L except with X
inserted between two adjacent elements A and B
so that A ≤ X ≤ B. (You can assume that two such elements exist — that is, that X is between the L's minimum and maximum.)
For example, insert(2, [1,3,4], M)
should apply only
when M is [1,2,3,4].
Feel free to use the append
predicate we saw in class.
As we saw, shared-memory concurrency systems tend to be more difficult to use than message-passing systems. What advantage does a shared-memory system have over message-passing systems?
Test 3 Review A: Solutions
a. | Fortran (1957) | procedural |
b. | Lisp (1959) | functional |
c. | C (1971) | procedural |
d. | Prolog (1973) | logical |
e. | ML (1978) | functional |
f. | Smalltalk (1980) | object-oriented |
g. | Ada (1983) | procedural |
h. | Python (1991) | procedural |
i. | Java (1994) | object-oriented |
j. | JavaScript (1997) | procedural |
COBOL
Smalltalk
parseX =
do t0 <- takeToken
case t0 of
"a" -> parseX
"b" ->
do t1 <- takeToken
if t1 == "a" then parseX else return false
"c" ->
do t1 <- takeToken
return t1 == ""
evalX :: TokenStream Int
evalX =
do n0 <- evalN
next <- takeToken
case next of
"+" ->
do n1 <- evalX
return (n0 + n1)
"" -> return n0
_ -> syntaxError
evalN :: TokenStream Int
evalN =
do t0 <- takeToken
case t0 of
case "-" ->
do val <- evalN
return (-val)
case "1" -> return 1
case "2" -> return 2
_ -> syntaxError
Even though it uses more instructions, (b.) has
fewer instructions per iteration of the loop. (All the instructions are
the same, except that
there is only one jump instruction (BLT
) instead of two
(BGE
and B
).)
Having fewer instructions per iteration of the loop
means that the total number of instructions executed will be less,
even if there are actually more overall instructions in the
actual code.
This is loop unrolling. It is more efficient because it
executes fewer instructions overall: The first version does a
comparison and branch instruction for each value of R0
,
whereas the second version does not.
for (int i = 0; i < 100; i++) {
System.out.println(10 * n + i);
}
This loop has a common subexpression of 10 * n
.
We can save time by computing 10 * n
once before we enter the loop,
and reusing this result each iteration.
int ten_times_n = 10 * n;
for (int i = 0; i < 100; i++) {
System.out.println(ten_times_n + i);
}
Without strength reduction | With strength reduction | |
i = 0; | i = 0; |
2. C. strength reduction
3. A. peephole optimization
2. A. peephole optimization
3. C. strength reduction
a. Subexpression elimination MOV R4, R1, LSL #1 |
b. Strength reduction MOV R4, #1 |
Though they may in fact work identically, gcc will not be able
to determine this, because it compiles files (and even separate
functions within a file) separately. Thus, it will not look into
the contents of findGoldbach()
while it compiles
main()
, and so it must assume that
findGoldbach()
could have some effect other than simply
computing a value. (It may print something to the screen, for example.)
If so, then this effect would occur three times in (a.),
but it would occur only once in (b.),
and so the two would not be equivalent.
(Incidentally, a good programmer, who understands that they are in fact equivalent, would write (b.) rather than (a.) precisely because compilers won't be able to perform the transformation themselves.)
It could be that the arrays x
and y
overlap
in some way.
For example, if x
and y
reference the
same memory address, then the left version would
change x[11]
and following values in x
to be the
original value of y[10] + 2
,
whereas the right version would change it to be
y[10] + 1
.
a. | |
a. |
One possible coloring is to
use MOV R0, #5 |
The compiler will decide on a fixed offset for each instance
variable; for instance, it may decide that each object will
require 12 bytes, with capacity
stored in the first 8
bytes while serialNumber
is always stored in the next 4
bytes.
In compiling the canHold
method, it will create a
subroutine taking the address of a Microwave
object (this
)
as its first parameter and the size
as its second parameter;
whenever the method accesses an instance variable, it will
retrieve this information from memory at the corresponding fixed offset from
the this
address.
A class can describe how each individual object
is laid out in memory. Thus, rather than have each object be a
hash table mapping names to values, the interpreter can lay the
object out as a struct
with fields at fixed offsets.
Since hash tables take significantly more memory than a struct
,
we can save a lot of memory by having just the one hash table as
part of the hidden class shared by many objects.
Also, it means that we can conceivably improved speed: In a loop iterating through several objects accessing the same field or method, the interpreter can determine the offset of the information for each object just once and then quickly retrieve the information at that offset for every object in the loop.
As the interpreter executes the program using its in-memory representation, it discovers which fragments of code are used most heavily, and it spends time translating these fragments to optimized machine language so that subsequent attempts to execute a fragment can be executed directly in machine language rather than through tree traversal.
halfsib(A, B) :- child(A, X), child(B, X), child(A, Y), child(B, Z), Y \= Z.
fib(0, 0).
fib(1, 1).
fib(K, N) :- I is K - 2, fib(I, L), J is K - 1, fib(J, M), N is L + M.
repeat(0, X, []).
repeat(K, X, [X | T]) :- J is K - 1, repeat(J, X, T).
reverse([], [])
reverse([H | T], R) :- reverse(T, Q), append(Q, [H], R).
insert(X, L, M) :- append(H, [A, B | T], M), A <= X, X <= B, append(H, [A, X, B | T], M)
For systems with several integrated cores accessing the same RAM, shared-memory systems can communicate very quickly by simply accessing memory locations that have been changed by other threads. There is no need to copy information from one thread to another.