Quiz 1: Questions
[8 pts] In studying a program, Eva Lu Ator identifies that it has three phases: It spends 50% of its time in the first phase; then it proceeds to the second phase, where it spends 10% of its time; and finally it enters the third phase, where it spends 40% of its time.
a. Eva first notices radical inefficiencies in the second phase. If she rewrites the second phase to be ten times faster, how many times faster will this new program be overall? (Show your work.)
b. Eva reconsiders: Rather than change the second phase, she looks very carefully at the first phase and discovers how to make it 25% faster. How many times faster will this new program be overall? (Show your work.)
c. If Eva has time to pursue only one of these two options, should she improve the first phase by 25% or the second phase by ten times?
[14 pts]
Write a MIPS subroutine that takes one parameter providing the address
where a subroutine f begins;
this parameter subroutine f (which you can call using jalr
)
takes an integer parameter and returns an integer.
Your subroutine should compute f(1) + f(2) + … + f(100).
For example, if f is the squaring function (f(n) = n²),
then your subroutine would return 338,350, since this is
the result of 1² + 2² + 3² + … + 100².
The provided code below handles saving some registers to the stack and restoring them at the end.
sumTo100: addi $sp, $sp, -16
sw $s0, 0($sp)
sw $s1, 4($sp)
sw $s2, 8($sp)
sw $ra, 12($sp)
# your code here
lw $s0, 0($sp)
lw $s1, 4($sp)
lw $s2, 8($sp)
lw $ra, 12($sp)
add $sp, $sp, 16
jr $ra
[10 pts] Translate the following MIPS “program” into machine language (binary).
loop: lw $t0, 4($s0)
add $s0, $s0, $t0
bne $t0, $zero, loop
nop
[10 pts] Complete the following diagram of how instructions progress through the classic five-stage pipeline, assuming that the processor supports data forwarding. Draw arrows indicating points where forwarding is necessary.
lw $t0, 4($s0) |
IF | ID | EX | MEM | WB |
lw $t1, ($t0) | |||||
xor $t2, $t0, $t1 | |||||
addi $t3, $t2, -1 | |||||
and $t4, $t2, $t3 |
[8 pts] Suppose we have a two-way set-associative cache with four lines, each containing sixteen bytes. The cache uses a least-recently-used policy to replace lines; in the tables below, the “MRU?” column represents which line is most recently used within each set.
If the cache has the initial state diagrammed at left below, and we then access the address sequence 0x00, 0x74, 0x1C, 0x8C, 0x10, what will be the final status of the cache?
set | MRU? | data |
0 | 0 | M[0x00–0x0F] |
1 | M[0x40–0x4F] | |
1 | 0 | M[0x10–0x1F] |
1 | M[0x50–0x5F] |
[10 pts] Suppose we have a 7-stage pipeline.
The branch prediction is determined by the end of the first stage.
The jump/branch destination is determined by the end of the third stage.
The branch decision is determined by the end of the fourth stage.
Assume that 5% of instructions executed are jumps or calls, while 15% are branches (of which 2/3 are taken). And you should assume that the branch prediction algorithm is correct 95% of the time (whether the branch is taken or not taken). How many clock cycles does the average instruction take before the next instruction is issued? (Ignore data and structural hazards, as well as memory access delays.) Show your work.
Quiz 1: Solutions
a. 1.10. Tmodified = (0.5 + 0.1 / 10 + 0.4) Tbase = 0.91 Tbase; the speedup is Tbase / Tmodified = 1 / 0.91 = 1.10.
b. 1.11. Tmodified = (0.5 / 1.25 + 0.1 + 0.4) Tbase = 0.9 Tbase; the speedup is Tbase / Tmodified = 1 / 0.9 = 1.11.
c. Improving the first phase by 25% provides a slightly better speedup.
move $s0, $a0
move $s1, 0
move $s2, 0
loop: addi $s1, $s1, 1
move $a0, $s1
jalr $s0
add $s2, $s2, $v0
slti $t1, $s1, 100
bne $t1, $zero, loop
nop
move $v0, $s2
lw $t0, 4($s0) |
100011 10000 01000 00000 00000 000100 |
add $s0, $s0, $t0 |
000000 10000 01000 10000 00000 100000 |
bne $t0, $zero, loop |
000101 00000 01000 11111 11111 111101 |
nop |
000000 00000 00000 00000 00000 000000 |
lw $t0, 4($s0) |
IF | ID | EX | MEM | WB | ||||||
lw $t1, ($t0) | IF | ID | — | EX | MEM | WB | |||||
xor $t2, $t0, $t1 | IF | — | ID | — | EX | MEM | WB | ||||
addi $t3, $t2, -1 | IF | — | ID | EX | MEM | WB | |||||
and $t4, $t2, $t3 | IF | ID | EX | MEM | WB |
We would have the following arrows indicating forwarding:
- from after the first
lw
's MEM stage to the secondlw
's EX stage. - from after the second
lw
's MEM stage toxor
's EX stage. - from after
xor
's EX stage toaddi
's EX stage. - from after
xor
's MEM stage toand
's EX stage. - from after
addi
's EX stage toand
's EX stage.
set | MRU? | data |
0 | 0 | M[0x00–0x0F] |
1 | M[0x80–0x8F] | |
1 | 0 | M[0x70–0x7F] |
1 | M[0x10–0x1F] |
0.8 ⋅ 1 + 0.05 ⋅ 3 + 0.10 ⋅ (0.95 ⋅ 3 + 0.05 ⋅ 4) + 0.05 ⋅ (0.95 ⋅ 1 + 0.05 ⋅ 4) = 0.8 + 0.15 + 0.305 + 0.0575 = 1.3125 cycles.