Quiz 1: Questions

Q1.1.

[8 pts] In studying a program, Eva Lu Ator identifies that it has three phases: It spends 50% of its time in the first phase; then it proceeds to the second phase, where it spends 10% of its time; and finally it enters the third phase, where it spends 40% of its time.

a. Eva first notices radical inefficiencies in the second phase. If she rewrites the second phase to be ten times faster, how many times faster will this new program be overall? (Show your work.)

b. Eva reconsiders: Rather than change the second phase, she looks very carefully at the first phase and discovers how to make it 25% faster. How many times faster will this new program be overall? (Show your work.)

c. If Eva has time to pursue only one of these two options, should she improve the first phase by 25% or the second phase by ten times?

Q1.2.

[14 pts] Write a MIPS subroutine that takes one parameter providing the address where a subroutine f begins; this parameter subroutine f (which you can call using jalr) takes an integer parameter and returns an integer. Your subroutine should compute f(1) + f(2) + … + f(100). For example, if f is the squaring function (f(n) = n²), then your subroutine would return 338,350, since this is the result of 1² + 2² + 3² + … + 100².

The provided code below handles saving some registers to the stack and restoring them at the end.

sumTo100: addi $sp, $sp, -16
          sw $s0, 0($sp)
          sw $s1, 4($sp)
          sw $s2, 8($sp)
          sw $ra, 12($sp)

          # your code here

          lw $s0, 0($sp)
          lw $s1, 4($sp)
          lw $s2, 8($sp)
          lw $ra, 12($sp)
          add $sp, $sp, 16
          jr $ra

Q1.3.

[10 pts] Translate the following MIPS “program” into machine language (binary).

loop: lw $t0, 4($s0)
      add $s0, $s0, $t0
      bne $t0, $zero, loop
      nop

Q1.4.

[10 pts] Complete the following diagram of how instructions progress through the classic five-stage pipeline, assuming that the processor supports data forwarding. Draw arrows indicating points where forwarding is necessary.

`lw $t0, 4($s0)`	IF	ID	EX	MEM	WB
`lw $t1, ($t0)`
`xor $t2, $t0, $t1`
`addi $t3, $t2, -1`
`and $t4, $t2, $t3`

Q1.5.

[8 pts] Suppose we have a two-way set-associative cache with four lines, each containing sixteen bytes. The cache uses a least-recently-used policy to replace lines; in the tables below, the “MRU?” column represents which line is most recently used within each set.

If the cache has the initial state diagrammed at left below, and we then access the address sequence 0x00, 0x74, 0x1C, 0x8C, 0x10, what will be the final status of the cache?

set	MRU?	data
0	0	M[0x00–0x0F]
0	1	M[0x40–0x4F]
1	0	M[0x10–0x1F]
1	1	M[0x50–0x5F]

Q1.6.

[10 pts] Suppose we have a 7-stage pipeline.

The branch prediction is determined by the end of the first stage.
The jump/branch destination is determined by the end of the third stage.
The branch decision is determined by the end of the fourth stage.

Assume that 5% of instructions executed are jumps or calls, while 15% are branches (of which 2/3 are taken). And you should assume that the branch prediction algorithm is correct 95% of the time (whether the branch is taken or not taken). How many clock cycles does the average instruction take before the next instruction is issued? (Ignore data and structural hazards, as well as memory access delays.) Show your work.

Quiz 1: Solutions

Q1.1.

a. 1.10. T_modified = (0.5 + 0.1 / 10 + 0.4) T_base = 0.91 T_base; the speedup is T_base / T_modified = 1 / 0.91 = 1.10.

b. 1.11. T_modified = (0.5 / 1.25 + 0.1 + 0.4) T_base = 0.9 T_base; the speedup is T_base / T_modified = 1 / 0.9 = 1.11.

c. Improving the first phase by 25% provides a slightly better speedup.

Q1.2.

        move $s0, $a0
        move $s1, 0
        move $s2, 0
loop:   addi $s1, $s1, 1
        move $a0, $s1
        jalr $s0
        add $s2, $s2, $v0
        slti $t1, $s1, 100
        bne $t1, $zero, loop
        nop
        move $v0, $s2

Q1.3.

`lw $t0, 4($s0)`	100011 10000 01000 00000 00000 000100
`add $s0, $s0, $t0`	000000 10000 01000 10000 00000 100000
`bne $t0, $zero, loop`	000101 00000 01000 11111 11111 111101
`nop`	000000 00000 00000 00000 00000 000000

Q1.4.

`lw $t0, 4($s0)`	IF	ID	EX	MEM	WB
`lw $t1, ($t0)`		IF	ID	—	EX	MEM	WB
`xor $t2, $t0, $t1`			IF	—	ID	—	EX	MEM	WB
`addi $t3, $t2, -1`					IF	—	ID	EX	MEM	WB
`and $t4, $t2, $t3`							IF	ID	EX	MEM	WB

We would have the following arrows indicating forwarding:

from after the first lw's MEM stage to the second lw's EX stage.
from after the second lw's MEM stage to xor's EX stage.
from after xor's EX stage to addi's EX stage.
from after xor's MEM stage to and's EX stage.
from after addi's EX stage to and's EX stage.

Q1.5.

set	MRU?	data
0	0	M[0x00–0x0F]
0	1	M[0x80–0x8F]
1	0	M[0x70–0x7F]
1	1	M[0x10–0x1F]

Q1.6.

0.8 ⋅ 1 + 0.05 ⋅ 3 + 0.10 ⋅ (0.95 ⋅ 3 + 0.05 ⋅ 4) + 0.05 ⋅ (0.95 ⋅ 1 + 0.05 ⋅ 4) = 0.8 + 0.15 + 0.305 + 0.0575 = 1.3125 cycles.