*printable version*
# Quiz 1

[1]
[2]
[3]
[4]
[5]
[6]

### Problem Q1.1.

[8 pts]
In studying a program, Eva Lu Ator identifies that it has three phases:
It spends 50% of its time in the first phase;
then it proceeds to the second phase, where it spends 10% of its time;
and finally it enters the third phase, where it spends 40% of
its time.

**a.** Eva first notices radical inefficiencies in the second phase.
If she rewrites the second phase to be ten times faster,
how many times faster will this new program be overall? (Show
your work.)

**b.** Eva reconsiders: Rather than change the second phase,
she looks very carefully at the first phase and discovers how to make it
25% faster. How many times faster will this new program be overall?
(Show your work.)

**c.** If Eva has time to pursue only one of these two
options, should she improve the first phase by 25% or the second
phase by ten times?

**a.** 1.10.
`T`_{modified} = (0.5 + 0.1 / 10 + 0.4) `T`_{base} = 0.91 `T`_{base};
the speedup is `T`_{base} / `T`_{modified} = 1 / 0.91 = 1.10.

**b.** 1.11.
`T`_{modified} = (0.5 / 1.25 + 0.1 + 0.4) `T`_{base} = 0.9 `T`_{base};
the speedup is `T`_{base} / `T`_{modified} = 1 / 0.9 = 1.11.

**c.** Improving the first phase by 25% provides a
slightly better speedup.

### Problem Q1.2.

[14 pts]
Write a MIPS subroutine that takes one parameter providing the address
where a subroutine `f` begins;
this parameter subroutine `f` (which you can call using `jalr`

)
takes an integer parameter and returns an integer.
Your subroutine should compute `f`(1) + `f`(2) + … + `f`(100).
For example, if `f` is the squaring function (`f`(`n`) = `n`²),
then your subroutine would return 338,350, since this is
the result of 1² + 2² + 3² + … + 100².

The provided code below handles saving some
registers to the stack and restoring them at the end.

`sumTo100: addi $sp, $sp, -16`

sw $s0, 0($sp)

sw $s1, 4($sp)

sw $s2, 8($sp)

sw $ra, 12($sp)

# your code here

lw $s0, 0($sp)

lw $s1, 4($sp)

lw $s2, 8($sp)

lw $ra, 12($sp)

add $sp, $sp, 16

jr $ra

` move $s0, $a0`

move $s1, 0

move $s2, 0

loop: addi $s1, $s1, 1

move $a0, $s1

jalr $s0

add $s2, $s2, $v0

slti $t1, $s1, 100

bne $t1, $zero, loop

nop

move $v0, $s2

### Problem Q1.3.

[10 pts]
Translate the following MIPS “program” into machine language
(binary).

`loop: lw $t0, 4($s0)`

add $s0, $s0, $t0

bne $t0, $zero, loop

nop

`lw $t0, 4($s0)` |
100011 10000 01000 00000 00000 000100 |

`add $s0, $s0, $t0` |
000000 10000 01000 10000 00000 100000 |

`bne $t0, $zero, loop` |
000101 00000 01000 11111 11111 111101 |

`nop` |
000000 00000 00000 00000 00000 000000 |

### Problem Q1.4.

[10 pts]
Complete the following diagram of how instructions progress through the
classic five-stage pipeline, assuming that the processor supports data
forwarding. Draw arrows indicating points where forwarding is
necessary.

`lw $t0, 4($s0)` |
IF | ID | EX | MEM | WB |

`lw $t1, ($t0)` |

`xor $t2, $t0, $t1` |

`addi $t3, $t2, -1` |

`and $t4, $t2, $t3` |

`lw $t0, 4($s0)` |
IF | ID | EX | MEM | WB |

`lw $t1, ($t0)` | |
IF | ID | — | EX | MEM | WB |

`xor $t2, $t0, $t1` | |
IF | — | ID | — | EX | MEM | WB |

`addi $t3, $t2, -1` | |
IF | — | ID | EX | MEM | WB |

`and $t4, $t2, $t3` | |
IF | ID | EX | MEM | WB |

We would have the following arrows indicating forwarding:

- from after the first
`lw`

's MEM stage to the
second `lw`

's EX stage.
- from after the second
`lw`

's MEM stage to
`xor`

's EX stage.
- from after
`xor`

's EX stage to
`addi`

's EX stage.
- from after
`xor`

's MEM stage to
`and`

's EX stage.
- from after
`addi`

's EX stage to
`and`

's EX stage.

### Problem Q1.5.

[8 pts]
Suppose we have a two-way set-associative cache with four lines,
each containing sixteen bytes.
The cache uses a least-recently-used policy to replace lines;
in the tables below, the “MRU?” column represents which line is
most recently used within each set.

If the cache has the initial state diagrammed at left below,
and we then access the address sequence
0x00,
0x74,
0x1C,
0x8C,
0x10,
what will be the final status of the cache?

**set** | **MRU?** | **data** |

0 |
0 | M[0x00–0x0F] |

1 | M[0x40–0x4F] |

1 |
0 | M[0x10–0x1F] |

1 | M[0x50–0x5F] |

**set** | **MRU?** | **data** |

0 |
0 | M[0x00–0x0F] |

1 | M[0x80–0x8F] |

1 |
0 | M[0x70–0x7F] |

1 | M[0x10–0x1F] |

### Problem Q1.6.

[10 pts]
Suppose we have a 7-stage pipeline.

The branch prediction is determined by the end of the first stage.

The jump/branch destination is determined by the end of the third stage.

The branch decision is determined by the end of the fourth stage.

Assume that 5% of instructions executed are jumps or calls,
while 15% are branches (of which 2/3 are taken).
And you should assume that the branch prediction algorithm is
correct 95% of the time (whether the branch is taken or not taken).
How many clock cycles does the average instruction take before the
next instruction is issued?
(Ignore data and structural hazards, as well as memory access delays.)
Show your work.

0.8 ⋅ 1 + 0.05 ⋅ 3 + 0.10 ⋅ (0.95 ⋅ 3 + 0.05 ⋅ 4) + 0.05 ⋅ (0.95 ⋅ 1 + 0.05 ⋅ 4)
= 0.8 + 0.15 + 0.305 + 0.0575 = 1.3125 cycles.