Final

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]

Problem Xf.1.

[8 pts] Define the term instruction-level parallelism (ILP), and include at least one example of a processor design technique illustrating it.

Instruction-level parallelism refers to opportunities to execute simultaneously different instructions from the same instruction stream, even though the stream is written as if it is sequential. Pipelined and superscalar architectures are two techniques that use ILP to enhance processor performance.

Problem Xf.3.

[10 pts] Translate the following MIPS “program” into machine language (binary).

loop: lw $t0, 0($s0)
      addi $s0, $s0, -4
      bne $s0, $s1, loop
      nop

	100011 10000 01000 00000 00000 000000
`addi $s0, $s0, -4`	001000 10000 10000 11111 11111 111100
`bne $s0, $s1, loop`	000101 10001 10000 11111 11111 111101
`nop`	000000 00000 00000 00000 00000 000000

Problem Xf.4.

[12 pts] The below MIPS subroutine accepts a subroutine parameter f and counts how many times f can be applied to its own result before we reach zero, starting with 1000. For example, if f is integer division by 10, it would return 4 (1000 / 10 / 10 / 10 / 10 = 0).

Though this is an admirable attempt, testing demonstrates that it does not work correctly. Explain what is wrong, and show how to repair the code.

repeat_to_zero:
        move $t0, $a0          # place f into $t0
        li $t1, 1000           # $t1 is number we're currently at
        li $t2, 0              # $t2 counts number of applications of f
cnext:  move $a0, $t1          # call f(current), result into $t1
        jalr $t0
        nop
        move $t1, $v0
        addi $t2, $t2, 1       # increment counter
        bne $t1, $zero, cnext  # repeat if current is still not zero
        move $v0, $t2          # return counter
        jr $ra
        nop

Calling f in the jalr instruction can potentially change all caller-save registers (and it will certainly change $ra). But the subroutine as written depends upon $t0, $t1, $t2, and $ra all retaining their original value, even though they are caller-save registers.

A partial solution is to use callee-save registers $s0, $s1, and $s2 wherever the original code uses $t0, $t1, and $t2. But using these requires that the subroutine restore these registers' original values (as well as $ra) before returning. So another required step would be to add code at the beginning to push the old values onto the stack, and to add code before “jr $ra” to restore the pushed values. The added code at the beginning would be:

repeat_to_zero:
        addi $sp, $sp, -16
        sw $ra, 12($sp)
        sw $s0, 8($sp)
        sw $s1, 4($sp)
        sw $s2, 0($sp)
        move $s0, $a0       ; original first instruction, using $s0 not $t0

And at the end would be: move $v0, $s2 ; original next-to-last instruction, using $s2 not $t2 lw $ra, 12($sp) lw $s0, 8($sp) lw $s1, 4($sp) lw $s2, 0($sp) addi $sp, $sp, 16 jr $ra nop

NNN	WN/0
NNY	WN/0
NYN	WN/0
NYY	WN/0 SN/4
YNN	WN/0
YNY	WN/0 SY/3 WY/6
YYN	WN/0 SY/2 SY/5
YYY	WN/0 SN/1

instruction	first elt	last elt
`lv V1, ($s0)`	0…
`lv V2, ($s0)`
`addvv.d V3, V1, V2`
`sv V3, ($s0)`

instruction	first elt	last elt
`lv V1, ($s0)`	0…`n` − 1	6…6 + `n` − 1
`lv V2, ($s0)`	`n`…2 `n` − 1	6 + `n`…6 + 2 `n` − 1
`addvv.d V3, V1, V2`	6 + `n`…6 + 2 `n` − 1	8 + `n`…8 + 2 `n` − 1
`sv V3, ($s0)`	2 `n`…3 `n` − 1	6 + 2 `n`…6 + 3 `n` − 1

initially, `a` is 1 and `b` is 3
Thread A	Thread B
Store 6 into `b`.	Load `a` into register `$t0`.
Store 2 into `a`.	Load `b` into register `$t1`.
	Display `$t0` + `$t1`.

Final

Problem Xf.1.

Problem Xf.2.

Problem Xf.3.

Problem Xf.4.

Problem Xf.5.

Problem Xf.6.

Problem Xf.7.

Problem Xf.8.

Problem Xf.9.

Problem Xf.10.

Problem Xf.11.

Problem Xf.12.

Problem Xf.13.

Problem Xf.14.

Problem Xf.15.

Problem Xf.16.

Problem Xf.17.

Problem Xf.18.