*printable version*

# Exam 1 Review A

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]

### Problem R1a.1.

What is *Moore's Law*?

The number of transistors that fit on an integrated circuit doubles every 18–24 months.

### Problem R1a.2.

Microprocessor clock rates grew at a quick clip from 1988 to 2003, going from 16 MHz to 3.2 GHz. But then the race for faster clock rates essentially stopped. Why?

A faster clock rate leads to transitions switching state faster, and each state transition generates heat. Until 3.2 GHz, this increase in heat could be overcome through more sophisticated heat sinks and fans. But architects have found little room for improving heat dissipation affordably.

### Problem R1a.3.

Contrast the terms *latency* and
*bandwidth*, explaining them in terms of delivering
water buckets to a fire.

Latency is the amount of time to handle a single operation, whereas bandwidth is the overall rate at which operations can be completed. A basic example is putting out a fire using a bucket brigade: Latency would be the amount of time for one bucket to get from the water source to the fire, whereas whereas bandwidth would be the rate at which buckets would be poured onto the fire.

### Problem R1a.4.

**a.** What is the geometric mean of 1, 4, 4, 16? What is
the arithmetic mean?

**b.** In combining several benchmark measurements, why use a
geometric mean rather than an arithmetic mean?

**a.** The geometric mean is 4. The arithmetic mean is
25/4 = 6.25.

**b.** Two advantages, either of which suffices as an
answer:

i.If the measurements have different ranges, the geometric mean gives each measurement equal emphasis whereas the arithmetic mean gives emphasis to the measurement whose range is widest.

ii.Unlike the arithmetic mean, the geometric mean is independent of the base used to compute the measurement's ratios. That is, we could divide each measurement by one set of bases or another, and the resulting mean could still be compared effectively to the mean arrived at for another computer.

### Problem R1a.5.

Distinguish between a memory-register instruction set (such as x86) and a load-store instruction set (such as MIPS). What are their relative performance advantages in today's technology?

A memory-register instruction set has many instructions that combine a memory access with another operation, such as an add or branch. By contrast, the only instructions in a load-store instruction set that access memory are the load and store instructions; other operations can operate only on registers and immediates.

Memory-register instruction sets are advantageous in situations where memory speed is fast relative to the processor. But with today's technology, memory is much slower, so restricting memory access to specialized instructions allows the other operations to complete more quickly.

### Problem R1a.6.

Describe what is meant by a “stack architecture” (in contrast to, say, a load-store architecture).

A stack architecture is an architecture where instructions operate on a stack rather than on random-access registers. For example, a stack architecture would typically have an ADD instruction with no arguments, which would pop two numbers off the stack and then push their sum onto the stack.

### Problem R1a.7.

Suppose we have a stack architecture supporting
`LOAD`

(direct addressing),
`STORE`

(direct addressing),
`ADD`

and `MULT`

instructions.

**a.**
Write a sequence of instruction to compute
`A` ⋅ `B` + `C` ⋅ `D` and store the result in address 1000,
assuming that `A` can be found at address 1004,
`B` at 1008,
`C` at 1012, and
`D` at 1016.

**b.** Write a sequence of instructions that swaps the
data found at memory addresses 2000 and 2004, so that what was
stored at 2000 is moved to 2004, while what was stored at 2004
is moved to 2000. Your solution should not use any other memory
addresses.

a. | `LOAD 1004` | b. | `LOAD 2000` |

### Problem R1a.8.

Define the term *pseudo-instruction* as used in the context
of assembly language programs.

A pseudo-instruction is an operation that the assembler accepts,
but which it translates into another operation or
operations that is actually supported by the processor.
An example is the `move`

pseudo-instruction under some MIPS
assemblers, which would actually translate it into an `or`

instruction. For example, `move $t0, $t1`

might translate
into `or $t0, $zero, $t1`

.

### Problem R1a.9.

Translate each of the following MIPS assembly instructions into its
32-bit machine language equivalent.
Note: `$t0`

is encoded as register number 8.

a. | `xor $t0, $t1, $t2` |

b. | `lw $t0, 4($t1)` |

a. | `xor $t0, $t1, $t2` |
→ | 000000 01001 01010 01000 00000 100110 |

b. | `lw $t0, 4($t1)` |
→ | 100011 01001 01000 0000000000000100 |

### Problem R1a.10.

Suppose we have a subroutine `fact`

available to
compute `n!`

.
Write a MIPS subroutine `sqfact`

that,
given `n` as a parameter, computes the lower 32 bis of
(`n`!)².

`sqfact: addi $sp, $sp, -4`

sw $ra, 0($sp)

jal fact

mul $v0, $v0

mflo $v0

lw $ra, 0($sp)

addi $sp, $sp, 4

jr $ra

### Problem R1a.11.

Write a MIPS subroutine named `fact`

that, given `n` as a
parameter, computes the lower 32 bits of `n`! using
recursion.

`fact: addi $sp, $sp, -8`

sw $a0, 0($sp)

sw $ra, 4($sp)

addi $v0, $zero, 1

beq $a0, $zero, done

addi $a0, $a0, -1

jal fact

lw $a0, 0($sp)

mul $v0, $a0

mflo $v0

done: lw $ra, 4($sp)

addi $sp, $sp, 8

jr $ra

### Problem R1a.12.

Suppose we have a subroutine `func`

that, given an integer
parameter `n` returns some mysterious function of `n` (such as,
say, `n`! or ⌊100 sin `n`⌋). The function works
entirely with integers.

Write a MIPS subroutine `sumFunc`

that, given a parameter which is the address of the first element of
an array, and a second parameter which is the number of elements in
that array, determines the sum of the function values for the array
entries. For example, if `func`

happens to compute `n`²,
and the array specified in the parameters is <1, 3, 6>
then `sumFunc`

would return 1² + 3² + 6² = 46.

`sumFunc: addi $sp, $sp, -16`

sw $s0, 0($sp)

sw $s1, 4($sp)

sw $s2, 8($sp)

sw $ra, 12($sp)

add $s0, $zero, $a0

addi $s1, $zero, 0

sll $t0, $a1, 2

add $s2, $s0, $t0

beq $s0, $s2, done

loop: lw $a0, 0($s0)

jal func

addi $s0, $s0, 4

add $s1, $s1, $v0

bne $s0, $s2, loop

done: add $v0, $zero, $s1

lw $s0, 0($sp)

lw $s1, 4($sp)

lw $s2, 8($sp)

lw $ra, 12($sp)

addi $sp, $sp, 16

jr $ra

### Problem R1a.13.

Explain the purpose underlying the register `$at`

, which assembly
programs should not utilize.

*pseudo-instructions*, which are instructions that appear to be a native instruction, but which really translate to a different instruction — or several different instructions. The register

`$at`

is reserved for
temporary data needed for such instructions.
For example,
an assembler might include a `bgt`

pseudo-instruction for
branching if one number is greater than another:
“`bgt $t1, $t0, there`

”.
This cannot be translated into any single instruction, and the
most reasonable translation involves computing a temporary value
using the `slt`

instruction:

` slt $at, $t0, $t1`

bne $at, $zero, there

### Problem R1a.14.

The MIPS calling convention includes some callee-save registers,
such as `$s0`

and `$s1`

, and some caller-save registers,
such as `$t0`

and `$t1`

. Explain why the inclusion of both
types of registers in the calling convention improves program
performance.

Callee-save registers allow a subroutine to use a register to remember a value across other subroutines calls, without having to push and pop the stack surrounding each subroutine call. (The callee-save register would be only pushed and popped once — at the beginning and end of the subroutine.) Caller-save registers allow a subroutine to employ registers for short-term computations without an intervening subroutine call, without having to worry about restoring the register's previous value. Since both situations show up in typical subroutines, having both categories of registers allows a compiler to minimize the overall usage of the stack.

### Problem R1a.15.

Describe two of the VAX's addressing modes, other than the register and immediate modes.

Four of the more important are: *indirect*, where the
computer looks into memory at an address contained in a
register; *auto increment*, where the computer looks into memory
at an address contained in a register, and it then increments
the register; *index*, where the computer looks into
memory at an address computed as the sum of a register and an
immdiate; and *absolute*, where the computer looks into
memory at an address specified as an immediate.

### Problem R1a.16.

Suppose we are designing an ISA for 16 registers, using a fixed-length 16-bit encoding.

**a.** What is the maximum number of ALU operations that we can
encode if each instruction encodes a destination and two
source registers (as in “`add R0, R1, R2`

”
to indicate that the sum of `R1`

and `R2`

should
be placed into `R0`

)?

**b.** What is the maximum number of ALU operations that we can
encode if each instruction encodes a destination register (also
used as a source operand) and an additional source register
(as in “`add R0, R1`

”
to indicate that the sum of `R0`

and `R1`

should
be placed into `R0`

)?

**a.** 16. Since it requires 4 bits to encode each
register operand, encoding three such registers requires
3 ⋅ 4 = 12 bits. That leaves 4 bits left over, which
can support up to 2^{4} = 16 distinct values.

**b.** 256.
Encoding two such registers requires
2 ⋅ 4 = 8 bits. That leaves 8 bits left over, which
can support up to 2^{8} = 256 distinct values.

### Problem R1a.17.

What is an advantage of using variable-length instruction encoding, even though fixed-length requires less computation to implement?

Variable-length encoding tends to encode programs using fewer bytes, since the ISA is naturally designed with the more common operations requiring less space. (For example, x86 code typically takes 40% less space than a fixed-length 32-bit representation.) This is particularly significant for embedded devices: Less memory must be installed for the same functionality — besides saving the cost of including the memory, the device is also less power-hungry (reducing the need for batteries).

### Problem R1a.18.

Explain why modern processors do not generally use the simple single-cycle implemention, even though this implementation provides the fastest possible completion of each instruction and requires less hardware than other techniques.

While it completes instructions quickly individually, it is only working on one instruction at a time, and so the overall throughput is less; a pipelined architecture increases latency but increases throughput, and throughput is what governs program performance.

(The advantage of requiring less transistors is a real advantage, but a very minor one: With today's technology, the number of transistors that can cheaply be placed on a chip far exceeds the number of transistors required for a single-cycle implementation.)

### Problem R1a.19.

Suppose we have a processor using an 8-stage pipeline. If the computer has a 2 GHz clock, how many nanoseconds does it take to complete each instruction once it enters the pipeline?

8 cycles | × | seconds | × | 10^{9} ns | = 4 ns |

instruction | 2 × 10^{9} cycles |
s |