Exam 1 Review A: Questions
What is Moore's Law?
Microprocessor clock rates grew at a quick clip from 1988 to 2003, going from 16 MHz to 3.2 GHz. But then the race for faster clock rates essentially stopped. Why?
Contrast the terms latency and bandwidth, explaining them in terms of delivering water buckets to a fire.
a. What is the geometric mean of 1, 4, 4, 16? What is the arithmetic mean?
b. In combining several benchmark measurements, why use a geometric mean rather than an arithmetic mean?
Distinguish between a memory-register instruction set (such as x86) and a load-store instruction set (such as MIPS). What are their relative performance advantages in today's technology?
Describe what is meant by a “stack architecture” (in contrast to, say, a load-store architecture).
Suppose we have a stack architecture supporting
LOAD
(direct addressing),
STORE
(direct addressing),
ADD
and MULT
instructions.
a. Write a sequence of instruction to compute A ⋅ B + C ⋅ D and store the result in address 1000, assuming that A can be found at address 1004, B at 1008, C at 1012, and D at 1016.
b. Write a sequence of instructions that swaps the data found at memory addresses 2000 and 2004, so that what was stored at 2000 is moved to 2004, while what was stored at 2004 is moved to 2000. Your solution should not use any other memory addresses.
Define the term pseudo-instruction as used in the context of assembly language programs.
Translate each of the following MIPS assembly instructions into its
32-bit machine language equivalent.
Note: $t0
is encoded as register number 8.
a. | xor $t0, $t1, $t2 |
b. | lw $t0, 4($t1) |
Suppose we have a subroutine fact
available to
compute n!
.
Write a MIPS subroutine sqfact
that,
given n as a parameter, computes the lower 32 bis of
(n!)².
Write a MIPS subroutine named fact
that, given n as a
parameter, computes the lower 32 bits of n! using
recursion.
Suppose we have a subroutine func
that, given an integer
parameter n returns some mysterious function of n (such as,
say, n! or ⌊100 sin n⌋). The function works
entirely with integers.
Write a MIPS subroutine sumFunc
that, given a parameter which is the address of the first element of
an array, and a second parameter which is the number of elements in
that array, determines the sum of the function values for the array
entries. For example, if func
happens to compute n²,
and the array specified in the parameters is <1, 3, 6>
then sumFunc
would return 1² + 3² + 6² = 46.
Explain the purpose underlying the register $at
, which assembly
programs should not utilize.
The MIPS calling convention includes some callee-save registers,
such as $s0
and $s1
, and some caller-save registers,
such as $t0
and $t1
. Explain why the inclusion of both
types of registers in the calling convention improves program
performance.
Describe two of the VAX's addressing modes, other than the register and immediate modes.
Suppose we are designing an ISA for 16 registers, using a fixed-length 16-bit encoding.
a. What is the maximum number of ALU operations that we can
encode if each instruction encodes a destination and two
source registers (as in “add R0, R1, R2
”
to indicate that the sum of R1
and R2
should
be placed into R0
)?
b. What is the maximum number of ALU operations that we can
encode if each instruction encodes a destination register (also
used as a source operand) and an additional source register
(as in “add R0, R1
”
to indicate that the sum of R0
and R1
should
be placed into R0
)?
What is an advantage of using variable-length instruction encoding, even though fixed-length requires less computation to implement?
Explain why modern processors do not generally use the simple single-cycle implemention, even though this implementation provides the fastest possible completion of each instruction and requires less hardware than other techniques.
Suppose we have a processor using an 8-stage pipeline. If the computer has a 2 GHz clock, how many nanoseconds does it take to complete each instruction once it enters the pipeline?
Exam 1 Review A: Solutions
The number of transistors that fit on an integrated circuit doubles every 18–24 months.
A faster clock rate leads to transitions switching state faster, and each state transition generates heat. Until 3.2 GHz, this increase in heat could be overcome through more sophisticated heat sinks and fans. But architects have found little room for improving heat dissipation affordably.
Latency is the amount of time to handle a single operation, whereas bandwidth is the overall rate at which operations can be completed. A basic example is putting out a fire using a bucket brigade: Latency would be the amount of time for one bucket to get from the water source to the fire, whereas whereas bandwidth would be the rate at which buckets would be poured onto the fire.
a. The geometric mean is 4. The arithmetic mean is 25/4 = 6.25.
b. Two advantages, either of which suffices as an answer:
i. If the measurements have different ranges, the geometric mean gives each measurement equal emphasis whereas the arithmetic mean gives emphasis to the measurement whose range is widest.
ii. Unlike the arithmetic mean, the geometric mean is independent of the base used to compute the measurement's ratios. That is, we could divide each measurement by one set of bases or another, and the resulting mean could still be compared effectively to the mean arrived at for another computer.
A memory-register instruction set has many instructions that combine a memory access with another operation, such as an add or branch. By contrast, the only instructions in a load-store instruction set that access memory are the load and store instructions; other operations can operate only on registers and immediates.
Memory-register instruction sets are advantageous in situations where memory speed is fast relative to the processor. But with today's technology, memory is much slower, so restricting memory access to specialized instructions allows the other operations to complete more quickly.
A stack architecture is an architecture where instructions operate on a stack rather than on random-access registers. For example, a stack architecture would typically have an ADD instruction with no arguments, which would pop two numbers off the stack and then push their sum onto the stack.
a. | LOAD 1004 | b. | LOAD 2000 |
A pseudo-instruction is an operation that the assembler accepts,
but which it translates into another operation or
operations that is actually supported by the processor.
An example is the move
pseudo-instruction under some MIPS
assemblers, which would actually translate it into an or
instruction. For example, move $t0, $t1
might translate
into or $t0, $zero, $t1
.
a. | xor $t0, $t1, $t2 |
→ | 000000 01001 01010 01000 00000 100110 |
b. | lw $t0, 4($t1) |
→ | 100011 01001 01000 0000000000000100 |
sqfact: addi $sp, $sp, -4
sw $ra, 0($sp)
jal fact
mul $v0, $v0
mflo $v0
lw $ra, 0($sp)
addi $sp, $sp, 4
jr $ra
fact: addi $sp, $sp, -8
sw $a0, 0($sp)
sw $ra, 4($sp)
addi $v0, $zero, 1
beq $a0, $zero, done
addi $a0, $a0, -1
jal fact
lw $a0, 0($sp)
mul $v0, $a0
mflo $v0
done: lw $ra, 4($sp)
addi $sp, $sp, 8
jr $ra
sumFunc: addi $sp, $sp, -16
sw $s0, 0($sp)
sw $s1, 4($sp)
sw $s2, 8($sp)
sw $ra, 12($sp)
add $s0, $zero, $a0
addi $s1, $zero, 0
sll $t0, $a1, 2
add $s2, $s0, $t0
beq $s0, $s2, done
loop: lw $a0, 0($s0)
jal func
addi $s0, $s0, 4
add $s1, $s1, $v0
bne $s0, $s2, loop
done: add $v0, $zero, $s1
lw $s0, 0($sp)
lw $s1, 4($sp)
lw $s2, 8($sp)
lw $ra, 12($sp)
addi $sp, $sp, 16
jr $ra
$at
is reserved for
temporary data needed for such instructions.
For example,
an assembler might include a bgt
pseudo-instruction for
branching if one number is greater than another:
“bgt $t1, $t0, there
”.
This cannot be translated into any single instruction, and the
most reasonable translation involves computing a temporary value
using the slt
instruction:
slt $at, $t0, $t1
bne $at, $zero, there
Callee-save registers allow a subroutine to use a register to remember a value across other subroutines calls, without having to push and pop the stack surrounding each subroutine call. (The callee-save register would be only pushed and popped once — at the beginning and end of the subroutine.) Caller-save registers allow a subroutine to employ registers for short-term computations without an intervening subroutine call, without having to worry about restoring the register's previous value. Since both situations show up in typical subroutines, having both categories of registers allows a compiler to minimize the overall usage of the stack.
Four of the more important are: indirect, where the computer looks into memory at an address contained in a register; auto increment, where the computer looks into memory at an address contained in a register, and it then increments the register; index, where the computer looks into memory at an address computed as the sum of a register and an immdiate; and absolute, where the computer looks into memory at an address specified as an immediate.
a. 16. Since it requires 4 bits to encode each register operand, encoding three such registers requires 3 ⋅ 4 = 12 bits. That leaves 4 bits left over, which can support up to 24 = 16 distinct values.
b. 256. Encoding two such registers requires 2 ⋅ 4 = 8 bits. That leaves 8 bits left over, which can support up to 28 = 256 distinct values.
Variable-length encoding tends to encode programs using fewer bytes, since the ISA is naturally designed with the more common operations requiring less space. (For example, x86 code typically takes 40% less space than a fixed-length 32-bit representation.) This is particularly significant for embedded devices: Less memory must be installed for the same functionality — besides saving the cost of including the memory, the device is also less power-hungry (reducing the need for batteries).
While it completes instructions quickly individually, it is only working on one instruction at a time, and so the overall throughput is less; a pipelined architecture increases latency but increases throughput, and throughput is what governs program performance.
(The advantage of requiring less transistors is a real advantage, but a very minor one: With today's technology, the number of transistors that can cheaply be placed on a chip far exceeds the number of transistors required for a single-cycle implementation.)
8 cycles | × | seconds | × | 109 ns | = 4 ns |
instruction | 2 × 109 cycles | s |