Exam 1 Review A
What is Moore's Law?
The number of transistors that fit on an integrated circuit doubles every 18–24 months.
Microprocessor clock rates grew at a quick clip from 1988 to 2003, going from 16 MHz to 3.2 GHz. But then the race for faster clock rates essentially stopped. Why?
A faster clock rate leads to transitions switching state faster, and each state transition generates heat. Until 3.2 GHz, this increase in heat could be overcome through more sophisticated heat sinks and fans. But architects have found little room for improving heat dissipation affordably.
Contrast the terms latency and bandwidth, explaining them in terms of delivering water buckets to a fire.
Latency is the amount of time to handle a single operation, whereas bandwidth is the overall rate at which operations can be completed. A basic example is putting out a fire using a bucket brigade: Latency would be the amount of time for one bucket to get from the water source to the fire, whereas whereas bandwidth would be the rate at which buckets would be poured onto the fire.
a. What is the geometric mean of 1, 4, 4, 16? What is the arithmetic mean?
b. In combining several benchmark measurements, why use a geometric mean rather than an arithmetic mean?
a. The geometric mean is 4. The arithmetic mean is 25/4 = 6.25.
b. Two advantages, either of which suffices as an answer:
i. If the measurements have different ranges, the geometric mean gives each measurement equal emphasis whereas the arithmetic mean gives emphasis to the measurement whose range is widest.
ii. Unlike the arithmetic mean, the geometric mean is independent of the base used to compute the measurement's ratios. That is, we could divide each measurement by one set of bases or another, and the resulting mean could still be compared effectively to the mean arrived at for another computer.
Distinguish between a memory-register instruction set (such as x86) and a load-store instruction set (such as MIPS). What are their relative performance advantages in today's technology?
A memory-register instruction set has many instructions that combine a memory access with another operation, such as an add or branch. By contrast, the only instructions in a load-store instruction set that access memory are the load and store instructions; other operations can operate only on registers and immediates.
Memory-register instruction sets are advantageous in situations where memory speed is fast relative to the processor. But with today's technology, memory is much slower, so restricting memory access to specialized instructions allows the other operations to complete more quickly.
Describe what is meant by a “stack architecture” (in contrast to, say, a load-store architecture).
A stack architecture is an architecture where instructions operate on a stack rather than on random-access registers. For example, a stack architecture would typically have an ADD instruction with no arguments, which would pop two numbers off the stack and then push their sum onto the stack.
Suppose we have a stack architecture supporting
LOAD (direct addressing),
STORE (direct addressing),
a. Write a sequence of instruction to compute A ⋅ B + C ⋅ D and store the result in address 1000, assuming that A can be found at address 1004, B at 1008, C at 1012, and D at 1016.
b. Write a sequence of instructions that swaps the data found at memory addresses 2000 and 2004, so that what was stored at 2000 is moved to 2004, while what was stored at 2004 is moved to 2000. Your solution should not use any other memory addresses.
Define the term pseudo-instruction as used in the context of assembly language programs.
A pseudo-instruction is an operation that the assembler accepts,
but which it translates into another operation or
operations that is actually supported by the processor.
An example is the
move pseudo-instruction under some MIPS
assemblers, which would actually translate it into an
instruction. For example,
move $t0, $t1 might translate
or $t0, $zero, $t1.
Translate each of the following MIPS assembly instructions into its
32-bit machine language equivalent.
$t0 is encoded as register number 8.
||→||000000 01001 01010 01000 00000 100110|
||→||100011 01001 01000 0000000000000100|
Suppose we have a subroutine
fact available to
Write a MIPS subroutine
given n as a parameter, computes the lower 32 bis of
sqfact: addi $sp, $sp, -4
sw $ra, 0($sp)
mul $v0, $v0
lw $ra, 0($sp)
addi $sp, $sp, 4
Write a MIPS subroutine named
fact that, given n as a
parameter, computes the lower 32 bits of n! using
fact: addi $sp, $sp, -8
sw $a0, 0($sp)
sw $ra, 4($sp)
addi $v0, $zero, 1
beq $a0, $zero, done
addi $a0, $a0, -1
lw $a0, 0($sp)
mul $v0, $a0
done: lw $ra, 4($sp)
addi $sp, $sp, 8
Suppose we have a subroutine
func that, given an integer
parameter n returns some mysterious function of n (such as,
say, n! or ⌊100 sin n⌋). The function works
entirely with integers.
Write a MIPS subroutine
that, given a parameter which is the address of the first element of
an array, and a second parameter which is the number of elements in
that array, determines the sum of the function values for the array
entries. For example, if
func happens to compute n²,
and the array specified in the parameters is <1, 3, 6>
sumFunc would return 1² + 3² + 6² = 46.
sumFunc: addi $sp, $sp, -16
sw $s0, 0($sp)
sw $s1, 4($sp)
sw $s2, 8($sp)
sw $ra, 12($sp)
add $s0, $zero, $a0
addi $s1, $zero, 0
sll $t0, $a1, 2
add $s2, $s0, $t0
beq $s0, $s2, done
loop: lw $a0, 0($s0)
addi $s0, $s0, 4
add $s1, $s1, $v0
bne $s0, $s2, loop
done: add $v0, $zero, $s1
lw $s0, 0($sp)
lw $s1, 4($sp)
lw $s2, 8($sp)
lw $ra, 12($sp)
addi $sp, $sp, 16
Explain the purpose underlying the register
$at, which assembly
programs should not utilize.
$atis reserved for temporary data needed for such instructions.
an assembler might include a
bgt pseudo-instruction for
branching if one number is greater than another:
bgt $t1, $t0, there”.
This cannot be translated into any single instruction, and the
most reasonable translation involves computing a temporary value
slt $at, $t0, $t1
bne $at, $zero, there
The MIPS calling convention includes some callee-save registers,
$s1, and some caller-save registers,
$t1. Explain why the inclusion of both
types of registers in the calling convention improves program
Callee-save registers allow a subroutine to use a register to remember a value across other subroutines calls, without having to push and pop the stack surrounding each subroutine call. (The callee-save register would be only pushed and popped once — at the beginning and end of the subroutine.) Caller-save registers allow a subroutine to employ registers for short-term computations without an intervening subroutine call, without having to worry about restoring the register's previous value. Since both situations show up in typical subroutines, having both categories of registers allows a compiler to minimize the overall usage of the stack.
Describe two of the VAX's addressing modes, other than the register and immediate modes.
Four of the more important are: indirect, where the computer looks into memory at an address contained in a register; auto increment, where the computer looks into memory at an address contained in a register, and it then increments the register; index, where the computer looks into memory at an address computed as the sum of a register and an immdiate; and absolute, where the computer looks into memory at an address specified as an immediate.
Suppose we are designing an ISA for 16 registers, using a fixed-length 16-bit encoding.
a. What is the maximum number of ALU operations that we can
encode if each instruction encodes a destination and two
source registers (as in “
add R0, R1, R2”
to indicate that the sum of
be placed into
b. What is the maximum number of ALU operations that we can
encode if each instruction encodes a destination register (also
used as a source operand) and an additional source register
(as in “
add R0, R1”
to indicate that the sum of
be placed into
a. 16. Since it requires 4 bits to encode each register operand, encoding three such registers requires 3 ⋅ 4 = 12 bits. That leaves 4 bits left over, which can support up to 24 = 16 distinct values.
b. 256. Encoding two such registers requires 2 ⋅ 4 = 8 bits. That leaves 8 bits left over, which can support up to 28 = 256 distinct values.
What is an advantage of using variable-length instruction encoding, even though fixed-length requires less computation to implement?
Variable-length encoding tends to encode programs using fewer bytes, since the ISA is naturally designed with the more common operations requiring less space. (For example, x86 code typically takes 40% less space than a fixed-length 32-bit representation.) This is particularly significant for embedded devices: Less memory must be installed for the same functionality — besides saving the cost of including the memory, the device is also less power-hungry (reducing the need for batteries).
Explain why modern processors do not generally use the simple single-cycle implemention, even though this implementation provides the fastest possible completion of each instruction and requires less hardware than other techniques.
While it completes instructions quickly individually, it is only working on one instruction at a time, and so the overall throughput is less; a pipelined architecture increases latency but increases throughput, and throughput is what governs program performance.
(The advantage of requiring less transistors is a real advantage, but a very minor one: With today's technology, the number of transistors that can cheaply be placed on a chip far exceeds the number of transistors required for a single-cycle implementation.)
Suppose we have a processor using an 8-stage pipeline. If the computer has a 2 GHz clock, how many nanoseconds does it take to complete each instruction once it enters the pipeline?
|8 cycles||×||seconds||×||109 ns||= 4 ns|
|instruction||2 × 109 cycles||s|