Exam 1 Review A: Questions

R1a.1.

What is Moore's Law?

R1a.2.

Microprocessor clock rates grew at a quick clip from 1988 to 2003, going from 16 MHz to 3.2 GHz. But then the race for faster clock rates essentially stopped. Why?

R1a.3.

Contrast the terms latency and bandwidth, explaining them in terms of delivering water buckets to a fire.

R1a.4.

a. What is the geometric mean of 1, 4, 4, 16? What is the arithmetic mean?

b. In combining several benchmark measurements, why use a geometric mean rather than an arithmetic mean?

R1a.5.

Distinguish between a memory-register instruction set (such as x86) and a load-store instruction set (such as MIPS). What are their relative performance advantages in today's technology?

R1a.6.

Describe what is meant by a “stack architecture” (in contrast to, say, a load-store architecture).

R1a.7.

Suppose we have a stack architecture supporting LOAD (direct addressing), STORE (direct addressing), ADD and MULT instructions.

a. Write a sequence of instruction to compute A ⋅ B + C ⋅ D and store the result in address 1000, assuming that A can be found at address 1004, B at 1008, C at 1012, and D at 1016.

b. Write a sequence of instructions that swaps the data found at memory addresses 2000 and 2004, so that what was stored at 2000 is moved to 2004, while what was stored at 2004 is moved to 2000. Your solution should not use any other memory addresses.

R1a.8.

Define the term pseudo-instruction as used in the context of assembly language programs.

R1a.9.

Translate each of the following MIPS assembly instructions into its 32-bit machine language equivalent. Note: $t0 is encoded as register number 8.

a.	`xor $t0, $t1, $t2`
b.	`lw $t0, 4($t1)`

R1a.10.

Suppose we have a subroutine fact available to compute n!. Write a MIPS subroutine sqfact that, given n as a parameter, computes the lower 32 bis of (n!)².

R1a.11.

Write a MIPS subroutine named fact that, given n as a parameter, computes the lower 32 bits of n! using recursion.

R1a.12.

Suppose we have a subroutine func that, given an integer parameter n returns some mysterious function of n (such as, say, n! or ⌊100 sin n⌋). The function works entirely with integers.

Write a MIPS subroutine sumFunc that, given a parameter which is the address of the first element of an array, and a second parameter which is the number of elements in that array, determines the sum of the function values for the array entries. For example, if func happens to compute n², and the array specified in the parameters is <1, 3, 6> then sumFunc would return 1² + 3² + 6² = 46.

R1a.13.

Explain the purpose underlying the register $at, which assembly programs should not utilize.

R1a.14.

The MIPS calling convention includes some callee-save registers, such as $s0 and $s1, and some caller-save registers, such as $t0 and $t1. Explain why the inclusion of both types of registers in the calling convention improves program performance.

R1a.15.

Describe two of the VAX's addressing modes, other than the register and immediate modes.

R1a.16.

Suppose we are designing an ISA for 16 registers, using a fixed-length 16-bit encoding.

a. What is the maximum number of ALU operations that we can encode if each instruction encodes a destination and two source registers (as in “add R0, R1, R2” to indicate that the sum of R1 and R2 should be placed into R0)?

b. What is the maximum number of ALU operations that we can encode if each instruction encodes a destination register (also used as a source operand) and an additional source register (as in “add R0, R1” to indicate that the sum of R0 and R1 should be placed into R0)?

R1a.17.

What is an advantage of using variable-length instruction encoding, even though fixed-length requires less computation to implement?

R1a.18.

Explain why modern processors do not generally use the simple single-cycle implemention, even though this implementation provides the fastest possible completion of each instruction and requires less hardware than other techniques.

R1a.19.

Suppose we have a processor using an 8-stage pipeline. If the computer has a 2 GHz clock, how many nanoseconds does it take to complete each instruction once it enters the pipeline?

Exam 1 Review A: Solutions

R1a.1.

The number of transistors that fit on an integrated circuit doubles every 18–24 months.

R1a.2.

A faster clock rate leads to transitions switching state faster, and each state transition generates heat. Until 3.2 GHz, this increase in heat could be overcome through more sophisticated heat sinks and fans. But architects have found little room for improving heat dissipation affordably.

R1a.3.

Latency is the amount of time to handle a single operation, whereas bandwidth is the overall rate at which operations can be completed. A basic example is putting out a fire using a bucket brigade: Latency would be the amount of time for one bucket to get from the water source to the fire, whereas whereas bandwidth would be the rate at which buckets would be poured onto the fire.

R1a.4.

a. The geometric mean is 4. The arithmetic mean is 25/4 = 6.25.

b. Two advantages, either of which suffices as an answer:

i. If the measurements have different ranges, the geometric mean gives each measurement equal emphasis whereas the arithmetic mean gives emphasis to the measurement whose range is widest.

ii. Unlike the arithmetic mean, the geometric mean is independent of the base used to compute the measurement's ratios. That is, we could divide each measurement by one set of bases or another, and the resulting mean could still be compared effectively to the mean arrived at for another computer.

R1a.5.

A memory-register instruction set has many instructions that combine a memory access with another operation, such as an add or branch. By contrast, the only instructions in a load-store instruction set that access memory are the load and store instructions; other operations can operate only on registers and immediates.

Memory-register instruction sets are advantageous in situations where memory speed is fast relative to the processor. But with today's technology, memory is much slower, so restricting memory access to specialized instructions allows the other operations to complete more quickly.

R1a.6.

A stack architecture is an architecture where instructions operate on a stack rather than on random-access registers. For example, a stack architecture would typically have an ADD instruction with no arguments, which would pop two numbers off the stack and then push their sum onto the stack.

R1a.7.

a. LOAD 1004 LOAD 1008 MULT LOAD 1012 LOAD 1016 MULT ADD STORE 1000 b. LOAD 2000 LOAD 2004 STORE 2000 STORE 2004

R1a.8.

A pseudo-instruction is an operation that the assembler accepts, but which it translates into another operation or operations that is actually supported by the processor. An example is the move pseudo-instruction under some MIPS assemblers, which would actually translate it into an or instruction. For example, move $t0, $t1 might translate into or $t0, $zero, $t1.

R1a.9.

a.	`xor $t0, $t1, $t2`	→	000000 01001 01010 01000 00000 100110
b.	`lw $t0, 4($t1)`	→	100011 01001 01000 0000000000000100

R1a.10.

sqfact: addi       $sp, $sp, -4
        sw         $ra, 0($sp)
        jal        fact
        mul        $v0, $v0
        mflo       $v0
        lw         $ra, 0($sp)
        addi       $sp, $sp, 4
        jr         $ra

R1a.11.

fact:   addi       $sp, $sp, -8
        sw         $a0, 0($sp)
        sw         $ra, 4($sp)
        addi       $v0, $zero, 1
        beq        $a0, $zero, done
        addi       $a0, $a0, -1
        jal        fact
        lw         $a0, 0($sp)
        mul        $v0, $a0
        mflo       $v0
done:   lw         $ra, 4($sp)
        addi       $sp, $sp, 8
        jr         $ra

R1a.12.

sumFunc:  addi     $sp, $sp, -16
          sw       $s0, 0($sp)
          sw       $s1, 4($sp)
          sw       $s2, 8($sp)
          sw       $ra, 12($sp)
          add      $s0, $zero, $a0
          addi     $s1, $zero, 0
          sll      $t0, $a1, 2
          add      $s2, $s0, $t0
          beq      $s0, $s2, done
loop:     lw       $a0, 0($s0)
          jal      func
          addi     $s0, $s0, 4
          add      $s1, $s1, $v0
          bne      $s0, $s2, loop
done:     add      $v0, $zero, $s1
          lw       $s0, 0($sp)
          lw       $s1, 4($sp)
          lw       $s2, 8($sp)
          lw       $ra, 12($sp)
          addi     $sp, $sp, 16
          jr       $ra

R1a.13.

Most assemblers have pseudo-instructions, which are instructions that appear to be a native instruction, but which really translate to a different instruction — or several different instructions. The register $at is reserved for temporary data needed for such instructions.

For example, an assembler might include a bgt pseudo-instruction for branching if one number is greater than another: “bgt $t1, $t0, there”. This cannot be translated into any single instruction, and the most reasonable translation involves computing a temporary value using the slt instruction:

slt $at, $t0, $t1 bne $at, $zero, there

R1a.14.

Callee-save registers allow a subroutine to use a register to remember a value across other subroutines calls, without having to push and pop the stack surrounding each subroutine call. (The callee-save register would be only pushed and popped once — at the beginning and end of the subroutine.) Caller-save registers allow a subroutine to employ registers for short-term computations without an intervening subroutine call, without having to worry about restoring the register's previous value. Since both situations show up in typical subroutines, having both categories of registers allows a compiler to minimize the overall usage of the stack.

R1a.15.

Four of the more important are: indirect, where the computer looks into memory at an address contained in a register; auto increment, where the computer looks into memory at an address contained in a register, and it then increments the register; index, where the computer looks into memory at an address computed as the sum of a register and an immdiate; and absolute, where the computer looks into memory at an address specified as an immediate.

R1a.16.

a. 16. Since it requires 4 bits to encode each register operand, encoding three such registers requires 3 ⋅ 4 = 12 bits. That leaves 4 bits left over, which can support up to 2⁴ = 16 distinct values.

b. 256. Encoding two such registers requires 2 ⋅ 4 = 8 bits. That leaves 8 bits left over, which can support up to 2⁸ = 256 distinct values.

R1a.17.

Variable-length encoding tends to encode programs using fewer bytes, since the ISA is naturally designed with the more common operations requiring less space. (For example, x86 code typically takes 40% less space than a fixed-length 32-bit representation.) This is particularly significant for embedded devices: Less memory must be installed for the same functionality — besides saving the cost of including the memory, the device is also less power-hungry (reducing the need for batteries).

R1a.18.

While it completes instructions quickly individually, it is only working on one instruction at a time, and so the overall throughput is less; a pipelined architecture increases latency but increases throughput, and throughput is what governs program performance.

(The advantage of requiring less transistors is a real advantage, but a very minor one: With today's technology, the number of transistors that can cheaply be placed on a chip far exceeds the number of transistors required for a single-cycle implementation.)

R1a.19.

8 cycles	×	seconds	×	10⁹ ns	= 4 ns
instruction		2 × 10⁹ cycles		s