Exam 1 Review A

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]

Problem R1a.1.

What is Moore's Law?

The number of transistors that fit on an integrated circuit doubles every 18–24 months.

Problem R1a.2.

Microprocessor clock rates grew at a quick clip from 1988 to 2003, going from 16 MHz to 3.2 GHz. But then the race for faster clock rates essentially stopped. Why?

A faster clock rate leads to transitions switching state faster, and each state transition generates heat. Until 3.2 GHz, this increase in heat could be overcome through more sophisticated heat sinks and fans. But architects have found little room for improving heat dissipation affordably.

Problem R1a.3.

Contrast the terms latency and bandwidth, explaining them in terms of delivering water buckets to a fire.

Latency is the amount of time to handle a single operation, whereas bandwidth is the overall rate at which operations can be completed. A basic example is putting out a fire using a bucket brigade: Latency would be the amount of time for one bucket to get from the water source to the fire, whereas whereas bandwidth would be the rate at which buckets would be poured onto the fire.

Problem R1a.4.

a. What is the geometric mean of 1, 4, 4, 16? What is the arithmetic mean?

b. In combining several benchmark measurements, why use a geometric mean rather than an arithmetic mean?

a. The geometric mean is 4. The arithmetic mean is 25/4 = 6.25.

b. Two advantages, either of which suffices as an answer:

i. If the measurements have different ranges, the geometric mean gives each measurement equal emphasis whereas the arithmetic mean gives emphasis to the measurement whose range is widest.

ii. Unlike the arithmetic mean, the geometric mean is independent of the base used to compute the measurement's ratios. That is, we could divide each measurement by one set of bases or another, and the resulting mean could still be compared effectively to the mean arrived at for another computer.

Problem R1a.5.

Distinguish between a memory-register instruction set (such as x86) and a load-store instruction set (such as MIPS). What are their relative performance advantages in today's technology?

A memory-register instruction set has many instructions that combine a memory access with another operation, such as an add or branch. By contrast, the only instructions in a load-store instruction set that access memory are the load and store instructions; other operations can operate only on registers and immediates.

Memory-register instruction sets are advantageous in situations where memory speed is fast relative to the processor. But with today's technology, memory is much slower, so restricting memory access to specialized instructions allows the other operations to complete more quickly.

Problem R1a.6.

Describe what is meant by a “stack architecture” (in contrast to, say, a load-store architecture).

A stack architecture is an architecture where instructions operate on a stack rather than on random-access registers. For example, a stack architecture would typically have an ADD instruction with no arguments, which would pop two numbers off the stack and then push their sum onto the stack.

Problem R1a.7.

Suppose we have a stack architecture supporting LOAD (direct addressing), STORE (direct addressing), ADD and MULT instructions.

a. Write a sequence of instruction to compute A ⋅ B + C ⋅ D and store the result in address 1000, assuming that A can be found at address 1004, B at 1008, C at 1012, and D at 1016.

b. Write a sequence of instructions that swaps the data found at memory addresses 2000 and 2004, so that what was stored at 2000 is moved to 2004, while what was stored at 2004 is moved to 2000. Your solution should not use any other memory addresses.

a. LOAD 1004 LOAD 1008 MULT LOAD 1012 LOAD 1016 MULT ADD STORE 1000 b. LOAD 2000 LOAD 2004 STORE 2000 STORE 2004

Problem R1a.8.

Define the term pseudo-instruction as used in the context of assembly language programs.

A pseudo-instruction is an operation that the assembler accepts, but which it translates into another operation or operations that is actually supported by the processor. An example is the move pseudo-instruction under some MIPS assemblers, which would actually translate it into an or instruction. For example, move $t0, $t1 might translate into or $t0, $zero, $t1.

Problem R1a.9.

Translate each of the following MIPS assembly instructions into its 32-bit machine language equivalent. Note: $t0 is encoded as register number 8.

a.	`xor $t0, $t1, $t2`
b.	`lw $t0, 4($t1)`

a.	`xor $t0, $t1, $t2`	→	000000 01001 01010 01000 00000 100110
b.	`lw $t0, 4($t1)`	→	100011 01001 01000 0000000000000100

Problem R1a.10.

Suppose we have a subroutine fact available to compute n!. Write a MIPS subroutine sqfact that, given n as a parameter, computes the lower 32 bis of (n!)².

sqfact: addi       $sp, $sp, -4
        sw         $ra, 0($sp)
        jal        fact
        mul        $v0, $v0
        mflo       $v0
        lw         $ra, 0($sp)
        addi       $sp, $sp, 4
        jr         $ra

Problem R1a.11.

Write a MIPS subroutine named fact that, given n as a parameter, computes the lower 32 bits of n! using recursion.

fact:   addi       $sp, $sp, -8
        sw         $a0, 0($sp)
        sw         $ra, 4($sp)
        addi       $v0, $zero, 1
        beq        $a0, $zero, done
        addi       $a0, $a0, -1
        jal        fact
        lw         $a0, 0($sp)
        mul        $v0, $a0
        mflo       $v0
done:   lw         $ra, 4($sp)
        addi       $sp, $sp, 8
        jr         $ra

Problem R1a.12.

Suppose we have a subroutine func that, given an integer parameter n returns some mysterious function of n (such as, say, n! or ⌊100 sin n⌋). The function works entirely with integers.

Write a MIPS subroutine sumFunc that, given a parameter which is the address of the first element of an array, and a second parameter which is the number of elements in that array, determines the sum of the function values for the array entries. For example, if func happens to compute n², and the array specified in the parameters is <1, 3, 6> then sumFunc would return 1² + 3² + 6² = 46.

sumFunc:  addi     $sp, $sp, -16
          sw       $s0, 0($sp)
          sw       $s1, 4($sp)
          sw       $s2, 8($sp)
          sw       $ra, 12($sp)
          add      $s0, $zero, $a0
          addi     $s1, $zero, 0
          sll      $t0, $a1, 2
          add      $s2, $s0, $t0
          beq      $s0, $s2, done
loop:     lw       $a0, 0($s0)
          jal      func
          addi     $s0, $s0, 4
          add      $s1, $s1, $v0
          bne      $s0, $s2, loop
done:     add      $v0, $zero, $s1
          lw       $s0, 0($sp)
          lw       $s1, 4($sp)
          lw       $s2, 8($sp)
          lw       $ra, 12($sp)
          addi     $sp, $sp, 16
          jr       $ra

Problem R1a.13.

Explain the purpose underlying the register $at, which assembly programs should not utilize.

Most assemblers have pseudo-instructions, which are instructions that appear to be a native instruction, but which really translate to a different instruction — or several different instructions. The register $at is reserved for temporary data needed for such instructions.

For example, an assembler might include a bgt pseudo-instruction for branching if one number is greater than another: “bgt $t1, $t0, there”. This cannot be translated into any single instruction, and the most reasonable translation involves computing a temporary value using the slt instruction:

slt $at, $t0, $t1 bne $at, $zero, there

Problem R1a.14.

The MIPS calling convention includes some callee-save registers, such as $s0 and $s1, and some caller-save registers, such as $t0 and $t1. Explain why the inclusion of both types of registers in the calling convention improves program performance.

Callee-save registers allow a subroutine to use a register to remember a value across other subroutines calls, without having to push and pop the stack surrounding each subroutine call. (The callee-save register would be only pushed and popped once — at the beginning and end of the subroutine.) Caller-save registers allow a subroutine to employ registers for short-term computations without an intervening subroutine call, without having to worry about restoring the register's previous value. Since both situations show up in typical subroutines, having both categories of registers allows a compiler to minimize the overall usage of the stack.

Problem R1a.15.

Describe two of the VAX's addressing modes, other than the register and immediate modes.

Four of the more important are: indirect, where the computer looks into memory at an address contained in a register; auto increment, where the computer looks into memory at an address contained in a register, and it then increments the register; index, where the computer looks into memory at an address computed as the sum of a register and an immdiate; and absolute, where the computer looks into memory at an address specified as an immediate.

Problem R1a.16.

Suppose we are designing an ISA for 16 registers, using a fixed-length 16-bit encoding.

a. What is the maximum number of ALU operations that we can encode if each instruction encodes a destination and two source registers (as in “add R0, R1, R2” to indicate that the sum of R1 and R2 should be placed into R0)?

b. What is the maximum number of ALU operations that we can encode if each instruction encodes a destination register (also used as a source operand) and an additional source register (as in “add R0, R1” to indicate that the sum of R0 and R1 should be placed into R0)?

a. 16. Since it requires 4 bits to encode each register operand, encoding three such registers requires 3 ⋅ 4 = 12 bits. That leaves 4 bits left over, which can support up to 2⁴ = 16 distinct values.

b. 256. Encoding two such registers requires 2 ⋅ 4 = 8 bits. That leaves 8 bits left over, which can support up to 2⁸ = 256 distinct values.

Problem R1a.17.

What is an advantage of using variable-length instruction encoding, even though fixed-length requires less computation to implement?

Variable-length encoding tends to encode programs using fewer bytes, since the ISA is naturally designed with the more common operations requiring less space. (For example, x86 code typically takes 40% less space than a fixed-length 32-bit representation.) This is particularly significant for embedded devices: Less memory must be installed for the same functionality — besides saving the cost of including the memory, the device is also less power-hungry (reducing the need for batteries).

Problem R1a.18.

Explain why modern processors do not generally use the simple single-cycle implemention, even though this implementation provides the fastest possible completion of each instruction and requires less hardware than other techniques.

While it completes instructions quickly individually, it is only working on one instruction at a time, and so the overall throughput is less; a pipelined architecture increases latency but increases throughput, and throughput is what governs program performance.

(The advantage of requiring less transistors is a real advantage, but a very minor one: With today's technology, the number of transistors that can cheaply be placed on a chip far exceeds the number of transistors required for a single-cycle implementation.)

Problem R1a.19.

Suppose we have a processor using an 8-stage pipeline. If the computer has a 2 GHz clock, how many nanoseconds does it take to complete each instruction once it enters the pipeline?

8 cycles	×	seconds	×	10⁹ ns	= 4 ns
instruction		2 × 10⁹ cycles		s