Exam 1 Review B

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

Problem R1b.1.

Suppose we can divide the execution of an instruction into eight possible pipeline stages:

S0: 0.20 ns

S1: 0.05 ns

S2: 0.10 ns

S3: 0.15 ns

S4: 0.05 ns

S5: 0.20 ns

S6: 0.10 ns

S7: 0.15 ns

However, inserting pipeline registers between stages requires adding 0.05 ns to the time for the stage.

a. If we use this proposed 8-stage pipeline, how long must each clock cycle be?

b. Now suppose that you want to combine stages while maintaining the same clock rate. Which pipeline stages would you combine? How many stages result?

c. Now suppose that you want to combine stages with each clock cycle taking 0.05 ns longer. Which pipeline stages would you combine? How many stages result?

d. Usually more stages enables a faster clock, which means that instructions are completed more quickly. Explain why sometimes fewer stages will enable instructions to complete more quickly.

a. 0.25 ns (the length of the longest stage, plus 0.05 ns)

b. Combine S1/S2 and S3/S4, giving a six-stage pipeline.

c. Combine S0/S1, S2/S3, S4/S5, S6/S7, giving a four-stage pipeline.

d. A pipeline with fewer stages can lead to shorter stalls: For example, if an add instruction needs its data starting in stage S4, and it depends on the result of the previous instruction, which doesn't become available until that instruction reaches S7, then the eight-stage pipeline will require stalling the add instruction for three clock cycles; the four-stage pipeline of part c would only require stalling for one clock cycle.

Problem R1b.7.

Suppose we have the following global declarations in C for representing memory with 16 lines of 8 bytes each.

struct cache_line { int tag; char data[8]; }; struct cache_line cache[16]; char mem[1024];

The following function uses this structure to simulate the retrieval of a value from a direct-mapped cache.

char fetch(int addr) {

    int tag  = addr >> 7;

    int indx = (addr >> 3) & 0xF;

    int offs = addr & 0x7;

    int i;

    if (cache[indx].tag == tag) {

        return cache[indx].data[offs];

    } else {

        cache[indx].tag = tag;

        for (i = 0; i < 8; i++) {

            cache[indx].data[i] = mem[(tag << 7) | (indx << 3) | i];

        }

        return cache[indx].data[offs];

    }

}

How can we modify this to simulate a two-way set-associative cache instead? Presume that we are using a random-replacement policy, using a function randbit() taking no parameters to generate a random bit.

char fetch(int addr) {

    int tag  = addr >> 6;

    int indx = (addr >> 3) & 0x7;

    int offs = addr & 0x7;

    int opt, i;

    for (opt = 2 * indx; opt < 2 * indx + 1; opt++) {

        if (cache[opt].tag == tag) {

            return cache[opt].data[offs];

        }

    }

    opt = 2 * indx + randbit();

    cache[opt].tag = tag;

    for (i = 0; i < 8; i++) {

        cache[opt].data[i] = mem[(tag << 6) | (indx << 3) | i];

    }

    return cache[opt].data[offs];

}

`lw $t0, ($sp)`	IF	ID	EX	MEM	WB
`add $sp, $sp, 4`
`lw $t1, ($sp)`
`xor $t2, $t0, $t1`
`sub $t3, $t0, $t1`
`andi $t3, $t3, 0xFFFF`

`lw $t0, ($sp)`	IF	ID	EX	MEM	WB
`add $sp, $sp, 4`		IF	ID	EX	MEM	WB
`lw $t1, ($sp)`			IF	ID	EX	MEM	WB
`xor $t2, $t0, $t1`				IF	ID	stl	EX	MEM	WB
`sub $t3, $t0, $t1`					IF	stl	ID	EX	MEM	WB
`andi $t3, $t3, 0xFFFF`							IF	ID	EX	MEM	WB

1.4	in fetching instructions from a dedicated 32KB instruction cache
38.4	in accessing data from a dedicated 32KB data cache
39.4	in accessing instructions and data from a unified 64KB cache

Exam 1 Review B

Problem R1b.1.

Problem R1b.2.

Problem R1b.3.

Problem R1b.4.

Problem R1b.5.

Problem R1b.6.

Problem R1b.7.

Problem R1b.8.

Problem R1b.9.

Problem R1b.10.

Problem R1b.11.

S0:	0.20 ns
S1:	0.05 ns
S2:	0.10 ns
S3:	0.15 ns
S4:	0.05 ns
S5:	0.20 ns
S6:	0.10 ns
S7:	0.15 ns