Final Review B: Questions

Rfb.1.

Suppose two cores execute the following threads having access to the shared-memory variables x, y, and z.

initially, x is 0, y is 1, z is 2
Thread AThread B
x = 10;
y = 11;
z = 12;
z = 22;
y = 21;
x = 20;

Assuming our cache is coherent, which of the following are possible final values for the shared variables?

xyz
a.101112
b.101122
c.102112
d.102122
e.201112
f.201122
g.202112
h.202122
Rfb.2.

Suppose two cores execute the following threads having access to the shared-memory variables x and y.

initially, x is 0, y is 10
Thread AThread B
if (x != 0) {
    y = 11;
else if (x == 0) {
    y = 12;
}
x = 1;

Assuming our cache is coherent, which of the following are possible final values for the shared variables?

xy
a.110
b.111
c.112
Rfb.3.

Suppose two cores execute the following threads having access to the shared-memory variables a and b.

initially, a is 0, b is 1
Thread AThread B
x = y;
y = 2;
y = 3;
x = 4;

Suppose we have a sequentially consistent policy for cache coherency. Nonetheless, in executing these threads we find that the final value for x is 1 and y is 2 — as if, in fact, B did not even exist. This seems to contradict a sequentially consistent policy: Though only one thread can start before the other ends, it seems that A started before B finished (since x is 1) and also that B started before A finished (since y is 2). Why is it possible that x is 1 and y is 2 at the program's completion?

Rfb.4.

In warehouse-scale computing, the biggest cost category is purchasing the computing equipment. What is the second biggest cost category?

Rfb.5.

What is a “cold aisle” in warehouse-scale computing?

Rfb.6.

Distinguish between the terms “array” and “rack” as sets of servers in warehouse-scale computing.

Rfb.7.

Distinguish the terms “Platform as a Service” (PaaS) and “Infrastructure as a Service” (IaaS) in cloud computing.

Rfb.8.

What motivates the introduction of MapReduce/Hadoop to help with programming for warehouse-scale computing?

Rfb.9.

Suppose we have a set of files corresponding to days of the year; each file contains a list of location names followed by a tempature reading. How can we use MapReduce/Hadoop to determine the maximum temperature reading for each location?

Rfb.10.

Suppose we have a set of files corresponding to courses, each listing the students enrolled in that course. How can we use MapReduce/Hadoop to determine, for each student, the number of distinct students that this person has taken a class with?

Final Review B: Solutions

Rfb.1.

The possibities are (a.), (e.), (g.), and (h.). If x ends up at 10, then that means that we execute the first instruction of Thread A after completing the last instruction of Thread B, so none of B's writes have any effect; that disqualifies (b.), (c.), and (d.). If y ends up at 11, that means that we execute the A's second instruction after B's second instruction, so B's write to z will be overwritten by A's write to z; that disqualifies (f.).

Rfb.2.

Any of the three are possibilities. Obviously, if we do all of A before B, then y will be 12; and if we do all of B before A, then y will be 11. It's also possible that first A's first if condition fails (since x is 0), then B executes, and then A's second if condition fails (since x is now 1), so y will remain unchanged at 10.

Rfb.3.

With the MIPS architecture and other load-store architectures, the instruction x = y will translate into a load instruction (lw on MIPS) followed by a store instruction (sw on MIPS). It is possible that A will execute its load instruction first, placing 1 into a register; then both instructions of B will execute, then A will store the 1 still in its register into x and the 2 into y.

Rfb.4.

Equipment for power and cooling.

Rfb.5.

For cooling, cool air is vented into one aisle between servers, which is blown across individual servers to cool them before entering the “hot aisle” where it is vented out of the room.

Rfb.6.

A rack is a vertical stack of servers, typically 24 to 48. An array is a collection of typically around 20 to 40 racks.

Rfb.7.

With PaaS, the cloud provider configures the operating system and installed software, and the user will manage an account within that configuration. With IaaS, the user is given control over a specific machine (typically a virtual machine), which allows control over all (or nearly all) aspects of the OS installation.

Rfb.8.

In programming solutions for warehouse-scale computing, any solution must deal with distributing tasks among servers, communicating results between servers, and recovering from server failure. MapReduce/Hadoop is a framework that applies to a wide variety of problems using a large amount of data processing, as is typical in warehouse-scale computing problems, and it manages these details (task assignment, communication, server failures).

Rfb.9.

In the Map phase, we'd have a Map task for each file that would break the file into (locationName, temperatureReading) pairs. In the Reduce phase, we'd have a Reduce task that takes a locationName and the list of temperatureReadings, computes the maximum temperatureReading from the list, and produces (locationName, maximumReading) as its result.

Rfb.10.

In the Map phase, we'd have a Map task for each file that would loop through each combination of (studentA, studentB) pairs and emit it as a key-value pair. In the Reduce phase, we'd have a Reduce task that takes a student name and the list of students who that person has taken a class with; it would sort the list, removing duplicates, and then produce (locationName, sortedListLength) as its result.