Test 1 Review B: Questions
In an earlier question, we implemented a ThreadHolder
class, where threads could call the awaitRelease
method
to stall until another thread calls release
with a number
saying how many threads stuck in awaitRelease
should be
released. (You should assume that release
is only called
when there are enough threads stalling that all can be released
immediately.)
public class ThreadHolder {
private int released = 0;
private Object lock = new Object();
public void awaitRelease() {
synchronized (lock) {
while (released <= 0) {
try { lock.wait(); } catch (InterruptedException e) {}
}
--released;
}
}
public void release(int n) {
synchronized (lock) {
released += n;
lock.notifyAll();
}
}
}
This solution, though, is prone to starvation even if a thread
regularly calls release
with a positive parameter n
.
Explain how starvation may occur,
and repair the code so starvation will not occur.
Summarize what happens with a semaphore's DOWN operation.
Java's built-in Semaphore
class implements semaphores,
including a constructor taking an int
parameter giving
the initial value, a no-argument acquire
method equivalent to the
DOWN operation, and a no-argument release
method
equivalent to the UP operation.
Below is the skeleton of a Web server. Using the Semaphore
class,
how could you modify it so that no more than 5 Responder
s
will be active at any time?
public class WebServer {
private class Responder implements Runnable {
public void run() {
// (omitted) receive request
// (omitted) send response to client
// TODO: tell "main" we are done
}
}
private void doMain() {
// (omitted) start the server listening
while (true) {
// (omitted) accept connection from client
// TODO: wait until less than 5 Responders are active
// start thread to service client
Responder responder = new Responder();
// (omitted) initialize responder
new Thread(responder).start()
}
}
public static void main(String[] args) {
new WebServer().doMain();
}
}
Java's built-in Semaphore
class implements semaphores,
including a constructor taking an int
parameter giving
the initial value, a no-argument acquire
method equivalent to the
DOWN operation, and a no-argument release
method
equivalent to the UP operation.
Using no other primitives but the Semaphore
class,
complete the following Barrier
class so that each thread calling
awaitOthers
stalls within the method until the tenth,
whereupon all ten threads return from awaitOthers
.
public class Barrier {
public void awaitOthers() {
}
}
Many processors provide a test-and-set machine language instruction to help an OS provide synchronization. Given a register and memory address as arguments, it places the current value in the provided memory address into the register and then updates the memory address to be 1. In C pseudocode, we can implement a critical section as follows.
while (test_and_set(lock_addr) != 0); // wait until lock changed from 0
// CRITICAL SECTION HERE
*lock_addr = 0; // reset lock to 0 to release it
Now suppose we need two locks lock0
and lock1
before entering a critical section. The straghtforward way
to accomplish this is the following.
while (test_and_set(lock0) != 0); // first acquire lock0
while (test_and_set(lock1) != 0); // then acquire lock1
// CRITICAL SECTION HERE
*lock1 = 0; // release lock1
*lock0 = 0; // release lock0
However, this holds on to the first lock for an indefinite period (until the second lock becomes available) before entering the critical section. It would be better to avoid this. How could we rewrite this so that no lock is held for a long period of time?
We discussed the relative merits of implementing threads as a kernel-independent user library (user-level threads) and of implementing threads via system calls to the kernel (kernel-level threads). Explain an advantage of one of the approaches over the other. Be sure to mention which approach your point supports.
Explain what a hybrid thread implementation is and how it can potentially combine the advantages of user-level threads and kernel-level threads (at the expense of more complexity).
Suppose we have four jobs arriving according to the following schedule (all units are in hours).
job | arrive | length |
A | 0 | 6 |
B | 2 | 5 |
C | 4 | 4 |
D | 6 | 3 |
For each of FIFO, SJF, and Round Robin (with a one-hour time slice), list the intervals in which each job is worked on, and compute the average delay.
Round-robin scheduling does poorly in minimizing the average delay. So what advantage does it have over FIFO or SJF?
In a system where some processes are I/O-bound while others are compute-bound, how can we improve our scheduling over a simple Round Robin approach to improve the overall system performance?
In a multiprocessor system, why can it be advantageous to try to schedule multithreaded programs simultaneously across different processors (rather than simply letting a thread come onto a processor whenever it happens to happen)?
Test 1 Review B: Solutions
There is nothing dictating in which order threads will wake
up from wait
, and in fact it could be that the threads
that most recently entered awaitRelease
will be the ones
that wake up and return. Thus, as long as more threads are in
awaitRelease
than are released by release
, there
could be a thread that never manages to exit awaitRelease
simply because it never wins.
A solution would be for each thread entering awaitRelease
to essentially “pick a number” and to ensure that
those with the smallest numbers are those that are released.
public class ThreadHolder {
private int lastReleased = 0;
private int lastAllocated = 0;
private Object lock = new Object();
public void awaitRelease() {
int myNumber;
synchronized (lock) {
lastAllocated++;
myNumber = lastAllocated;
while (lastReleased < myNumber) {
try { lock.wait(); } catch (InterruptedException e) {}
}
}
}
public void release(int n) {
synchronized (lock) {
lastReleased += n;
lock.notifyAll();
}
}
}
[There is a slight bug here with overflow. It's not
particularly likely to be an issue, but it would be safer to use
BigInteger
rather than an int
.]
We decrement the integer held as part of the semaphore — but if the semaphore had already reached zero, we first wait until the semaphore becomes positive.
The three lines marked “// ADDED
” below are the
only changes.
public class WebServer {
private class Responder implements Runnable {
public void run() {
// (omitted) receive request
// (omitted) send response to client
// TODO: tell "main" we are done
availableSlots.release(); // ADDED
}
}
private Semaphore availableSlots = new Semaphore(5); // ADDED
private void doMain() {
// (omitted) start the server listening
while (true) {
// (omitted) accept connection from client
// TODO: wait until less than 5 Responders are active
availableSlots.acquire(); // ADDED
// start thread to service client
Responder responder = new Responder();
// (omitted) initialize responder
new Thread(responder).start()
}
}
public static void main(String[] args) {
new WebServer().doMain();
}
}
public class Barrier {
private int numWaiting = 0;
private Semaphore numWaitingLock = new Semaphore(1);
private Semaphore waitToReturn = new Semaphore(0);
public void awaitOthers() {
numWaitingLock.acquire(); // lock for access to numWaiting
++numWaiting;
boolean inFirstNine = numWaiting < 10;
numWaitingLock.release(); // done accessing it, so release lock
if (inFirstNine) { // I am in first 9; wait until tenth releases me
waitToReturn.acquire();
} else { // I am tenth; release the other 9
for (int i = 0; i < 9; i++) {
waitToReturn.release();
}
}
}
}
while (1) {
if (test_and_set(lock0) != 0) { // we acquired lock0
if (test_and_set(lock1) != 0) {
break; // both locks acquired, exit loop
} else {
*lock0 = 0; // lock1 failed, release lock0
}
}
}
// CRITICAL SECTION HERE
*lock0 = 0;
*lock1 = 0;
[By the way, a better approach would be for the else
case to release lock0
, then work on acquiring lock1
before it attempts to acquire lock0
again. But this is
a more complex.]
Advantages of user-level threads:
- In switching contexts between threads, avoids switching to kernel mode, which adds quite a bit of time to the context switch.
- OS-level support is minimal, so threaded applications can be more portable across operating systems.
- Applications have more control over the scheduling policy used for switching between threads.
Advantages of kernel-level threads:
- On multicore/multiprocessor systems, threads can be scheduled simultaneously on different cores/processors.
- When one thread blocks for a page fault or device access, the OS knows about other live threads in the same process that can still use the CPU.
In a hybrid threading system, a process has a handful of kernel-level threads, each of which manages a large group of user-level threads. For a context switch between threads, we can usually switch between user-level threads, avoiding entry into the CPU's kernel mode. But different kernel-level threads can still be scheduled on different processors; and if a thread blocks for a page fault or device access, the OS can switch to another kernel-level thread.
- FIFO
time job 0–6 A 6–11 B 11–15 C 15–18 D job arrive complete delay A 0 6 6 B 2 11 9 C 4 15 11 D 6 18 12 The total delay is 6 + 9 + 11 + 12 = 38, so the average is 38 / 4 = 9.5.
- SJF
time job 0–6 A 6–9 D 9–13 C 13–18 B job arrive complete delay A 0 6 6 B 2 18 16 C 4 13 9 D 6 9 3 The total delay is 6 + 16 + 9 + 3 = 34, so the average is 34 / 4 = 8.5.
- Round Robin (time slice 1)
time job 0–2 A 2–3 B 3–4 A 4–5 C 5–6 B 6–7 D 7–8 A 8–9 C 9–10 B 10–11 D 11–12 A 12–13 C 13–14 B 14–15 D 15–16 A 16–17 C 17–18 B job arrive complete delay A 0 16 16 B 2 18 16 C 4 17 13 D 6 15 9 The total delay is 16 + 16 + 13 + 9 = 54, so the average is 54 / 4 = 13.5.
Over SJF, it has the advantage that it works without any foreknowledge of job lengths, and it is not prone to starving.
Over FIFO, Round Robin ensures that the processor regularly spends time on each process, so that short-running jobs will tend to complete faster. By contrast, FIFO can lead the processor to have very long delays as the CPU works through the longest-running jobs.
We should prioritize those processes that are I/O-bound, so that whenever they are ready for the CPU we can spend the small amount of time it takes so that the I/O-bound process is again waiting on I/O. This will use the I/O devices more heavily while having a negligible effect on CPU usage.
When the threads are trying to communicate, having both scheduled simultaneously can allow them to communicate with smaller delays.
Also, the cache can potentially be used more effectively if it is only being used for one process's memory.