Session 25: Threads, cont'd

Textbook: Section 2.1.3

Implementation
  kernel-level threads
  user-level threads
  comparison
  hybrid solutions
Practice

Implementation

An interesting issue arises when you talk about threads: How should they be implemented? The basic question is, should they be a part of the OS or not?

Kernel-level threads

The simplest solution is to say: Yes, they will be. We'll just implement the required routines as system calls. Whenever the program wants to create a thread, it will generate a software interrupt requesting that the OS creates a new thread.

The OS will maintain a thread table (analogous to the process table), and it will switch between threads just as it switches between processes.

I'm calling this ``simple'' because it doesn't take a lot of creativity or new concepts - you just throw it all into the kernel.

User-level threads

An alternative is to place all the thread management into the lap of the process. In this situation, the OS knows nothing about threads, and the thread routines are just library functions.

The thread table will be part of the process's memory. The kernel is ignorant of threads, and so it treats every process equally regardless of threads.

The thread library will implement the thread routines. If it is to be preemptive, it will have to schedule a signal with the OS so that it can regain control while a computational thread is running. (Unix has such a system call, called alarm(), though it only works with a granularity of a second.)

Even with user-level threads, some kernel support is needed. The alarm() system call is one example. But, also, library would need some way of avoiding system calls blocking. For example, the read() system call blocks until information becomes available. You don't want all the threads blocking while this is going on - after all, being able to do something during this time is frequently a major motivation behind using threads. Luckily, industrial-strength Unix systems already include such a system call (usually select()) to test whether a file descriptor is ready.

To give the user the illusion of being able to use regular system calls, the threads library would have to define wrapper functions for any system calls that might block. For example, read() would need a wrapper function. This wrapper function might test whether the file descriptor is ready for reading; if so, it would read and immediately return. Otherwise, it would block the thread and start up into another thread.

Comparison

So why go with one instead of the other?

kernel-level threads	user-level threads
requires more frequent switches to and from kernel mode (which takes a lot of time)	requires less such switching
makes kernel larger and hence more error-prone	makes processes larger and hence could involve additional memory expense
existing system calls require little overhead	handling blocking system requires wrapper functions
can handle page faults	page fault by any thread blocks all
programs are less portable across OSes	programs port to other OSes
programs can't control scheduling	programs can control scheduling policy
threads of same process can be scheduled on different processors	threads of same process cannot be scheduled on different processors

Overall, user-level threads are faster, but they're less flexible. One study timing the various approaches on a VAX computer.

type	time to create
user-level thread	34 microseconds
kernel-level thread	948 microseconds
process	11,300 microseconds

You can see that the overhead is real. (But the overhead of allocating a process is much more real!)

Hybrid solutions

In practice, many operating systems opt for a hybrid approach. Windows 2000, for example, defines the concept of a thread, a kernel-level thread, and a fiber, a user-level thread. A process can have multiple threads, and a thread can have multiple fibers.

Using fibers, a program can get the efficiency of user-level threads. But by having several threads, the program can take advantage of multiple processors. Additionally, the scheme handles OS blocks by simply switching to the fibers of another thread as the OS blocks.

This scheme is certainly the most complicated you could have, but it wins out on efficiency.

Practice

Using the mutex and condition variables defined in Pthreads, give a solution to Problem 1 from Quiz 3 (the one with save_obs() and read_obs()).

My solution for Quiz 3 used three semaphores: block_test (initially 1), is_good_wait (initially 0), and latest_lock (initially 1).

void save_obs(int value) {      int read_obs() {

                                    down(block_test);
                                    if(!is_good) {
                                        blocked = true;
                                        up(block_test);
                                        down(is_good_wait);
                                    } else {
                                        up(block_test);
                                    }

    down(latest_lock);              down(latest_lock);
    latest = value;                 int ret = latest;
    is_good = true;                 is_good = false;
    up(latest_lock);                up(latest_lock);

    down(block_test);
    if(blocked) up(is_good_wait);
    up(block_test);

                                    return ret;
}                               }

My solution using the Pthreads concepts uses a single mutex and a condition variable.