Session 18: Memory segments

Textbook: Sections 4.1 and 4.2

Partitioning memory
  Single partition
  Fixed partitions
  Variable partitions
Relocation problem
  Relocating on load
  Relocating through segment registers
  Comparison
Conserving memory
  Swapping
  Memory compaction

Partitioning memory

The OS exists primarily to govern computer resources. We've seen how an OS can govern CPU time through the use of processes. We've seen how an OS works with I/O devices. The final important resource for the OS to manage is memory.

Single partition

The simplest operating systems can support only a single process at a time. Such an OS can allocate a region of memory for holding the process (simply all the memory that the OS itself doesn't occupy), and the process can run within that memory space. There's nothing to this scheme.

Fixed partitions

A more complex operating system can allocate multiple regions of memory. Each time a program is loaded, the OS chooses an open region of memory for the process.

An actual implementation of this would have to have regions of many sizes, since some processes (e.g., Netscape) are very large, while other processes (e.g., lpd, the printer daemon) are tiny.

Interesting issues arise with this scheme in deciding how to choose a process when a partition becomes available. The assumption is that there are many more processes than there are regions, so that there are many processes waiting to get into memory. In a modern interactive OS, this isn't what happens, but the old batch systems worried about this.

Variable partitions

Modern operating systems typically partition memory dynamically. Regions are created as processes begin, and they continue until the process ends. The OS maintains a linked list listing available regions of memory.

When a new process comes in, the OS goes through the linked list looking for a chunk of memory that is large enough. There are lots of algorithms for doing this.

first fit: take the first region found that's big enough.
best fit: take the smallest region found that's big enough.
worst fit: take the largest region available (assuming it's large enough).

People have tried these algorithms and many more. Somewhat surprisingly, the best seems to be first fit. Best fit tends to break memory up into tiny pieces, as it will always choose the region that leaves the smallest available region after the process is put into memory. Worst fit quickly eats up large regions of memory. So, not only is first fit simpler than the others, it also does quite well.

Relocation problem

Having multiple partitions presents a problem: This means that references to particular memory addresses within the program won't be the same every time the program runs. For example, the program might call a particular function that begins 1024 bytes into the program. The exact memory address of this function varies depending on where the program is relocated. If the program is loaded beginning at 100K, then the call is to the function that begins at address 101K. If the program is loaded beginning at 150K, then the call is to the function that begins at address 151K.

These addresses cannot be determined absolutely at compile time, yet they need to be in memory when the program runs. This is called the relocation problem. We're going to look at two possible solutions.

Relocating on load

When the program is compiled, the compiler generates an executable file in the executable format. One possible solution involves restructuring this format. We'll have the format begin with the machine code, compiled as if the program was located at memory address 0. But after this portion of the file, the executable file will list every instruction referring to a memory address.

When the OS is to load a program into memory, it reads this file. It can determine where to place the file (the base address), and then read the machine code into that memory. At this time, the call to the function of our example is a call to the instruction at 1K, even though the program is loaded beginning at 100K. But then the OS reaches the portion of the executable file that lists instructions referring to memory addresses. For each of these instructions, the OS can add the base address into the address referred to in the instruction. It is at this time that in our example the reference to the function changes to 101K.

Now every reference to a memory address in the loaded program refers to the proper memory address. The program can run successfully.

Relocating through segment registers

An alternative scheme is to provide CPU support for relocation. Here, the CPU has one (or more) segment registers. Any time the CPU accesses memory, it adds the segment register to the address mentioned in the instruction to determine which memory address to access.

In this scheme, the executable format is much simpler: It can simply list the machine code. The OS loads this into memory (beginning at, say 100K). But before beginning the process, the OS stores the base address into the segment register.

When the program calls the function, it is still calling the function at 1K, and 1K is what goes into the program counter. But whenever the program reads an instruction from memory, it first adds the segment register. Thus, even though the program counter holds 1K, the instruction it loads is the PC+Segment=101K, which is the proper address.

Comparison

Relocating at load time is simpler from a hardware point of view. The CPU provides no hardware support, which leads to a slightly simpler (and hence more efficient) CPU. However, relocating an load time involves a significant overhead at the time the program is loaded into memory.

Segment registers complicate the CPU. The actual circuitry to support the addition is actually relatively small. Perhaps more significant is the addition of a more complex register set and instruction set.

One significant advantage of the segment registers is the added possibility of adding some form of memory protection. The CPU might also have a limit register. Before adding an offset to the segment (or base register), the CPU first checks to make sure the offset is less than the limit register. If it exceeds the limit register, the CPU can cause an interrupt to the OS indicating that the program has produced a segmentation fault. (In Unix systems, a segmentation fault produces a short message to that effect, followed by aborting the offending process and producing a core dump.)

The Intel chip has had segment registers from the Intel 8088. In the earliest version, the segment register held a number, which was shifted left by 4 and added to each memory reference. (There were many segment registers - today's chip has six different segment registers!)

Though this does have the advantage of solving the relocation problem, the reason for segment registers in the Intel 8088 was entirely different: Its registers could only hold 16 bits, allowing a memory address space of only 2¹⁶=64K of memory. At the time, this was generous for a personal computer, but the Intel designers were farsighted enough to see that it would cause problems. By shifting the segment register left 4 bits before adding it to the memory address, they could allow memory addresses of 20 bits, allowing a memory address space of 2²⁰=1M.

When they designed the Intel CPU, 1M was considered far beyond the potential needs of customers. And it was, for many years. They began to run into problems, however. With the 286 chip, they introduced a new way of using the segment registers, a process which was refined with the 386.

Incidentally, this whole issue is an issue that repeatedly recurs in the study of Programming Languages: When you push computation off later, you gain in flexibility but potentially lose in efficiency. In Programming Languages, the issue is whether you do something at compile-time or run-time. (For example, type checking might occur at run-time, leading to increased flexibility but lower efficiency.)

Conserving memory

As the OS continues running, memory becomes limited. One prominent way of addressing this problem is virtual memory, which we'll look at next time. But today I want to look at two alternative schemes.

Swapping

The first scheme is swapping. In this alternative, when process A has been running quite a while and another process B wants to run, the OS will suspend A, store A's current memory out to disk, and then load B into memory. Now B can run. After some time, the OS can store B out to disk, and load A back into memory.

Essentially, swapping works like multitasking, except the resource in question is memory rather than the CPU. The interaction of swapping and multitasking is interesting: A multitasker that simply ignores the issue of swapping can be catastrophic, since it will swap processes to and from disk with abandon. For example, if the multitasker uses the round-robin scheme, and A and B are both computational processes, then the multitasker would alternate between giving quanta to A and B. Thus, for each quantum, the OS would have to load in the entire process from disk again. Since loading from disk is so expensive, this is a big mistake.

Typically, you would want a tiered system: The multitasker works only within the processes currently in memory, with very short quanta; and the swapper works on a larger time scale, bringing things to and from memory.

Memory compaction

With variable-sized segments, memory tends to become fragmented as processes run. For example, if A is a 4K process placed at 1K, then B begins, B would be placed at 5K. When A terminates, C, a 3K process might come along, and the OS might choose to place C at the open spot where A terminated. As a result, there is a tiny 1K open space between B and C in memory. Such tiny spaces add up.

If the OS is using segment registers, it can conceivably block C at this point and copy all its memory downward to fill up the smaller empty space, increasing the size of the memory space above C. After it has finished, it can change C's segment register to point to its new base address, and then re-enable C to continue. This is memory compaction.

Notice that this is impossible if relocation occurs at load-time. In this scheme, any memory addresses in the data structures of the program hold pointers to actual addresses within the process's region. You can't relocate these data structures, since there's no way to reliably find all these addresses and change them. (You can go through the memory looking for what appear to be addresses - but something which looks like an address might actually be constant data.)

So the possibility of memory compaction is another argument for segment registers. In reality, I don't know that any operating systems actually do it, as memory compaction eats up the CPU simply copying memory.