Garbage collection

by Carl Burch, Hendrix College, November 2011

Creative Commons License
Garbage collection by Carl Burch is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
Based on a work at www.cburch.com/books/gc/.

Contents

1. Motivation
2. Basic algorithms
2.1. Reference counting
2.2. Mark and sweep
2.3. Copy collection
2.4. Generational collection
3. Real-world garbage
3.1. Collectors
3.2. Programming implications

1. Motivation

Much software is written without the availability of garbage collection. C is an example of this: C programs that allocate memory using malloc must deallocate it using free. Whenever programmers allocate memory on the heap, they must carefully track when the memory is being used, and the program must include an explicit deallocation after the memory usage is complete.

If a program loses track of allocated memory without deallocating it, the program is said to have a memory leak. For long-running software, memory leaks are bothersome bugs: The program seems to work well, but it grows slower and slower over time — and, eventually, the system will allow no further memory allocations. Fortunately, there are tools that help to identify memory leaks, such as valgrind and Purify, but these programs only identify allocated memory blocks that seem to be inaccessible, and the programmer must then try to figure out how to modify the program so that the block is deallocated after its final use.

This technique of explicit deallocation leads to other problems in addition to memory leaks: A program might deallocate a memory block before it is actually finished using it, and the program would go on to use the memory even though it is deallocated — and by that point the memory might be used for other purposes. Alternatively, a program might end up deallocating the same block of memory twice, which can lead to problems of its own. Tools that identify memory leaks usually identify these problems as well.

Worrying about the lifetime of memory blocks can take a substantial portion of developer time. Naturally, many programmers regard this as a waste: Shouldn't the computer be able to figure out for itself when memory is no longer being used? This is what leads to garbage collection, which encompasses automatic techniques that a system includes for identifying the used memory and reclaiming all unused memory (garbage) for future allocations to recycle.

Garbage collection has been around since nearly the beginning of programming: LISP was one of the first programming languages (from 1959), and though it never achieved wide usage, it did include garbage collection. But early garbage collection implementations tended to be quite slow, and they often involved a program freezing in unpredictable situations while the system searched for memory to reclaim; this frozen time could even last for minutes. Most programmers disregarded garbage collection as impractical. But in 1995 the Java programming language was introduced with no memory deallocation mechanism aside from garbage collection, and programmers using it quickly found that it was practical despite this. Today, most new systems use garbage collection as a matter of course, though of course there are legacy systems (notably, those built in C or C++) where memory must still be explicitly deallocated.

2. Basic algorithms

Computer scientists have developed several basic techniques for garbage collection. As we'll see in Section 3.1, production-grade systems end up combining the techniques. But before we can examine production-grade systems we need to understand the basic underlying techniques. We'll examine four: reference counting, mark-and-sweep, copy collection, and generational collection.

2.1. Reference counting

In reference counting, the system includes a count for each allocated block of memory, which tracks the number of pointers referencing that memory block. In what follows, we'll imagine that each memory block is preceded by a four-byte header containing this count. Thus, if a program asks to allocate 12 bytes, the system would actually allocate an additional 4 bytes for this count (and probably an additional 4 bytes as a header containing information about the block, such as the block's length).

Confronted with a program, a compiler will encounter an assignment statement such as r4 = r5. If r4 and r5 were integer variables that were stored in registers, then the compiler could translate this into a single assembly instruction (on the ARM, MOV R4R5); but if r4 and r5 are pointer variables, then a compiler using reference counting would instead compile it to assembly code that accomplishes the following steps:

  1. Increment the reference count for the memory block referenced by r5, since following the assignment there will be one more reference to that memory block.
  2. Decrement the reference count for the memory block currently referenced by r4, since following the assignment there will be one less reference to that memory block.
  3. If decrementing the reference count results in 0, deallocate the memory block currently referenced by r4.
  4. Reassign r4 to point to the same place as r5.

For ARM's assembly language, such a compiler would translate r4 = r5 into the following code.

LDR R2, [R5#-4]    ; Increment r5's reference count.
ADD R2R2#1
STR R2, [R5#-4]
LDR R2, [R4#-4]    ; Decrement r4's reference count.
SUBS R2R2#1
STRNE R2, [R4#-4]
MOVEQ R0R4         ; If count reaches 0, deallocate r4
BLEQ free
MOV R4R5           ; Assign r4 to point to same place as r5.

Reference counting is a popular approach to garbage collection because it is simple, and because it aggressively identifies and deallocates memory that is no longer being used. That said, it suffers from three problems that lead system designers to seek alternative techniques.

2.2. Mark and sweep

The mark-and-sweep algorithm for garbage collection involves periodically going through memory to identify all unused pieces of memory. Unlike with reference counting, pointer assignments happen by simply reassigning the pointer. But whenever not enough memory is available to satisfy an allocation request, we go through the following process to identify and reclaim all unused memory.

  1. For every existing block of allocated memory, we clear the mark associated with the block. The mark is simply a single bit, probably stored as part of the header describing the block. (Such a header is required anyway to indicate how long the block is — and thus where the following block can be found.)

  2. Now we go through and mark all memory in use. We do this by executing the below mark procedure for each variable p that points to memory in the heap.

    function mark(p):
        if p's mark is 0:
            Set p's mark to 1.
            for each pointer q in p's block:
                mark(q)

    This recursive algorithm manages to mark all blocks of memory reachable from any accessible variable. Thus, by the end of this step, all reachable memory has a mark of 1 in its header, while all unreachable memory has a mark of 0.

  3. For each block of memory whose mark is still 0, reclaim that block for use by future memory allocations.

The mark-and-sweep algorithm is intuitively simple, and it identifies all unused blocks of memory. It does have a notable disadvantages, though: The program must be frozen during the mark-and-sweep algorithm. The algorithm involves stepping through all memory blocks, and so it can take quite a long time. For a simple implementation, then, you'd find that the program would suddenly freeze while it attempts to find more memory to reclaim. The times when this happen appear to be unpredictably random.

(You might object that mark-and-sweep could be executed in the background, as essentially a separate process that runs whenever the current process is idle. This would be nice, but unfortunately it doesn't work: Suppose our program involves two pointers, p and q, and q is the sole pointer to its block of memory, which we'll call B. Now, let's suppose that the mark-and-sweep process is in the middle of its marking step, having already handled p but not yet q (and thus B remains unmarked); and at this point the program ends up assigning p = q and q = NULL. Now the marking step might continue onto q, but it won't mark B, since q no longer points there. Thus, B remains unmarked still, and it would be reclaimed for other purposes even though B is now reachable through p.)

Another problem with mark-and-sweep is that the recursion stack in calling mark can get very deep. For example, if we have a linked list with 10,000 nodes in it, then the stack for mark would grow to include 10,000 nodes. This possibility of heavy memory usage comes at the time that it can least be afforded!

2.3. Copy collection

In contrast to the two techniques we have already explored, copy collection moves reachable memory blocks to different locations even as they remain in use. We imagine that memory consists of two halves, which we'll term from-space and to-space. All allocated memory will be in from-space; and when we allocate memory, we allocate in from-space. But eventually from-space will become full, and then we will start our copy collection.

The goal of copy collection is to move all reachable memory from from-space to to-space. We start just with the blocks referenced directly by a variable. Then we start iterating through the blocks already copied into to-space; for each pointer within each of these blocks, we copy the block referenced by it (and modify the pointer to reference its new location). As we go through this process, though, we will encounter some pointers referencing blocks that have already moved; to deal with this, our process of copying a block to to-space leaves a tombstone behind at the block's prior location in from-space, saying where to find the block's new location.

We'll look at some pseudocode for accomplish this. First, we'll define a function that finds a new location in to-space for its parameter block a and returns this new location.

function copy_block(a):
    if a references a normal block in from-space:
        Let b reference allocated region of to-space large enough to hold block.
        Copy block at a over to b.
        Create tombstone at a holding new location b.
        return b.
    else:
        return address in tombstone referenced by a.

We then go through the following procedure to complete the copy collection.

for each pointer variable p:
    pcopy_block(p)
for each block in to-space:
    for each pointer q within block:
        qcopy_block(q)

The following example illustrates.

Our example supposes we have three variables (x, y, and z) from which we can access blocks in the heap. The heap contains five blocks as illustrated. All blocks are in from-space.
  1. We begin by copying each block referenced by a variable. In the illustration at left, we have handled the first two variables, x and y. The memory blocks that they referenced, A and B, have been copied over into to-space, and left behind are the tombstones.

    The memory-copying process is a simple bit-by-bit copy; any addresses within A and B remain identical, so the pointer within A (and within B) still points into from-space; the pointer will be repaired later. Since we have not yet processed z, it is pointing to the tombstone left behind in the process of copying B.

  1. In processing z, we notice that z points to a tombstone, so rather than copying the memory referenced by z, we simply change z to point to the memory referenced by the tombstone.
  2. After processing the last of the variables, we proceed to processing the pointers for each block in to-space; in the illustration at left, we have processed the pointer in A, which references E. Since E was in from-space, it is copied into to-space with a tombstone left behind.
  1. We continue by processing the pointer in B, which references C. Since C was in from-space, it is copied into to-space with a tombstone left behind.
  2. The next block, E, contains no pointers, so there is nothing to do for it.
  1. We proceed to the next block, C, whose one pointer references what is now E's tombstone; we update the pointer to the reference contained within the tombstone.
  2. Having completed all blocks in to-space, we swap the meaning of from-space and to-space, and we regard to-space as empty. Notice that this has cleaned out block D; this is expected since D was not reachable from any of the variables. Future memory allocations will be from from-space as it is now defined, until it becomes full, at which point we'll repeat the process.

Copy collection has some notable disadvantages. Most notably, by dividing memory into two halves and requiring all the usable memory to lie in one of the halves at any point in time, we have halved the amount of memory available. With an adequately-sized address space, this is not necessarily a huge problem, but it is certainly a notable issue.

Another major issue with copy collection is the same as for mark-and-sweep: The current program must be frozen while garbage collection is in progress. One reason is that in the middle of collection, some pointers reference tombstones, which would lead to complications if a program were allowed to access such a pointer. This might be resolved by including code for each memory reference to test whether it references a tombstone, though that would come at a performance penalty. Another problem is the same scenario that led to problems with programs running during the mark phase of mark-and-sweep; it would need to be addressed here as well if programs could run during the copy collection.

A final issue with copy collection — though this one can easily be addressed — is that the algorithm we described spreads related blocks out. In our example, blocks B and C were related (by a pointer from B to C), and they were adjacent in memory before the copy collection; but after the copy collection, we saw that they ended up being separated by another block. This behavior can lead to reduced performance with virtual memory, since B and C could end up on different pages. This possibility can be reduced, however, by modifying our program so that whenever a block is copied into to-space, we also copy all blocks referenced by pointers within that block (and perhaps also all blocks referenced by pointers within those blocks).

By contrast to these disadvantages, copy collection has a major advantage: Unlike any other memory management technique we have seen (explicit deallocation, reference counting, mark and sweep), copy collection compacts memory. This simplifies normal memory allocation tremendously: With these other techniques, a request to allocate memory would require searching through all the fragments of previously deallocated memory. But since copy collection compacts the used memory so that there is one huge chunk of free memory, memory allocation with copy collection has no fragments to worry about. The end result is that memory allocation can actually be faster with copy collection than it would be if programs explicitly deallocated memory.

Of course, you still have the time required to actually perform the copy collection. But there's a significant performance improvement here, too: Notice that copy collection only involves looking at the memory that is actually being used. In our example above, we never even looked at block D. If, then, our program happens to have many short-lived memory allocations, then each copy collection would only go through the few still-live allocations, and so it would complete quickly. And since memory allocation can go very quickly as well, overall the program could be faster than any other technique available.

While this is very promising, unfortunately the need to stop the program while iterating through all of used memory is a major problem. And it is the need to address this that leads to our next garbage collection algorithm.

2.4. Generational collection

Generational collection builds on copy collection by dividing memory into many regions. Memory is always allocated into the first region, and when it becomes full, we perform copy collection searching just through the first region to copy used blocks into the second region. When the second region becomes full, we perform copy collection just on this region to copy used blocks into the third region, and so forth. In this way, objects are born into the first region, and as it continues to live, it moves up through the regions. Thus, we can think of each region as containing a generation of memory allocations, with each region have older memory allocations than the previous one.

The advantage of this is that in practice, most memory allocations have very short lives, and so they rarely live long enough to be copied beyond the first region. When we perform copy collection, then, only a small number of memory allocations end up being copied into the second region, and so we don't need to perform copy collection even on the second region very often; and naturally we collect on the subsequent regions even more rarely. This means that when we collect garbage, we rarely need to go past the first region, which is quite small and can be collected quite quickly.

A difficulty arises in performing this garbage collection, though: To detect which memory in the first region should be copied into the second region, we must identify all pointers referencing any memory in the first region. Unfortunately, it's possible that there could be a pointer in one of the other regions that points into the first region. On the surface, then, it appears that the collection process must always iterate through all regions, which will take a lot of time.

We address this by observing that in practice, a memory block's pointers almost always refer to blocks that are older than the block (or roughly the same age). We only need to iterate through those very few pointers that reference blocks from younger regions. But how can we identify such blocks easily? One way is to divide each region into cards (512 bytes per card would be typical), and we can maintain a bit for each card signifying whether that card contains any pointers to a newer region. The copy collection process then would only need to walk through the cards that are marked — and in practice, very few cards would be marked, so this can go very quickly.

But how would a card become marked? As with reference counting, the system will need to insert code each time it compiles code for changing a memory value. In this case, the code set the relevant card's bit to 1 if the newly written value is within a younger region. This will slow the process of updating memory, but overall this would be much faster than going through all of the allocated blocks every time garbage collection occurs.

(By the way, for systems using virtual memory, we might choose cards to be the same size as pages. This would allow us to take advantage of the fact that the CPU automatically marks the card/page's dirty bit when it is changed. In such a system, we could then avoid the overhead for modifying references. That said, garbage collection is almost always part of a user process, and operating systems typically prevent user processes from being able to access the page table, so this possibility is more theoretical than real.)

Generational collection removes many of the problems with copy collection: First, we no longer have the problem of being able to use only half of memory. And usually collection can be quite fast, so that stopping the program during garbage collection isn't nearly the problem that it would be for copy collection or mark-and-sweep.

Nonetheless, generational collection still requires the program to be stopped during copy collection, even if it is for a short amount of time. More advanced approaches to garbage collection allow for garbage collection to proceed while the program runs, or at least attempt to reduce the time when the program must be frozen. They typically follow the basic outline of one of the basic algorithms we have examined, but with some difficult-to-understand subtleties. We won't examine such techniques here.

3. Real-world garbage

We've seen four basic approaches to garbage collection. But what happens in the real world? We'll see a summary of what some industrial-strength language implementations do, and we'll examine some implications for programmers.

3.1. Collectors

When programming languages are defined, designers rarely specify exactly how to collect garbage; of course, they do indicate whether garbage collection should occur, but they don't constrain how it should take place. Nonetheless, many programming languages have standard implementations, and we can discuss how these implementations collect garbage.

For Python, the standard implementation is CPython, which uses the reference counting technique. As we've seen, though, reference counting doesn't work for cycles of memory allocations that each contain a pointer referring to the next. CPython suggests that programmers should strive to avoid such cycles, but it also occasionally does something similar to mark-and-sweep so that such cycles can be detected and removed.

For Java, the standard implementation is the HotSpot JVM maintained by Oracle Corporation (before Oracle acquired Sun, it was by Sun Microsystems). It has gone through a variety of garbage collection techniques.

Like Java, JavaScript implementations have evolved garbage collectors, too. The earliest implementation in Netscape Navigator 2.0 (March 1996) actually did no garbage collection: Memory allocated by JavaScript was simply kept around until the user navigated to a different page. This quickly proved inadequate, and Netscape Navigator 3.0 (August 1996) switched to reference counting. Early versions of Internet Explorer used mark and sweep. As JavaScript performance has become more important recently, browsers have become more sophisticated about their approach to garbage collection. Chrome, for example, now uses a generational garbage collection algorithm.

3.2. Programming implications

As programmers, fortunately, we usually don't need to worry much about garbage collection. It simply works. But there are some considerations we need to give our programs to allow the garbage collectors to do its job well. We'll examine each of these three points.

Avoid unnecessary memory allocations. If you allocate more memory than necessary, then the garbage collection will have to do more work with identifying when memory can be cleaned up. As an example, consider the following two fragments that both compute the list of squares [014916,... ].

Fragment A     Fragment B
x = []
for i in xrange(1000):
    x = x + [i * i]
x = []
for i in xrange(1000):
    x.append(i * i)

The first fragment creates a new list with each iteration, and makes x reference this new list instead; the garbage collector is then relied upon to free the previous list. By contrast, the second fragment creates a single list before the for loop, and then that same list is slowly extended to incorporate additional elements.

Admittedly, the biggest performance penalty for Fragment A is that each iteration must copy the contents of the list from one location to another. But an additional penalty is that the garbage collection algorithm must be more active in cleaning up memory from previous iterations. Both of these issues put together mean that Fragment A takes 314 milliseconds on a computer that can perform Fragment B in 20.4 milliseconds — the time is cut by a factor of 15 by this simple modification.

Clear references that are no longer useful. As long as a memory allocation is reachable through any possible path, the garbage collector will assume that the object should be kept. However, a program might easily include some references to blocks of memory that the program will actually never access again. A careful programmer will null out any pointer that will stay in memory for a while but no longer references memory that is useful.

The classical example of this comes from implementing a stack. Following is a simple class definition.

class Stack():
    def __init__(selfvalues):
        self.contents = values + (100 - len(values)) * [None]
        self.length = 0

    def push(selfvalue):
        self.contents[self.length] = value
        self.length += 1

    def pop(self):
        self.length -= 1
        return self.contents[self.length]  # This leads to a memory leak!

Suppose we have an instance of Stack, and we use push to add an object O to it and then we use pop to remove that same object O. Nothing in pop leads the Stack instance's contents list to change, so this list will retain a reference to O until something else is pushed onto the stack. During that time, O will not be eligible for deallocation, even if there are no other references to O.

A good developer would ensure that this entry of contents would become None, so that the Stack instance does not prevent garbage collection of O from taking place. This can be done by writing pop as follows.

def pop(self):
    self.length -= 1
    ret = self.contents[self.length]
    self.contents[self.length] = None
    return ret

Another example where this principle comes into play is in a program supporting multiple undo. This would typically be done by maintaining a list of actions completed by the user. The easy thing is to keep growing this list as long as the user continues doing new things; but this list can easily grow out of hand. It usually makes sense to start retiring things from the list once it grows to a certain size.

Consider weak references. Occasionally programs wish to retain a reference to a memory block without disqualifying it from garbage collection. We can do this using weak references. The following Python example illustrates how this works.

import weakref

data = Stack(['hello''world'])
dref = weakref.ref(data)
print(dref() is data# displays "True"
print(dref().pop())   # displays "world"

data = Stack(['goodnight''moon'])
print(dref() is None# displays "True"
print(dref().pop())   # exception: "'NoneType' object has no attribute 'display'"

In the first section, we create a Stack instance referenced by the data variable, and the dref variable contains a weak reference to this stack. The weak reference is a bit odd in that you can't treat it simply as a regular reference: To pop from the stack, we must first call dref as a function to retrieve a normal reference to the stack, and then we call pop on that. Thus, at the end of the section, you see dref().pop() rather than dref.pop() as you would expect if dref were a normal reference to a stack.

In the second section, we change data to refer to a different Stack instance instead. At this point, there are no normal references to the first Stack instance, and so it is deallocated as garbage. Consequently, calling dref() now returns None rather than the normal reference it returned before. Thus, calling dref().pop() attempts to call a member function on None, which leads to an exception.

(Java supports weak references, too, through its java.lang.ref package.)

It must be admitted that weak references aren't often used in regular programming. However, they can be useful. Here are two places they can be useful.

Garbage collection is an important and complex feature of many modern programming languages. Fortunately, programmers rarely need to worry much about exactly how it works, but nonetheless there are occasions when understanding garbage collection can allow much better performance.