Review G: Garbage collection: Questions

G1.1.

Define the term heap as it relates to memory.

G2.1.

Describe the reference counting technique for garbage collection.

G2.2.

Describe two disadvantages of reference counting as a garbage collection technique.

G2.3.

Describe the Mark and Sweep algorithm for garbage collection.

G2.4.

Explain why a process must be frozen while the mark and sweep algorithm (in its simplest form) locates unused memory.

G2.5.

Copy collection makes use of what's called a tombstone. What is the purpose of this tombstone?

G2.6.

For a program that has many short-lived memory allocations, why is the process of collecting garbage much faster with copy collection than with mark and sweep?

G2.7.

Traditional thought regards automatic garbage collection as automatically slower than explicit memory deallocation, since garbage collection systems must spend time identifying unused memory, which is unnecessary when programs use explicit memory allocation. However, we saw that copying collection actually allows some programs to run faster than equivalent programs running on systems using malloc and free, even if the equivalent programs deallocate memory efficiently. Why?

G2.8.

Describe the generational technique for garbage collection.

G3.1.

Describe how a program might keep a large memory allocation for far longer than necessary, even though garbage collection is running regularly.

G3.2.

What is a weak reference (as used in Python or Java)?

Review G: Garbage collection: Solutions

G1.1.

The heap is the large area of unstructured memory for storing data used by a program. In C, memory is allocated in the heap via the malloc library function; in Java, heap allocation is accomplished via the new operator.

G2.1.

Each memory allocation has a count associated with it tracking how many pointers reference that memory allocation. When a pointer is modified, the allocation which the pointer was referencing has its counter decremented, and the allocation to which the pointer is now referencing has its counter incremented. When the counter reaches 0, the allocation can safely be freed automatically.

G2.2.

First, when there is a cycle of memory blocks each referencing the next block in the cycle, with no other locations referencing any of the cycle's blocks, reference counting will not identify these blocks as being available.

Second, reference counting has a high overhead for changing pointers: With every change to a pointer, we must decrement the reference counter for the previously referenced block, increment the reference couner for the newly referenced block, and check whether the previous reference counter has reached zero (to see whether to free it). By contrast, the lazier techniques allow the pointers to be updated directly without such an overhead.

G2.3.

With each memory allocation, we include a one-bit header. Periodically, we stop the process, and go through the following steps.

  1. Set all of the currently allocated blocks' bit-headers to 0.

  2. From each reference root (that is, all variable names that are accessible anywhere on the stack), we traverse to find the reachable memory: With each block we can find, we check whether its bit-header is 1, and if it is not, we set it to one and then recurse for every reference found within that block.

  3. Finally, we sweep through all currently allocated blocks and free any blocks whose bit-headers are still 0.

G2.4.

Suppose we have an array of pointers whose first location references an object A and whose last location references an object B, and suppose that A and B have no other references to them. If the marking phase runs while the program continues, then the program might decide to swap the first and last locations of the array while the marking phase is in the middle of the array. As a result, it would mark A when it reaches the array's beginning, and it would mark A again when it reaches the array's end. It would never mark B, and thus it would identify the memory as unused, and it would be deallocated even though the memory is in fact referenced.

G2.5.

The tombstone is a marker in from-space placed at the location of a memory block that has already been copied into to-space. As copy collection continues, other pointers to the old from-space location may be found. Finding a tombstone there will allow the pointer to be corrected to the to-space copy without performing any additional copies.

G2.6.

Mark and sweep involves checking every memory allocation, whether it is active or not. With lots of short-lived memory allocations, most memory allocations are not active. With copy collection, the only memory traversed is that which is currently active.

G2.7.

With explicit memory deallocation, memory can become highly fragmented, and requests for allocating memory blocks require some computation to find a block that is big enough. Copying collection, however, defragments the memory in the course of collecting garbage, so that there is always just one large block of available memory from which all memory allocations are taken.

(This would be particularly noticeable if we had a program that allocated many short-lived blocks of memory. In such a case, copying collection would require little time for copying the used blocks over into “to-space,” and memory allocation would go quite quickly. With explicit deallocation, though, memory would be freed frequently.)

G2.8.

We divide the heap into multiple regions called generations; one is the first generation, one the second, and so on. Memory is always initially allocated into the first generation. When it becomes full, copy collection is performed to empty out the first generation to copy any still-used blocks into the second generation, and then we continue allocating into the first generation. When the second generation becomes full (as we perform copy collection on the first generation), we perform copy collection again to copy any still-used second-generation blocks into the third generation. For the final generation, we either perform copy collection to move the still-used blocks into the next-to-last generation, or else we use mark and sweep.

G3.1.

The garbage collector will not collect a large memory allocation until there are no references to the memory whatsoever. So if there is a lingering variable holding a reference to the allocation somewhere, the allocation will remain as long as the variable is accessible (even if the variable's value is no longer important).

G3.2.

A weak reference to a memory block is ignored when the garbage collector determines whether the memory block should be kept. Thus, when we have a weak reference, it points to a memory block that could potentially disappear once there are no “normal” references to the block.