Review G: Garbage collection
printable versionG1: [1] // G2: [1] [2] [3] [4] [5] [6] [7] [8] // G3: [1] [2]
Problem G1.1
Define the term heap as it relates to memory.
The heap is the large area of unstructured memory for storing
data used by a program. In C, memory is allocated in the heap
via the malloc
library function; in Java, heap
allocation is accomplished via the new
operator.
Problem G2.1
Describe the reference counting technique for garbage collection.
Each memory allocation has a count associated with it tracking how many pointers reference that memory allocation. When a pointer is modified, the allocation which the pointer was referencing has its counter decremented, and the allocation to which the pointer is now referencing has its counter incremented. When the counter reaches 0, the allocation can safely be freed automatically.
Problem G2.2
Describe two disadvantages of reference counting as a garbage collection technique.
First, when there is a cycle of memory blocks each referencing the next block in the cycle, with no other locations referencing any of the cycle's blocks, reference counting will not identify these blocks as being available.
Second, reference counting has a high overhead for changing pointers: With every change to a pointer, we must decrement the reference counter for the previously referenced block, increment the reference couner for the newly referenced block, and check whether the previous reference counter has reached zero (to see whether to free it). By contrast, the lazier techniques allow the pointers to be updated directly without such an overhead.
Problem G2.3
Describe the Mark and Sweep algorithm for garbage collection.
With each memory allocation, we include a one-bit header. Periodically, we stop the process, and go through the following steps.
Set all of the currently allocated blocks' bit-headers to 0.
From each reference root (that is, all variable names that are accessible anywhere on the stack), we traverse to find the reachable memory: With each block we can find, we check whether its bit-header is 1, and if it is not, we set it to one and then recurse for every reference found within that block.
Finally, we sweep through all currently allocated blocks and free any blocks whose bit-headers are still 0.
Problem G2.4
Explain why a process must be frozen while the mark and sweep algorithm (in its simplest form) locates unused memory.
Suppose we have an array of pointers whose first location references an object A and whose last location references an object B, and suppose that A and B have no other references to them. If the marking phase runs while the program continues, then the program might decide to swap the first and last locations of the array while the marking phase is in the middle of the array. As a result, it would mark A when it reaches the array's beginning, and it would mark A again when it reaches the array's end. It would never mark B, and thus it would identify the memory as unused, and it would be deallocated even though the memory is in fact referenced.
Problem G2.5
Copy collection makes use of what's called a tombstone. What is the purpose of this tombstone?
The tombstone is a marker in from-space placed at the location of a memory block that has already been copied into to-space. As copy collection continues, other pointers to the old from-space location may be found. Finding a tombstone there will allow the pointer to be corrected to the to-space copy without performing any additional copies.
Problem G2.6
For a program that has many short-lived memory allocations, why is the process of collecting garbage much faster with copy collection than with mark and sweep?
Mark and sweep involves checking every memory allocation, whether it is active or not. With lots of short-lived memory allocations, most memory allocations are not active. With copy collection, the only memory traversed is that which is currently active.
Problem G2.7
Traditional thought regards automatic garbage collection
as automatically slower than explicit memory deallocation, since
garbage collection systems must spend time identifying unused
memory, which is unnecessary when programs use explicit memory
allocation.
However, we saw that copying collection actually allows
some programs to run faster than equivalent programs
running on systems using malloc
and free
, even
if the equivalent programs deallocate memory efficiently.
Why?
With explicit memory deallocation, memory can become highly fragmented, and requests for allocating memory blocks require some computation to find a block that is big enough. Copying collection, however, defragments the memory in the course of collecting garbage, so that there is always just one large block of available memory from which all memory allocations are taken.
(This would be particularly noticeable if we had a program that allocated many short-lived blocks of memory. In such a case, copying collection would require little time for copying the used blocks over into “to-space,” and memory allocation would go quite quickly. With explicit deallocation, though, memory would be freed frequently.)
Problem G2.8
Describe the generational technique for garbage collection.
We divide the heap into multiple regions called generations; one is the first generation, one the second, and so on. Memory is always initially allocated into the first generation. When it becomes full, copy collection is performed to empty out the first generation to copy any still-used blocks into the second generation, and then we continue allocating into the first generation. When the second generation becomes full (as we perform copy collection on the first generation), we perform copy collection again to copy any still-used second-generation blocks into the third generation. For the final generation, we either perform copy collection to move the still-used blocks into the next-to-last generation, or else we use mark and sweep.
Problem G3.1
Describe how a program might keep a large memory allocation for far longer than necessary, even though garbage collection is running regularly.
The garbage collector will not collect a large memory allocation until there are no references to the memory whatsoever. So if there is a lingering variable holding a reference to the allocation somewhere, the allocation will remain as long as the variable is accessible (even if the variable's value is no longer important).
Problem G3.2
What is a weak reference (as used in Python or Java)?
A weak reference to a memory block is ignored when the garbage collector determines whether the memory block should be kept. Thus, when we have a weak reference, it points to a memory block that could potentially disappear once there are no “normal” references to the block.