19. Garbage Collection

Part of the 22C:116 Lecture Notes for Fall 2002
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

A Brief Return to Memory Management

While it is fairly easy to implement the storage management routines new and dispose (for Pascal) or malloc and free (for C, underlying object creation and destruction in C++), it is hard to get programs to call dispose or free at the right time, or in C++, it is hard to get programs to call an object's destructor at the right time.

Failure to deallocate storage is not usually a problem in transient user-level code -- you can simply deallocate the user's entire heap when the program terminates. In an operating system that is expected to run forever, though, or in applications such as web servers, window managers or control systems that are not expected to terminate, this becomes a big problem.

The problem is also big in languages where the heap is persistant from one session to the next. Many LISP and all Smalltalk implementations work this way. In list processing languages such as LISP, and in applications that are based on intensive list processing no matter what the language, heap usage is so intense that failure to deallocate storage will terminate most programs in a short time, even when they run in huge virtual memories.

The failure to deallocate no-longer-accessible objects in a system is frequently described as a memory leak, since it appears, from the programmer's perspective, as if the spare storage capacity in the heap has been leaking out.

Garbage Collection

Systems that automatically reclaim unused memory are said to rely on garbage collection, a term coined in the context of the LISP programming language, developed under the direction of John McCarthy in the early 1960's. Garbage collection continues to be an important feature of the LISP, Prolog, Scheme, Simula 67 and Java programming languages.

Garbage collection is easiest to explain when all memory blocks in the heap have identical size and uniform format; this was true in LISP, where garbage collection originated. The LISP language is designed for list processing, where most operands are lists, and the elements of each may be either other lists or such atomic objects as numbers and identifiers. The format and structure of atomic items isn't relevant to this discussion, and we will ignore later additions to LISP such as arrays. All items in the classic LISP heap are records with one of the following two formats:

          _ _________ _ _________
         |1|_______o_|_|_______o_|
        Tag    Car | Mark  Cdr |______ pointers to
                   |__________________ other items
          _ _____________________      on the heap
         |0|_________|_|_________|
        Tag  Data    Mark  Data  (additional tag bits in
                                 the data distinguish
                                 integer from other types)

On the IBM 709x, where LISP was originally developed, and later, on the DEC PDP-10, the dominant machine used for artificial intelligence research in the late 1960's and early 1970's, the records in LISP's heap were packed one per 36 bit word, with the tag bit being the sign bit (to make it very easy to test to see whether the word contained an atom or a pair of pointers). Given the size of main memory common in that era, 17 bit pointers into the heap were quite adequate.

The names CAR and CDR for the two pointer fields come from the IBM 709, where each word held two fields for purposes of indexed addressing, the Address and Decrement fields; CAR and CDR were natural names for the macros to follow the pointers from these fields (Copy Address to Register and Copy Decrement to Register, or something like that). See the proceedings of the ACM History of Programming Languages conference (Jean Sammet edited this in book form) if you're really interested in this history.

LISP programs are themselves represented as structures in the heap, and the stack and run-time environment are also structures in the heap. Programs operate primarily by composing lists out of pieces of other lists, and it is possible to create circularly linked lists or other non-tree-like structures. There are no primitives for deallocation. Instead, the language defines all items as remaining allocated until they can no longer be reached by following pointers other reachable items in the heap, where the stack and run-time environment serve as the roots used by the outside world to enter this tangled graph structure. Here is a picture of a very small LISP-like heap:

        Root _____     _____     _____
            |_____|-->|_____|   |__X__|
               ^         |
             __|__     __v__     _____
            |_____|<--|_____|-->|_____|
               |
             __v__     _____     _____
            |_____|<--|__X__|<--|__X__|

At any point in the computation, some items in the heap will be accessible from the root while others will not be accessible (those marked with X in the above illustration).

In normal operation, most LISP interpreters allocate new items from a free list. When the free list is exhausted, the garbage collector must find all items which are inaccessible from the root(s) and incorporate them into a new free-list so that execution can continue.

The allocate operation in LISP is cons(a,b); this constructs a new list element containina, pointers to a and b in its two pointer fields. By definition, car(cons(a,b))=a, and cdr(cons(a,b))=b.

The Mark-Sweep Garbage Collection Algorithm

The oldest garbage collection algorithm, the one originally developed for LISP, is known as the mark-sweep algorithm. This algorithm comes into play when the free list is found to be empty; it begins by marking every reachable item in the data structure, and then it sweeps the unreachable items into a new free list, as outlined below:

   cons(a,b):
      -- first make sure there is a free list
      if free_list = NIL then
         Mark
         Sweep
         if free_list = NIL then
            -- the heap is full!
            terminate program with a nasty error message
         endif
      endif

      -- second allocate from the free list
      temp = free_list
      free_list = free_list.cdr

      -- finally, use the allocated item
      temp.car = a
      temp.cdr = b
      temp.tag = 1
      return temp

There must be one bit in each heap item available for marking that item. To mark an item, this bit is set. The marking algorithm recursively traverses the part of the heap reachable from the root, setting the mark bit on every item it finds, and using the mark bit to avoid following cycles in the data structure. The straightforward recursive version of this code operates as follows:

   Mark:
     Touch( Root )
     Return

   Touch( Item ):
     if Item.mark = 1, return.
 
     -- Item has not previously been touched
     Item.mark = 1
     if Item.tag = 0, return.

     -- Item contains pointers, follow them recursively
     Touch( Item.car )
     Touch( Item.cdr )
     Return

The Mark procedure may be viewed formally in two interesting ways. First, it may be viewed as traversing a spanning tree of the connected part of the directed graph representing the data structure reachable from the root.

Second, if the pointers in the heap are thought of as defining a relation called points-to, and if the transitive closure of this relation is defined as points-to*, then the algorithm computes the set of items related to the root by the points-to* relation.

The mark procedure, as shown above, requires a stack that could be as big as the heap. The worst case for this code occurs when traversing a data structure where all the items in the heap are arranged in a single linear list. The need for recursive code and a stack can be completely eliminated by a trick known as pointer rotation.

To see how pointer rotation works, note that when passing down to an item, the process follows a pointer in the forward direction, when returning to an item along either the car or cdr pointer, the same pointer is followed in the reverse direction, and each pointer is followed exactly two times (once in the forward direction, and once in the reverse direction). Furthermore, each item in the structure is visited exactly 3 times, once on entry from its parent, and once on return from each of its two children.

The central idea of pointer rotation is to reverse the direction of each link in the data structure each time that link is followed! Since the links are followed exactly twice, the net effect of this is to make no changes. When doing this, no item ever needs to have more than two outgoing pointers. In the initial and final states, each item has pointers to two children; when the car subtree is being visited, the item has a pointer to its parent and its cdr child, and when the cdr child is being visited, the item has pointers to its parent and car child. Since each item contains exactly two pointers, this is exactly sufficient.

Each time the collectoion algorithm passes through an item on the three successive visits visits it makes to that item, we say that it rotates the pointers, and the net result, after the third visit, is to restore the structure to its initial state. Pointer rotation can be generalized to items that contain n pointers; such an item will be visited n+1 times during the marking phase, and each pointer will still be followed exactly twice. Nonetheless, rotation becomes a much less important tool in these systems because larger items with more pointers generally pose less demand on the stack, so recursion or more optimal stack-based traversal schemes are far more likely to be sufficient.

Generalizing Garbage Collection

Simula 67 (a language developed at about the same time as Pascal, PL/I and C) was the first object oriented programming language. Simula 67 inspired the object oriented features in C++, and it is not improper to say that Java represents a purification of C++, removing undesirable features of C and returning to the original philosophy of Simula 67 while retaining a surface syntax in the style of C. Java has re-introduced many of the features of Simula 67 that were eliminated from C++, including garbage collection.

Simula 67 and Java store objects (or other record types) in a heap that is quite similar to the heap used for dynamic allocation in Pascal or C. The following example shows one such record:

             ________ 
            |  tag   |      The tag identifies
            |________|      the object's class.
            |  data  |       
            |________|      Pointers and
            |   o-------->  Data are mixed in
            |________|      the body of the
        <-------o    |      record, in a way
            |________|      determined by the
            |  data  |      object's class.
            |________|
         The size of the record
         is also given by the class

Simula 67 (first fully specified and implemented in 1968) is a strongly typed programming language. It was developed for discrete- event simulation (hence the name), and it was the language where the term class, the concept of a class hierarchy, and the idea of inheritance were first introduced. What matters to this discussion is that Simula 67 was developed with a garbage collecting heap manager, so that users never have to explicitly deallocate anything.

The first Simula 67 garbage collectors used a mark-sweep algorithm, similar to that used for LISP. The tag word on each item in the heap included a bit that could be used for marking, and it included a pointer to a class description record from which the marking phase could determine the record size and the locations of all the pointers in the record.

Thus, once the marking phase is done, a boundary-tag heap manager already has the data structures required to allow a sweep of the heap looking for unmarked records to merge or sweep into the free list.

Measurements of the computational cost of garbage collection indicate that, for typical large heap-based applications, the CPU time consumed by garbage collection is typically less than the CPU time consumed by the various extra bits of code that must be added to applications to determine when it is safe to `manually' free items on the heap. This has been demonstrated repeatedly since the early 1970's, but it has taken years to get this message through into enough of the programming community to allow systems based on garbage collection to be widely used.

Unfortunately, with the mark-sweep algorithm, this CPU time is bunched into spasms of garbage collection that bring the application to a halt, so mark-sweep garbage collection can be quite uncomfortable for interactive applications and it is totally inapplicable to hard real-time applications. These problems have been extensively addressed by recent research!

Garbage Collection Without Tags

A garbage collector supporting a language like C or C++ must operate without tags, since C allows the user to create pointers to anything, including pointers to items in the interior of other objects such as records or arrays, and these languages are weakly typed, allowing pointers to be freely converted to integers and back again.

Garbage collectors that solve this problem have been built! These treat every item in every record on the heap as if it was a potential pointer, even if it is a floating point number, an integer, or a character string. The cost of this is that some of these "pointers" were never intended to be interpreted as pointers, and they may accidentally point to items on the heap that are in fact garbage; as a result, these accidentally referenced items will not be collected.

Despite this, garbage collectors for C and C++ are available, and they work well for all but the most perverse C code (for example, code that encrypts pointers so they no-longer appear to point to their actual destinations.

Compacting the Heap

Heap compaction helps improve the locality of reference on virtual-memory based systems, and when the item size in the heap is variable, it eliminates problems with external fragmentation. Compaction requires finding all of the pointers to an item and updating them to point to the item after it has been moved. Since garbage collection mechanisms rely on being able to locate all pointers, compaction is a natural feature to add to a grabage collector.

The first compacting garbage collectors were based on a split heap. At any instant, half of the virtual address space assigned to the heap was available for allocation, while the other half was idle, as illustrated below:

      ____________
     |            | \
     |  Data and  |  |
     |   Garbage  |  |
     |____________|  |  Working
     |            |  |- Half
     |    Free    |  |
     |            |  |
     |            |  |
     |____________| /
     |            | \
     |            |  |
     |            |  |
     |            |  |  Idle
     |            |  |- Half
     |            |  |
     |            |  |
     |            |  |
     |____________| /

When the garbage collector traverses the heap in the marking phase, it copies each item it finds into the formerly idle half of the heap, replacing each item in the old half with a pointer to the new copy; this pointer is sometimes referred to as a forwarding address. The sweep phase is replaced with a sweep through the copies of all items in the heap, fixing all of the pointers that still point into the old half by replacing them with the forwarding addresses they reference. Once the copy has been updated, execution continues using the former idle half of the heap as the working half, and ignoring the formerly idle half until the next garbage collection cycle.

In this scheme, the free list is always contiguous, not scattered through the heap, so the free space can be allocated very quickly and without the need for complex boundary tag structures. This, in turn, minimized internal fragmentation. With essentially no fragmentation, this looks very good if you can forget that the memory requirements of the system had to be doubled in order to allow this benefit!

Compacting garbage collectors work only in environments where pointers and data can be definitively distinguished -- that is, they don't work in C or other weakly typed languages where we can only guess which items are pointers.

One consequence of copying the heap from one range of addresses to another, in LISP systems, is that lists can be compacted. It is easy to arrange it so that the collector follows successor fields first when it traverses the data structure, and the result of this is that the successive elements of the list will be copied into successive memory locations. In a virtual memory environment, this can make a huge performance improvement by greatly increasing the locality of the data structure of each list, and therefore, greatly improving the locality of list traversal operations. Because lists in LISP are formed using the CDR fields, with the CAR fields pointing to successive list elements, this type of compaction (particularly when taken to an extreme) is referred to as CDR-coding.

Hiding Delays Caused By Garbage Collection

The break in normal computation consumed by a mark-sweep garbage collector is disruptive. Users don't like it when a system grinds to a halt for garbage collection. The larger the address space, the less frequent the need for garbage collection, but also, the more disruptive this event becomes. The result has been a search for ways to disperse the computation involved in garbage collection so that it is mixed with normal computation in some way.

One idea that has been tried for solving this problem is the tenuring garbage collector. The idea is to divide the heap into two regions, a small heap area, where a copying garbage collector copies things back and forth fairly frequently, and a much larger area where items are put if they have survived enough many copying cycles. To do this, each item needs to have an associated count of the number of cycles it has survived (this is typically a small number like 5 or 7).

New items are allocated in the small heap -- called the non-tenured heap. Most data structures are assumed to be temporary, and in LISP and Java programs, this assumption is generally safe. Temporary items will be collected soon after they are created by this scheme, but some items are long lived. It is the accumulation of long-lived items that destroys the efficiency of copy based garbage collection, so in a tenuring garbage collector, these are moved to the larger tenured heap where, if they become garbage, they can rot until it is convenient to bring the the system to a much longer halt with a global garbage collection.

If the sizes of the heaps are adjusted appropriately, it should be possible to arrange things so that the expensive garbage collection occurs at night or some other time when people won't complain.

A generalization of tenuring is to use a multi-generational garbage collector, where items are promoted in a step-wise fashion through the heap. If there are on the order of 10 generations (anywhere from 8 to 16 generations seems to work well), it is quite effective to run a generational garbage collector with promotion from one generation to the next occuring automatically every time a copy cycle is performed, so there is no need to count how many cycles an object has survived before promoting it.

In a generational garbage collector, if the top generation is garbage collected every 5 seconds, the next generation might be collected every 25 seconds, and the next every 125 seconds (the delay between collection cycles growing exponentially). The cost of a top level collection might only be 1 second, while the cost of a second level collection is 2 seconds, and the cost of a third level collection is only 3 seconds.

With only a bit of extra work, such systems can easily be made to meet soft real-time demands. The idea is to do garbage collection on what is called an opportunistic basis. Instead of doing garbage collection only when the free-list in a generation of the heap is exhausted, an opportunistic garbage collector schedules garbage collection cycles when it finds free time, for example, when the application blocks awaiting input from the user, or when the application does a timed pause.

Reference counts

It is possible to detect some, but not all garbage, if a count is maintained, in each object, of the number of pointers to that object. This is called the reference count for that object. Where the user might have written the simple code:

        a = b; /* pointer assignment */

A system that uses reference counts must rewrite this as:

        if (a != NULL) {
		a->refcount = a->refcount - 1;
		if (a->refcount == 0) free( a );
	}
        a = b; /* pointer assignment */
        if (a != NULL) {
		a->refcount = a->refcount + 1;
	}

If we bracket every single assignment to a pointer variable with this code, the price is certainly high, but an optimizing compiler can eliminate many of the checks for NULL pointers, and even better optimizing compilers can discover that an object is temporary by noticing that the only pointer to the object is a local variable where that variable is never assigned to any longer-lived variable; in this case, reference count maintenance code for that object can be omitted and it can be deallocated on exit from the block where it is allocated.

Unfortunately, reference counting cannot reclaim all garbage. So long as all data structures are trees, it does work, but when cyclic data structures are created, the reference counts of the items in the cycle will not go to zero when the last outside pointer to that cycle is deleted. Mark-sweep collection has no problems with such cycles.

The Unix file system manages a reference count in each I-node that tracks the number of directory entries that reference that I-node. This is why Unix does not allow arbitrary cycles in the directory structure. Unix does allow two specific classes of cycles in the directory structure, the "." self reference in each directory, and the ".." reference to the parent of each directory. Because of these two cycles, Unix requires special code, in the form of the rmdir() kernel call, and it forbids using the link() and unlink kernel calls on directories.

Parallel garbage collection

In multiprocessor systems, it is appealing to try to dedicate one processor to collecting garbage while the other processes perform computation, with a basic algorithm such as the following:

                                 |
      GARBAGE COLLECTOR          |       USER
                                 |
    Repeat                       |  Repeat
      Mark root                  |    Allocate and mark A
      Repeat                     |    Link into data
        For each item            |    Make garbage
          if marked,             |  Forever
            mark unmarked items  |
            it points to with M  |
        end for                  |
      until for loop             |
        marks nothing new        |
      sweep unmarked             |
        items into free list,    |
        unmark all M marks and   | 
        change all A marks to M. |

The user in such a setting is frequently described as a heap mutator, since from the point of view of the garbage collection algorithm, the user does nothing other than allocating memory, changing pointers around and making garbage.

The parallel algorithm outlined roughly above will try to collect garbage as the mutator creates it. During any cycle of the collector process, new items will not be collected because they are already marked, and garbage created during that phase should not be collected because some new item might have been arranged to point to it (this requires two kinds of marks, A and M above, plus unmarked). The result is a delay between garbage creation and collection of, on the average, one collector cycle.

The Cambridge CAP file system (described in the book The CAP Computer and it's Operating System, by Wilkes and Needham) uses a parallel garbage collection process to reclaim inaccessible files on disk. This eliminated the requirement that most UNIX users take for granted that the directory structure be tree shaped. The CAP file garbage collector was a parallel process that only did disk I/O when no user process needed the disk, and it worked very well. In order to achieve this level of performance, the CAP system also used reference counts, so that many garbage files could be reclaimed without waiting for a garbage collection cycle to complete.

Garbage collection also plays a role in some operating system kernels for central memory management, particularly in some capability systems, such as the IBM AS 400.