22C:116, Lecture 19, Fall 1999

Douglas W. Jones
University of Iowa Department of Computer Science

  1. A Brief Return to Memory Management

    While it is fairly easy to implement New and Dispose (in Pascal) or Malloc and Free (in C), it is hard to get programs to call Dispose or Free at the right time.

    Failure to deallocate storage is not usually a problem in transient code -- you can simply deallocat the user's entire heap when the program terminates. In an operating system that is expected to run forever, though, or in other applications that are to run for a long time, this becomes a big problem.

    The problem is also big in languages where the heap is persistant from one session to the next. Many LISP and all Smalltalk implementations work this way. In list processing languages such as LISP, heap operations are so frequent that failure to deallocate storage will terminate most programs in a short time, even when they run in huge virtual memories.

    The failure to deallocate no-longer-accessible objects in a system is frequently described as a memory leak, since it appears, to the programmer, as if the spare storage capacity in the heap has been leaking out.

  2. Garbage Collection

    Systems that automatically reclaim unused memory are said to rely on garbage collection, a term coined in the context of the LISP programming language, developed under the direction of John McCarthy in the early 1960's.

    The LISP language is oriented towards list processing, where most operands are lists, and the terminal elements of lists are, occasionally, such things as integers or other atomic objects (the format and structure of non-list items isn't relevant to this discussion, and we will ignore later additions to LISP such as arrays). All items in the classic LISP heap are records with one of the following two formats:

         _ _________ _ _________
        |1|_______o_|_|_______o_|
       Tag    Car | Mark  Cdr |______
                  |__________________ pointers
         _ _____________________
        |0|_____________________|
       Tag          Data
    

    On the IBM 709 (and later, on the DEC PDP-10) the records in LISP's heap were packed one per 36 bit word, with the tag bit being the sign bit (to make it very easy to test to see whether the word contained an atom or a pair of pointers). The names CAR and CDR come from the IBM 709, where each word held two fields for purposes of indexed addressing, the Address and Decrement fields; CAR and CDR were natural names for the macros to follow the pointers from these fields (Copy Address to Register and Copy Decrement to Register, or something like that). See the proceedings of the ACM History of Programming Languages conference (Sammet edited this) book form) if you're really interested in this history.

    LISP programs operate primarily by composing lists out of pieces of other lists, and it is possible to create circularly linked lists or other non-tree-like structures. There are no primitives for deallocation. Instead, the language defines all items as remaining allocated until they are no longer referencable. Here is a picture of a very small LISP heap:

            Root _____     _____     _____
                |_____|-->|_____|   |__X__|
                   ^         |
                 __|__     __v__     _____
                |_____|<--|_____|-->|_____|
                   |
                 __v__     _____     _____
                |_____|<--|__X__|<--|__X__|
    

    In any LISP environment, some of the nodes in the heap are implicitly accessible. These are the roots of the data structure. In a minimal implementation, there is one root, the stack pointer, and the stack itself is embedded in the heap.

    At any point in the computation, some items in the heap will be accessible from the root while others will not be accessible (those marked with X above).

    In normal operation, most LISP interpreters allocate new items from a free list. When the free list is exhausted, the garbage collector must find all items which are inaccessible from the root and incorporate them into a new free-list so that execution can continue.

    The allocate operation in LISP is cons(a,b); this constructs a new list node containina, pointers to a and b in its two pointer fields.

  3. The Mark-Sweep Garbage Collection Algorithm

    The oldest garbage collection algorithm, the one originally developed for LISP, is known as the mark-sweep algorithm. This algorithm comes into play when the free list is found to be empty; it begins by marking every reachable node in the data structure, and then it sweeps the unreachable nodes into a new free list, as outlined below:

       cons(a,b):
          if free_list = NIL then
             Mark
             Sweep
             if free_list = NIL then
                -- the heap is full!
             endif
          endif
    
          -- code to allocate from free list
          temp = free_list
          free_list = free_list.cdr
          temp.car = a
          temp.cdr = b
          temp.tag = 1
          return temp
    
    There must be one bit in each heap record available for marking that record. To mark a node, this bit is set. The marking algorithm recursively traverses the part of the heap reachable from the root, setting the mark bit on every node it finds, and using the mark bit to avoid following cycles in the data structure. The straightforware recursive version of this code operates as follows:
      Mark:
        Touch( Root )
        Return
    
      Touch( Node ):
        if Node.mark = 1, return.
    
        -- Node has not previously been touched
        Node.mark = 1
        if Node.tag = 0, return.
    
        -- Node contains pointers
        Touch( Node.car )
        Touch( Node.cdr )
        Return
    
    The Mark procedure may be viewed formally in two interesting ways. First, it may be viewed as traversing a spanning tree of the connected part of the directed graph representing the data structure reachable from the root.

    Second, if the pointers in the heap are thought of as defining a relation called "points-to", and if the transitive closure of this relation is defined as "points-to*", then the algorithm computes the set of nodes related to the root by the "points-to*" relation.

    The mark procedure, as shown above, requires a stack that could be as big as the heap (in the case that all items in the heap are linked into a singly linked list, with the other pointers set to NIL or pointing at random). The need for a heap can be eliminated by a trick known as pointer rotation.

    Note that when passing down to a node, the process follows a pointer in the forward direction, when returning to a node along either the car or cdr field, the same pointer is followed in the reverse direction, and each pointer is followed exactly two times (once in the forward direction, and once in the reverse direction. The trick, then, is to reverse the direction of each link each time that link is followed! Since the links are followed exactly twice, the net effect of this is to make no changes. When doing this, no node ever needs to have more than two outgoing pointers -- either pointers to the Car and Cdr, or to the Cdr and Parent (the Car and Parent having been reversed), or to the Parent and Car (the Cdr having been reversed, and the Car reversed twice). Thus, each time the collector passes through a node on its three visits, all it does is rotate the pointers in that node.

    Rotation is of no value except in the special and useful case of LISP, so we'll ignore it form here on.

  4. Generalizing Garbage Collection

    Simula 67 (a language developed at about the same time as Pascal, PL/I and C) was the first object oriented programming language. Simula 67 inspired the object oriented features in C++, and it is not improper to say that Java represents a purification of C++, removing undesirable features of C and returning to the original philosophy of Simula 67 while retaining a surface syntax in the style of C. Java has re-introduced many of the features of Simula 67 that were eliminated from C++, including garbage collection. Simula 67 and Java store objects (or other record types) in a heap that is quite similar to the heap used for dynamic allocation in Pascal or C. The following example shows one such record:

                        ________       The tag tells
                       |  tag   |      about the type
                       |________|      or class
                       |  data  |       
      The record       |________|      Pointers and
      size is set      |   o-------->  Data are mixed in
      by the type      |________|      the body of the
                   <-------o    |      record, in a way
                       |________|      determined by the
                       |  data  |      type or class
                       |________|
    
    Simula 67 (first fully specified and implemented in 1968) is a strongly typed programming language. It was developed for discrete- event simulation, and it was the language where the term class, the concept of a class hierarchy, and the idea of inheritance were first introduced. What matters to this discussion is that Simula 67 was developed with a garbage collecting heap manager, so that users never have to explicitly deallocate anything.

    The first Simula 67 garbage collectors used a mark-sweep algorithm, similar to that used for LISP. The tag word on each item in the heap included a bit that could be used for marking, and it included information from which the marking phase could determine the record size and the locations of all the pointers in the record.

    One way to record the locations of pointers in a record is to set aside a bit-vector in the header of each record that has one bit per word in the record. If the bit is zero, the word is a data word. If the bit is one, the word is a pointer.

    Another way to record the locations of pointers in a record is to have a pointer in the head of each record that points to a type description. This description can indicate which fields are pointers and also the types of the other fields, as might be needed to print, dump, or debug the code.

    Heap managers for variable sized objects on the heap, such as are used for Pascal or C without any garbage collector, usually have a tag on each record indicating whether it is allocated or free, and they include a pointer to the adjacent items on the heap. Thus, by doing a bit of pointer arithmetic, the heap manager can compute the size of any record on the heap, it can merge adjacent free blocks, and it can split a vacant block into two blocks so that a block of the right size can be allocated.

    Thus, once the marking phase is done, it is easy to search the heap for unmarked blocks by traversing the list of blocks that make up the heap and marking all unmarked blocks as free (merging adjacent free blocks in the process and reseting the mark bits).

    In record-based languages such as Pascal, C and Simula 67, records are usually moderately large, so the problem of needing a huge stack to recursively mark the records is less severe than it was in the LISP environment where each heap entry was one word.

    Measurements of the computational cost of garbage collection indicate that, for typical large heap-based applications, the CPU time consumed by garbage collection is typically less than the CPU time consumed by the various extra bits of code that must be added to applications to determine when it is safe to `manually' free items on the heap.

    Unfortunately, with the mark-sweep algorithm, this CPU time is bunched into spasms of garbage collection that bring the application to a halt, so mark-sweep garbage collection can be quite uncomfortable for interactive applications and it is totally inapplicable to hard real-time applications. These problems have been extensively addressed by recent research!

  5. Garbage Collection Without Tags

    A garbage collector supporting a language like C must operate without tags, since C allows the user to create pointers to anything, including pointers to objects in the interior of records or arrays, and C allows pointers to be freely converted to integers and back again.

    Garbage collectors that solve this problem have been built! These treat every item in every heap record as a potential pointer, even if it is a floating point number, an integer, or a character string. The cost of this is that some of these "pointers" may accidentally point to items on the heap that are in fact garbage, and as a result, these will not be collected.

    Despite this, garbage collectors for C are available, and they work well for all but the most perverse C code (for example, code that encrypts pointers so they no-longer appear to point to their actual destinations.

  6. Compacting the Heap

    Heap compaction helps on LISP systems that use virtual memory (it improves locality), and it eliminates external fragmentation on variable-record-size systems.

    The first compacting garbage collectors were based on a split heap. At any instant, half of the virtual address space assigned to the heap was available for allocation, while the other half was idle, as illustrated below:

          ____________
         |            | \
         |  Data and  |  |
         |   Garbage  |  |
         |____________|  |  Working
         |            |  |- Half
         |    Free    |  |
         |            |  |
         |            |  |
         |____________| /
         |            | \
         |            |  |
         |            |  |
         |            |  |  Idle
         |            |  |- Half
         |            |  |
         |            |  |
         |            |  |
         |____________| /
    
    When the garbage collector operates, it copies the reachable data from the working half of memory to the idle half. Since the free list is always a contiguous block of memory, free space management is simple and there is no need for lists of free nodes and no possibility of internal fragmentation. Once a compact copy of the reachable data is made into the idle half of memory, execution resumes with the roles of the working and idle regions reversed.

    When data is copied from one half of the heap to the other, all the pointers are invalidated. To avoid this, each time an item is copied, a forwarding address is left in the carcass of the item in the other half of the memory; before execution is resumed, an extra sweep is made through the compacted data on the heap to update every pointer with the forwarding address found in the object that the pointer references.

    This collector still has a mark-sweep character! The marking phase now makes a copy of each node as it marks it (and the mark bit being set in a node indicates that the node has been copied and that the body of the node contains the forwarding address). The sweep phase sweeps through the copies, following each pointer and updating the pointers using the forwarding addresses.

    This works only in environments where pointers and data can be definitively distinguished -- that is, it doesn't work in C or other weakly typed languages, where its use could cause integers, character strings, or floating point numbers to be changed when the garbage collector thinks they might be pointers.

    One consequence of copying the heap from one range of addresses to another, in LISP systems, is that lists can be compacted. It is easy to arrange it so that the collector follows successor fields first when it traverses the data structure, and the result of this is that the successive elements of the list will be copied into successive memory locations. In a virtual memory environment, this can make a huge performance improvement.

  7. Hiding Delays Caused By Garbage Collection

    The mark-sweep pattern is disruptive. Users don't like it when a system grinds to a halt for garbage collection. The larger the address space, the more disruptive this event becomes. The result has been a search for ways to disperse the computation involved in garbage collection so that it is mixed with normal computation in some way.

    One idea that has been tried for solving this problem is the Tenuring garbage collector. The idea is to divide the heap into two regions, a small heap area, where a copying garbage collector copies things back and forth fairly frequently, and a much larger area where items are put if they have survived many copying cycles. To do this, each item needs to have an associated count of the number of cycles it has survived (this is typically a small number like 5 or 7).

    New items are allocated in the small heap -- called the non-tenured heap. Most data structures are temporary, so they will be collected soon after they are created, but some items are long lived. It is the accumulation of long-lived items that destroys the efficiency of copy based garbage collection, so these are moved to the larger tenured heap where, if they become garbage, they can rot until it is convenient to shut down the system with a global garbage collection.

    If the sizes of the heaps are adjusted appropriately, it should be possible to arrange things so that the expensive garbage collection occurs at night or some other time when people won't complain.

    A generalization of tenuring is to use a multi-generational scheme, where items are promoted in a step-wise fashion through the heap. If there are on the order of 10 generations (8 to 16), it is quite effective to run a generational garbage collector with promotion from one generation to the next occuring automatically every time a copy cycle is performed, so there is no need to count how many cycles an object has survived before promoting it.

    In a generational garbage collector, if the top generation is garbage collected every 5 seconds, the next generation might be collected every 25 seconds, and the next every 125 seconds (the delay between collection cycles growing exponentially). The cost of a top level collection might only be 1 second, while the cost of a second level collection is 2 seconds, and the cost of a third level collection is only 3 seconds. With only a bit of extra work, such systems can easily be made to meet soft real-time demands.

  8. Parallel garbage collection

    In multiprocessor systems, it is appealing to try to dedicate one processor to collecting garbage while the other processes perform computation, with a basic algorithm such as the following:

                              |
          GARBAGE COLLECTOR   |       USER
                              |
        Repeat                |  Repeat
          Mark root           |    Allocate
          Repeat              |    Link into data
            For each node     |    Make garbage
              if marked,      |  Forever
                mark what it  |
                points to.    |
            end for           |
          until for loop      |
            marks nothing new |
          sweep unmarked      |
            items into free   |
            list, mark them,  |
            and unmark all    |
            previously marked |
            items.            |
    
    The user in such a setting is frequently viewed as a storage mutator from the point of view of the garbage collection algorithm.

    The parallel algorithm outlined roughly above will try to collect garbage as the mutator creates it. During any cycle of the collector process, new nodes will not be collected because they are already marked, and garbage created during that phase should not be collected because some new node might have been arranged to point to it (this requires three colors of marks). The result is a delay between garbage creation and collection of, on the average, one collector cycle.

    The Cambridge CAP file system (described in the book The CAP Computer and it's Operating System, by Wilkes) uses a parallel garbage collection process to reclaim inaccessible files on disk. This eliminated the requirement that most UNIX users take for granted that the directory structure be tree shaped. The CAP file garbage collector was a parallel process that only did disk I/O when no user process needed the disk, and it worked very well.

    Garbage collection also plays a role in some operating system kernels for central memory management, particularly in some capability systems, such as the IBM AS 400.