9. More Page Replacement

Part of the 22C:116 Lecture Notes for Fall 2002
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Page Tables and Frame Tables

Note that many implementations of paged virtual memory rely on two distinct data structures, the page table and the frame table.

The page table describes the state of each page in the virtual address space. There is one entry in the page table for every addressable page in the virtual address space. This table frequently grows very large for large address spaces.

The page table is indexed by the virtual address of the page, and each page-table entry has a field that describes where that page is currently stored -- for pages currently not in main memory, this would typically be the disk address where the page is stored; for pages in main memory, this would be the page frame holding the page.

The frame table is indexed by the physical address of the frame, and the entries describe the state of each page frame in main memory. For those frames currently in use, it describes what pages are is them. There is one entry for each page frame in the physical address space. At any instant, the frame table can typically be derived from the page table, but this derivation is computationally expensive.

The minimum frame table is just an inverted index, holding the index of the page table entry holding the page currently stored in each page frame. On some systems, more information is stored in the frame table; such information as whether the frame has been recently used and whether the contents of the frame have been changed by the CPU are all as useful if stored in the frame table as they are if stored in the page table.

                Page table                Frame table
Page Number     __________    Frame       ___________
     |         |_|________|   number     |_|_________|
     |         |_|________|      |       |_|_________|
     |         |_|________|       ------>|_|_________|
      -------->|_|________|              |_|_________|
               |_|________|              |_|_________|
               |_|________|               |     |
               |_|________|      <--------       ------->
                |     |            mark        page number
     <----------       -------->
    rights               frame number

Address translation problems

Recall that the easiest to understand virtual to physical address translation hardware holds the entire page table in the memory management unit:

                  Word in page
   Virt addr  |----------------------->| Phys Addr
   ---------->|  Page   ______  Frame  |--------->
              |---     |      |   ---->|
                  |    | Page |  |
                  |    |Table |  |
                   --->|      |  |
        address        |______|  |
        invalid         |  |     |
      <-----------------    -----

For large page tables, this is not implementable. For example, on the Intel 80x86 (for x > 2), the virtual address is 32 bits, of which 12 bits specify byte-in-page. Thus, the page table must contain a million entries. The entries themselves turn out to be 32 bits each, so this approach to implementing the memory management unit would require that the memory management unit contain 4 megabytes of special purpose memory for the page table, and a context switch from one virtual address space to another would require loading and storing this entire special memory area!

The original Atlas virtual memory system used an alternative approach to translation. The address translation hardware held a copy of the frame table. When a virtual address was presented to the hardware, it compared the desired page number with the page number stored in each frame table entry. If there was no match, a page fault was reported; if there was a match, the index of the frame table entry holding the match was the frame number. This comparison was done using fast parallel hardware, in what we would now call an associative memory, so the time taken for the search did not depend on the number of page frames in the system.

The Atlas scheme only works if the number of frames is itself relatively small. On a modern machine with many megabytes of memory, the number of page frames may itself grow into the tens of thousands, so neither the entire frame table nor the entire page table can be held in the memory management unit registers.

The usual solution to this problem is to hold both the page table and the frame table in main memory, so only those entries that reference recently used pages reside in registers within the memory management unit. In this case, we say that the memory management unit contains a virtual address translation cache or an address translation lookaside buffer. From the viewpoint of system software, we then view the address translation process as if it was done through an address translation table in main memory.

The problem of large page tables

A typical modern machine, whether an Intel Pentium, or an IBM RS6000 has a virtual address space of 4 gigabytes, but the main memory is only measured in megabytes. The typical virtual address map needed to describe such an address space contains over 1 million map entries, and requires on the order of 4 megabytes to store. Users with 4 megabytes of main memory (today, an admittedly small size) would be quite unhappy if the system required the entire memory just to store the address map!

The solution to this problem is to break up the virtual address map and store most of it in secondary memory. In effect, the map itself is stored in virtual memory, and only those pages of the address map that contain map entries for recently used pages occupy page frames in main memory at any given time. Each page of the virtual address map is called a segment of the map, so the data structure describing the virtual address space of the program can be pictured as follows:

                            _ _ _ _
             Segment Table |_|_|_|_|
             _______________| | | |________
  Page      |              ___| X          |
 Tables  _ _|_ _       _ _|_ _          _ _|_ _
  for   |_|_|_|_|     |_|_|_|_|        |_|_|_|_|
Segments_| | | |_     _| | | |_        _| | | |_ 
       |  |   |  |   |  |   |  |      |  |   |  |
Pages |_||_| |_||_| |_||_| |_||_|    |_||_| |_||_|

The page table for a segment may itself be moved to disk when all pages in that segment have been moved to disk, and once this is done, a reference to any page in that segment requires that the page table for that segment be brought back into main memory first, prior to bringing in the referenced page.

When such a system is used, virtual addresses are typically broken up as follows:

 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
|                   |                   |                       |
       segment         page in segment        byte in page

Thee segment field specifies what segment of the address map contains the desired map entry, while the page field specifies what entry in that segment of the map should be used. The sizes of the fields shown above is typical of many 32-bit machines. With 12-bits for the byte-in-page field, pages are 4K bytes or 1K words. With 10 bits in the segment and page-in-segment fields, both the root segment table and the page table for each segment have 1K entries, and if each page-table entry occupies 1 word, each segment of the page table occupies exactly one page-frame of main memory.

The term segment is used here deliberately because, on many machines that use paged-segmented virtual memory, the designers intended that each segment of the page table correspond to a single logical segment of the address space. For example, a program might have a stack segment, a code segment and a heap segment, each composed of from 1 to 2¹⁰ pages, while other segments might be set aside for any data structures too large to fit in the heap, secondary parts of the program or other purposes. Today, many systems divide the virtual address space into segments that are much larger than the segments of the page table established by the hardware.

Assuming that the virtual address translation hardware uses a cache (translation lookaside buffer), there are now four different outcomes possible for each address translation operation:

Hit: The desired page table entry was in the hardware address translation cache. Address translation takes place as expected.

Miss: The desired page table entry is not in the hardware address translation cache. The address translation unit performs one memory cycle to fetch the page table entry for the desired page. Address translation takes place as expected by the software, but it is slowed by the additional hardware activity.

Segment Fault: A miss occurs, but the desired page table entry cannot be fetched because the desired segment of the page table is not in main memory. The address translation unit therefore requests a fault, aborting the current instruction and allowing the operating system segment fault trap service routine to bring in the desired segment of page table.

Page Fault: A miss occurs, the desired page table entry is available, but the page is marked as invalid. The address translation unit requests a fault, aborting the current instruction and allowing the page fault trap service routine to bring in the desired page.

The Software View

With a paged segmented virtual address space, the software view of address translation is summarized by the following picture:

                 ___  ____
             Seg  |  |    | Segment
Virt Addr  |----->|  |    | Table
---------->|---   |  |____|       ____
           |-  | --->|____|----->|    | Page
             | |     |____|   |  |    | Table
             | | Page         |  |    |      
             |  ------------->|  |____|       ____
             |               --->|____|----->|    | Page
             |                   |____|   |  |    |
             | Byte in Page               |  |    |
              --------------------------->|  |____|
                            physical addr--->|____|
                                             |____|

To translate one virtual address to a physical address, first a word must be fetched from the segment table; this holds the address of the page table for that segment. Next, a word must be fetched from the page table for that segment; this holds the frame number of the desired page, from which the physical address may be computed.

A cache is always included in the address translation hardware so that these extra memory accesses are only required when there is a cache miss. In fact, if the cache entries may be directly manipulated by software, the cache itself can be entirely managed by software, in response to page faults, and the memory management unit need never actually initiate a memory cycle to fetch page table entries into the unit. In this case, all cache misses initiate page faults, but some faults, called soft page faults, can be serviced by merely bringing the indicated page-table-entry into the memory management unit with no need for I/O activity.

Address Translation and Protection

Virtual address translation hardware does more than just translate addresses. It can also serve a central role in protection. A protection domain describes, for one logical resource user, for example, a process or a person, what resources that user should be able to access and how. Each logical resource user typically operates in a different domain.

When there is one big virtual address space, shared by all users, a change of protection domains can be accomplished by changing the markings on the page table entries. Those table entries describing pages that the new user should not access can be marked inaccessible; those entries describing pages the user should not be able to modify can be marked read-only, and so on.

Editing address maps can be expensive, particularly if they are large, so it is more common to assign a separate virtual address space to each domain. In this case, all that needs to be done when the domain changes is to change the pointer to the currently active page table. The page table then becomes the primary record of what pages are in each domain and what access each domain allows to each page.

With one virtual address space per domain, it is common to find that most domains are fairly small; if a full page table occupies a few megabytes, this can lead to a significant waste of resources, so many virtual addressing systems allow some mechanism to eliminate storage of irrelevant parts of the page table. For example, entire segments of the page table may be marked as invalid in the segment table, so that no space must be allocated to the corresponding page tables, or there may be a limit register that describes size of the page table, or segments of the page table may be variably sized.

With one virtual address space per domain, sharing of pages between domains is accomplished by replicating the page table entries for the shared pages between the page tables for each domain that shares those pages. When the page table is segmented, shared segments are sometimes used instead of shared pages.

Locality and the Working Set

In all cases, virtual memory rests on the fundamental empirical observation that most programs exhibit a property called locality of reference. A program exhibits strong temporal locality if the set of addresses most recently referenced is a strong predictor of the set of addresses that will be referenced soon. The fact that most programs are iterative leads naturally to temporal locality.

A program exhibits strong spatial locality if most memory references are to locations near other referenced locations. The fact that instructions are usually executed sequentially and the fact that sequential search is common both naturally lead to spatial locality. It is important to remember that locality is an empirical property, and that there are a few programs that exhibit very bad locality.

As a consequence of the locality of programs, we can talk about the working set of a process or group of processes. This is the set of pages that process or group of processes needs in order to make progress. When a process comes into execution in a demand paged environment, there will typically be a sequence of page faults as its working set is brought into memory. Once the working set is in memory, that process will typically run steadily, with very few page faults; these page faults typically reflect changes in the working set as the program enters different phases of its computation.

When the number of page frames on a computer is smaller than hold the number of pages in the working set of the set of ready and running processes on a system, performance is typically very bad. A large fraction of all memory accesses on such a system cause page faults, and with each page fault, some page that will be needed in the near future is forced out to disk, guaranteeing that there will be yet another page fault very soon. A system with this behavior is said to be thrashing.

The application or mix of applications determines the working set size. For any particular application, a graph of page-faults per second versus number of page frames made available to that application will tend to look like the following:

       |
       |______    Working set size
       |      --_ |
 page  |         -|
faults |          _
 per   |Thrashing   CPU bound
second |    or    |-
       |I/O bound | _ 
       |          |  -__
       |          W     -------
     --+------------------------
       |  page frames available

In this graph, the number of page frames in the working set is empirically determined by the location of the transition from normal CPU bound operation to the thrashing state.

Some application exhibit a sharp thrashing transition -- these are programs with high locality and a well defined working set. Others exhibit a gentle transition to thrashing -- these are programs with poor locality and an ill defined working set.

There are two cures for thrashing: First, we can buy more memory in order to enlarge the set of available page frames. Second, if there are multiple processes, we can shrink the working set by stopping some of these processes. If running both processes A and B in parallel causes the system to thrash, so long as A and B are independent, we are better off finishing A before we start B (or visa versa). Many Unix systems include a process called the swapper that monitors the system's performance and stops low priority processes when it detects thrashing, allowing the page replacement algorithm to reclaim the page frames belonging to those processes and use them to complete the higher priority processes.