22C:116, Lecture Notes, Sept. 11, 1995

Douglas W. Jones
University of Iowa Department of Computer Science

Page Tables and Frame Tables
Note that many implementations of paged virtual memory rely on two distinct data structures, the page table and the frame table.
The page table describes the state of each page in the virtual address space. There is one entry in the page table for every addressable page in the virtual address space. This table frequently grows very large for large address spaces.
The page table is indexed by the virtual address of the page, and each page table entry typically has a field that describes where that page is currently stored -- for pages currently not in main memory, this would typically be the disk address where the page is stored; for pages in main memory, this would be the page frame holding the page.
The frame table is indexed by the physical address of the frame, and the entries describe the state of each page frame in main memory. For those frames currently in use, it describes what pages are is them. There is one entry for each page frame in the physical address space. At any instant, the frame table can typically be derived from the page table, but this derivation is computationally expensive.
The minimum frame table is just an inverted index, holding the index of the page table entry holding the page currently stored in each page frame. On some systems, more information is stored in the frame table; such information as whether the frame has been recently used and whether the contents of the frame have been changed by the CPU are all as useful if stored in the frame table as they are if stored in the page table.
Address translation problems
Recall that the easiest to understand virtual to physical address translation hardware holds the entire page table in the memory management unit:
```
                  Word in page
   Virt addr  |----------------------->| Phys Addr
   ---------->|  Page   ______  Frame  |--------->
              |---     |      |   ---->|
                  |    | Page |  |
                  |    |Table |  |
                   --->|      |  |
        address        |______|  |
        invalid         |  |     |
      <-----------------    -----
```
For large page tables, this is not implementable. For example, on the Intel 80386 (and higher numbered processors), the virtual address is 32 bits, of which 12 bits specify byte-in-page. Thus, the page table must contain a million entries. The entries themselves turn out to be 32 bits each, so this approach to implementing the memory management unit would require dedicating 4 megabytes of special purpose memory to page table management.
The original ATLAS virtual memory system used an alternative approach to translation. The address translation hardware held a copy of the frame table. When a virtual address was presented to the hardware, it compared the desired page number with the page number stored in each frame table entry. If there was no match, a page fault was reported. If there was a match, the index of the frame table entry holding the match was used as the frame number. This comparison was done using fast parallel hardware, in what we would now call an associative memory.
The atlas scheme only works if the number of frames is itself relatively small. On a modern machine with many megabytes of memory, the number of page frames may itself grow into the tens of thousands, so neither the entire frame table nor the entire page table can be held in the memory management unit registers.
The usual solution to this problem is to hold the page table and frame table in main memory, and only bring those entries that reference recently used pages into the registers in the memory management unit. In this case, we say that the memory management unit contains a virtual address translation cache or address translation lookaside buffer. From the viewpoint of system software, we then view the address translation process as if it was done through an address translation table in main memory.
The problem of large page tables
A typical modern machine, whether a DEC VAX, an Intel 80586, or an IBM RS6000 has a virtual address space of 4 gigabytes, but it only has 4 to 16 megabytes of main memory. The typical virtual address map needed to describe such an address space contains over 1 million map entries, and requires on the order of 4 megabytes to store. Users with 4 megabytes of main memory would be quite unhappy if the system required the entire memory just to store the address map!
The solution is to break up the virtual address map. In effect, the map itself is stored in virtual memory, and only those pages of the address map that contain map entries for recently used pages occupy page frames in main memory at any given time. Each page of the virtual address map is called a segment of the map, as follows:
```
                            _ _ _ _
             Segment Table |_|_|_|_|
             _______________| | | |________
  Page      |              ___| X          |
 Tables  _ _|_ _       _ _|_ _          _ _|_ _
  for   |_|_|_|_|     |_|_|_|_|        |_|_|_|_|
Segments_| | | |_     _| | | |_        _| | | |_ 
       |  |   |  |   |  |   |  |      |  |   |  |
Pages |_||_| |_||_| |_||_| |_||_|    |_||_| |_||_|
```
The page table for a segment may itself be moved to disk when all pages in that segment have been moved to disk, and once this is done, a reference to any page in that segment requires that the page table for that segment be brought back into main memory first, prior to bringing in the referenced page.
When such a system is used, virtual addresses are typically broken up as follows:
```
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
|                   |                   |                       |
       segment         page in segment        word in page
```
The segment field specifies what segment of the address map contains the desired map entry, while the page field specifies what entry in that segment of the map should be used.
Assuming that the virtual address translation hardware uses a cache, there are now four different outcomes possible for each address translation operation:
Hit: The desired page table entry was in the hardware address translation cache. Address translation takes place as expected.
Miss: The desired page table entry is not in the hardware address translation cache. The address translation unit performs one memory cycle to fetch the page table entry for the desired page. Address translation takes place as expected by the software, but it is slowed by the additional hardware activity.
Segment Fault: A miss occurs, but the desired page table entry cannot be fetched because the desired segment of the page table is not in main memory. The address translation unit therefore requests a fault, aborting the current instruction and allowing the operating system segment fault trap service routine to bring in the desired segment of page table.
Page Fault: A miss occurs, the desired page table entry is available, but the page is marked as invalid. The address translation unit requests a fault, aborting the current instruction and allowing the page fault trap service routine to bring in the desired page.
The Software View
With a paged segmented virtual address space, the software view of address translation is summarized by the following picture:
```
                 ___  ____
             Seg  |  |    | Segment
Virt Addr  |----->|  |    | Table
---------->|---   |  |____|       ____
           |-  | --->|____|----->|    | Page
             | |     |____|   |  |    | Table
             | | Page         |  |    |      
             |  ------------->|  |____|       ____
             |               --->|____|----->|    | Page
             |                   |____|   |  |    |
             | Byte in Page               |  |    |
              --------------------------->|  |____|
                            physical addr--->|____|
                                             |____|
```
To translate one virtual address to a physical address, first a word must be fetched from the segment table; this holds the address of the page table for that segment. Next, a word must be fetched from the page table for that segment; this holds the frame number of the desired page, from which the physical address may be computed.
A cache is always included in the address translation hardware so that these extra memory accesses are only required when there is a cache miss. In fact, if the cache entries may be directly manipulated by software, the cache itself can be entirely managed by software, in response to page faults, and the memory management unit need never actually initiate a memory cycle to fetch page table entries into the unit. In this case, all cache misses initiate page faults, but some faults, called soft page faults, can be serviced by merely bringing the indicated page-table-entry into the memory management unit with no need for I/O activity.
Address Translation and Protection
Virtual address translation hardware does more than just translate addresses. It can also serve a central role in protection. In this discussion, the term protection domain is important. A protection domain describes, for one logical resource user, for example, a process or a person, what resources that user should be able to access and how. Each logical resource user therefore operates in a different domain.
When there is One big virtual address space, a change of protection domains can be accomplished by changing the markings on the page table entries. Those table entries describing pages that the new user should not be able to access should be marked inaccessible; those entries describing pages the user should not be able to modify should be marked read-only, and so on.
Editing address maps can be expensive, particularly if they are large, so it is more common to assign a separate virtual address space per domain. In this case, all that needs to be done when the domain changes is to change the pointer to the currently active page table. The page table then becomes the primary record of what pages are in each domain and what access each domain allows to each page.
With one virtual address space per domain, it is common to find that most domains are fairly small; if a full page table occupies a few megabytes, this can lead to a significant waste of resources, so many virtual addressing systems allow some mechanism to eliminate storage of irrelevant parts of the page table. For example, entire segments of the page table may be marked as invalid in the segment table, so that no space must be allocated to the corresponding page tables, or there may be a limit register that describes size of the page table, or segments of the page table may be variably sized.
With one virtual address space per domain, sharing of pages between domains is accomplished by replicating the page table entries for the shared pages between the page tables for each domain that shares those pages. When the page table is segmented, shared segments are sometimes used instead of shared pages.
Emphasis added!
In all cases, virtual memory rests on the fundamental empirical observation that most programs exhibit locality.
The locality principle is that those pages that have been recently referenced are likely to be referenced again (temporal locality), and that, when new pages are referenced, they are likely to be near recently referenced pages (spatial locality).
As a consequence, we can talk about the working set of a process, the set of pages it is currently using. When a process comes into execution, there will typically be a sequence of page faults as its working set is brought into memory, and then the system will settle down, with only an occasional page fault as the process changes its working set to work on new areas of memory.
When the number of available page frames is insufficient to hold the number of pages in the working sets the processes currently running on a system, the system typically runs very slowly, with a large fraction of all memory accesses causing page faults; with each page fault, some page that will be needed in the near future is swapped to disk, guaranteeing that there will be yet another page fault in the near future. A system with this behavior is said to be thrashing.
The application or mix of applications determines the working set size. For any particular application, a graph of page-faults per second versus number of page frames made available to that application will tend to look like the following:
```
       |
       |______    Working set size
       |      --_ |
 page  |         -|
faults |          _
 per   |Thrashing   CPU bound
second |    or    |-
       |I/O bound | _ 
       |          |  -__
       |          W     -------
     --+------------------------
       |  page frames available
```
In this graph, the number of page frames in the working set is empirically determined by the location of the transition from normal CPU bound operation to the thrashing state.
Some application exhibit a sharp thrashing transition -- these are programs with high locality and a well defined working set. Others exhibit a gentle transition to thrashing -- these are programs with poor locality and an ill defined working set.