22C:116, Lecture Notes, Lecture 4, Fall 2001

Douglas W. Jones
University of Iowa Department of Computer Science

ALTERNATIVE SYSTEM STRUCTURES:
Over the past 40 years, the fundamental structure of operating systems has undergone a significant evolution. In this lecture, we will look at the range of variation.
Classical Monitors:
The first generation of operating system, frequently called a monitor, typically running on a machine with no memory protection mechanism, could generally be described by a map of the machine's memory showing what system components resided in each range of memory addresses:
```
 _________
|         | Interrupt handlers           \
|_________|                               \
|         | Resident library               > The operating system
|_________|                               /
|         | Command language interpreter /
|_________|
|         | \
|         |  \
|         |   > Available to user programs
|         |  /
|_________| /
```
(Omit the interrupt handlers for first generation computers -- interrupts were not common with vacuum tube machines. Omit the interrupt handlers for the first generation of minicomputer (1965-1970) and microcomputer (1975-1980) operating systems because the programmers using those machines frequently didn't understand interrupts very well. The BIOS underlying MS-DOS for the IBM PC is very close to such an archaic I/O subsystem.)
The order of the system components in the memory of the machine could vary, depending on what memory addresses were dedicated to hardware functions and depending on the availability of any protection mechanism.
The function of the code within the system could be divided up more finely as follows:

Device drivers
each I/O device in the system is typically supported by one or more interrupt handlers -- code to handle interrupts from that device, plus one or more routines in the resident library, used by programs wishing to perform I/O operations to that device. The combination of interrupt handlers plus library routines that support one device is called the device driver for that device.

Device independent I/O routines
While user software may occasionally make direct calls to routines in a device driver, most users prefer to write I/O operations in a device independent form. So, the resident library usually included device-independent I/O routines that users would call. These, in turn, would call the appropriate device dependent I/O routines. The concept of file is introduced at this level as a name for a logical I/O channel. For example, a routine such as open() could bind a logical I/O channel, at runtime, to a device, and from then on, calls to the device independent putc() would write to that specific device.

The loader
One key routine in the resident library is the loader. When called with an open object file as a parameter, this reads object code from that file, translating it into machine code and loading that code into the user area in memory. Without this component, the system would be limited to running only one application, whatever was pre-loaded in the user area at startup time.

Other Library Routines
Any library routines that are commonly used by large numbers of applications may be included in the resident library. If the resident library is large, the amount of memory available to applications will be smaller, but applications will link and load very quickly because little code from the standard library must be included with each application. If the resident library is big, linking time will be longer because applications must include standard library routines in their own object code. Typically, on systems where large numbers of small applications were being run (for example, programs from introductory CS classes), large resident libraries were the norm, while on systems that ran a few large applications, smaller resident libraries were the norm.

The Command Language Interpreter
This is, in a sense, the initial user program. The bootstrap loader loads the system image into memory and then jumps to the command language interpreter. The command language interpreter, in turn, reads input from some device and interprets this input (mouse clicks, textual command lines) as instructions to (among other things) load and run application programs. Command language interpreters generally include some built-in functions, but large built-in functionality takes memory that user programs may need, so on small systems, most command language functionality was embedded in user programs.

This kind of system typified mainframe operating systems designed for first and second-generation computers, in the 1950's and the early 1960's. These systems ran only one application at a time. Minicomputers, the small computers of the 1960's and 1970's, were sold with similar operating systems, and the first generation of operating systems for personal computers were again similar. Even early versions of MacOS and MS/DOS were basically designed on this model -- a sophisticated graphical user interface can be built using an archaic operating system architecture!
Swap-based Timesharing and Multitasking Monitors:
The first generation of memory protection mechanisms were quite simple. These were introduced with the second generation of computers, transistorized machines. Typically, there was a single CPU register, the protection register, that was ignored in unprotected or system mode and used in protected or user mode. In user mode, it was illegal to perform I/O operations or to use memory addresses less than the protection register. Performing such an illegal operation would cause a trap, a transfer of control akin to an interrupt. In system mode, all addresses were legal. With this added hardware and a carefully written operating system, user programs could not crash the system, and while this is valuable, but this hardware offered a new possibility.
A timesharing monitor was intended to provide a virtual personal computer to each interactive user. Typically each interactive users had a keyboard-printer as a terminal; the two dominant brands were Teletype, and the IBM 2740/2741 (based on the Selectric typewriter mechanism). The Teletype had a 110 baud asynchronous serial interface and used ASCII, while the IBM 2740/2741 ran at 150 baud and used proprietary IBM character sets.
A multitasking monitor was intended to provide multiple virtual computers, one to each task running on the machine, where most of the tasks were not necessarily interactive. One task might communicate with the machine operator, while others would run batch jobs and yet others might intereact with mechanisms attached to the computer.
In fact, these two classes of systems differ more in the way users perceive the system than they do in necessary internal details. In the 1960's, when these ideas were new and system programmers were exploring what was possible, the similarities were frequently thoroughly obscured by poor understanding of what was being done.
The first generation of timesharing and multitasking systems was based on a very simple idea. At any moment, only one user program would be loaded in main memory, while the others were stored on secondary memory -- in early times, typically drum memory, but disk would be typical today. When it came time to change form one user program to another, the entire user area of memory would be written out to secondary memory, and then the code and data of the next user program would be read in from secondary memory. This exchanging of programs was called swapping.
Swapping may seem slow, but even in the 1960's, drums and disks frequently rotated at 20 or more revolutions per second, and the storage capacity of one track was frequently about the size of the user area of memory. As a result, the basic swap operation took under 1/10th of a second, and if a typical user program was allowed to use the CPU for on the order of one second at a time, swapping only consumed 10% of the time, an acceptable overhead.
Multiple Memory Resident Tasks:
Swapping allows multiple processes to share the CPU, but the cost of context switching from one process to the next is quite high. If all of the processes are trustworthy and small, they can easily all be resident, and the context switching expense can be quite low; all that needs to be done when switching from one process to the next is to save the registers of one and restore the registers of the next.
Early (and many modern) real-time systems used this model, and if there is a rudimentary protection mechanism, this may be combined with swapping, so that trusted resident tasks can be run while untrusted user tasks are swapped. Several successful early timesharing systems were built on this model.
More complex memory proteciton mechanisms allow more! If the memory protection mechanism creates one virtual address space per user process, we can run one user process while the previous is being swapped out and the next is being swapped in. This assumes, of course, that we have sufficient memory for at least two user processes.
The simplest sufficient memory protection mechanism allowing this has two registers in the CPU, a base and a bound register. In user mode (but not in protected mode), the base register is added to all virtual addresses in order to compute the physical address, and the virtual address is illegal if it exceeds the bound register. Variations on this simple mechanism were used by such important computers as the DEC PDP-10 and the CDC 6600 (both were used for commercially successful timesharing systems, and the 6600 is generally seen as the first supercomputer).
Segmented address translation systems offer more. For example, if the memory address space is divided into two segments, one may be used for access to the user program, and the other for access to the user's data. If two users are running the same program, context switching from one user to the next involves changing only the data segment and the other registers, so we can avoid paying the price for swapping code.
Paged address translation offers finer granularity of sharing, so two processes could share some code pages, have other code pages that are private, and perhaps share some data pages as well. The Berkeley Timesharing System running on the SDS 940 computer was the first system I know of to fully exploit this possibility in the second half of the 1960's, but it was built in the shadow of the MIT/GE/Bell MULTICS project, and when MULTICS finally worked a year or two later, it was far more complete in its exploitation of these ideas.
Demand Paged Virtual Memory:
Demand paging allows swapping to be done one page at a time, with only a few pages of each user program resident in memory at any time. We will discuss this more, later, but the important thing to realize is that demand paged memory management is, in many ways, an independend idea that can be applied to any of the system structures outlined previously.
The User/System Boundary:
Up to this point, the resident code of the operating system has been viewed as being fairly unstructured; From the system's point of view, at any instant, the user has unrestricted access to some region of memory (the boundaries of which are set by the memory protection mechanism), and any attempt by the user to reach outside that region causes a trap.
All interrupts and traps transfer control to the system, with the CPU state prior to the interrupt or trap saved through some combination of hardware and software. The specific combination depends on the particular computer being used. The interrupt or trap service routine runs in system state, and therefore has unrestricted access to all system resources.
Interrupt or trap service routines may return directly to the process that caused the interrupt or trap, restoring the saved state, or they may return to a different process. In that case, the system has done a context switch from one user process to another. It is worth noting that the interrupt itself represents a partial context switch from user state to system state, and that the time taken to switch from one user's context to another is, at minimum, the same as the time taken to respond to an interrupt.
A call from a user program to a routine within the operating system involves a deliberate transfer of control across the protection boundary between the user program and the system. This boundary, enforced by the protection mechanism, can be likened to a fence, and the deliberate crossing of this fence has been described as being made through a gate provided for this purpose. Therefore, the term gate-crossing has been used to describe what must be done to call a system routine from a user program.
Any trap service routine may be used for gate-crossing. Consider a system where users may not address memory addresses 0 through FFFF₁₆. Since many user programs use address 0 as the value for NULL pointers, we should probably consider any use of this address as an error, but we could write the illegal memory address trap service routine to catch references to address FF00₁₆ through FFFF₁₆ and treat references to each specific address in this range as a call to one of 256 specific system functions. For example, referencing FF00₁₆ might cause normal program termination, while referencing FF01₁₆ might read the next character from standard input into register 0.
Many hardware designs reserve a specific opcode for system calls. To the hardware designer, this is just an illegal instruction, but in the manual, this instruction is specially documented as being used for system calls. This idea was fully explored on the SDS 930 machine, where the POP (programmed operator) opcode was reserved. POP instructins caused traps, and the operating system was expected to interpret the address field of the POP instruction as a parameter to the service. (The name POP was not confused with a stack operation back then, because outside of Burroughs corporation, nobody in the mid 1960's cared much about stacks).
The IBM 360 included a SVC (supervisor call) instruction that was similar. The DEC PDP-11 included a TRAP instruction, and the microprocessors of the 1970's all included similar routines. In fact, there was rarely a need to set aside special opcodes for this, because the protection mechanism always makes use of any instruction that would change the protection state illegal in user programs, so these instructions are always available, in user mode, for use as gate crossing instrucitons.
The Problem posed by Parameters:
Merely transferring control from the system to the user is not sufficient! We must also be able to pass parameters. There are several ways of passing values, all of which have been used on one system or another. Parameters can be passed in registers, for example, or on the stack, or inline in the code after the instruction that actually causes the call.
Passing the parameter is easy. The hard thing, for the system, is interpreting the parameter. Integers are easy to pass, their meaning doesn't depend on their context. Pointers, on the other hand, are meaningful only in the address space from which they were passed.
Even the simplest of memory management units translate addresses; for example, those that simply establish a base address and bound for each user program translate the address by adding the base address.
User programs pass pointers that are meaningful in the user context, and when the system receives such a pointer, it must do two things:
First, the system must determine if the pointer is valid. Because the protection mechanism is turned off in system state, the system cannot just use the pointer without checking. If it did so, the user could pass an invalid pointer and the system might corrupt some part of the memory that the user had no right to manipulate. With the simple base-bound protection mechanism, the system must check that the pointer does not exceed the bound.
Second, the system must adjust the pointer so it refers to the correct memory location from the system's point of view. With a simple base-bound protection mechanism, for example, the system must add the bound.
So, consider a setting where parameters to system calls are passed by pushing them on the stack before the call. First, the system must check and translate the user's stack pointer; then, it may pop items off the stack. If any of those items are pointers, for example, the addresses of I/O buffers, the system must check and translate those pointers before using them.
The developers of Multics estimated that the cost of checking and translating pointers that were passes as parameters was the dominant component of the cost of gate crossing in their system. They had good hardware support for the actual control transfer, and their virtual memory model even eliminated the problem of address translation, but the check to see if the caller had authentic rights to the memory pointed to by a pointer was entirely left to software, and small errors in this checking could cause major security loopholes in the system.
The Onion Model:
A software engineer looking at the bulk of the code within an operating system can typically identify a number of subsystems, and in many cases, these subsystems have a distinct layering, where the subsystems in each layer build only on lower layers. The designers of the Multics system were the first to attempt to fully exploit this in the mid 1960's. Many hierarchies have been proposed, here is one:
1. Main memory allocation: Routines for allocating buffers and other regions of main memory, used by I/O drivers and system routines.
2. Low level process manager: Basic scheduling and context switching services.
3. Drivers: Interrupt handlers and basic access routines for each device. These routines are entirely device dependent. Drivers allocate buffers using the main memory allocation layer, and drivers use the low level process manager to block and unblock the callers of device services.
4. Virtual memory: The page fault handler relies on the disk I/O driver.
5. File system: The file system provides device independent user access to devices. This uses the drivers.
6. High level process manager: This associates proccesses with address spaces and initial I/O environments.
7. User code:
The MULTICS project responded to this idea of hierarchical decomposition of system function by proposing a general protection mechanism that allowed each layer in the hierarchy to be protected from misuse by any layer above it. They developed a hardware model that supported at least 8 layers, and they built a software model on top of this that supported 64 layers, so that user programs could be divided into layers in the same way.
This was a mixed success. It turns out that our software is not strictly hierarchical, with more privileged and less privileged components. Nonetheless, this model has given us a key bit of terminology that lasts to this day:
The kernel is the innermost layer of the system. (Computer people or notorious for misspelling the word kernel -- it is a common word! We speak figuratively of the kernel of truth in an idea, the innermost part of the idea that is true, even though it is covered in layers of falsehood, and we speak literally of a kernel of wheat, the actual seed that lies inside layers of chaff.)
The kernel-Domain Model:
The reaction to the complexity of Multics and to the shortcomings of the Multics protection model began around 1970. The cleanest statement of this reaction was that an operating system should consist of a very small kernel that served only to establish multiple protected domains with controlled communication channels between domains. One can then speak of operating system structure in terms of a map of the domains and their interactions. Some of these domains will form natural hierarchies, but this is not required.
Modern operating systems such as Mach and Amoeba clearly illustrate the kernel-domain model. The first system to reach widespread use that adopted the term kernel was UNIX, and the use of the term there is a bit problematic.
As used on Multics and in the research literature, the term Kernel was intended to refer to something very small, and the UNIX kernel is not that small. On the other hand, the designers of UNIX did move many system functions out of their kernel that had been considered system functions in the common commercial systems of 1970. Prior to UNIX, the command language interpreter was frequently part of the system, as were most of the functions performed by daemon processes under UNIX. As such, in a relative sense, the use of the term kernel by the UNIX community was justified in 1970.
On the other hand, neither the Multics kernel nor the kernels of modern operating systems include the file system, and many kernels even push much of the I/O and process management code outside of the kernel. Like the user programs, each such system function operates in its own protection domain, isolated from all outside communication except to the extent that it is allowed by communication channels maintained by the kernel.