22C:116, Lecture Notes, Lecture 7, Fall 2001

Douglas W. Jones
University of Iowa Department of Computer Science

Schedulers
The scheduler is the software within an operating system that determines when each operation in the computer system is done. There are many kinds of scheduling; for example, there are disk schedulers that schedule the order of disk operations on a high-traffic disk system, and there are network schedulers that determine the order of transmission of messages onto a network link. If the term scheduler is used without qualification, however, it is usually understood to refer to the scheduling of the CPU.
CPU scheduling revolves around one simple question. At each instant in time, what process among those that are ready to run should be running on each CPU. Scheduling can be trivial, for example, on a machine where there is only one process or where each process runs on a dedicated CPU. On the other hand, scheduling can be quite complex. The extreme case for scheduling complexity occurs on the largest of servers, where huge numbers of relatively short-lived but computationally demanding jobs are created in response to service requests.
The CPU scheduling task is usually divided into two layers, conventinally called low-level scheduling and high-level scheduling. Low level scheduling is concerned with the immediate problem of selecting among the ready processes to select the next process to run, while high level scheduling is concerned with, for example, deciding what processes should be run in what order.
On a huge server, for example, we might perform experiments to determine the system load that maximizes the number of service requests that can be satisfied per minute. Suppose this maximum rate of service is obtained when there are n ready processes in the system. Once we have learned this, we can design a high-level scheduler that monitors the number of ready processes and permits the creation of a new process in response to a service request only when there are fewer than n ready processes. If there are n or more ready processes, our high-level scheduler will queue service requests.
Both high-level and low-level schedulers involve queues, but frequently, the contents of these queues differ. In a low level scheduler, the key queue is a queue of processes that are ready to be run. This is called the ready queue. In a high-level scheduler, it is common to find a queue of jobs that need to be done, the job queue. The high-level scheduler in such a system creates a process every time it removes a job from the job queue.
Whenever we speak of queues, we are entitled to speak of the queueing discipline. The following queueing disciplines are common:

FIFO, first-in first-out
This is the most elementary queueing discipline. If someone says that there is a queue in the system, and they say nothing else, it is fair to assume that it is a first-in first-out queue.

LIFO, last-in first-out
This elementary discipline is so unlike the default that we give objects with this discipline their own name, calling them stacks. Nonetheless, this is a queueing discipline!

Random
Random queues are occasionally useful, although their most common use is as a theoretical model. If all enqueued items have an equal probability of being selected each time an item is dequeued, the queue is as fair as a FIFO queue.

Priority
Priority queues are the most widely used queues in schedulers, at all levels! In a priority queue, each item in the queue has an associated priority, and the dequeue operation always selects an item with a priority higher or equal to the priorities of all other items in the queue. Some priority queues are stable -- they act like FIFO queues for items of equal priority; to be useful, a priority queue implementation should at least offer the fairness of a random queue for items of equal priority.

Low Level Schedulers
In discussing low-level scheduling, we will assume that all of the CPU's on the computer are identical. If they differ, each equivalence class of processors will be treated as a separate low-level scheduling problem.
The low level scheduler centers on a queue of ready processes. Process creation enqueues a process description in this queue. When a process relinquishes or is preempted, its process description is enqueued. When it is time to schedule a new process on a CPU, a process description is dequeued. The fundamental classes of low level schedulers differ primarily in the queueing discipline that is used for the ready queue.
In all low-level schedulers, the ready queue is typically implemented as a linked data structure, using links that are part of each process description record. At the very least, the queue implementation is typically based on a head and tail pointer, with a singly linked list of ready processes.
Exercise: What basic operations on processes require more complex data structures? Consider the following list process control operations: create, exit, kill, relinquish, wait, signal. Assume create returns a pointer to the process description of the created process and that kill takes such a pointer as a parameter.
If the ready queue is a FIFO queue, the scheduler is said to implement the round-robin algorithm, where all processes have equal priority for access to the CPU. Round-robin scheduling was common on early timesharing systems, and it is commonly used for scheduling user-level threads.
If the ready queue is a priority queue, where priority is a fixed attribute of each process, the scheduler is a simple priority scheduler. Such schedulers are commonly used in real-time operating systems. Typically, high priorities are given to processes that must meet frequent deadlines, while low priorities are given to processes with infrequent deadlines. This is called rate monotonic scheduling.
If the ready queue is a priority queue, where priority is adjusted in response to process behavior, we can develop an adaptive scheduler. For example, the classic UNIX scheduler would raise the priority of processes each time they voluntarily relinquished the CPU, perhaps to wait for I/O completion, while lowering the priorities of processes when they are preempted. The result of this scheme is that interactive processes naturally rise to a higher prioriy than compute-bound processes, and it automatically adapts to changes in process behavior. As a result, interactive users usually see very fast response times so long as they programs they are running make minimal demands for compute time.
Another category of scheduler attaches, as the priority of each process, the deadline by which it must finish its operation. This is called deadline-based scheduling. The process with the nearer deadline gets priority, and when a process finishes its operation, it either terminates or resets its deadline to the time at which it must finish its next operation. Deadline based schedulers have been proven to be optimal for real-time systems, but they are rarely used in practice.

Where is the Scheduler?
On a symmetrical multiprocessor, whenever the process running on any particular CPU relinquishes, terminates, waits or is preempted, that CPU runs code to dequeue a process from the ready queue and run that process. Thus, the scheduler is not a single program running on a single CPU, but rather, the scheduling function is distributed over all of the CPUs in the system!
Form that matter, the low-level scheduler, as a software component, is very difficult to extract from the remainder of the process manager. The enqueue and dequeue routines for the ready-queue determine the scheduling policy, but they are not the scheduler. The code for relinquish both enqueues one process and dequeues the process that will be scheduled next, but it is not the scheduler. The same can be said for the code for each of the fundamental process management services!
In fact, if you want to poison an operating system development process, perhaps one of the best things you could do is to try to make the scheduler into a single software component, distinct from and equal in stature to the process manager. A very similar observation can be made about input/output scheduling!

Thread Schedulers
Kernel threads pose no special problems in scheduler design, but user-level threads on a multiprocessor do. In that context, the operating system will run, at any time, one user process per processor. If a user wants to write a multithreaded application where threads can potentially run in parallel on multiple processors, the thread manager itself must execute as a multiple process program!
Typically, the initialization code for the thread manager creates a number of processes, say n. Once these processes are created, as many as n threads may run in parallel, each thread being run under a different process, and if there are at least n CPUs, each thread being run (potentially) on a different CPU.
The n processes that are running the application threads are typically arranged around a shared memory region. All global variables to be shared by the different threads are placed in this region, as are the thread stacks, the thread description records, and the thread ready list structures. Given this, the low level structure of the thread manager is very similar to the low level structure of the process manager on a multiprocessor. Whenever a thread relinquishes, terminates, or blocks, the process running that thread dequeues a new thread from the shared thread ready list and runs that thread.

High Level Schedulers
There are a huge variety of high-level schedulers, enough so that it is hard to generalize, but we can give examples.
The classic UNIX high level scheduler is called the swapper. The swapper is a privileged process that monitors the performance of the system. When it sees evidence of performance problems that result from two many processes competing for system resources, it selects processes and removes them from the sytem temporarily.
The primary concern of the UNIX swapper is the possibility that the total demand for memory by ready or running processes will exceed the resources of the system. With virtual memory technology, such a demand does not cause the system to stop, but it does lead to serious performance problems. When the swapper sees the system nearing the onset of such trouble, blocking a ready process allows the memory manager to reclaim all of the main memory used by that process (copying the process to disk as it does so). Once the load has decreased enough to allow more processes to run, the process that was "swapped" (by merely stopping it) is unblocked, and as it runs, the memory manager will bring it back into main memory.
The classic high level scheduler on an IBM mainframe (sorry, they call them enterprise servers nowdays) deals with a queue of jobs. Each job is typically described as an application to be run along with files that application is to be run with. Typical jobs may range anywhere from "print the entire batch of paychecks for this week" to "update the employee database by adding John Smith as a new employee." Some jobs are run periodically (payroll is a good example), while others are run on demand. On a modern system, many of the on-demand jobs are submitted by server-side web applications, and some of the output of the job will be formatted in HTML.
The mainframe high-level scheduler determines what jobs from the job queue will be run next. Once the job is running, it is running under the authority of the low-level scheduler. The high level scheduler uses a complex priority system based on absolute priority, expected resource requirements of the job, and other considerations to determine which jobs to run next.