Concurrent Programming

Using multiple processes to do useful work

Part of the notes for 22C:112, Operating Systems
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

The Problem

Users must face a number of new problems on systems which allow programs to be constructed from multiple concurrent processes. Some of these problems have been widely studied and a number of solutions are known. Among these, the mutual exclusion problem and the producer-consumer problem are particularly well known and should be understood by all programmers attempting to write programs consisting of multiple processes. These problems have actually been hinted at in previous sections. The queues used to communicate between an interrupt service routine and a user program are an example of a special case of the producer-consumer problem; the example shown in Figure 15.13 also hints at the producer-consumer problem. The mutual exclusion problem was also mentioned in the context of interrupt driven input/output; the special solution used in that context was to disable interrupts during a critical section. This chapter includes discussions of a number of solutions to these problems.

Although most operating systems support only a few basic mechanisms to aid in solving the problems of concurrent programming, many experimental programming languages have incorporated such mechanisms. Some of these languages have advanced beyond the experimental stage and may become widely used; among these, the Unix shell and the Ada programming language are particularly important. This chapter ends with a brief survey of the concurrent programming features of these languages.

The Mutual Exclusion Problem

When two or more processes must share some object, an arbitration mechanism is needed so that they do not try to use it at the same time. The particular object being shared does not have a great impact on the choice of such mechanisms. Consider the following examples: Two processes sharing a printer must take turns using it; if they attempt to use it simultaneously, the output from the two processes may be mixed into an arbitrary jumble which is unlikely to be of any use. Two processes attempting to update the same bank account must take turns; if each process reads the current balance from some database, updates it, and then writes it back, one of the updates will be lost.

Both of the above examples can be solved if there is some way for each process to exclude the other from use of the shared object during critical sections of code. Thus the general problem is described as the mutual exclusion problem. The mutual exclusion problem was recognized (and successfully solved) as early as 1963 in the Burroughs AOSP operating system, but the problem is sufficiently difficult that it was not widely understood for some time after that. A significant number of attempts to solve the mutual exclusion problem have suffered from two specific problems, the lockout problem, in which a subset of the processes can conspire to indefinitely lock some other process out of a critical section, and the deadlock problem, where two or more processes simultaneously trying to enter a critical section lock each other out.

On a uniprocessor system with non-preemptive scheduling, mutual exclusion is easily obtained: The process which needs exclusive use of a resource simply refuses to relinquish the processor until it is done with the resource. A similar solution works on a preemptively scheduled uniprocessor: The process which needs exclusive use of a resource disables interrupts to prevent preemption until the resource is no longer needed. These solutions are appropriate and have been widely used for short critical sections, such as those involving updating a shared variable in main memory. On the other hand, these solutions are not appropriate for long critical sections, for example, those which involve input/output. As a result, users are normally forbidden to use these solutions; when they are used, their use is restricted to system code.

Semaphores can be used to solve the mutual exclusion problem at the user level; for example, consider the code fragments shown in Figure 33.1.

var shared: ...      { the shared variable needing mutual exclusion };
    mutex: semaphore { a shared semaphore; initial count is 1 };
      .
      .
      .

repeat { a typical process using shared }
      .
      .     { code outside the critical section;
      .       \dreferences to shared are not allowed }\u

      wait( mutex ) { start of critical section };
          .
          .     { code inside the critical section;
          .       \dreferences to shared are allowed }\u

      signal( mutex ) { end of critical section };
      .
      .     { more code outside the critical section }
      .

until ...

Figure 33.1. Using semaphores to solve the mutual exclusion problem.

This code uses a semaphore to control access to a shared variable, but the semaphore could just as well be used to control access to a shared file or device. Because the initial count in the semaphore is 1, the first process to execute a wait operation on that semaphore will be allowed into the critical section. While a process is in the critical section, on the other hand, the value in the semaphore will be zero, forcing other processes to wait.

If there are multiple shared resources, each may be protected by a separate semaphore. If no process ever attempts to use more than one shared resource at a time, this solution is safe and free of risk. On the other hand, if two or more processes must obtain exclusive use of more than one resource at some point during their execution, the use of separate semaphores to guard each resource may lead to deadlock. Such deadlocks may be avoided by complex solutions such as the banker's algorithm, or by requiring that multiple resources be obtained in a fixed order.

The statement that semaphores can be used to solve the mutual exclusion problem is somewhat misleading because it assumes that the semaphores themselves have been implemented. A careful examination of the code for a naive implementation of semaphores shows that this code itself contains critical sections where mutual exclusion must be guaranteed! These are shown in Figure 33.2, which repeats the code that as was originally given in Figure 15.12.

type semaphore = integer;

procedure wait( var s: semaphore );
begin
     { begin critical section }
     while s < 1 do begin
          { end critical section }
          { an appropriate place to relinquish }
          { begin critical section }
     end;
     s := s - 1;
     { end critical section }
end { wait };

procedure signal( var s: semaphore );
begin
     { begin critical section }
     s := s + 1;
     { end critical section }
end { signal };

Figure 33.2. The critical sections in a crude semaphore implementation.

Some of these critical sections are fairly obvious; for example, while one process has loaded the count in the semaphore into a register to increment or decrement, no other process should try to increment or decrement the count. The reason for the inclusion of while loop terminating condition in the critical section in the wait operation is more subtle. Here, it is essential that, when two processes are competing to perform a wait operation on a semaphore with an initial count of 1, one must both test and decrement the semaphore before the other tests it; if both could test it before either one finished decrementing it, the count would eventually be decremented to -1 and neither process would be blocked.

On a uniprocessor system, it would be quite appropriate to use the code from Figure 33.2 with the comments marking the beginning and end of each critical section replaced with code to disable and re-enable interrupts. On a preemptively scheduled system, this would work, but the wait operation would frequently waste the remainder of a time-slice waiting to be preempted, so it would be better to explicitly relinquish the processor once for each cycle of the while loop. This solution has proven to be satisfactory for many systems, but when there are many processes, a large number of which are waiting at any time, a significant fraction of the available processor time can be devoted to context switching between waiting processes, giving each a chance to poll the semaphore on which it is waiting.

As was pointed out in Section 16.3, polling (also known as busy waiting) can be eliminated from the implementation of the wait operation by incorporating a queue of waiting processes into each semaphore. In a preemptively scheduled uniprocessor environment, this might lead to code based on the outline shown in Figure 33.3.

type semaphore = record
                      waiting: processqueue;
                      count: integer;
                 end;

procedure wait(var sem: semaphore);
begin
     { begin critical section }
     sem.count := sem.count - 1;
     if sem.count < 0 then begin { enter wait state }
          enqueue( running, sem.waiting );
          running := dequeue( ready );

	  { NOTE: At this point, the current process is not the
            one which called wait!  Special code may be required
            to finish this context switch, for example, to save
            and restore register values such as the stack pointer }

     end;
     { end critical section }
end { wait };

procedure signal(var sem: semaphore );
var t: integer;
begin
     { begin critical section }
     if sem.count < 0 then begin { wake up a waiting process }
          enqueue( ready, dequeue( sem.waiting ) );
     end;
     { end critical section }
end { signal };

Figure 33.3. The critical sections in a queue-based semaphore implementation.

This outline is based on the coroutine based code given in Figure 16.6; as with that code, this requires that semaphores be integrated with the scheduler; specifically, the context switching code in the wait operation must be carefully matched to the context switching code used when a process is preempted or when a process relinquishes. Note that, as in Figure 33.2, the specific code used to ensure mutual exclusion during the critical sections has not been shown, but comments show where this code must be placed. As long as a uniprocessor environment is used, these comments may be replaced with code to enable or disable interrupts.

In a multiprocessor environment, or in a uniprocessor environment where disabling interrupts is not allowed, other solutions to the mutual exclusion problem must be found. A very common solution rests on the use of special machine instructions, variously known as test-and-set (TS on the IBM 360/370, BSET on the Motorola 68000, SBITL on the National 32000), branch-and-set (BBSSI on the DEC VAX) or exchange (EXCH on the DEC PDP-10, LOCK XCHG on the Intel 8086). Whatever their name, these instructions allow a program to both inspect the old value of a variable and set a new value in that variable in a single indivisible operation. In the case of test-and-set instructions, the condition codes are set to indicate the value of a Boolean variable prior to setting it. Branch-and-set instructions do a conditional branch on a Boolean variable prior to setting it. Exchange instructions allow the old value of a variable to be fetched from memory and replace it with a new value; having fetched the old value, it may be inspected at any later time.

A simple or binary semaphore is a semaphore which has only two values, zero and one; when used to solve the mutual exclusion problem, the value zero indicates that some process has exclusive use of the associated resource and the value one indicates that the resource is free. Alternately, the states of the semaphore are sometimes named "claimed" and "free"; the wait operation claims the semaphore, the signal operation frees it. Binary semaphores which are implemented using busy waiting loops are sometimes called spin-locks because a process which is waiting for entry to a critical section spends its time spinning around a very tight polling loop. An example implementation of spin-locks using an instruction which exchanges the values of its two operands is shown in Figure 33.4.

type spinlock = 0..1;

procedure spinwait(var s: spinlock);
var t: spinlock;
begin
     t := 1;
     repeat exchange(t,s) until t = 0;
end { spinwait };

procedure spinsignal(var s: spinlock);
begin
     s := 0;
end { spinsignal };

Figure 33.4. Spin-locks (binary semaphores) using the exchange instruction.

This implementation would be quite appropriate for use on a multiprocessor system; in fact, it appears to have been understood by the designers of the Burroughs AOSP operating system in 1963. This is also an appropriate implementation on a uniprocessor if a call to a relinquish service is included in the busy waiting loop.

The machine code needed to implement spin-locks frequently involves only two or three instructions for the code of "spinwait" and one instruction for the code of "spinsignal" shown in Figure 33.4; as a result, they take very little time when no waiting is required because the semaphore was already available. On the other hand, this implementation wastes processor time if there is much contention for entry to critical sections. As a result, spin-locks are preferred for entry to critical sections which have low contention, but they are usually discouraged for high contention critical sections.

When general semaphores are implemented using a first-in first-out queue of waiting processes, they guarantee that processes contending for entry to a critical section will be granted access in the order they arrived at the entrance to the critical section; this, in turn, guarantees that no waiting process will be locked out by other processes. Spin-locks provide no such guarantee of the order in which waiting processes will be granted access to the critical section. However, spin-locks are not inherently unfair; on most multiprocessor systems, the order in which contending processes are granted access to a critical section will usually appear to be random. This random order, in turn, is sufficient to prevent a waiting process from being intentionally locked out, and it makes the probability of an accidental lockout infinitesimal.

It is worth noting that instructions such as test-and-set or exchange solve the mutual exclusion problem by shifting the burden to the hardware. The hardware designer must, in turn, solve the same problem! The reason is that the test-and-set operation is implemented as a sequence of two primitive memory operations, first reading the value from memory, and then writing a new value back. If multiple processors are sharing one memory, a processor executing this sequence must first claim exclusive use of the memory!

The original motivation for the inclusion of instructions such as exchange in the instruction sets of early computers was that, with core memory, they were particularly easy to implement. The reason for this is that, with the commonly used core memory technologies, read operations had the side effect of clearing the location from which the value was read. Thus, hardware designers were forced to build memory units which automatically followed each read operation by a write operation to restore what had been read, and these frequently allowed the write operation to optionally use a value provided by the central processor instead of the value just read from memory. The advent of semiconductor memory has removed the original justification for instructions such as exchange, but their utility in solving the mutual exclusion problem ensures that future machines will continue to support them.

Mutual exclusion can be assured even when there is no underlying mechanism such as the test-and-set instruction. This was first realized by T. J. Dekker and published (by Dijkstra) in 1965. Dekker's algorithm uses busy waiting and works for only two processes. The basic idea is that processes record their interest in entering a critical section (in Boolean variables called "need") and they take turns (using a variable called "turn") when both need entry at the same time. Dekker's solution is shown in Figure 33.5.

type processid = 0..1;

var need: array [ processid ] of boolean { initially false };
    turn: processid { initially either 0 or 1 };

procedure dekkerwait( me: processid );
var other: processid;
begin
     other := 1 - me;
     need[ me ] := true;
     while need[ other ] do begin { there is contention }
          if turn = other then begin
               need[ me ] := false { let other take a turn };
               while turn = other do { nothing };
               need[ me ] := true { re-assert my interest };
          end;
     end;
end { dekkerwait };

procedure dekkersignal( me: processid );
begin
     need[ me ] := false;
     turn := 1 - me { now, it is the other's turn };
end { dekkersignal };

Figure 33.5. Dekker's solution to the mutual exclusion problem.

Dekkers solution to the mutual exclusion problem requires that each of the contending processes have a unique process identifier which is called "me" which is passed to the wait and signal operations. Although none of the previously mentioned solutions require this, most systems provide some form of process identifier which can be used for this purpose.

It should be noted that Dekker's solution does rely on one very simple assumption about the underlying hardware; it assumes that if two processes attempt to write two different values in the same memory location at the same time, one or the other value will be stored, and not some mixture of the two. This is called the atomic update assumption. The atomically updatable unit of memory varies considerably from one system to another; on some machines, any update of a word in memory is atomic, but an attempt to update a byte is not atomic, while on others, updating a byte is atomic while words are updated by a sequence of byte updates.

The Producer-Consumer Problem

Whenever data flows between concurrent processes, some variant of the producer-consumer problem must be solved. Although this problem is usually discussed in the context of the flow of data between concurrent processes on a single computer system, perhaps with multiple processors, the problem generalizes to the flow of data between programs and peripheral devices, or to the flow of data between programs on different computers in a network.

It is frequently possible to decompose large programs into multiple processes in such a way that all interprocess communication is formulated in terms of producer-consumer relationships. For example, a compiler might be broken into a lexical analysis process which feeds a stream of lexemes to a syntax analysis process; the syntax analyzer might, in turn, feed a stream of syntactic information to a code generator, which would feed a stream of assembly code to a one-pass assembler. Linear arrangements of processes such as this are sometimes described as pipelines, with each process acting as a filter in the pipe.

Some multiprocessor systems, such as the Intel IPSC computer, are based on networks of microprocessors which are interconnected by high speed serial data links. Unlike multiprocessors with shared memory, such systems limit all interprocess communication to producer-consumer relationships; as a result, programs decomposed into multiple processes which communicate in this way should be able to be run on a wider class of machines than those which use a shared memory model of interprocess communication.

Although it is fairly obvious that first-in first-out queues can be used to solve the producer-consumer problem, this is actually an oversimplification! A first-in first-out queue shared between two processes is a shared data structure, and care must be taken to avoid mutual exclusion problems. Because of this, it is worth examining some very crude solutions to the producer-consumer problem before examining how it is solved with full-fledged queues. For example, consider the very simple solution shown in Figure 33.6.

type item = ... { type of data items passed from producer to consumer };

var buffer: item;
    status: (empty, full) { initially, empty };

procedure produce( x: item );
{ called by the producer when an item is ready to pass to the consumer }
begin
     while status \(!= empty do { nothing };
     buffer := x;
     status := full;
end { produce };

procedure consume( var x: item );
{ called by the consumer when an item from the producer is needed }
begin
     while status \(!= full do { nothing };
     x := buffer;
     status := empty;
end { consume };

Figure 33.6. A very simple solution to the producer-consumer problem.

There are many similarities between this solution and the use of simple polling loops in processing input/output, as was discussed in Chapter 9.

The solution shown in Figure 33.6 is correct as long as there is only a single producer and a single consumer, an acceptable limitation for many applications. The problem is, it provides very little buffering between the producer and consumer because calls to produce and consume must alternate in strict lock step, even if some items take much longer to produce than to consume or visa versa. In order to overcome this, we must use more buffers. A simple solution to this would involve an array of buffers, each with its own status indicator; the producer would cycle through this array, filling one buffer after another, and the consumer would cycle through, emptying buffers. With two buffers, this is called double buffered communication.

When generalized to a relatively large array of buffers, the solution outlined above is very similar to the bounded buffers discussed in Section 10.4; the difference is that an attempt to enqueue a newly produced item will simply cause the producer to wait until there is space instead of causing an error. As a result, there is little use for the special "full" and "empty" functions used in Chapter 4. Of course, it is not necessary to have a separate flag for each buffer in the queue; as suggested in Figure 10.10, the state of the queue can be inferred from the values of the head and tail pointers. This leads to the bounded buffer solution to the producer-consumer problem shown in Figure 33.7.

const size = ... { number of items to allow in the buffer };

var buffer: array [1..size] of item;
    head, tail: 1..size { initially 1 };

procedure produce( x: item );
{ called by the producer when an item is ready to pass to the consumer }
begin
     while ((tail mod size) + 1) = head do { nothing };
     buffer[ tail ] := x;
     tail := (tail mod size) + 1;
end { produce };

procedure consume( var x: item );
{ called by the consumer when an item from the producer is needed }
begin
     while head = tail do { nothing };
     x := buffer[ head ];
     head := (head mod size) + 1;
end { consume };

Figure 33.7. The bounded buffer solution to the producer-consumer problem.

On a multiprocessor system, it may be appropriate to use the code shown in Figures 17.6 and 17.7 as shown; on the other hand, on a uniprocessor, espeically one with a non-preemptive scheduler, these should relinquish the processor. Although it is simple enough to put a call to a relinquish service into each polling loop, some processor time will still be consumed, at each time slice given to a waiting process, checking the condition and relinquishing again. This can all be avoided by using semaphores, as long as the semaphores put the waiting process on a waiting queue. The semaphore solution to this problem was hinted at in Figure 15.13, and is shown here in Figure 33.8.

const size = ... { number of items to allow in the buffer };

var buffer: array [1..size] of item;
    head, tail: 1..size { initially 1 };
    free: semaphore { the count should be initialized to size };
    data: semaphore { the count should be initially zero };

procedure produce( x: item );
{ called by the producer when an item is ready to pass to the consumer }
begin
     wait( free ) { wait for free space to be available };
     buffer[ tail ] := x;
     tail := (tail mod size) + 1;
     signal( data ) { signal that there is data };
end { produce };

procedure consume( var x: item );
{ called by the consumer when an item from the producer is needed }
begin
     wait( data ) { wait for data to be available in the queue };
     x := buffer[ head ];
     head := (head mod size) + 1;
     signal( free ) { signal that there is free space };
end { consume };

Figure 33.8. The bounded buffer solution with semaphores.

As with all of the previously mentioned solutions to the producer-consumer problem, this solution works for only a single producer process and a single consumer.

There are many cases where multiple producers must communicate with multiple consumers in solving a problem. For example, consider a time-sharing system with multiple line printers. Each user terminal has an associated process, and it is appropriate to associate one process with each printer. When a user program submits a request to print some file, that request is put in the print request queue; when a printer process finishes dealing with some request, it reads the next request from the print request queue. Each request might consist of a record containing the name of a file to be printed, information about who to bill for the cost of paper, and the "banner line" to print on the break page.

The problem with multiple producers and multiple consumers is that the simple code given in Figures 17.6 through 17.8 contains critical sections where mutual exclusion must be assured. It should be obvious, for example, that the incrementing of head and tail pointers in Figures 17.7 and 17.8 must be done within critical sections. Furthermore, it is important that, when two processes try to produce or consume data at the same time, each uses a different value of the head or tail pointer; if this was not the case, both producers could fill the same slot in the queue, or both consumers could obtain copies of the same item. Considerations such as these lead to the code shown in Figure 33.9:

const size = ... { number of items to allow in the buffer };

var buffer: array [1..size] of item;
    head, tail: 1..size { initially 1 };
    free: semaphore { initially size };
    data: semaphore { initially zero };
    mutexhead, mutextail: semaphore { initially one };

procedure produce( x: item );
{ called by the producer when an item is ready to pass to the consumer }
begin
     wait( free ) { wait for free space to be available };
     wait( mutextail ) { begin critical section };
     buffer[ tail ] := x;
     tail := (tail mod size) + 1;
     signal( mutextail ) { end critical section };
     signal( data ) { signal that there is data };
end { produce };

procedure consume( var x: item );
{ called by the consumer when an item from the producer is needed }
begin
     wait( data ) { wait for data to be available in the queue };
     wait( mutexhead ) { begin critical section };
     x := buffer[ head ];
     head := (head mod size) + 1;
     signal( mutexhead ) { end critical section };
     signal( free ) { signal that there is free space };
end { consume };

Figure 33.9. Bounded buffers for multiple producers and consumers.

Note, in this solution, that there are two separate mutual exclusion semaphores, one for the head of the queue, and one for the tail; thus, producers do not compete with consumers, but only with each other.

On multiprocessor systems, the code shown in Figure 33.9 should probably be coded using spin-locks to solve the mutual exclusion problem. The reason is that these critical sections are very short and it is unlikely that many processes will be contending for them. On the other hand, the semaphores used to track the free space and data in the queue should probably not use busy waiting unless the likelihood of the queue being either full or empty for very long is very low. The reason for this is that a full or empty queue could cause all of the processes at one end or the other of the queue to wait for some time, locking up a significant fraction of the processors which might have other uses.

Structured Concurrent Programming

Semaphores provide a much more organized approach to controlling the interaction of multiple processes than would be available if each user had to solve all interprocess communications using simple variables, but more organization is possible. In a sense, semaphores are something like the goto statement in early programming languages; they can be used to solve a variety of problems, but they impose little structure on the solution and the results can be hard to understand without the aid of numerous comments. Just as there have been numerous control structures devised in sequential programs to reduce or even eliminate the need for goto statements, numerous specialized concurrent control structures have been developed which reduce or eliminate the need for semaphores.

One of the oldest proposals for a clean notation for parallel programming is based on the idea that concurrent programs should be described in terms of flowcharts where single flow lines may fork into more than one flow line, and where multiple lines may be joined into one. These basic fork and join operations are illustrated in Figure 33.10.

              |                      -----           -----
            -----                ---|  D  |---   ---|  E  |---
           |  A  |                   -----    | |    -----
            -----                            _V_V_
              | Fork                    Join   |  
            __V__                            -----
    -----    | |    -----                   |  F  |
---|  B  |---   ---|  C  |---                -----
    -----           -----                      |

Figure 33.10. The fork and join operations.

In the fork operation illustrated here, operations B and C may begin concurrently as soon as A finishes, while in the join operation, operation F may only begin when both D and E have finished. The notation shown in Figure 33.10 was developed in an era when graphical flowcharting was the dominant approach to high level program documentation, but it is hard to capture in textual form. Furthermore, the use of flowchart notation does nothing to prevent disorderly control structures, it simply makes them somewhat more obvious than they would be if written in terms of assembly language branches and labels.

Much of the work done on concurrent programming since the mid 1960's has centered on the search for the ideal solution to the problem of interprocess communication. This has resulted in proposals for a diverse variety of control structures, mostly proposed as additions to languages descended from Algol '60 such as Algol '68, Pascal, and later Ada. Among the more important proposals were the concurrent block, the monitor, and the Ada rendezvous. These will be discussed below, but the reader should be warned that, although these structures are valuable contributions to concurrent programming, none of them are implemented in more than a few languages, and of these, only Ada has much chance of becoming widely used.

The reader should also beware that the search for the ideal solution to the problem of interprocess communication is as futile as the search for the ideal sequential control structure. It has been known for a long time that any sequential program can be reformulated totally in terms of while loops, although this sometimes requires the addition of auxillary variables. This knowledge in no way changes our desire for a variety of expressive control structures in the languages we use, including if-then, if-then-else and case statements, for loops, until loops, and others. Similarly, although semaphores can be used to implement and can usually be implemented by any of the more recently proposed concurrent control structures, different problems are more naturally using different control structures.

Concurrent Blocks

Concurrent blocks provide a clean notation for one of the simplest common uses of concurrent programming where a number of independent operations can be done in parallel. A concurrent block consists of a fork where one process splits into many; these processes perform their computations and then join back into a single process. Concurrent block notation provides no help in dealing with interprocess interactions, in fact, some languages which provide this notation define it as being illegal for the processes to interact.

Textual representations of the concurrent blocks in a number of languages are illustrated in Figure 33.11, along with examples of sequential blocks in the same languages.


     LANGUAGE             CONCURRENT BLOCK	 SEQUENTIAL BLOCK

Algol '60 pseudocode    cobegin A; B; C coend    begin A; B; C end

     Algol '68            begin A, B, C end      begin A; B; C end

   the Unix shell           ( A & B & C )          ( A ; B ; C )

    C with Unix         if (fork()==0) {           { A ; B ; C }
                            A; exit();
                        } else if (fork()==0) {
                            B; exit();
                        } else {
                            C; wait(); wait();
                        }

Figure 33.11. Concurrent blocks in various languages.

The keywords cobegin and coend have been widely used in pseudocode, so every programmer should recognize them; even so, they do not appear in any major programming language. The Algol '68 notation is interesting because it applies to parameter lists as well as blocks. The Unix shell notation wasn't originally intended as a programming language, but it also fits well in this context.

The last example illustrates how these ideas can be expressed, clumsily, in a common programming environment, C (or C++) under Unix. Here, the fork() system call creates processes, the exit() call is used to self-terminate a process, and the wait() call allows a process to await termination of a process it created. The combination of exit() and wait() is effectively a join.

Concurrent block notation does not help with interprocess communication; its merely allows programmers to avoid expressing unneeded sequencing relationships within a program so parts could be done in parallel. In fact, most Algol '68 compilers did not start concurrent processes when they encountered a concurrent block; instead, sucy blocks were treated as giving permission reorder code arbitrarily for faster execution by a single sequential process.

All of these notations for concurrent blocks can be implemented identically, since all describe the same flowchart. Most implementations follow the Unix model, having the parent execute one statement while creating a new child process to run each of the others. If each child signals a semaphore as it terminates, the parent process can use this to wait for the children to finish.

C. A. R. Hoare's quicksort algorithm provides a good example of the use of concurrent blocks to utilize multiple processors. On a sequential machine, quicksort divides the array to be sorted into two subarrays, where all elements of one subarray are less than all elements of the other, and then calls itself recursively to sort the subarrays. Since the two subarrays are disjoint, there is no reason not to sort them in parallel! This leads to the parallel quicksort code in Figure 33.12.

procedure quicksort( low, high: address {bounds on array to sort} );
var lowup, highdn: address {used to split array};
    median: value {used to decide what goes in each part};

begin
     median := M[low];
     lowup := low;
     highdn := high;

     while lowup \(<= highdn do begin
          cobegin
               while M[lowup] < median do lowup := lowup + 1;
               while M[highdn] > median do highdn := highdn - 1;
          coend;

          if lowup < highdn then begin
               exchange( M[lowup], M[highdn] );
               lowup := lowup + 1;
               highdn := highdn - 1;
          end;
     end;

     cobegin
          if low < highdn then quicksort( low, highdn );
          if lowup < high then quicksort( lowup, high );
     coend;
end {quicksort};

Figure 33.12. A concurrent version of quicksort.

The overhead of starting a new process and waiting for it to finish is not insignificant. Since these costs are implicit in the "cobegin coend" block, such blocks should not be used except when the processes involved will take some time. The first concurrent block in the above program, for example, should probably be done sequentially. Although it contains two loops, these will usually terminate very quickly. The second concurrent block in the above program poses more difficult problems. When the sub-arrays are large, there will be significant advantages to sorting them concurrently, but when they are small, it will probably be faster to sort them sequentially.

The Rendezvous Construct

There are times when the different branches of a parallel program must communicate. When branches can be easily identified as either producers or consumers, queues could be used, but this is not always possible. The inventors of the Ada programming language have provided another model for process interactions, the rendezvous.

The French word rendezvous (pronounced ron-day-voo) literally means "give (to) you," but meeting is the most appropriate one-word translation into English, and the idomatic get together may come even closer.

A rendezvous can be described by the following Flowchart shown in Figure 33.13.

Process X                   Process Y   -- code of process X
                                              A;
    |    -----         -----    |             accept R do
      --|  A  |--   --|  D  |--                  B;
         -----   | |   -----                  end;
                _V_V_                         C;
                  |
                -----
               |  B  |
                -----
                  |  
                __V__                   -- code of process Y
         -----   | |   -----                  D;
      --|  C  |--   --|  E  |--               R; -- looks like a call
    |    -----         -----    |             E;

Figure 33.13. An Ada rendezvous.

Essentially, a rendezvous is a temporary joining of two streams of control after which they fork to continue independently. The intent is that the operations done during the rendezvous will be used to move information between the processes which have joined. Although the flowchart notation is symmetrical, such symmetry cannot be obtained by a textual notation, so the body of the rendezvous must be coded in the body of one of the tasks, as the body of an "accept" statement, while the other task has what looks syntactically like a procedure call to the "accept" statement. In Ada, the code of the body of an "accept" statement may directly access the local variables of the process in which it is embedded, and the accept statement may have parameters passed to it from the other process.

A rendezvous may be implemented with the aid of two binary semaphores, one to make sure that the first process waits until the second calls the rendezvous, and one to suspend the second process until the rendezvous has completed. In fact, the Ada rendezvous is more complex than this, since it has options to allow timed entry calls and accept statements, where if the rendezvous is not executed after a certain interval alternate code is executed, and conditional entry calls and accept statements, where, if the rendezvous cannot be immediately executed, alternate code is executed. The implementation of these constructs is beyond the scope of this work.

Monitors

An early proposal for organizing the operations required to establish mutual exclusion is the explicit critical section statement. In such a statement, usually proposed in the form "critical x do y", where "x" was the name of a semaphore and "y" was a statement, the actual wait and signal operations used to ensure mutual exclusion were implicit and automatically balanced. This allowed the compiler to trivially check for the most obvious errors in concurrent programming, those where a wait or signal operation was accidentally forgotten. The problem with this statement is that it is not adequate for many critical sections, for example, the one in Figure 33.2.

A common observation about critical sections is that many of the procedures for manipulating shared abstract data types such as files have critical sections making up their entire bodies. Such abstract data types have come to be known as monitors, a term coined by C. A. R. Hoare. Hoare proposed a programming notation where the critical sections and semaphores implicit in the use of a monitor were all implicit. All that this notation requires is that the programmer enclose the declarations of the procedures and the representation of the data type in a monitor block; the compiler supplies the semaphores and the wait and signal operations that this implies. Using Hoare's suggested notation, shared counters might be implemented as shown in Figure 33.14.

counter: monitor begin
              var value: integer;

              procedure increment;
              begin
		  value := value + 1;
              end { increment };
	 end { counter };

var i, j: counter;

Figure 33.14. Shared counters implemented using monitors.

Calls to procedures within the body of a monitor are done using record notation; thus, to increment one of the counters declared in Figure 33.14, one would call "i.increment". This call would implicitly do a wait operation on the semaphore implicitly associated with "i", then execute the body of the "increment" procedure before doing a signal operation on the semaphore. Note that the call to "i.increment" implicitly passes a specific instance of the monitor as a parameter to the "increment" procedure, and that fields of this instance become global variables to the body of the procedure, as if there was an implicit "with" statement.

There are a number of problems with monitors which have been ignored in the above example. For example, consider the problem of assigning a meaning to a call from within one monitor procedure to a procedure within another monitor. This can easily lead to deadlock, for example, when procedures within two different monitors each call procedures in the other. It has sometimes been proposed that such calls should never be allowed, but they are sometimes useful!

The most important problem with monitors is that of waiting for resources when they are not available. For example, consider implementing a queue monitor with internal procedures for the enqueue and dequeue operations. When the queue empties, a call to dequeue must wait, but this wait must not block further entries to the monitor through the enqueue procedure. In effect, there must be a way for a process to temporarily step outside of the monitor, releasing mutual exclusion while it waits for some other process to enter the monitor and do some needed action.

Hoare's suggested solution to this problem involves the introduction of condition variables which may be local to a monitor, along with the operations wait and signal. Essentially, if s is the monitor semaphore, and c is a semaphore representing a condition variable, "wait c" is equivalent to "signal(s);wait(c);wait(s)" and "signal c" is equivalent to "signal(c)". The details of Hoare's wait and signal operations were somewhat more complex than is shown here because the waiting process was given priority over other processes trying to enter the monitor, and condition variables had no memory; repeated signaling of a condition had no effect and signaling a condition on which no process was waiting had no effect.

Terminology

The reader should be familiar with the following general terms after reading this section:

binary semaphores           general semaphores
mutual exclusion            producer consumer
take exclusive use          pipelines
give exclusive use          concurrent blocks
wait                        cobegin
signal                      coend
P                           rendezvous
V                           monitors
test and set instructions   critical sections

Exercises

Why is it inappropriate to disable interrupts in order to ensure mutual exclusion in a long critical section which may involve input/output? Hint: Consider the effects this would have on a multi-user system.
To what extent does the semaphore used to implement mutual exclusion on a shared variable enforce that mutual exclusion? If a process wants to inspect or change the shared variable but does not use the semaphore, what would happen to that process?
Consider a test-and-decrement instruction which first compares its operand with zero, and then decrements its operand if it was greater than zero; the result of the comparison is returned in the condition codes. Packaged in a high level language, this might be a Boolean function testanddecrement(x) which returns true if x was decremented. Propose an implementation of general semaphores using this function; your implementation should use busy waiting in the style of the spin-lock implementation shown in Figure 33.4.
Given that a process can relinquish the remainder of its time-slice by calling a procedure called relinquish, modify the spin-lock code from Figure 33.4 to be appropriate for use on a uniprocessor with preemptive scheduling. Be careful to avoid unnecessary calls to "relinquish" which would slow down processes which successfully claim a lock on their first try.
Modify the semaphore implementation given in Figure 33.2 so that it uses spin-locks, as defined in Figure 33.4, to solve the mutual exclusion problem. Your solution should be such that, if two processes are waiting on two different semaphores, they never contend for the same spin-lock. Additionally, your solution should be coded for speed, with comments added, where necessary, to clarify anything this requirement would obscure.
Generalize on Dekkers solution from Figure 33.5 to support more than two processes; consider doing this by building a binary tree of processes using Dekkers solution at each level of the tree.
Write code for the produce and consume procedures compatible with that given in Figure 33.6, but using an array of buffers and status indicators.
What would happen if the code for produce in Figure 33.7 were modified so that the while loop read as follows:
```
while tail = head do { nothing };
```
Show an implementation of the code cobegin A; B coend using semaphores; assume that process creation is done by a service called create(p) which takes a procedure as a parameter and runs that procedure as a separate process; assume that that process dies (or kills itself) when it tries to return.
Starting a new process is relatively expensive on many computers; modify the code for quicksort shown in Figure 33.12 so that it only starts new processes if there is a substantial amount of work to be done, for example, when there are at least 500 items to be sorted.
What is the largest number of processors which the program shown in Figure 33.12 could utilize during the process of sorting an array of 65536 items?
How would you modify the code from Figure 33.12 so that it avoided creating more processes than there are available processors? Clearly, creating more processes than there are processors will not lead to faster execution, and it will add to the overhead of executing the program.
Write code using only semaphore operations to implement the example rendezvous from Figure 33.13.
Write code for the monitor shown in Figure 33.14 using explicit semaphores; to do this, depending on the language you use, you may replace calls to i.increment with calls to increment(i) where i is passed by reference.

Readings

Excellent coverage of the material in this chapter is provided by the books

Structured Concurrent Programming with Operating Systems Applications by Holt, Graham, Lazowska, and Scott (Addison Wesley, 1978).
Principles of Concurrent Programming by M. Ben-Ari (Prentice Hall, 1982).

The latter presents, in some detail, Dijkstra's development of Dekker's solution to the mutual exclusion problem, as well as detailed discussions of monitors and the Ada rendezvous.

Dijkstra's original presentation of Dekker's solution to the mutual exclusion problem, as well as a tutorial introduction to semaphores and to the Banker's algorithm appears in his paper "Co-operating Sequential Processes" which was published in

Programming Languages edited by F. Genuys (Academic Press, 1968).

The original paper on monitors is still worth reading, in part because it predates and anticipates later work on object-oriented programming.

"Monitors: An Operating System Structuring Concept" by C. A. R. Hoare, in Communications of the ACM, 17, 10 (October 1974), pages 549 to 557.

This paper includes details of how to implement monitors with semaphores, and it includes a number of well presented examples of various problems solved with monitors.