4. The User's View of a Process

Part of the 22C:116 Lecture Notes for Fall 2002
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Petri Nets, a Theoretical Tool

Conway, in his classic 1961 paper, was thinking about one of the first multiprocessor systems, and came up with a surprisingly general flowchart notation for expressing parallelism in application programs. Remarkably, his notation is formally equivalent to a subclass of Petri Nets, and that notation is illustrated here:

       ___            ___
      |   |     |    |   |
      | A |---->|----| B |   A occurs before B
      |___|     |    |___|

       ___            ___
      |   |     |    |   |
      | A |---->|----| B |
      |___|     |    |___|
        |
      __V__     A occurs before either B or C, but
        |       not both; A could be a conditional
       _|_      test, with control flowing to eithe
      |   |     B or C depending on the condition.
      | C |           ___
      |___|          |   |
                     | A |
                     |___|
                       |
                     __V__
                       |    C will occur after A
       ___            _V_   and C will occur after B
      |   |     |    |   |  entirely independently.
      | B |---->|----| C |
      |___|     |    |___|

In Petri-net notation, we refer to the net as a graph, where the vertices are places or transitions. Places correspond to the boxes in a flowchart, although in Petri-net notation, they are more frequently drawn as circles. Transitions, drawn as bars, correspond to control transfers from place to place. In the "Petri-net game," we simulate the transfer of control in the program that the network models by putting a token in the starting place (or more generally, starting places), and then advance the token from place to place through the transitions of the net.

The notations listed above are sufficient, with minor generalizations, to replicate the control structure of any sequential program. What Conway's paper describes is an extension of this to parallel programs:

            ___
           |   |
           | A |   The fork operation:
           |___|   A occurs before both
             |     B and C in parallel.
      _______V_______
        |         |
       _V_       _V_
      |   |     |   |
      | B |     | C |
      |___|     |___|

       ___       ___
      |   |     |   |
      | A |     | B |
      |___|     |___|
        |         |
      __V_________V__
             |
            _V_    The join operation:
           |   |   A and C must both
           | C |   occur before C happens.
           |___|   

If, during the token game, one token flows into the fork transition illustrated above, two tokens flow out. Similarly, only when two tokens are present on the inputs to the join transition does one token flow out. These generalize naturally to N-way forks and N-way joins, as well as to transitions with both multiple inputs and outputs. For the latter, only when all inputs have tokens are tokens passed to the outputs.

Expressing Parallelism in programs

Consider a program where we have 4 statements, call them A, B, C and D, where we want A to be done first, then B and C in parallel, and then D. Conway used his graphical fork-join notation for this, and we can express it similarly with Petri nets as follows:

            ___
           |   |
           | A |   The fork operation:
           |___|   A occurs before both
             |     B and C in parallel.
      _______V_______
        |         |
       _V_       _V_
      |   |     |   |
      | B |     | C |
      |___|     |___|
        |         |
      __V_________V__
             |
            _V_
           |   |
           | D |
           |___|

As flowchart notations fell out of favor, programming language designers began devising alternative syntactic notations for the introduction of parallelism into programs. One popular notation was based on the begin end blocking of the Algol programming language. A block of statements to be executed sequentially in Algol could be grouped with these keywords, and it was natural to invent new keywords for grouping a block of statements to be executed in parallel. The most popular keywords for this were cobegin and coend, and these are still common in modern psuedocode. The Petri-net given above can be re-expressed in this notation as follows:

     A;
     cobegin
       B;
       C;
     coend
     D;

Note that Conway proposed these ideas in the context of his proposal for a multiprocessor computer system, at the same time that he was developing a clear idea of the distinction between the concepts of process (a software entity) and processor (a hardware entity).

Other notations have been proposed at various times. For example, some have proposed that the statement delimiter be changed when parallelism is intended. Statements separated by semicolons have long been read as being executed sequentially, so perhaps we should use commas to indicate parallelism? This leads to the following possibility:

     A;
     B, C;
     D;

Implementing Parallelism

Both Conway's notation and the similar but much more recent Petri-net notations express the fork and join operations symmetrically, but in real systems, Conway recognized that symmetrical implementations were unlikely. Instead, using the figures from the previous section as an example, one sequence of statements must be in one process, perhaps A-B-D, and the other statement, perhaps C, must be in a second process; we will call these the parent process and the child process. Conway proposed an implementation based on the the following outline:

     Parent process:
                              Child process:
           A;
           fork(L);               L: C
           B;                        join(V);
           await-join(V);
           D;
In Conway's proposed system, the fork(L) operation took a label as a parameter and created a new process that began execution at that label. The new process was, aside from the new value for the program counter, identical to the original, and Conway did not anticipate the use of any memory protection mechanism, so all of memory was shared by all processes.

Conway's join(V) and await-join(V) operations took the address of an appropriately initialized variable as a parameter, communicating through that variable in order to synchronize correctly. Today, we would say that the variable is similar to a semaphore, and we would say that Conway's join operation signals that semaphore and then terminates the process.

In UNIX, this same assymetry is present, and some of Conway's original terminology has been preserved. The following Unix code correctly implements the control-flow semantics of our running 4-statement example:

     A;
     if (fork() == 0) {
          B;               /* child */
          exit();          /* kill child, signal parent */
     } else {
          C;               /* parent */
          wait();          /* await signal from child */
     }
     D;
In the above, statements A, C and D are executed by the parent process, while the fork() primitive creates a child process that executed B. In Unix, each process has its own address space, and the fork primitive creates a child process with all variables copied from the parent process. As a result, statement B will see the results of statement A in memory, but any changes made by statement B will not be seen by C (in parallel) or D (after B finishes). The file system is, of course, shared by all processes, so if the statements make changes to files, these will be seen by all, and later versions of UNIX introduced several different approaches to shared memory.

The fork() operation in Unix is seen, from C programs, as a function, and when it returns, the only difference seen by the parent process and the child is in the return value of this function. The child process sees a return value of zero, while the parent process gets back the process ID of the child.

Exercise: Look up the fork(), exit() and wait() primitives in the Unix Programmer's Reference Manual. They are in section 2 of the manual (system calls); the Unix man command under the Unix shell searches this manual. Try man man for help on this command.

Exercise: A; cobegin B; C; D; coend; E; Write this in C to run under Unix. There are two classes of solutions, differing in whether one parent has two children, or whether the parent has one child that in turn has a child (the grandchild of the parent). Read the manual carefully! One of these variants is harder to program than the other.

Introduction to Mutual Exclusion

Consider the problem of counting the nonzero elements of a 1,000,000 element array. Here is a parallel solution, expressed informally using two processes:

     cobegin
        -- process A
        for i := 1 to 500,000 do
          if a[i] <> 0
            then count := count + 1;
     
        -- process B
        for j := 500,001 to 1,000,000 do
          if a[j] <> 0
            then count := count + 1;
     coend
After processes A and B have completed, you'd expect that count would correctly reflect the desired quantity, but there is a significant chance that it won't, because the operation of incrementing count in each of these processes will most likely be accomplished by some sequence of operations such as the following:
     load register with count
     increment register
     store register to count
Unfortunately, even on a uniprocessor, it is possible that both processes A and B will attempt to execute the this same instruction sequence at the same time, and one of the possible instruction sequences will be something like the following:
            process A           process B
         
       load rA with count
       preempt process A
                            load rB with count
                            increment rB
                            store rB in count
                            ...
                            preempt process B
       increment rA
       store rA to count
It is not hard to see that each time something like this occurs, one or more increment operations will be miscounted. On a uniprocessor using preemptive scheduling once per timeslice, once the CPU is preempted from process A and given to B as shown above, one full time-slice of computation by process B will be lost when process A regains control!

To prevent this, it is essential that only one process at a time attempt to increment count. Therefore, we say that this sequence of instructions is a critical section, or that access by processors to count must be mutually exclusive. In general, whenever there is a shared variable, code that accesses that shared variable will potentially involve critical sections, and each of these must be guarded!

Simple Mutual Exclusion on Uniprocessors

On a uniprocessor, mutual exclusion may be provided by disabling interrupts on entry to a critical section and reenabling them on exit.

     disable interrupts   -- begin critical section
     load register with count
     increment register
     store register to count
     enable interrupts    -- end critical section
Disabling interrupts prevents preemption, and so long as the code within the critical section includes no relinquish or signal operations, we have a guarantee that no other processes will run. This approach is very common in systems such as small microcontrollers that have no operating system and it is very common inside the kernel of an operating system, but there are two problems that limit its use:

First, the instructions to enable and disable interrupts are usually privileged, which is to say, the operating system kernel and hardware cooperate to prevent users from using these instructions. If this were not done, a user could execute the disable interrupt instruction and then go into an infinite loop, shutting down the system. Therefore, this approach cannot be used by user programs on protected systems.

Second, this approach is not selective. If one process is incrementing a particular shared variable, it only needs to exclude other processes that are intent on modifying the same variable. If there are many programs that operate on shared variables, and if each of them shuts down all scheduler activity when it enters a critical section, the net result can be serious performance problems.

Exercise: The code given in the previous lecture for implementing a scheduler relied on exactly this method! Find the critical sections in that code.

Simple Mutual Exclusion on Multiprocessors

Many CPUs designed for use in multiprocessors, particularly in the 1960's and 1970's, included instructions such as exchange(r,m) or test-and-set(m). These are atomic, which means that, if one CPU is executing one of these operations, no other CPU may intervene. The exchange operation swaps the contents of a register with the contents of a location in memory, while the test-and-set operation sets a memory location to zero while testing its previous value and reporting the results in the condition codes.

Given one or another of these operations, the variable count used in our running example may be protected with a new variable called a spin lock. As a convention, the spin lock protecting a variable, for example, count will be called count-lock. Given this, we can implement our critical section as:

      lock( count-lock )    -- begin critical section
      load register with count
      increment register
      store register to count
      unlock( count-lock )  -- end critical section
The two states of a spin lock are locked and unlocked; any two values may be used to represent these states, however, we will use zero to represent locked and one to represent unlocked because of the way these values generalize to the use of semaphores. Initially, the lock should be one; lock sets it to zero, and unlock resets it to one. Lock and unlock can be implemented as follows:
      lock( v )
         register := 0;
         repeat
            exchange( register, v );
         until register = 1;

      unlock( v )
         v := 1;
Spin locks have the disadvantage that they involve busy waiting. Nonetheless, they are the preferred mutual exclusion mechanism for very short critical sections on shared-memory multiprocessors. Specifically, if the expected waiting time for entry to the critical section in the event of contention is shorter than the time it would take to do two context switches (switching the waiting process out of the CPU and back in again, under the control of the scheduler), this is the preferred method.

On some uniprocessor operating systems, including much of the older code in UNIX, the poor design of the scheduler or compatability with older versions of the system forces the use of spin-locks. Typically, in high level low-priority code, a relinquish operation of some type is inserted in the busy waiting loop in order to let other ready processes use the CPU while one process waits for the lock.

In modern computer architectures, those designed in the 1980's or later, both the test-and-set and exchange operations are uncommon. Instead, it is common to find what is known as a snooping mechanism in each CPU, along with a pair of special instructions, load-locked and store-conditional. The load-locked instruction loads from a memory location to a register and sets the snooping mechanism to monitor the bus and note any attempt to change the contents of that location. The store-conditional instruction stores from register to memory, but only if the snooping mechanism is currently monitoring the destination address and there has been no change to that address since the most recent load-locked on this CPU. The store-conditional instruction always reports it success or failure, for example, using a condition code.

Exercise: Write code for the lock and unlock operations using test-and-set.

Exercise: Write code for the lock and unlock operations using load-locked and store-conditional.

Mutual Exclusion Without (much) Help from Hardware

The problem of mutual exclusion on multiprocessors where there are no atomic exchange or test-and-set operations has been the subject of years of study, during which many ingeneous solutions to the critical section problem have been found. Dekker's solution is oldest of these, solving the problem for 2 processes with ID's 0 and 1, and making no assumption about special hardware support other than the fact that the operations of reading or writing a variable are atomic.

Dekker represented each spin-lock by three variables:

      state[0..1] -- process i sets state[i] to enter section
      turn        -- indicates who's turn it is to try next

Given this, and that all variables are initially zero, the code for the lock and unlock operations is as follows, assuming that each process passes it's process ID to the lock and unlock operations:

      other(pid)
          return( 1 - pid );

      lock(pid):
          state[pid] = 1;
          while state[other(pid)] = 1 do
              state[pid] = 0;
              while turn <> pid do nothing;
              state[pid] = 1;
          end loop;
          return;

      unlock(pid):
          state[pid] = 0;
          turn = other(pid);
          return;

When A is inside the critical section, state[A] is one and state[B] is zero. If both try to enter at the same time, so both state variables are set to one, they both enter the loop waiting for the other to reset their state to zero. One will immediately reassert their claim, while the other is blocked a waiting its turn. Furthermore, on exit from the critical section, each process always sets turn to give priority to the other process, preventing one or the other from hogging the critical section.

This code is difficult to understand! Furthermore, it is difficult to generalize to more than two processes. One interesting generalization uses a binary tree of processes, where processes enter the tree at the leaves and use a different process ID at each internal node, for example, using 0 if they came to that node from the left branch and 1 if they came to that node from the right branch. There are many more solutions to this problem, some of which we will discuss as the semester continues.

A General Mutual Exclusion Mechanism

Semaphores can be used to provide mutual exclusion! The variable count used in our running example may be protected with a semaphore; we'll call this count-sem. Given this, we can implement our example critical section as:

     P( count-sem )    -- begin critical section
     load register with count
     increment register
     store register to count
     V( count-sem )    -- end critical section
Initially, the semaphore should have a count of one; the P operation sets it to zero, and the V operation resets it to one.

The parallel between this and the spin-lock solution should be obvious! In fact, spin-locks are considered to one implementation of a special case of semaphores called binary semaphores. Binary semaphores are those that never take on values other than zero and one.

Semaphores implemented using the process scheduler are the preferred implementation of user-level and some system-level mutual exclusion on uniprocessors, and they are the preferred implementation on multiprocessors for those critical sections where the expected waiting time for entry to the section in the event of contention is greater than the time taken for two context switches.

Exercise: Suggese a rule for system programmers on uniprocessors: When should the system programmer use semaphores to guard a critical section, and when should the system programmer simply disable interrupts?