25. Other Models of Communication

Part of the 22C:116 Lecture Notes for Fall 2002
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

The Ada Rendezvous

The rendezvous construct in the Ada programming language provides an interesting example of how remote procedure call semantics can be presented to high level language programmers. Ada allows a program to be composed of many processes, called (rather conventionally) tasks.

Note: rendezvous is a French word meaning "meeting". The pronounciation is approximately "RON-day-voo", using English pronounciation with capitalization to indicate emphasis.
Communication between Ada tasks takes place when code in one task calls an entry of another. A call to an entry of a task is syntactically identical to a procedure call or a call to a method of an object, where the task itself is the object. Here is an example:
Calling Task                  Called Task

                                task T;
                                  entry E(formals);
                                  :       -- declaration
                                begin
  :                               :
  T.E(actuals) -- calls the       accept E(formals);
                  rendezvous        : -- body of rendezvous
                                  end accept;
                                  :
                                end;

The entry declaration of a task simply declares that the entry is part of the public interface of the task. The task may contain multiple entries, and the task body must contain one or more accept statements for each entry. The body of the accept statement (between the keywords accept and end accept) is only executed when the control flow of the task reaches the accept statement and when some other task has called that entry.

A rendezvous can be implemented using message passing through FIFO queues, as outlined below:

Calling Task                  Called Task
  -- has private queue
  -- named return               task T;
                                  entry E(formals);
                                    -- implemented as
                                    -- a message queue
                                  :     
                                begin
  :                               :
  T.E(actuals)                    accept E(formals);
    -- implemented as               -- implemented as
    -- enqueue <actuals, return>    -- await <formals, ret> in E
    -- in T.E and then              : -- body of rendezvous
    -- await <actuals>            end accept;
    -- in return                    -- implemented as
                                    -- enqueue <formals> to ret
                                  :
                                end;

The caller indicates an interest in initiating a particular rendezvous by placing a message in the queue for that entry of the called task. The message contains the actual parameters to the rendezvous, and when the rendezvous has completed, the called task places a message in the caller's queue containing the results. The accept statement can be implemented with a blocking wait for a message in the queue for that entry, and the end of the body can be implemented with code to send the reply message.

Another way to express the semantics of a rendezvous is with a Petri net. The following petri net notation correctly expresses the semantics of a simple rendezvous between two processes:

Calling Task            Called Task

   |                        |
   |______________________  |
                          | |
                         _V_V_ join
                           |
                           V
                          Body
                           |
                         __V__ fork
                          | |
    ______________________| |
   |                        |
   V                        V

This petri-net model does not deal with the possibility of multiple callers, but the model can be extended to allow for this as follows:

One of several
Calling Tasks           Called Task

   |                        |
  _V___ send                |
   | |                      |
   | |_______message______  |
   |         queue        | |
   |                     _V_V_ receive
   |                        |
   V                        V
 waiting                   Body
   |                        |
   |                     ___V_ send
   |                      | |
   |  _______return_______| |
   | |       message        |
  _V_V_ receive             |
   |                        |
   V                        V

This model, if implemented by message passing using primitives comparable to those of Demos, is clearly suitable for a distributed system. Each task would have one incoming message queue per entry, and, on calling an entry, the caller would create a unique "use once" return link to be used for the reply from that rendezvous. Fault tolerant implementations can be derived using fault tolerant RPC protocols.

On a uniprocessor or a shared memory multiprocessor with no memory protection, this message passing model of rendezvous implementation can be considerably simplified. the entries for each task would still be implemented as interprocess message queues, and the accept statements for each entry would still be implemented as receive operations on the associated queue, but the outgoing message can be simplified to contain the bare minimum of information, just a pointer to the calling process description. The called process can extract the parameters from the caller's process description, for example, by following the caller's stack pointer to the parameters on the caller's stack top. Each process would have a semaphore used only for waiting for returns from rendezvous; after sending a call message, the calling process would block on this semaphore, and on completing the body of a rendezvous, the called process would signal the caller's semaphore.

The actual implementation of the Ada rendezvous is complicated by two factors: First, Ada supports time limits; a timed call to a rendezvous will abort if the rendezvous is not entered within the time limit, and a timed accept statement will abort if no client calls that entry within the time limit. In order to prevent the execution of the body of a rendezvous when the caller aborts, it is essential to exchange additional messages in the implementation of a timed rendezvous.

Second, Ada supports accept statements that service more than on entry to a task. If an accept statement will accept calls to either entry A or entry B, it must wait on both queues until either one or the other (or both) contain data, accepting exactly one message addressed to either A or B each time it falls through. This form of the accept statement resembles a case statement, with one body for each entry that may be accepted.

Curiously, the Demos model provides exactly the primitives needed to implement an accept statement that handles multiple entries. This is provided by the version of the Demos receive primitive that takes a list of incoming message queues as a parameter and returns as soon as a message is available in any one of them.

Barriers

When a group of processes must cooperate to perform a single task, it is sometimes necessary for one process in the group to send messages to all others. In a shared memory environment, this is commonly implemented with what is called barrier synchronization. Consider the case where a group of processes alternately compute and exchange results; for example, each process may be updating one entry in a matrix, where each computation stage involves inspecting the values of neighboring entries from the previous computation phase.

The synchronization relationship between these processes may be summarized by a petri net such as the following:

     _________________________
    |        _______________  |
    |       |        _____  | |
    |       |       |     | | |
    V       V       V     | | |
 Compute Compute Compute  | | |
    |       |       |     | | |
  __V_______V_______V__   | | |
    |       |       |     | | |
  Update  Update  Update  | | |
    |       |       |     | | |
  __V_______V_______V__   | | |
    |       |       |_____| | |
    |       |_______________| |
    |_________________________|
Recall that the horizontal bar in a Petri net means "wait until control arrives at the bar from all sources, and then deliver control to all destinations". For the simple single input, multiple output case, it models a process fork, a message transmission, or a V operation. For the simple multiple input, single output case, it models a process join, message receive or a P operation. Here, with the same number of inputs as outputs, it represents all processes waiting for all the others before continuing.

This synchronization operation between a group of processes is called barrier synchronization. Efficient implementations of barrier synchronization are essential to many distributed algorithms.

The simple minded implementation of barrier synchronization is to have each process send a message to each of the others and then await the messages each of the others sends back. In the simplest case where only synchronization is desired, these messages reduce to semaphore operations, where the signal or V operation can be thought of as sending an empty message, and the wait or P operation can be thought of as waiting for an empty message. A three way barrier, as needed in the above example, may be expressed using semaphores, as follows:

     Process a  Process b  Process c
         Sa         Sb         Sc      Semaphores

        V(Sb)      V(Sa)      V(Sa) \
        V(Sc)      V(Sc)      V(Sb)  \  Barrier
        P(Sa)      P(Sb)      P(Sc)  / Code
        P(Sa)      P(Sb)      P(Sc) /

In order to synchronize N processes, this simple-minded approach requires the exchange of N(N-1) messages! This is O(N2). If these messages can be exchanged in parallel, the critical path is short (one message transmission delay), but exchanging these messages in parallel is rarely possible. Implementing a barrier this way on a network with bottlenecks can be very inefficient. For example, on an ethernet or token ring network, all of these messages must be sent in sequence because there is only one shared message channel between all of the machines!

The class of network protocols designed to solve this problem are usually group communication protocols, particularly when the messages between processes cary information as well as synchronization.

Barrier Synchronization on a Ring

One way to cut down the number of messages exchanged during group communication is to arrange the processes into a ring and circulate a message around this ring to synchronize the group and exchange information.

On the first round, every process adds its message to the circulating message, saying, in effect, "I'm ready and here's my contribution to our shared venture". On the second round, each process receives permission to continue, along with the complete corpus of information submitted by all of the processes.

This two-round protocol on a ring network takes 2n-1 interprocess messages, O(n), so it scales very well on ethernet or token ring networks. On the other hand, the critical path through this protocol is of length n -- no process may continue until it receives the reply to the message it sent, and this takes one complete circuit of the ring.

Barrier Synchronization on a Point-to-Point Net

On an arbitrarily structured store-and-forward point-to-point network, an effective way to synchronize a group of processes is to use a spanning tree of the graph representing the topology of that network. Given a precomputed spanning tree and a designated root process, a barrier can be implemented by having the leaves send messages towards the root when they reach the barrier.

When an internal node in the tree reaches the barrier, it awaits messages from all its subtrees, then sends a summary to the root. When the root reaches the barrier, it awaits messages from all its subtrees and then forwards a complete summary back toward the leaves of each subtree before continuing.

When a subtree receives a summary message from the root, it sends this to each subtree under it before continuing. Leaves merely await their summary messages and then continue.

This protocol takes 2(N-1) messages or O(N). The proof of this is based on the fact that any tree of N vertices has N-1 edges, and that each edge is traversed by exactly 2 messages, one towards the root (we're ready), and one from the root (OK, you can go).

Dynamic computation of the spanning tree is essential for a fault tolerant version of this protocol.

The critical path through this protocol clearly depends on the structure of the graph! If the graph is "stringy", for example, a straight line, the critical path is related to the length of this line. In a random network though, the critical path will be of length O(log N).

Assists for Group Communication on Rings or Ethernet.

Some low level communications protocols include provisions to simplify group communications. For example, both Ethernet and ring networks are inherently broadcast or multicast networks -- messages from one station to another must be inspected by all other stations or at least by all intermediate stations between sender and receiver. Normally, we limit this inspection to an examination of the message header so each station can determine if the message is addressed to it, but we can easily take advantage of this to allow broadcast.

Typically, what we do is reserve a special destination address to mean "broadcast" and then use this to implement group communication protocols. Note that the hardware broadcast mechanism is at the link level in the protocol hierarchy; typically, what we need is barriers or group communication protocol support at the transport or connection level, but we can use these link-level protocols to help implement efficient higher-level protocols.

This scheme reduces the cost of the simple minded barrier implementation from O(N2) point-to-point messages to O(N) broadcasts.