22C:116, Lecture 25, Spring 1997

Douglas W. Jones
University of Iowa Department of Computer Science

  1. The Ada Rendezvous

    The rendezvous construct in the Ada programming language provides an interesting example of how remote procedure call semantics can be presented to high level language programmers. Ada allows a program to be composed of many processes, called (rather conventionally) tasks.

    Note: rendezvous is a French word meaning "meeting". The pronounciation is approximately "RON-day-voo", using English pronounciation and capitalization to indicate emphasis.
    Communication between tasks takes place when code in one task calls an entry of another. A call to an entry of a task is syntactically identical to a procedure call. Here is an example:
    Calling Task            Called Task
    
                              task T;
                                entry E(formals);
                                :       -- declaration
                              begin
      :                         :
      T.E(actuals) -- use       accept E(formals);
                                  : -- body of rendezvous
                                end accept;
                                :
                              end;
    

    The entry declaration of a task simply declares that the entry is part of the public interface of the task. The task may contain multiple entries, and the task body may contain one or more accept statements for every entry. The body of the accept statement (between the keywords accept and end accept) is only executed when the control flow of the task reaches the accept statement and when some other caller has called that entry.

    A rendezvous can be implemented using message passing. The caller indicates an interest in initiating a particular rendezvous by sending a message to the queue for that entry of the called task containing the actual parameters to the rendezvous, and when the rendezvous has completed, the called task sends a message back to the caller containing the results. The accept statement can be implemented with a blocking wait for a message in the queue for that entry, and the end of the body can be implemented with code to send the reply message.

    Another way to express the semantics of a rendezvous is with a Petri net. The following petri net notation correctly expresses the semantics of a simple rendezvous between two processes:

    Calling Task            Called Task
    
       |                        |
       |______________________  |
                              | |
                             _V_V_ join
                               |
                               V
                              Body
                               |
                             __V__ fork
                              | |
        ______________________| |
       |                        |
       V                        V
    
    This petri-net model does not deal with the possibility of multiple callers, but the following model can be extended to allow for this:
    Calling Task            Called Task
    
       |                        |
      _V___ send                |
       | |                      |
       | |_______message______  |
       |         queue        | |
       |                     _V_V_ receive
       |                        |
       V                        V
     waiting                   Body
       |                        |
       |                     ___V_ send
       |                      | |
       |  _______return_______| |
       | |       message        |
      _V_V_ receive             |
       |                        |
       V                        V
    
    This model, if implemented by message passing using primitives comparalbe to those of DEMOS, is clearly suitable for a distributed system. Each task would have one incoming message queue per entry, and, on calling an entry, the caller would create a unique "use once" return link to be used for the reply from that rendezvous. Fault tolerant implementations can be derived using fault tolerant RPC protocols.

    On a uniprocessor or a shared memory multiprocessor with no memory protection, this message passing model of rendezvous implementation can be considerably simplified. the entries for each task would still be implemented as interprocess message queues, and the accept statements for each entry would still be implemented as receive operations on the associated queue, but the outgoing message can be simplified to contain the bare minimum of information, just a pointer to the calling process description. The called process can extract the parameters from the caller's process description, for example, by following the caller's stack pointer to the parameters on the caller's stack top. Each process would have a semaphore used only for waiting for returns from rendezvous; after sending a call message, the calling process would block on this semaphore, and on completing the body of a rendezvous, the called process would signal the caller's semaphore.

    The actual implementation of the Ada rendezvous is complicated by two factors: First, Ada supports time limits; a timed call to a rendezvous will abort if the rendezvous is not entered within the time limit, and a timed accept statement will abort if no client calls that entry within the time limit. In order to prevent the execution of the body of a rendezvous when the caller aborts, it is essential to exchange additional messages in the implementation of a timed rendezvous.

    Second, Ada supports accept statements that service more than on entry to a task. If an accept statement will accept calls to either entry A or entry B, it must wait on both queues and continue when either one or the other (or both) contain data, accepting exactly one message addressed to either A or B each time it falls through. This form of the accept statement resembles a case statement, with one body for each entry that may be accepted.

    Curiously, the DEMOS model provides exactly the primitives needed to implement an accept statement that handles multiple entries. This is provided by the version of the accept statement that takes a list of incoming message queues as a parameter and returns as soon as a message is available in any one of those queues.

  2. Barriers

    When a group of processes must cooperate to perform a single task, it is sometimes necessary for one process in the group to send messages to all others. (In a shared memory environment, this is commonly implemented with what is called barrier synchronization.) Consider the case where a group of processes alternately compute and exchange results; for example, each process may be updating one entry in a matrix, where each computation stage involves inspecting the values of neighboring entries from the previous computation phase.

    The synchronization relationship between these processes may be summarized by a petri net such as the following:

         _________________________
        |        _______________  |
        |       |        _____  | |
        |       |       |     | | |
        V       V       V     | | |
     Compute Compute Compute  | | |
        |       |       |     | | |
      __V_______V_______V__   | | |
        |       |       |     | | |
      Update  Update  Update  | | |
        |       |       |     | | |
      __V_______V_______V__   | | |
        |       |       |_____| | |
        |       |_______________| |
        |_________________________|
    
    Recall that the horizontal bar in a Petri net means "wait until control arrives at the bar from all sources, and then deliver control to all destinations". For the simple single input, multiple output case, it models a process fork, a message transmission, or a V operation. For the simple multiple input, single output case, it models a process join, message receive or a P operation. Here, with the same number of inputs as outputs, it represents all processes waiting for all the others before continuing.

    This synchronization operation between a group of processes is called barrier synchronization. Efficient implementations of barrier synchronization are essential to many distributed algorithms.

    The simple minded implementation of barrier synchronization is to have each process send a message to each of the others and then await the messages each of the others sends back. A three way barrier, as needed in the above example, may be expressed using semaphores, as follows:

    Process a  Process b  Process c
        Sa         Sb         Sc      Semaphores
    
       V(Sb)      V(Sa)      V(Sa) \
       V(Sc)      V(Sc)      V(Sb)  \  Barrier
       P(Sa)      P(Sb)      P(Sc)  / Code
       P(Sa)      P(Sb)      P(Sc) /
    

    In order to synchronize N processes, this requires the exchange of N(N-1) messages! This is O(N2). If these messages can be exchanged in parallel, the critical path is short (one message transmission delay), but exchanging these messages in parallel is rarely possible. Implementing a barrier this way on a network with bottlenecks, for example, on an ethernet or token ring network, is very inefficient because all of these messages must be sent in sequence!

    The class of network protocols designed to solve this problem are usually described as a class of group communication protocols.

  3. Barrier Synchronization on a Ring

    One way to cut down the number of messages exchanged during group communication is to arrange the processes into a ring and circulate a message around this ring to synchronize the group and exchange information.

    On the first round, every process adds its message to the circulating message, saying, in effect, "I'm ready and here's my contribution to our shared venture". On the second round, each process receives permission to continue, along with the complete corpus of information computed by all the other processes.

    This process takes 2n-1 interprocess messages, O(n), so it scales very well on ethernet or token ring networks. On the other hand, the critical path through this protocol is of length n -- no process may continue until it receives the reply to the message it sent, and this takes one complete circuit of the ring.

    By using an out-and-back protocol instead of a ring protocol, the number of messages can be reduced by one at the expense of almost doubling the length of the critical path for the process that originated the token.

  4. Barrier Synchronization on a Point-to-Point Net

    On an arbitrarily structured store-and-forward point-to-point network, an effective way to synchronize a group of processes is to use a spanning tree of the graph representing the topology of that network. Given a precomputed spanning tree and a designated root process, a barrier can be implemented by having the leaves send messages towards the root when they reach the barrier.

    When an internal node in the tree reaches the barrier, it awaits messages from all its subtrees, then sends a summary to the root. When the root reaches the barrier, it awaits messages from all its subtrees and then forwards a summary to each subtree before continuing.

    When a subtree receives a summary message from the root, it sends this to each subtree under it before continuing. Leaves merely await their summary messages and then continue.

    This protocol takes 2(N-1) messages or O(N). The proof of this is based on the fact that any tree of N vertices has N-1 edges, and that each edge is traversed by exactly 2 messages, one towards the root (we're ready), and one from the root (OK, you can go).

    Dynamic computation of the spanning tree is essential for a fault tolerant version of this protocol.

    The critical path through this protocol clearly depends on the structure of the graph! If the graph is "stringy", for example, a straight line, the critical path is related to the length of this line. In a more compact multiply connected network, the critical path will typically be of length O(log N).

  5. Assists for Rings or Ethernet.

    Some low level communications protocols include provisions to simplify group communications. For example, both Ethernet and ring networks are inherently broadcast or multicast networks -- messages from one station to another are potentially seen by all other stations or by all intermediate stations.

    Given this, it is easy to design low-level broadcast protocols based on the assignment of a special network address that is accepted by all network receivers. Given such an addressing scheme, the higher level software may interpret part of the broadcast message as an indication of which group of processes is supposed to receive copies of the message.

    This scheme reduces the cost of the simple minded barrier implementation from O(N2) point-to-point messages to O(N) broadcasts.

  6. Group Communication

    While this presentation has emphasized barrier synchronization, many system designers have emphasized multicast or group communication at a lower level, where one process sends a message to a group of others. The protocols for solving that problem are trivially contained within the protocols discussed above.