22C:116, Lecture 35, Spring 1999

Douglas W. Jones
University of Iowa Department of Computer Science

  1. Server Implementation

    Consider a distributed system, with many machines scattered around a net. Some machines have disks, some have none. What are the alternative approaches for implementing file service on this system? Here, we are not so concerned with what service is offered by each server, but rather, with the internal architecture of the servers and the impact of the particular communications primitives used on how the servers function.

  2. Servers with One Incoming Queue

    The simplest form of message addressing we will consider involves addressing messages directly to a process. This implies that all messages to a process end up in the same incoming message queue; if the process is a server, for example, a file server, this implies a structure like the following:

            repeat
               await request
               process request
               send reply
            forever
    
    This simple model is sufficient for servers that provide very simple services, specifically, this may be the right model to use if the server never blocks during the processing phase. On the other hand, if the server must send messages to a subsidiary server, for example, when a file server communicates with a disk I/O server, problems arise:
            repeat
             * await request from client
               process step 1
               send request to sub server
             * await reply from sub server
               process step 2
               send reply to client
            forever
    
    As examples, consider a directory server that stores its directories as a file, using a low-level file server to do the work, or consider a low-level file server that uses a disk server to store user files. These examples all suggest the above structure.

    This structure fails if there is only one incoming FIFO queue of messages per process! In this case, the two starred lines (await) have no way of distinguishing between incoming requests from clients and replies from subsidiary servers.

  3. Event Processing in Single Queue Servers

    One solution to the problem outlined above is to rewrite the server so that it classifies incoming messages as either requests or replies. Each message must therefore have a prefix identifying the message type. The software structure of such a server can get quite complex:

            repeat
               await message
               case message-type of
                 request:
                   allocate transaction record (TR)
                   process step 1 relative to TR
                   send request + TR to sub server
                 reply:
                   get TR from reply
                   process step 2 relative to TR
                   send reply to client recorded in TR
                   deallocate TR
               end case
            forever
    
    This scheme, although complex, offers one important advantage! It allows multiple transactions to be processed at the same time. Consider the situation where two transactions arrive in quick succession: The server will process step 1 of both before it awaits answers from the subsidiary server on behalf of either. If there are multiple independent subsidiary servers, or if the subsidiary server can optimize the order in which it processes requests, it is possible for the resulting system to have faster throughput than a naive system that finished one transaction before starting another.

    Unfortunately, this "event processing loop" involves complex code, and most programmers would prefer to be able to write something simpler. This can be done, as follows:

            repeat
              ----
             | if deferred-queue empty
             |    await message
             |    (assert it is a request from a client)
             | else
             |    dequeue request from deferred-queue
             | endif
              ----
               process step 1
               send request to sub server
              ----
             | repeat
             |    await message
             |    if it is a request from a client
             |       enqueue it in the deferred-queue
             |    endif
             | until it is a reply from the sub server
              ----
               process step 2
               send reply to client
            forever
    
    This bit of code is also ugly, but the uglyness is confined to the marked sections of code, each of which substitutes for a starred line in the naive code. The resulting code does not offer the possibility of overlapping the processing of multiple transactions, so the resulting throughput may not be as good as in the previous example. On the other hand, it may be significantly easier to debug, and it should serve well for simple low performance servers.

    In fact, the "uglyness" in the above code is a special case of a more general solution, in which the prefix on each message identifies the logical queue to which that message is directed. Thus, we might write a general wait routine as:

    	typed_wait( msg-type )
              if empty( queue[ msg-type ] )
    	    repeat
                  await an incoming message
                  t = type of received message
                  if not( t = msg-type )
                     enqueue message on queue[ t ]
                  endif
                until t = msg-type
              else
                dequeue message from queue[ msg-type ]
              endif
    
    This is, in effect, how systems like DEMOS implement multiple mailboxes for each process. Furthermore, it is not difficult to generalize the above to allow a selective wait for any of a set of message types. There are alternatives to this model; for example, on some systems, there are primitives that allow newly received messages to be inspected and put back on the queue if they don't fit some criteria, or they allow a wait for messages from a particular sender.

  4. Multi Thread, Multi Queue Servers

    Given only one queue per thread instead of one queue per process, we can solve the problems outlined above in another way, using multiple threads per server (or equivalently, multiple processes sharing an address space, with one queue per process). This leads to the following server design:

            repeat
               await request from client
               create a new process to do the following
                  process step 1
                  send request to sub server
                  await reply from sub server
                  process step 2
                  send reply to client
                  exit
               end new process
            forever
    
    This piece of code has some interesting properties. First, note that the process that receives requests from clients is not the same process as the one that eventually sends replies to clients! The process that receives requests creates one short lived process per request, and all further work resulting from that request is done by that new process.

    The second useful property of this code is that it allows for parallelism in exactly the same way as the event processing code. The processing of different requests can be carried on in parallel, but most of the benefit of this is likely to come from overlapping interactions with subservers.

    Note that the only processes that know about the short-lived request processing processes are the main process of the server, which creates them, and the sub-server, which learns their identity as part of the reply address that is provided with the request messages. As a result, the short-lived processes never need to worry about receiving any messages other than the expected replies.

    This code has a problem! If requests arrive faster than they can be processed, large numbers of short-lived processes may be created. In most systems, there is a limit on the number of processes that may be created, and to avoid exceeding this limit, the server must generally avoid creating too many processes, for example, by using a general semaphore to count processes:

            repeat V(pop) to initialize the semaphore
            repeat
               P(pop)
               await request from client
               fork a new process to do the following
                  process step 1
                  send request to sub server
                  await reply from sub server
                  process step 2
                  send reply to client
                  V(pop)
                  exit
               end new process
            forever
    
    On a UNIX based server, the main server process can use process termination signals to await the termination of a short-lived processes in the event that too many short-lived processes have been created.

  5. Single Thread, Multi Queue Servers

    On systems where process creation is expensive, a multithreaded server is impractical. An alternative is to introduce multiple queues. This was done on DEMOS, and the following simple server structure results:

            repeat
               await message in main mailbox
               (assert it is a request from a client)
               process step 1
               create new mailbox
               send request+newbox to sub server
               await reply in new mailbox
               destroy new mailbox
               process step 2
               send reply to client
            forever
    
    This code is amost identical to the naive (and incorrect) code give at the beginning of this section; the big change is the use of a new (shortlived) mailbox for each transaction with a subsidiary server.

    The above code does not exploit the opportunities for parallelism that were exploited by the event processing code or by the multithreaded code, but this can be done if there is a "multiwait" operation, as on DEMOS. In this case, the control structure is that of the event processing code:

            boxset = { main mailbox }
            repeat
               await message in boxset
               case box id of
                 main mailbox:
                   create new mailbox B
                   boxset := boxset + { B }
                   client[B] := client
                   process step 1 relative to B
                   send request+B to sub server
                 other:
                   B := box id
                   process step 2 relative to B
                   send reply to client[B]
                   destroy B
               end case
            forever
    
    The advantage of this scheme over the pure event processing scheme is that the simple code can be used for simple servers, and only if these begin to limit system performance need the code be rewritten as above.