22C:116, Lecture 26, Spring 2002

Douglas W. Jones
University of Iowa Department of Computer Science

Multicomputers
A multicomputer consists of a number of computers, typically identical computers, interconnected by a high-speed local network. In order to talk about interprocess communication in such a system, we must have some grounding in networks.
The Intel Ipsc family of hypercubes, derived from the Cal-Tech Cosmic Cube is an example multicomputer family. The "stone-soup" IBM-pc compatable cluster is another example multicomputer family.
Communication Protocols
A communication protocol is an agreement between the users of a communication medium about how the medium is to be used to convey useful information. For example, a typical asynchronous serial communications protocol might be described as follows:
```
               time ___________________\
      V1      _ _ _ _ _ _ _ _ _ _      /
        _____| |_|_|_|_|_|_|_|_|_|_|_|____
      V0      | 1 2 3 4 5 6 7 8 P \ /
              |      data       |  Stop bits
          Start bit             |
                              Parity bit.
```
The time scale must be specified as part of the protocol, as must the voltages V1 and V0 used to encode zero and one. For example, if someone says they're using an asynch RS232 protocol at 9600 baud, they mean V₁ = -15, V₀ = +15, and the time per bit is 1/9600 second. In addition, they should specify how many data bits are used per character, whether even or odd parity is used, and whether one or two stop bits follow each character.
The model 15 Teletype, perfected by Kleinschmidt in the 1920's, operated at 75 baud, with 5 bits per character, 1 start bit and 1.5 stop bits; it used a 15 milliamp current loop, where zero was indicated by current flowing, and 1 was indicated by breaks in the current (the 15 milliamp current loop was standard in telegraphy back into the era of morse keys and the very first generation of relays).
It is not sufficient to give this low level protocol. To effectively use a communications line, there must also be an agreement about the interpretation of data. For example, the protocol could specify that the ASCII character set is used.
Even specifying the character set is not sufficient, however. There must be a convention for identifying messages embedded in the string of characters. The ASCII code includes special control characters that were intended for this use. These were originally intended to be used to construct messages with the following format:

SOH -- start of heading
header -- for example, the address of the recipient
STX -- start of text
message text
ETX -- end of text
trailer -- for example, the message checksum
EOT -- end of transmission
Other characters provided by the ASCII code related to this kind of protocol issue are:

ENQ -- enquiry
ACK -- acknowledgement
NAK -- negative acknowledgement
These might be used to enquire about the previously transmitted block. An ACK might indicate that the block was received correctly, and a NAK might indicate that there was an error.
Protocol Hierarchy
In general A protocol takes a stream of data and embeds it in a stream containing not only that data but also other information.
```
              ____________________
               DATA   DATA   DATA    higher level data stream
              ____________________
          _________/|      |\_________
         /          |      |          \
  ______________________________________________
   DATA |   stuff   | DATA |   stuff   | DATA     data stream
  ______|___________|______|___________|________ using protocol
```
This added information is not necessarily anything the user wants sent, but it is necessary (under the protocol) for the correct conveyance of the data.
In the discussion of asynchronous data transmission, the start and stop bits, inserted between each byte of data, constituted a low-level protocol, while the SOH, header, STX, ETX, trailer and EOT were added to each block of data by a higher level protocol.
Protocols generally exist at many different levels of abstraction, although early protocols frequently mixed and confused these levels. The ISO (International Standards Organization) Open Systems Interconnection model protocol hierarchy clearly separates the levels and helps system designers avoid many of the problems of early protocols.
An example of a confused protocol is IBM's Word Processing EBCDIC protocol from the 1970's. This specified everything from the use of the BISYNC synchronous data transmission method to the use of type-balls on a selectric printing mechanism. Issues of flow control, communications line management, and character set coding were completely mixed together.
If one says RS-232, one is specifying a low-level protocol for conveying ones and zeros. If one says Asynchronous, 56 kilobaud, one is specifying a somewhat higher level protocol for conveying a stream of bytes over some low level serial protocol. If one says ASCII, one is specifying an even higher level but somewhat mixed up protocol for the interpretation of those bytes. The problem with saying ASCII is that the ASCII (or ISO-7, as it is now officially known) character set specifies both a set of interpretations of bytes for printable characters and a set of control characters that were originally intended for various higher level protocols but are, today, rarely used as intended.
An open system is a system where the components are made by different manufacturers and combined by the user in ways not always anticipated by the manufacturers. Open systems only work well when there are standards for interconnecting the components. Some standards, like the Centronics printer interface (more widely known as the IBM PC parallel port) or Postscript, are the result of a single company leading the marketplace and being copied by others; other standards, such as ASCII and the ISO OSI suite, are the result of committee actions that span many users and manufacturers.
The ISO OSI Reference Model has seven layers, as follows

Application
A protocol agreed on by the end users for their application.

Presentation
The mechanisms by which the lower level protocols are presented to the application. This is, to a large extent, a matter of packaging.

Session
Protocols for connecting one application to another, establishing data streams or user level addressing of data packets.

Transport
Protocols for moving data from logical sender to logical receiver. Note that there may be many logical senders, for example, processes, on one physical sender, for example, a computer.

Network
Protocols for moving data from physical sender to physical receiver. This may involve forwarding over several distinct network links.

Data Link
Protocols for moving data over a single link in the network.

Physical
Protocols at the electronic, electrical and mechanical levels.
The physical layer
Examples:
- RS-232 Asynchronous data at 9600 baud using DB25 connectors,
  - 1 start bit
  - 8 data bits
  - 1 parity bit
  - 2 stop bits
- 50 Ohm BNC baseband Ethernet (thin wire) with grounded shield, terminated, running at 10 megabaud.
In both of the above examples, the type of the physical network connection is given (a 25 pin miniature delta connector, and "50 ohm BNC" is a type of connector). Both specify voltages used for signalling (RS-232 specifies +15 volts for logical 0, -15 volts for logical 1; the baseband Ethernet standard includes a set of voltage assignments). Both specifications include a basic interpretation of the data on the line, in terms of how to identify the start and stop of data and how to encode the individual bits.
In fact, we can distinguish sublayers here! For example, Ethernet interface hardware that was designed for use with coaxial cable can also be used with 10 Base T wires by introducing a simple adapter, and the same RS-232 wiring that will work for asynchronous data can be used for synchronous data.
The link layer
Consider a point to point data format where data is formed into blocks with the following structure based on the protocol suggested by ASCII's control characters:
```
                >>--time-->>
   ________________________________________ 
  |_|_______|_|_______________|_|______|_|_|
  SOH   |   STX       |       ETX  |  EOT ENQ
       head      n data bytes      |
         n                       checksum
         (number of bytes)         CRC-16 computed over
                                   the head and data.
```
The problem with data that contains characters that are accidentally meaningful to the protocol is a consequence of in-band signalling. This term comes form telephony. A protocol using out-of-band signalling relies on two separate communications channels, one to send user data, and one to send the data necessary to control the communications link.
With in-band signalling, both user data and control information are sent over the same channel, and unless care is taken, there are problems that can arise when the two are confused.
In the above example, it is important to make sure that the following control characters are not present in the data:
```
           ENQ EOT ETX STX SOH ACK NAK
```
Typically, if these are found in the data, the transmission software must substitute something else, typically something like an escape sequence, for example ESC-1 for ENQ, ESC-2 for EOT, and so on up to ESC-8 for ESC, in order to prevent the data from corrupting the protocol.
The touch-tone signalling mechanism used in telephony is an example of in-band signalling. In the early 1970's, Captain Crunch cerial (made in Cedar Rapids) was sold with a small whistle in each box. This whistle, unfortunately, was tuned to a signalling frequency used in long distance telephone lines (2600 cycles). The effect of injecting this signal into a line was to cause the remote long-distance exchange to terminate the connection and listen for touch-tone signals encoding the new destination being called. Unfortunately for the telephone companies, the billing for the long-distance call ended with the 2600 cycle tone, and the new long-distance call was made at no charge to the customer.
When the telephone companies discovered their error, they got Quaker Oats to discontinue their promotional giveaway, and over the decade that followed, they moved to out-of-band signalling.
The old hacker magazine 2600 took its name from this bit of history.
The network layer
```
          * Absolute Addressing

            >>-- transmission direction -->>
             _______________________
            |______________|_|______|
               data         |  fixed size address
                        bytes
                       of data

          * Path Addressing

            >>-- transmission direction -->>
             _________________________________
            |______________|_|____|____|____|_|
               data         |   variable     number of
                        bytes   sized        address
                       of data  address      components
```
With absolute addressing, the sender specifies the name of the destination machine, and it is up to the network layer to find a route through the network to get to that destination. Each machine must therefore have some routing algorithm, typically based on routing tables, used to map the final destination address to the locally available set of outgoing links. Maintaining the routing tables is expensive.
With path addressing, the sender specifies a path to the destination, and the network layer sends the data to the first machine on the path, at which point, the first machine strips off its address and forwards the message to first machine on what is left of the path. The sending machine must therefore know the topology of the network in order to design appropriate paths. This is expensive knowledge.
In any case, the network layer is concerned with routing the data from one machine to another.
The transport layer
The transport layer deals with movement of data between logical senders and receivers. Thus, each machine on the network may have more than one logical destination for messages.
For example, data may be transported between processes, or it may be transported from a sending process to a named network socket, an abstract named destination -- a process may be able to receive information from more than one socket.
```
  A stream between sockets

   >>-- transmission direction -->>
    ______________________________
   |______________|_|______|______|
      data         |   |     socket
               bytes   |     number
              of data  |
                       sequence number
```
The layers between the hardware and the transport layre don't necessarily guarantee that packets of data will arrive at their destination in the order in which they were sent. If this order matters, the transport layer must add sequence numbers to each outgoing message and it must sort incoming messages into order.
The transport layer also multiplexes messages from multiple logical sources on one machine, and it demultiplexes messages addressed to different destinations on the machine. One way of identifying the source (or destination) of a message is by socket number. Sockets identify logical destinations of a message, not the machine to which the message is addressed. It is up to the transport layer to determine (for the network layer) where the message should go, physically.
The session layer
The name of the session layer suggests that its inventors expected that this layer would implement interactive sessions between users on remote machines. Typically, the transport layer manages the delivery of messages from logical sender to logical receiver, but the session layer is given the job of organizing these into streams of bytes.
Not all applications need streams, though, so an alternate session layer might organize data into remote procedure calls or other transaction oriented structures.
Protocol Bloat
Protocol layering can cause a problem known as protocol bloat. This is because, at each layer in the protocol, it is tempting to include such information as block size, checksum, and similar information. If the same problem is solved at many different layers in the protocol, the result will be considerable extra network traffic!
The ISO OSI model focuses on standardizing the data stream at each level in the protocol hierarchy. This is probably necessary for open systems, where components at each level may come from different sources, but it is not a good software engineering methodology for constructing an integrated system.
It is not good software engineering practice to focus on data structures, particularly at an early stage in the design. It is far better to focus on functional decomposition of a problem first, and then to concentrate on the abstract components needed.
Hierarchies are useful, but design in terms of functional layers! Think about the transparency or opacity of layers!
Transparent layers add function without adding overhead. The concept of transparency originated in a paper by D L Parnas in the early 1970's.
An opaque layer in a hierarchidal design hides the details of lower levels in the design. Opacity is useful when one goal is to allow the lower level to be changed with no effect on upper levels.
A transparent layer allows the facilites of a lower level to be used directly by upper levels, without any need to, for example, call on procedures or functions at an intermediate level which forward the request to a lower level.
Transparent layers allow high performance. Typically, layers that add functionality but don't provide for implementation independance should be transparent. Layers that provide for implementation independance but add no functions should be opaque, and layers that do both should be partially opaque.
In the context of the ISO OSI protocol suite, a transparent session layer is a good idea. Once this establishes a new channel at the transport level, the application should be able to directly use the lower level protocol. With appropriate modularization, the transport layer at the sending end of a logical link can even let the sender deal directly with the network layer.