Homework 10

22C:116, Spring 1997

Due Friday Apr. 18, 1997, in class

Douglas W. Jones

Background: Consider the problem of distributed deadlock detection in a system where processes communicate by messages over Demos style links. Each process has a list of links over which it may send messages, and each link designates a particular incoming message queue of a process. The only way a user process may block is by awaiting a message on one or more incoming message queues.
Assume that there are multiple host machines running user processes, and that each has a local operating system kernel. Assume that, when a kernel detects that a process is blocked, it may initiate a deadlock detection algorithm. The deadlock detection algorithm must not involve communication with user processes, but must only involve communication between the kernels involved. The only process state information used by the algorithm must be information available to the kernel.
Assume that message delivery is instantaneous (the only delays are computational) and that all links of a process and the contents of all messages in message queues are available to the local kernel. Assume that all system components are reliable.
The Problem, part A: Describe a distributed deadlock detection algorithm that would work in this context. This algorithm must not rely on timeout, nor may it use any data structures other than those described above! It will not be efficient!
The Problem, part B: What improvements to the above algorithm are possible if kernels maintain an inverted data structure -- that is, in addition to the link table of each process, the kernels maintain a table, with each incoming message queue, of which processes have links to that queue.
The Problem, part C: Discuss the additional kernel to kernel message traffic required to maintain this inverted data structure as links are manipulated by user programs.
Background: See page 439-440 of the text.
The Problem: Explain how scatter-gather primitives in the DMA hardware used to send messages can significantly reduce the overhead imposed by a protocol hierarchy.
Background: Consider a system where failures are a regular occurance, for example, a space probe in the radiation belts of Jupiter, where radiation frequently causes the values of individual bits in registers or memory to change. The computer systems on such probes are usually composed of networks of independant machines, where all machines periodically check the correct operations of their neighbors and all machines are designed so that their neighbors can reboot them in case of transient failure. Every machine in such a system is expected to suffer many transient failures over the life of the system, but the system itself is expected to survive.
On such a system, multiple copies must be saved for any data that is to be preserved through any length of time, and these copies must constantly be monitored to assure that corrupted copies are discarded and replaced with replicas of surviving copies.
The problem: Propose a reasonable approach to implementing stable storage in such a setting.
Background: Consider a system where the only message passing primitives in the system are blocking, that is, the send(m,r) operation blocks until the message m has been correctly received by r, and the receive(m) operation blocks until a message is delivered to buffer m.
Assume the primitive spawn(p,x,y...) creates a new thread running procedure p(x,y...), and that the primitive die() allows a thread to kill itself. Threads may synchronize using thread semaphores with the operations thP(s) and thV(s). If no memory is available, spawn(...) will block until there is enough memory to create the new thread.
The problem, part A: Write a procedure called nbsend(...) that allows a program to send a message without being blocked until that message is received. Consider the following example application:
```
                repeat
                    i = i+1;
                    nbsend(i,...)
                forever
```
This should send all of the integers to the recipient, and it should block only if the memory capacity of the sending system is filled with copies of the message and possibly with other ephemera.
The problem, part B: What assumptions do you have to make about the underlying system for your answer to part A to guarantee that the messages are received in the same order that they were sent?
The problem, part C: Describe a protocol that could be used on the underlying system to implement the blocking send and receive primitives.