Homework 10 Solutions

22C:116, Spring 1997

Ioana M. Lungeanu
PROBLEM 1 PART A

        Each process has a QueueTable from which it may receive
    messages on links it has created, and a LinkTable which it
    may use to transmit messages.  Each link in the LinkTable
    designates some queue in the QueueTable of some process.
        Multiple processes may send to a queue, so this is an
    or-model problem.  Therefore, the deadlock detection
    algorithm tries to find a knot in the wait-for graph.
        A kernel detecting the fact that one of the processes
    on its machine has been waiting for some time initiates the
    deadlock detection algorithm.  The kernel constructs links
    for each queue on which it is waiting in its QueueTable
    and broadcasts these links to all the other kernels in the
    system, with a message saying "These links may be involved
    in a deadlock."  This message has a unique ID on it that
    identifies this round of deadlock detection.
	On receiving such a broadcast message, a kernel checks
    the LinkTables of each process it manages.  If one of the
    links from the message is in that table, it checks the state
    of the process.  If the process is running (not blocked) it
    sends a reply message saying "no deadlock".  If the process
    is blocked and has not been queried in this round of
    deadlock detection, it broadcasts a deadlock detection
    inquiry for all of the queues that process is blocked on.
    In this inquiry, it uses the unique ID previously
    established.
        If a kernel receives a deadlock inquiry concerning a
    process for which it has already begun deadlock detection
    for this round, it records the origin of the inquiry (in
    order to send an eventual reply) and does nothing.
        Once a deadlock check is initiated for a process, the
    kernel on that machine awaits a reply of "no deadlock" from
    other machines.  If such a reply is received by a process,
    it sends the reply to all processes which had sent an inquiry
    to it.  This guarantees that all processes not involved in
    deadlock will learn this fact.
        It is harder to guarantee that all processes involved in
    a deadlock are informed of this in a finite time!  If no
    "no deadlock" message is received within some time limit,
    a deadlock could be declared, but this isn't formally
    adequate.
	If we require that every kernel reply to each broadcast
    inquiry involved in deadlock detection, we can get a formally
    correct solution.  The replies are either "not involved",
    which means that this kernel has no processes satisfying the
    inquiry, or "no deadlock", as above, or "blocked" if, for
    every blocked process running under this kernel, all replies
    have been received and none say "no deadlock".
    
PROBLEM 1 PART B


        If the kernels maintain, for each each incoming message
    queue, a table of which processes have links to that queue,
    then instead of broadcasting inquiries to every other kernel,
    messages will be transmitted only to those kernels that are
    relevant, and those messages will identify what processes are
    involved.
        The solution given in part A scales poorly because the
    amount of broadcast traffic grows with the system size.  The
    solution involving inverted link tables scales far better
    because each process will typically have to pass on deadlock
    inquiries to a limited number of other processes and await
    replies from only those processes.

PROBLEM 1 PART C

        When a link is moved to a new process's link table (by
    being included in a message), or when a link is deleted
    from the link table of some process, a message must be sent
    to the kernel managing the process that is the target of the
    link telling it about the change.

PROBLEM 2

        If scatter-gather primitives in the DMA hardware are
    available, the actual data need not be copied several times,
    with different headers and/or trailers.  Trailers, headers
    and data can be picked from different memory addresses and
    assembled/disassembled as necessary, without multiple copies
    of common parts. This is possible for layers that don't
    change the actual data. For example, in the ISO/OSI protocol
    the presentation level might perform data conversion.

PROBLEM 3

        In a system in which every machine is expected to suffer
    many transient failures, stable storage could be implemented
    as follows:
        In addition to duplicating the stable storage on machines
    that are unlikely to have correlated failures, the stable
    storage must be checked periodically.  If, on checking, one
    of the copies is found to have been corrupted (as evidenced
    by a parity error or invalid checkusm) a correct copy is used
    to fix the error.
        There must be at least two processes prepared to make
    periodic checks (since the transient failures could kill one
    of these processes), and if one of these processes detects
    that the other is dead, it must restart it.
        Given that failures to any memory cell occur randomly
    once every f seconds, the rate at which copies are checked
    should be somewhat more frequent than once every f seconds.
    With a sufficiently high rate of checking or a sufficiently
    large number of copies, overall system reliability may be
    made arbitrarily high.

PROBLEM 4 PART A

        The pseudo-code for a nonblocking send using a blocking
    send and threads:

    nbsend(i,r) {
        spawn(proc, i, r);
    }

    proc(i,r) { /* runs as a separate thread */
        send(i,r);
        die();
    }

PROBLEM 4 PART B

        The underlying system must schedule the threads to run
    in the order of their creation, for the messages to be sent
    in the intended order. The communication links have to be
    FIFO links, for the messages to actually be received in the
    order of transmission.

PROBLEM 4 PART C

        Consider the following data structure for blocking
    message exchange ports:

	Blocking send( msg, queue )
	    transmit [msg,return-address] to queue
            await reply

        Blocking receive( msg, queue )
            await [msg,return-address] from queue
            send reply to return-address

    A timeout-acknowledge scheme can be added to the send; this
    requires that serial numbers be added.