Homework 7

22C:116, Fall 2001

Due Friday Oct 12, 2001, in class

Douglas W. Jones

Always, on every assignment, please write your name legibly as it appears on your University ID and on the class list!

Problems from the text:
a) Do problem 5 on page 449.
b) Do problem 9 on page 450.
c) Do problem 19 on page 450.
d) Do problem 24 on page 451.
e) Do problem 30 on page 451.
Background: The Xerox Distributed File System was an experimental file system with the following basic operations:

read(file,sector,buffer)
write(file,sector,buffer)
-- file, the number of the file to access
-- sector, the sector number of the file to access
-- buffer, the address of the buffer in main memory
The buffer size was implicit, equal to the sector size and fixed by the system. File and sector numbers were simple integers. Sector numbers were arbitrary. Writing a previously nonexistant file creates that file. Writing a previously nonexistant sector of a file creates that sector. Reading a nonexistant file returns an error indication, as does reading a nonexistant sector; in these cases, the buffer is cleared.

Note that the basic file system had no textual file names and no directory structure! This was deliberate! In the implementation of this system, the prologue of each disk sector contained the file number and sector number of the associated data, so that, if performance were not the issue, a linear search of the entire disk could always be used to find the disk address of any sector of any file.
The Question: A linear search for some sector of some file is inefficient! One alternative would be to treat the file number as the I-number of the file, and use an I-table, as in UNIX, to locate all of the sectors of each file. Another alternative, the one chosen by Xerox, was to concatenate the file number and sector number, and then use the resulting double integer as the index into a B-tree, stored on disk, used to gain fast access to the sectors of any file.
Assuming that the disk has just been mounted on the host computer, how many read accesses would you expect it to take to find some sector of some file. Assume a 1 Gigabyte disk with 4K bytes per sector, and assume that both the sector number and the file number are 32 bit integers, and assuming that the disk is about half full, with the average file size being 8K but some few files being very large.
The Xerox Distributed File System had a directory manager. The use of this directory manager was optional! It was not necessary to use this part of the system, but most users prefer to deal with files by their textual names! The basic directory manager services were:

lookup(filename,directory)
-- filename, the textual name of a file.
-- directory, the file number of the directory to look in.
If the indicated name is found in the indicated directory, the corresponding file number is returned. If it is not found, an error indication is returned.

define(filename, filenumber, directory)
-- filename, the textual name of a file.
-- filenumber, the file number for that file.
-- directory, the file number of the directory to define it in.
The entry in the indicated directory is either created or updated so that the directory associates the indicated name with the indicated number. Define always stores the file name and the directory number in sector 0 of the file.

Directories on this system were just a type of file, referenced by the file number of that file. By convention, directory 0 was the root directory of the entire system. Because sector 0 of each file always contains the directory information for that file, lookup can always find any file by a linear search of the entire disk, checking all sectors that claim to be sector 0 of some file until there is a match, and then returning the file number of that sector. Of course, this is not a desirable implementation, so the contents of each directory was stored in the indicated file in a form allowing rapid lookup.
Note that most users of this file system used a slightly higher level service, open. Open parsed a pathname into component names, and then looked up each component, starting in directory zero with the first path component and continuing downward toward the indicated file.
The Question: Suppose a problem occurs that destroys some number sectors somewhere on the disk, possibly including some sectors of file 0. How much of the file system can be reconstructed after this accident. Contrast this with a similar accident on a UNIX file system. Which system minimizes the damage? Assuming both systems use comparable RAM cache to speed disk access, what performance penalty would you expect to pay for this minimized damage?