Assignment 7 Solutions

Part of the homework for 22C:112, Spring 2012
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Background: The Xerox Distributed File System had some useful fault tolerance features that inspired the following suggested approach to constructing a file system -- one that completely ignores the traditions of DOS and Unix:

Thus, to read a particular sector of a particular file, a linear search of all sectors on the disk will be guaranteed to find that sector, if it is readable. Of course, this is not efficient, merely sufficient. Files

The above suffices to organize all of the content of the disk into numbered files, but it does not provide for file names or directories.

Thus, given the file number of a directory and the name of a file in that directory, a linear search of the entire disk suffices to find the file number associated with that name in that directory. All you do is look at each sector claiming to be sector zero of a file and, if it is in the right directory and it has the right name, then the file number of that sector is the file number of the file you are looking for.

You might want to look up B-trees on Wikipedia, or hunt up any of the tutorials on B-trees before doing the rest of this assignment.

a) The data structures described above are sufficient, but not efficient. Assuming that there is sufficient working memory (not on disk), how many passes over the entire disk would be required to construct an efficient data structure for mapping logical disk addresses to physical disk addresses. Your answer should concisely describe the efficient data structure and how it is built in one or more passes over the disk. (0.6 points)

(Note: After this data structure is built, free space on disk can be used to store the data structure, allowing efficient access.)

A linear scan of the headers on every sector requires computing the physical addresses of the sectors, in order to read them, and can extract, for each sector, the logical address of the sector. This is sufficient to build, for example, a b-tree allowing efficient mapping from logical to physical addresses.

b) How many passes over the entire disk would be required to rebuild the entire directory structure from scratch. What is done during each pass, and is it reasonable to combine any of this work with any of the passes you described in part a? (0.6 points)

A single traversal of the b-tree constructed in part a) suffices to locate sector zero of each file. Reading sector zero of a file identifies the directory in which that file should be listed, along with the file's name. From this, it is possible to verify that each file for which sector zero was found is correctly listed in the appropriate directory, and if the file is not listed, the directory entry for that file can be added to the appropriate directory.

Once this is done, it is possible to traverse the directory tree in order to check, for each file named in the directory, that sector zero of that file exists. Where sector zero of a file does not exist, an appropriate sector zero can be reconstructed from the directory entry. The result is that the entire directory structure can be rebuilt after any one sector is destroyed, and many faults that destroy many sectors can be repaired.

c) Consider the act of opening a file and reading its first sector, given the name of that file as a path from the root of the file system. Do you expect that the file system described above would be faster or slower than the classic Unix file system? Give a high-level argument, justifying your answer. (A low level analysis could go on for pages and pages; if you undertake such an analysis, present an executive summary of your conclusions, do not turn in the whole analysis!) (0.6 points)

Assuming that the data structures describe in parts a) and b) have been built, following a path from the root to a particular file will involve reading similar numbers of sectors of data from directories. The difference, if any, must lie in the cost of finding the sectors of the directories when given the file numbers of those directories. In effect, we must ask if the b-tree is more or less efficient than the i-node structure used by Unix.

The answer is tied closely to questions about the average directory size and whether the sectors of a typical directory are allocated sequentially or use some other scheme. Typical directories are small, under 100 directory entries, so directories will probably only occupy a few sectors, and there is no need to allocate these sectors in any way other than sequentially.

Therefore, the Unix i-node scheme will likely offer a small advantage because of the way they favor access to the first few sectors of each file or directory. B-trees have no such built-in favoritism -- they offer a constant cost independent of the file number or the sector number in the file.

d) Lightning strikes the power lines while you were writing data to disk. The electrical surge causes the head positioning arm to twitch while the write was in progress. An unpredictable number of disk sectors are corrupted as a result. If this happens to a classical Unix file system, the fsck utility may be able to recover some data. With the above file system, it may also be possible to recover some data. Which file system is more resiliant, and why? (0.6 points)

In both file systems, there is no ability to recover data sectors of files. If such sectors are lost, the loss is equal in both file systems.

In the classic Unix file system, if the i-table is corrupted, all files referenced from the damaged sections of the i-table will be lost, and if index sectors for large files are damaged, the data sectors indexed from those index sectors will be lost. In contrast, the Xerox file system can completely recover from such damage by repeating the process described in part a).

If directories are corrupted, both file systems offer a degree of resiliance. In Unix, files that are no-longer referenced from any directory can be entered into the lost-and-found directory, and the back-links on directories allow significant reconstruction of the directory tree if directories are corrupted -- although file names will be lost. The Xerox system is stronger here, entire directories can be reconstructed so long as sector zero of each file referenced from that directory still survives, and so long as the directory entry referring to the lost directory survives.

e) Why is the above file system impractical on today's commodity harware? (0.6 points)

No. The Xerox file system relied on the ability of the system designers to control the content of the prologue on each disk sector. To do this, they had to build their own disk controllers, something that was reasonable in the 1970s, but is not practical with the commodity disk drives of today, where the controller is integrated into the drive and not designed to be altered.