Assignment 6, due Mar 14

Part of the homework for 22C:112, Spring 2008
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Always, on every assignment, please write your name legibly as it appears on your University ID and on the class list! All assignments will be due at the start of class on the day indicated (usually a Friday). The only exceptions to this rule will be by advance arrangement unless there is what insurance companies call "an act of God" - something outside your control. Homework must be turned in on paper and in class! Late work may be turned in to the teaching assistant's mailbox, but see the late work policy. Never push late work under someone's door!

Background: Consider the classic Unix i-node as described in the notes for lecture 18. Each i-node holds 13 sector numbers, the first 10 of which refer to the first 10 sectors of the file, while the final 3 are used for successively deeper trees of index sectors (with data sectors as leaf nodes on the tree). In the classic system, sectors are 512 bytes and sector numbers are 32 bits.
a) What is the largest disk this file system can be used on? (0.3 points)
b) What is the largest file this file system can support? (0.3 points)
c) Suppose the system was changed to use the first 8 disk addresses in each i-node to refer to the first 8 sectors of the file, with the final 5 addresses used to refer to successively deeper trees of index sectors. Ignoring questions of disk size, what is the maximum file size this would permit? (0.3 points)
d) Evaluate the prudence of using the alternative suggested in part c. (0.3 points)
e) If the sector size was changed from 512 bytes to 1024 bytes, how many of the 13 disk addresses in each i-node should point to data sectors, and how many should point to successively deeper trees of index sectors? In answering this, take into account both the maximum file size and the maximum disk size. (0.3 points)
Background: Google-scale file systems require some huge files, much bigger than a single volume, as well as huge numbers of small files. Developers of Google-scale file systems have generally been unimpressed by the foundations provided by conventional file models such as those of Unix or Windows. The Wikipedia writeup on the Google file system is worth reading.
A Question: Discuss the relative benefits of implementing file "chunks" as files in the underlying file sytem, versus abandoning the underlying file system and building a new file system on top of raw disk drives. (0.5 points)
Background: The Xerox Distributed File System (XDFS) was like the Unix file system in one critical aspect: Directories mapped file names to file numbers, and then the low-level read and write operations operated in terms of reading or writing the selected sector of the selected file in terms of its file number. In XDFS, the need for an elaborate tree structure to locate the sectors of each file was eliminated. Instead, the 32-bit sector number was simply concatentated with the 32-bit file number to make a 64-bit sector number, and then the desired sector was located by looking up this 64-bit value in a global B-tree.
You may have to look up B-trees -- the Wikipedia entry is well written. In the case of XDFS, each B-tree record was one sector, each search key was 64 bits, and linearized sector numbers were 32 bits. If we assume 512 bytes per sector, this means that each sector would hold, on average, about 64 entries.
a) Compare the efficiency (in terms of total disk space needed for i-nodes and index sectors) of this file system with the efficiency of the classical Unix file system. Specifically, given that there are m files of n sectors each, which file system will have a higher storage overhead? (0.5 points)
b) What is the expected run-time cost of the first access to sector i of file j, assuming that there are m files of n sectors each. (The expected run-time cost of the second access is not a problem if the system uses appropriate caching.) (0.5 points)