Assignment 7, due Mar. 9

Part of the homework for 22C:112, Spring 2012
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

On every assignment, write your name legibly as it appears on your University ID and on the class list! Assignments are due at the start of class on the day indicated (usually Friday). Exceptions will be by advance arrangement unless there is what insurance companies call "an act of God" (something outside your control). Homework must be turned in on paper, either in class or in the instructor's mailbox. Never push late work under someone's door!

Background: The Xerox Distributed File System had some useful fault tolerance features that inspired the following suggested approach to constructing a file system -- one that completely ignores the traditions of DOS and Unix:
- The header on each sector on disk contains the usual material needed by the disk drive: Start of sector mark, skew fields needed by the head positioning servo, what cylinder is this? What surface is this? What sector in the track, plus a checksum on the header. Nothing exciting here.
- We augment the header with: What is the logical address of this sector, that is, what is its sector number in some file, and what is the file number of that file? (File number zero is set aside as free space. Sectors of this file may be allocated to other files as those files grow).
Thus, to read a particular sector of a particular file, a linear search of all sectors on the disk will be guaranteed to find that sector, if it is readable. Of course, this is not efficient, merely sufficient. Files
The above suffices to organize all of the content of the disk into numbered files, but it does not provide for file names or directories.
- Sector zero of every file is set aside to hold directory information. Specifically, sector zero holds the name of the file and identifies the file number of the directory in which it is to be listed. Sector zero also indicates the file type, where type zero is a directory, and nonzero types are various kinds of data files -- uninterpreted binary, text, object code, and so on.
- File number one is set aside as the root directory of the entire file system.
Thus, given the file number of a directory and the name of a file in that directory, a linear search of the entire disk suffices to find the file number associated with that name in that directory. All you do is look at each sector claiming to be sector zero of a file and, if it is in the right directory and it has the right name, then the file number of that sector is the file number of the file you are looking for.
You might want to look up B-trees on Wikipedia, or hunt up any of the tutorials on B-trees before doing the rest of this assignment.
a) The data structures described above are sufficient, but not efficient. Assuming that there is sufficient working memory (not on disk), how many passes over the entire disk would be required to construct an efficient data structure for mapping logical disk addresses to physical disk addresses. Your answer should concisely describe the efficient data structure and how it is built in one or more passes over the disk. (0.6 points)
(Note: After this data structure is built, free space on disk can be used to store the data structure, allowing efficient access.)
b) How many passes over the entire disk would be required to rebuild the entire directory structure from scratch. What is done during each pass, and is it reasonable to combine any of this work with any of the passes you described in part a? (0.6 points)
c) Consider the act of opening a file and reading its first sector, given the name of that file as a path from the root of the file system. Do you expect that the file system described above would be faster or slower than the classic Unix file system? Give a high-level argument, justifying your answer. (A low level analysis could go on for pages and pages; if you undertake such an analysis, present an executive summary of your conclusions, do not turn in the whole analysis!) (0.6 points)
d) Lightning strikes the power lines while you were writing data to disk. The electrical surge causes the head positioning arm to twitch while the write was in progress. An unpredictable number of disk sectors are corrupted as a result. If this happens to a classical Unix file system, the fsck utility may be able to recover some data. With the above file system, it may also be possible to recover some data. Which file system is more resiliant, and why? (0.6 points)
e) Why is the above file system impractical on today's commodity harware? (0.6 points)

Machine Problem IV

Due Monday, Mar. 26.

The disksim.c disk simulator is a program that models the behavior of a disk drive. This uses a discrete event simulation model to deliver disk I/O requests at random intervals, requesting access to random sectors on the disk, and it uses a model of disk rotation at a constant rate combined with disk head positioning latency that is a quadratic function of the number of tracks moved.

At the center of the model is a disk request queue. In the model distributed on line, this is a FIFO queue, implemented as a linked list of disk request blocks. Since no data is actually transferred, the request blocks do not actually contain buffer addresses, just disk addresses. The disk request blocks do contain extra information used to gather statistics about disk performance.

The output of the simulation model is a report on the number of disk requests processed during the simulation, the average and worst-case number of time units each request spent in the queue, and the average and worst-case queue length. The mean arrival time has been adjusted so that, with a FIFO queue, requests arrive at a rate that the disk can just barely handle. Sometimes, the queue gets fairly long, with well over ten pending requests.

Your job is to replace the FIFO disk queue with something better, trying to reduce the average time a request spends in the queue. To do this, you will have to change the queue implementation. This may will involve changes to the following:

That part of the disk request block data structure that makes disk requests queueable -- currently, this is just one element, the next field.
The declarations for the global variables describing the queue -- currently, these are just a head and tail pointer.
The enqueue and dequeue routines.
You should, of course, change the header comments to indicate that the program is your solution to this machine problem.

Your improved enqueue and dequeue routines may access the disk address field of the disk request block in order to determine the best order to service requests. You may use the macros provided for decoding sector, track and cylinder numbers from the disk address, if needed.

Note that, if your code was included in a real disk I/O driver, the enqueue routine would be called from the user side of the driver, while the dequeue routine would be part of the interrupt service routine. Therefore, you should try to minimize the computational load of the dequeue code, shifting as much of the searching and sorting job to the enqueue end of the queue.

Note also that the simulation reaults from running the FIFO version suggest that queue lengths above 20 are rare. An old rules of thumb for deciding between binary trees and linear searches is that trees do little good for data structures smaller than about 10 elements, and linear searches are only embarrassing for data structures larger than about 20 elements. The empirical data combined with this rule of thumb suggest that a linear search is an acceptable solution to this problem.

You may not make modifications to the simulation model, except on an experimental basis for code you do not turn in. Do not change the disk parameters (that describe the disk geometry) and do not change the simulation code. If anyone finds an error in the code, please report it so that corrections can be issued.

Your solution will be graded on two different issues:

Does it work?
Quality of your code!

You are not responsible for improving the mediocre quality of the simulation code you have been given, but you must make sure that all of the code you add is written very well. The programming guidelines in the Manual of style for C programmers should be followed in all code you add, and in modifying the program header to reflect changes you have made.

Submit your solution using the coursework submission tools documented at:

-- http://www.divms.uiowa.edu/help/msstart/submit.html

In short, you will type the shell command submit, with no arguments, and when it prompts for a file name, you will type disksim.c, and then an extra carriage return to indicate that there is only one file in your submission. When it prompts for a course, type c_112 and finally, when it prompts for an assignment directory, enter mp4.

BUG NOTICE

The original version of the disk simulator distributed with this assignment contained an error. The current on-line version has two changes:

The variable curcyl was never changed, leading to all seek times being computed incorrectly. The line marked /*** bug ***/ (line 177 in the updated source code, 13 lines back from the end of file) was added to fix this. In context, the updated result is as follows:
```
                if (mywait > maxwait) maxwait = mywait;
                nextdeq = time + deqdelay( r->addr );
                curcyl = CYLINDER( r->addr );   /*** bug ***/
                free( r );
```
Fixing this bug makes the seek times shorter, so to give the queue sufficient work to make the problem interesting, the amount of traffic in the queue has to be increased. This requires a change to MEANARRIVALTIME. A comment saying /*** bug ***/ has been added to the code to mark this change as well. The new value is 175, defined on line 33 of the source file.