Homework 4 Solutions

22C:116, Fall 1995

Douglas W. Jones

The problem, part A: Given a fixed CPU with a fixed mix of applications programs, where both hard disks and diskettes are in use, describe the expected benefit of using a different sector size on hard and floppy disks. Give an estimate of the order of magnitude of this benefit, and state the order of magnitude and sign of the difference in sector sizes that would give the largest expected benefit.
To answer this, the first step is to ask how the optimal sector size varies as a function of the disk transfer time Tt. This is an unreasonable question if the data production or consumption rate is higher than the transfer rate of the device, but if data is being produced or consumed at a lower rate, the problem becomes one of adjusting the block size until the expected delay for latency matches the time it takes to produce the next block.
The key to note is that, holding other things constant, as the fraction of each transfer devoted to latency rises, the optimal block size rises. Thus, if latency is expensive, fewer transfers are made, and thus, less time is wasted to latency.
So, assuming that the electromechanical head positioning mechanisms of a floppy disk drive are relatively slow, the optimal block size for a floppy disk would be larger than the optimal block size for a high performance hard drive.
Would the benefit be large? Yes! This is why, for many diskette based applications, it is faster to read an entire file in a single transaction then operate out of RAM, then write the entire file back to diskette. Doing this, the effective sector size for the diskette is equal to the file size!

The problem, part B: The determination of the optimal sector size is difficult, but it can be empirically measured! Given that you are the author of the software used to read and write sequential disk files, you can make the software read and write multi-sector blocks to gain the effect of larger sectors. If your software dynamically adjusts the number of sectors per block, it can approximate the optimal sector size for each particular application program it serves. Propose the variables the software could measure in the course of normal operation in order to tune the block size for optimal performance.
On input, the system could try to read ahead, anticipating application read requests. If the application requests input before the anticipatory read of the next "logical sector" is complete, the system would increment the logical sector size of its reads on behalf of that application. If the read finishes before the user requests it, the system could decrement the logical sector size.
On output, the system could set the logical sector size to the number of sectors provided by the user since the previous write was initiated. Of course, these schemes will only give a benefit if the sectors of the file are generally allocated sequentially on disk, thus minimizing the chance for seeks within a logical sector that would otherwise be transferred as a single uninterrupted multi-sector transfer.
The problem, part A: What effect would you expect this isolation to have on the system software, assuming that the software uses a typical effective disk scheduling algorithm such as the elevator algorithm, and assuming that sectors of files are allocated preferentially in the same cylinder?
A SCSI interface can prevent a file system from knowing the physical cylinder on which a sector is located. As a result, it might fail to correctly predict which sectors can be reached in sequence with minimal latency, and thus, the overall performance of the file system would be poorer than the performance of a file system in which the physical location of each sector was known.

The problem, part B: Propose (and describe) a set of empirical measurements you could perform, using software to measure the actual layout of sectors on a modern SCSI disk.
Assuming no cache, the measurement can be done by measuring the time delay between consecutive reads from various sectors. The rotational speed of the disk should be measured first, by measuring the time delay between consecutive reads of the same sector. Two different sectors which can be read in sequence such that one is read exactly one rotation after the other are most likely corresponding sectors of the same cylinder.
Such timing measurements can be used to determine the order and interleaving of sectors in a track, and the number of tracks within a cylinder. Once this is determined for each cylinder of the disk, the overall geometry is completely known and can be used for disk scheduling and storage allocation.

The problem, part C: Now assume the SCSI disk incorporates an on-board cache of recently referenced disk tracks. Can you propose modifications to your experiments that would still allow you to determine the physical layout of the actual disk?
If the cache holds entire tracks or entire cylinders, which is quite likely, then measurement of the geometry within a track or cylinder will be difficult and may be impossible.
On the other hand, once the cache capacity is known, inter-track or inter-cylinder head motion latencies can still be measured! Thus, the first thing to test is the cache size. Assuming an LRU replacement policy, which is a reasonable policy to use in a cache implemented by the software of the SCSI disk drive's controller, this measurement can be made by first filling the cache (for example, by reading the entire disk in sequence), and then reading first the most recently and then the second most recently sectors, working backwards until an obvious latency delay is encountered.
Given the cache capacity, seek times can be measured by first filling the cache, then reading one additional sector that you know isn't in the cache because it wasn't recently used. The latency measured will be the seek time between the last item read during the filling of the cache and the one read in order to force the cache to overflow.