22C:116, Lecture 21, Spring 2002

Douglas W. Jones
University of Iowa Department of Computer Science

File Systems -- CD-R and other WORM devices
CD-R devices pose an interesting challenge for the designers of file systems. These devices are WORM devices -- the acronym officially stands for Write Once Read Multiple, but some people say it's Write Once Read Mostly. It's not a read-only memory, merely a read-mostly memory. When delivered, a new CD-R disk contains all ones, and the "CROM Burner" can change any one bit to zero, but can never change zero bits back to one.
As a result, classical notions of directory and file structure must be changed! You cannot update the superblock in track 0 sector 0. You cannot change the pointer to the root directory, you cannot update an I-node in the I-table.
EPROM and DVD-R have similar WORM characteristics, and there are applications for file systems on these technologies too.
A very simple idea
One approach to using WORM devices is to build an entire working file system on a classical random access file system -- on magnetic disk or on CD-RW, for example, and then copy it in bulk, as a single operation, to the WORM device. Having done so, the CD-R version of the file system is read-only, but aside from that, all of the conventional matters of file organization apply.
We do need to make sure that the sector sizes on the CD-R device and the original random access device are the same, and we need to make sure that the disk addresses encoded in the file system are all simple sector numbers and not some kind of <cylinder,track,sector> tuples, because CD-R and other WORM technologies don't generally have the same geometry as classica magnetic disks. (In fact, in the specific case of CD-R, the device has a single spiral data track like a classical mechanical phonograph record!)
This approach works well enough that it is widely used in the microcomputer world for backup management, but it fails to fully exploit the possibilities of the WORM technology.
A Simple File System Model
Consider, for the purpose of this discussion, a simplified model of a hierarchical file system. In this model, files are either directories or data files. Both consist of a sequence of sectors, and the same mechanism describes both. A file or directory is described by an index sector that contains the physical sector numbers of the sectors of that file. A directory consists of a number of directory entries, each of which consists of the name of one directory entry and the sector number of the index sector for that file. One index sector is special, it describes the root directory of the entire file system.
This file system is a simple n-ary tree, where each internal node in this tree is either the index sector of some file or directory, or a sector of some directory. Internal nodes contain the sector numbers of children in this tree, and the leaves are all either sectors of data files or sectors of directories that do not happen to have children.
Modifying a Sector of a WORM device
As stated previously, all bits stored on a WORM are initially 1, and the only operation a write to a sector of a WORM can do is set bits to zero. We will assume that each sector includes some data bits and a checksum; therefore, each sector on our WORM device will initially look something like this:
```
        11111111111111111111111111111111111111111111111111111
        \___________________data__________________/\checksum/
```
Typically, we'll devise a checksum algorithm that declares this initial value to be valid. Our checksum computation is typically done with some algorithm that, after application to the entire block, data plus the recorded checksum on the sector, produces a checksum in the internal checksum accumulator of zero.
Suppose the sector is the last sector of a file, and some of it has been recorded:
```
        10110101110001111111111111111111111111111111011101010
        \__recorded__/\__________unused___________/\checksum/
```
Once we have recorded this much data, we cannot add new data to this sector unless we are extremely lucky to add data in such a way that the new checksum is that is consistant with the checksum already computed -- that is, it must be equal to the old checksum or have some more ones converted to zero. This won't do!
Fortunately, there is a way out! If we set aside several slots at the end of the block for the checksum, we can add data to the block and then compute a new checksum. We must decide in advance how many updates we want to allow for each block. Say we decide to allow each block to be updated exactly once after it is first recorded. We need to allow for two checksums to support this, so we use the following format:
```
        11111111111111111111111111111111111111111111111111111
        \______________data_____________/\checksum/\checksum/
```
After writing the block once, we have the following:
```
        10110101110001111111111111111111110111010101111111111
        \__recorded__/                   \checksum/
```
We can do this because our checksum algorithm declares the block to be correct if the sum at the end of the block is zero. It doesn't matter which bits we adjust to give this result, so we can record our checksum anywhere in the block, and the first time we update the block, we record it in the first slot reserved for the checksum. After our second update, we have the following:
```
        10110101110001111000101000111111110111010101111110101
        \__recorded__/\_recorded_/       \checksum/\checksum/
```
The new checksum takes into account the old data, the new data, the part of the block that has not yet been recorded (and never will) and the old checksum, so that the grand total is still zero.