Assignment 8, due Oct. 25

Part of the homework for 22C:112, Fall 2013
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

On all assignments, your name must be legible as it appears on your University ID card! Assignments are due at the start of class on the day indicated (usually Friday). Exceptions will be by advance arrangement unless there is what lawyers call "an act of God" (something outside your control). Homework must be turned in on paper, either in class or in the teaching assistant's mailbox. Never push late work under someone's door!

Background: Consider the problem of writing a file system for a flash memory device with the following characteristics:
- The memory is divided into 5012 byte blocks. By convention, this is 4096 bytes of data plus 16 bytes of additional data.
- Block addresses are 32 bits (but not all addresses will work).
- The flash erase operation will convert a block to all zero.
- The flash write operation will or the user's data with one block. Writing can change zero bits to 1 bits, but it cannot change them back.
- The flash read operation will read one block.
- Each block can be flash erased at least 10,000 times. After that, it may begin to fail (some bits may be stuck at 1).
A file system designed for this device must do wear leveling -- that is, it must not focus write operations on any one block, but must, instead, try to spread the wear uniformly over all blocks.
a) One way to deal with this is to use a disk driver that agressively caches things, keeping all changes in RAM until you ask to dismount the flash drive, and only then making any changes on the drive. This means that typically, you will only change each sector a few times a day. 5000 days is over a decade, so most users will be satisfied. Or will they? What is the biggest risk of this scheme? (Consider the impact of USB devices and power failures.) (0.5 points)
b) Another alternative is to avoid storing any indices. Instead, we use the extra space in each data sector of each file or directory to store: i) the sector number in file, and ii) the i-number of the file. In sector number -1 of each file, we put the administrative contents of the i-node (date of last modification, file size, etc.). Now, a simple linear search of the entire flash device suffices to find any sector of any file, but the performance of that would be awful unless we build auxiliary index structures in RAM when we mount the flash file system. When it comes time to change the data in a sector of a file, what sector do we write that data into in order to level out the wear on the flash memory? (0.5 points)
Background: Many MMUs have had user (well, system programmer) interfaces that present two registers to the outside world:
- MMUPTA The page-table address register
- MMUPTS The page-table size register
Suppose the virtual address is 32 bits, composed of a 20 bit page number and a 12-bit byte-in-page field, while page-table entries consist of one word each, with the frame number in the top 20 bits of each word and access rights in the bottom 3 bits (with many unused bits). If virtual address translation were done in software, the algorithm might look like this when written in C (using the type intptr_t to hold the integer representations of both virtual and physical addresses):
```
intptr_t MMU( intptr_t va, int use ) {
        intptr_t byte_in_page = va & 0x00000FFF;
        intptr_t page_number = va >> 12;
        if ( page_number >= MMUPTS ) throw_page_out_of_bounds();
        intptr_t page_table_entry = MMUPTA[ page_number ];
        intptr_t rights = page_table_entry & 0x00000007;
        if ( check_rights( rights, use ) ) access_rights_violation();
	intptr_t frame_number = page_table_entry & 0xFFFFF000;
        intptr_t physical_address = frame_number | byte_in_page;
	return physical_address;
}
```
Both use and rights are 3-bit values where the top bit indicates read, the middle bit indicates write, and the bottom bit indicates execute. In use just one of these is set, indicating the use the CPU is making of the memory address, while in rights one bit is set for each permitted operation.
a) Give appropriate C code for check_rights(). This may be in the form of a #define or a function of type bool. (0.5 points)
b) An MMU built this way wouldn't perform well because it would take one memory cycle to access the page table for each useful memory cycle -- doubling the cost of every memory access. To speed things up, most MMUs contain a cache (the translation lookaside buffer or TLB) holding recently used page table entries. Only when there is a cache miss is there any need for the MMU to actually look at the page table in RAM. Why do we need a special MMU control to invalidate the TLB? (0.5 points)
Background: Suppose we have a computer with an MMU and a fast I/O device like a disk or Ethernet connection. Fast devices typically used direct memory access or DMA input/output. This means that the device interface has a memory address register which the I/O driver loads with the buffer address before starting the I/O operation.
Since the MMU is typically part of the CPU chip in a modern computer, while the DMA I/O devices are outside the CPU chip, the memory address used by the DMA interface is typically the physical address of the buffer.
Some DMA devices have what is called scatter-read gather-write capability. Instead of just having a single buffer-address register and a single buffer-size register, they allow a single DMA transfer to span multiple buffers, for example, reading 100 bytes into memory starting at location X and then 412 bytes into memory starting at location Y (using a device with a 512-byte sector size). Most of these devices allow an arbitrary array of buffer chunks to be specified, with an entry with a byte-count of zero marking the end of the array.
a) Suppose the user calls read( fd, buf, 512 ) where the device has 512-byte sectors, and where the file position is a multiple of 512. If we didn't have virtual memory, we could just load the buffer address into the DMA address register and 512 into the byte count register and then start the transfer. What extra work must the software do because we are using virtual memory? For this part, assume the buffer is entirely contained in one page of RAM. (0.5 points)
b) Now, assume that the buffer spans a page boundary. Explain how scatter-read gather-write can be used to simplify the work done by the I/O software. (0.5 points)