Midterm Answers and Results

22C:116, Fall 1995

Grade Distribution

The exam was graded on an 8 point scale, one point for each part of each problem. Grades were then multiplied by 2.5 and rounded to one place after the point so that the maximum possible score was brought up to 20 points.

The average exam score was 12.3; the grades were distributed as follows:

                    X     X   X
                    X     X   X
                    X X X X   X   X X
          X     X X X X X X   X X X X X
__________X_____X_X_X_X_X_X___X_X_X_X_X_X___
 0 . 2 . 4 . 6 . 8 . 10. 12. 14. 16. 18. 20

Answers

Consider an application with a total memory requirement of 1 megabyte, a working set of 100 kilobytes, and pages that may be compressed to an average of 2/3 of their original size, but with considerable variance from one page to the next.
Part A: If the heap manager used to store the compressed pages of the application wastes, on average 10% of the heap to fragmentation, the minimum physical memory resources required would include:
- 100 Kb for the working set; Compressing this would cause thrashing.
- 600 Kb (2/3 times 900 Kb) for everything else, in compressed form.
- 67 Kb for fragmentation of the compressed data area (600 = 90% of 666.66).
- 767 Kb = TOTAL
The most common error was to assume that, somehow, the pages outside the working set didn't need to be stored, and that the working set would be compressable, thus giving a total of around 67 Kb.
Part B: Consider the problem of storing compressed copies of pages.
The main memory can be subdivided into two regions, one serving as a pool of page frames used to hold the expanded pages in the application's working set, and the other region used to hold a heap of compressed pages. Entries in the page table pointing to expanded pages would be marked as being valid, while entries pointing to compressed pages would be marked invalid, with most of the bits in the page-table entry being used to hold the physical address of the compressed page. If there are insufficient bits to hold such an address, the least significant bits of the address can be assumed to be zero and the entries in the heap can be aligned on the resulting block boundaries.
The heap can be managed by any management scheme, for example, the buddy system or one of the boundary tag systems. Boundary tag systems have a clearly defined storage overhead, in the form of the tags, and the internal fragmentation caused by the introduction of alignment constraints because of short pointer formats (with the low bits assumed zero) can also be considered a form of overhead. In any case, all heap management schemes for dealing with variable sized objects introduce external fragmentation, unless the software is made more complex by the inclusion of provisions for compacting the heap.
Note that this question cannot be answered well without discussion. Many of the weak answers had scribbled keywords like fragmentation without the text needed to explain them. Other answers focused on peripheral issues, such as the run-time cost of compression; this issue is outside the problem domain, while heap management overhead is clearly within the domain of this course and therefore a central issue.
Part A: With reference to typical modern hardware, what data structures would you use to represent an address sapce, how would you handle the allocation of storage for those address spaces, and how would you handle the problem posed by the potentially large size of some address spaces?
Clearly, the address space described here must be paged and segmented, or equivalently, the page table describing the address space must itself be paged. Thus, at an abstract level, the address space is described by a segment table, each entry of which points to a page table, each entry of which points to a page. If a page is not currently allocated in main memory, it can be marked invalid, and if no pages in a segment are currently allocated in main memory, the entire page table for that segment can be removed from main memory and the corresponding entry in the segment table can be marked as invalid. Everything described above can be achieved with equal ease on an Intel 80386, a Motorola 68020, a DEC VAX, or an IBM Power processor.
Part B: With reference to typical modern hardware, how would you detect and prevent attempts by users to apply conventional machine instructions to objects that are not pages?
If a page or segment is not to be directly manipulated by machine instructions, the page table or segment table entry for that page or segment can be marked as invalid, and attempts to reference it will be trapped by the hardware. The page fault trap service routine is then free to interpret the page table or segment table entry in any way it wants. If the page or segment is currently on secondary memory, the page fault service routine can move it to main memory and update the table as needed.
If the virtual address that caused the fault has been assigned to an object that is not a page or segment, for example, if it has been assigned to another address space or to a sealed object, all direct manipulation of that address by the CPU can simply be forbidden.
Part C: Describe the implementation of the Move Capability operation on typical modern hardware.
Typically, the operation would be performed by a procedure within the system kernel, with the control transfer to the kernel accomplished by something like a supervisor call trap, although in theory, the kernel could interpret any trap that the user can deliberately cause as a kernel operation.
In any case, a move capability operation must specify the C-lists (address spaces) to be used as the source and destination of the move operation, and it must specify the offset within each C-list of the capability to be moved. Either the source or destination may be the user's own C-list, and either may also be a C-list that is referenced by a capability in the user's C-list (anything less would be of doubtful utility).
The kernel procedure that interprets these parameters must determine if the user has the rights needed to fetch a capability (page-table-entry) from the source C-list and store it in the destination C-list. Having determined that the operation is legal, the actual operation would be done by copying a page-table entry from the source page table to the destination.
Background: Consider a multiuser database system where update operations frequently require that multiple records be locked prior to the update.
Part A: Given that all lock operatons must be completed before any record is read during a transaction, describe a simple deadlock avoidance algorithm that will prevent all deadlocks in this system.
The easy solution to this problem is to impose any total ordering over all records in the database system, and then require that the records be locked in the order suggested by this ordering. This trivially avoids deadlock.
Some students suggested that the banker's algorithm be used. As originally stated, this algorithm is inapplicable (it can only manage one resource, all units of which are interchangable). Furthermore, there is nothing analogous to a "credit limit" available in this application, so generalizations of the banker's algorithm are not only difficult but inapplicable.
Part B: Given that lock operations are performed one at a time, and that, after locking each record, that record may be read to determine the next record that must be locked:
- What deadlock model applies to this problem?
  The single-resource model is clearly applicable! No process blocks on more than one resource at a time.
- Outline a deadlock detection algorithm applicable to this problem.
  The process-resource graph will have no more than one outgoing edge per node, so cycle finding is trivial. Starting at a vertex that is suspected of being party to a deadlock, the graph edges can be traversed until either a vertex is found with no outgoing edge, in which case, there is no deadlock, or a vertex is visited twice, in which case, there is a cycle, which is to say, there is a deadlock.
  Failure to notice that this is a single-resource model led some students to propose the full complexity of graph traversal cycle-finding algorithms, and some even proposed that a knot finding algorithm was needed.
- When would it be best to initiate the execution of this algorithm?
  This algorithm can be initiated whenever a process blocks. If CPU time is scarce, it might be wise to initiate the check for deadlock only after waiting a while, but the algorithm is simple enough that it will likely be practical to run it with every lock claimed.
  Some students proposed that the algorithm be run continuously, an expensive proposition, or that it only be run when all processes are blocked or when the CPU utilization falls below some threshold. Both of the latter alternatives open up the possibility of a deadlock going undetected for an unbounded time.

Background: Consider the following process used as part of the implementation of FIFO queues in terms of semaphores and memoryless message exchange ports:

head = 0
tail = 0
repeat
    receive( channel, msg )
    if msg = "enqueue"
       receive( channel, msg )
       buffer[tail] = msg
       tail = (tail + 1) mod size
    else assert msg = "dequeue"
       send( channel, buffer[head] )
       head = (head + 1) mod size
    endif
forever

The Problem: Write pseudocode, at this level of detail, for the procedures the user would use to enqueue and dequeue buffers full of data on this queue.

mutex: semaphore, initially 1
data: semaphore, initially 0
space: semaphore, initially size

enqueue( item )
    P( space )
    P( mutex )
    send( channel, "enqueue" )
    send( channel, item )
    V( mutex )
    V( data )

dequeue( item )
    P( data )
    P( mutex )
    send( channel, "dequeue" )
    receive( channel, item )
    V( mutex )
    V( space )

It is essential that the user wait for space or data to be available in the server before attempting to claim exclusive use of the server or to communicate with it. Because the communication channels are unbuffered, beginning a dialogue with the server before assuring that that dialogue can be completed will result deadlocking the server.

Background: Consider a system with many concurrent processes and a moving-head disk drive. After the sector size was optimized, a new improved disk scheduler was written for this system using the circular elevator algorithm.
The Problem: Would you expect the use of the new improved disk scheduler to change the optimal sector size?
The use of a decent disk scheduler reduces the latency time per transfer, and the optimum sector size is determined by the ratio of latency to data-transfer time. Thus, with a smaller latency time, the new optimal sector size will be smaller than the old one. Will this have much of an impact? It would, if we were dealing with a "sharp" optimum, but we are not. This is clear from the experience of the computer industry, where we frequently see the same sector size being used over a wide range of disk technologies over decades of technological evolution. If the payback were large, we would expect to see the problem of optimizing the sector size attracting considerable attention, and it simply doesn't attract much!
Some users wrote at length about the impact of adding the disk scheduler, but the question didn't ask about this! Indeed, the disk scheduler could have a significant impact in the application under consideration, but the question asked about sector-size optimization.
Background: Consider the problem of implementing a file system to support very large files, some of which may span multiple disk devices.
The Question: Would it be better to first build a large virtual disk out of multiple disk devices and then build a disk scheduler on top of this, or would it be better to insert separate disk schedulers between the large virtual disk implementation and the individual disks? Why?
It is straightforward to build a scheduler for each disk drive that takes the characteristics of that drive into account. If the schedulers for each drive initiate seeks separately from initiating data transfers, seeks for each drive can be overlapped with seeks for the others, with no need to know how the drives are multiplexed into one large file system by the higher level software.
Writing one scheduler to sit on-top of "virtual device" constructed from an array of disk drives is far more complex. In this case, the scheduler would not be effective if it used something like the elevator algorithm; instead, it would have to be written with knowledge of the opportunities for parallelism in the underlying hardware, and it might have to manage a different scheduling algorithm for each underlying disk.
If all the underlying disks have identical structures, the possibility of overlapping seeks with I/O becomes the primary reason to favor the simple approach of using one scheduler per drive. In the unlikely event that the underlying disks are inhomogenous, the opportunity to use different scheduling patterns on each drive would be a dominant consideration.