CS:3620, Homework 8 Solutions, Spring 2018

On all assignments, your name must be legible as it appears on your University ID card! Assignments are due at the start of class on the day indicated (usually Friday). Exceptions will be by advance arrangement unless there is what lawyers call "an act of God" (something outside your control). Homework must be turned in on paper, either in class or in the teaching assistant's mailbox. Never push late work under someone's door!

Background: If you solve MP4, mush still lacks some useful features. Among them, it has no support for while loops. Consider this syntax for while loops in mush:

while command
   echo this is in the loop body
endloop

This requires two new built-in commands, while and endloop. The while command interprets its arguments as a shell command and launches that command. If the launched command exits with EXIT_SUCCESS the shell continues executing successive commands of the loop body. If the launched command exits with EXIT_FAILURE, the shell skips over the loop body and the endloop before resuming normal execution. If the shell encounters endloop during normal execution, it skips back to the previous while command.

In many cases, the launched command would be the test command. If you're curious, you can look at its man page.

Note: This assignment does not ask you to actually implement anything, it only asks you to think about how to implement it.

a) Explain how ftell() and fseek() could play a role in the implementation of while and endloop. Ignore the possibility of nested loops in this part of the problem. (0.5 points)

Before reading each command, use ftell() to note the place in the input file, just in case that command turns out to be a while command. If the command turns out to be while (a built-in shell command), and if the loop condition is true then save the result returned by ftell as the "top of loop" marker.

When the endloop built in command is encountered, use fseek() to re-position the input file to the "top of loop" position.

Although it was not part of the question, it is worth noting that if the while condition turns out to be false, the while command must read from the input file until it finds and skips over an endloop command. In this case, there would be no use of the "top of loop" variable or fseek().

b) How would your answer to part a change if you allowed nested loops. a user can observe. (0.5 points)

The "top of loop" variable needs to be replaced with a stack; while would push a new "top of loop" value, and endloop would pop the value.

Background: Assume your computer offers the standard Unix gettimeofday() kernel call.

a) How could you use this system call to instrument part of your program, for example, the assignment statement i=j+1 to see whether it causes a page fault? (0.4 points)

Call gettimeofday() immediately before and after the statement and compare the times. If the time difference is zero or very small, no page fault occurred. If the time difference is big, there was probably a page fault.

The definitions of big and small may need a few experiments. The question does not ask about this, but the resolution of one microsecond means that that, on a modern machine, on the order of 1000 instructions can be executed without incrementing the time, but we don't know the overhead of a kernel call. It may well take hundreds of instructions just to make a kernel call. Experiments can show the range of time delays that you would normally get from the measurements, and from this, you can determine where to draw the line that indicates that one or more page faults occurred.

b) Can the instrumentation you proposed in part a) distinguish between page faults caused by instruction fetches and page faults caused by data references? Explain. (0.3 points)

No. All we have done is measure the time, so any page fault will contribute comparably to the delay.

It is worth noting (but not part of the answer) that some page faults will be served faster than others because some faults involve writing data back to disk in order to clear a page holding read-write data out of a page frame, while other faults are faster because the page frame selected contains read-only code or data and so does not need to be written back to disk. These two types of faults take different amounts of time, but the need to write data back to disk is determined by properties of a page that was not referenced by this page fault, so a fault on a read-only or code page could cause some other data page to be written back to disk. As a result, you can't use the time taken by a page fault to distinguish between faults caused by code or data references.

c) To what extent does the instrumentation you introduced to solve part a create the possibility of additional page faults? (0.3 points)

The code to call gettimeofday() takes up space, and so, where the statement you are instrumenting might have all fit on one page, adding the two calls might push it to span a page boundary, increasing the chance of a page fault.

When you call gettimeofday(), you save the result in a variable somewhere. References to this variable could cause a page fault that wouldn't have occurred if that variable wasn't being used.

Background: Performance of demand-paged virtual memory can be improved by having the MMU note whenever the CPU writes to a frame by setting the dirty bit on that frame. If a frame is "clean", that is, never written since the last time a page was copied into that frame from disk, there is no need to write that page back to disk when the time comes to replace that page. Only "dirty" pages need to be written back to disk.

The normal clock replacement algorithm has a mark bit used to record whether a frame has been touched by the CPU since the last time the clock-hand swept by. The easiest variant on the clock algorithm that uses the dirty bit operates just like the original clock algorithm to find a frame to replace, and then it inspects the dirty bit on the frame to see if it is necessary to copy that page back to disk before replacing it. Frames holding pages freshly brought in from disk are always marked as clean.

An alternative (and much more interesting) scheme is to have the following rule: When the clock hand passes a dirty frame, it schedules it for cleaning. That is, it schedules a write request for that frame in the disk queue. Only when the write is completed does the page in that frame become a candidate for replacement. The search for a frame to replace only terminates when a clean unmarked page is found. Here are the states of a page frame in this scheme:

clean-unmarked
clean-marked
dirty-marked
write-scheduled
write-in-progress

A Question: Identify all of the state transitions that can occur in this system, and for each state transition, indicate what causes it, that is, what activity of the CPU, the page-fault service routine, or the disk-interrupt service routine causes the markings on the frame to change. Do not forget the possibility that the CPU might touch a frame while that frame is in the disk queue or while the disk hardware is in mid write. (1.0 points)

clean-unmarked to clean-marked when a variable or instruction in the page in that frame is read by the CPU.
clean-unmarked to dirty-marked when a variable in the page in that frame is written by the CPU.
clean-marked to dirty-marked when a variable in the page in that frame is written by the CPU.
clean-marked to clean-unmarked when the clock hand sweeps by the frame in the page-fault service routine, replacing the contents of that frame with a different page.
dirty-marked to write-scheduled when the clock hand sweeps by the frame in the page-fault service routine, putting the page in that frame in the disk I/O request queue.
write-scheduled to write-in-progress when the disk interrupt service routine takes that request off the queue and starts the disk write operation.
write-in-progress to clean-unmarked when the disk interrupt service routine reports completion of the disk operation.
write-scheduled to dirty-marked when a variable or instruction in the page is written by the CPU.

By way of explanation, the write-scheduled and write-in-progress states are both conceptually sub-states of dirty-marked, so additional writes to the page during while the page is in one of these substates do not cause difficulty. Only when the page in a fram is successfully written without further modification during the write operation is the page reported as being clean and available for page replacement.

Assignment 8, due Mar. 23

Solutions