Exam 1: Midterm

Solutions and Commentary

Part of the homework for 22C:112, Spring 2012
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Grade Distributions

Exam

Mean   = 6.55
Median = 6.6                             X
                       X       X   X     X
  ___________X___X_X___X___X_X_X___X_____X_______X__________________
    0 . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10. 11. 12. 13. 14. 15

Machine Problems 1 to 3

Mean   =  9.40
Median = 10.5                                  X       X
                           X         X         X       X
  _______________X_________X_______X_X___X_____X_____X_X___X________
    0 . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10. 11. 12. 13. 14. 15

Homeworks 1 to 7

Mean   = 11.05
Median = 12.3
                                         X               X     X
  _____________________X_________X_X_X___X_X_________X_X_X_X___X___X____
    0 . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10. 11. 12. 13. 14. 15. 16

Overall Total Scores

Mean   = 27.00
Median = 26.3
                           X X                   X   X
  ___________X_X_____X_____X_X_______X___________X___X_X_X_X____________
    10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 32. 34. 36. 38. 40. 42.
           D             C             B             A

Exam Solutions and Commentary

This exam is worth 15 points. Each part of each question is worth one point.

Background: Consider this code from the posted solution to MP2:
```
 1  void launch(){
 2      if (fork() == 0) { /*child*/
 3          int i = 0; /* patha index */
 4          execve( argv[0], argv, myenvp ); /* first, try it as itself */
 5
 6          while (patha[i] != NULL) { /* try successive path prefixes */
 7              char filename[100];
 8              strncpy( filename, patha[i], 99 );
 9              filename[99] = '\000'; /* insurance */
10              strncat( filename, "/", 99 - strlen( filename ) );
11              strncat( filename, argv[0], 99 - strlen( filename ) );
12              execve( filename, argv, myenvp ); /* try one prefix */
13              i = i + 1;
14          }
15
16          printf("no such command\n");
17          exit( EXIT_FAILURE ); /* a minor bug fix */
18      } else { /*parent*/
19          wait( NULL );
20      }
21  }
```
a) Why does this need the strncpy() call on line 8? Why can't it directly concatenate to patha[i] instead of working with a copy?

The path component patha[i] is a reference to a string stored in an array of characters of an unknown size. We do not know that there is space at the end of this array to concatenate the suffix needed here.

b) If the first execve() on line 4 was omitted, leaving only the one on line 12, what change in behavior would shell users notice?

Users could not type absolute file names such as /bin/ls, but could only type file names relative to path components.

c) A more subtle question: On the original version of the Mush shell, line 17 above was omitted, yet the shell seemed to work normally. Suppose a user of that version repeatedly typed incorrect file names. What would be the consequences, from the point of view of system resource consumption?

Each time the user mistypes a command name, the child process would not exit, signalling the parent that it can wake up and continue. Instead, the parent would remain waiting and the child process would continue processing the next command in sequence. The user might not notice, but each time a bad command was typed, the system would create anther process, gradually exhausting the system's resources.
Background: Programs running on Unix-like systems typically have three segments: a read-execute only code segment, a read-write static segment and a read-write stack segment. Consider the following program fragment:
```
  1 #include <stdio.h>
  2 char cha = " ";  /* a global variable */
  3
  4 int main() {
  5     char chb = " ";  /* a local variable */
```
The identifiers cha and chb name a global and a local variable. The C code initializing these two variables is identical, yet these declarations are treatede very differently by the compiler and loader, and the machine code used to create them is very different at run-time.
a) How would the compiler typically handle the declaration on line 2? Where would the initial value of cha be stored in the object code? When would the corresponding variable be initialized in RAM, and what code would do this initialization?

The variable cha is global; as such, it is statically allocated in the static segment, which is initialized by the loader when the program is loaded in RAM before the program is executed.
Note that there is an error above. Either the initial values should be ' ' (single quoted character constants) or the data type should be char *. This error has no impact on the correct answer.

b) How would the compiler typically handle the declaration on line 5? Where would the initial value of chb be stored in the object code? When would the corresponding variable be initialized in RAM, and what code would do this initialization?

The variable chb is local; as such, it is dynamically allocated in the activation record on the stack, so it must be initialized by code in the function main(), when that function is called.
Background: Consider this C code fragment:
```
  1  char * a;
  2  char * b;
  3  a = "this is a test";
  4
  5  b = malloc( strlen( "this is a test" ) + 1 );
  6  strcpy( b, "this is a test" );
```
A Question: What is the semantic difference between lines 3 and 6? Curiously, this question is intimately related to the previous question.

Line 3 sets the character pointer a to point to a string constant. Line 6 sets the character pointer b to point to a dynamically allocated string variable that is initialized by what could well be the same string constant.
Background: Consider the path between an application and a disk drive. For this problem, assume that there is no file system, but rather, that the application directly uses the entire drive as a single random-access device.
- The program requests I/O by specifying a disk address, a buffer address, and a transfer direction. The program blocks until the I/O is completed, so from the application viewpoint, there is never more than one pending disk I/O request.
- The disk hardware requests an interrupt (if enabled) whenever the disk completes the previous I/O transfer.
Given only the information above, there would never be more than one pending disk request. To improve performance, the I/O driver is maintains a cache. When the user writes a sector to disk, the driver takes a copy, schedules the I/O transfer from that copy, and immediately returns to the user. On input, if the driver finds a pending output for the requested disk address, the driver copies from the buffer scheduled for output instead of waiting for input from disk.
a) How many input requests and how many output requests could be in the disk request queue at any one time with the above cache scheme?

There can only be one pending read request, but because of the way the driver manages the cache, any number of write requests may be pending.

b) How much of the cache management code for this scheme is done on the user side of the I/O driver, and how much is done on the interrupt-service-routine side?

The entire cache management function is on the user side of the driver.
Background: The C and C++ stream model, used with variables of type FILE * (pointer to file). The type FILE is typically declared something like the following (deliberately oversimplified):
```
struct FILE {
    int fd;     /* file descriptor to use for this file */
    int size;   /* buffer size */
    char * buf; /* pointer to the buffer */
    int pos;    /* position in buffer */
};
```
When fopen() is used to open a stream, it uses open() to open the the file on the underlying operating system, stores the returned file descriptor in fd, queries the newly opened file to find the best buffer size for I/O to that file, allocates a buffer and initializes the buffer position appropriately. In the context of the above, fgetc() might be implemented as follows:
```
char fgetc( FILE * s ) {
    char ch;
    if (s->pos >= s->size) {
        read( s->fd, s->buf, s->size );
        s->pos = 0;
    }
    ch = s->buf[ s->pos ]
    s->pos = s->pos + 1;
    return ch;
}
```
a) How should the position within buffer be initialized on a stream file used for input only?
The code only reads from the device when s->pos>=s->size. Since you need to read into the buffer from the device as soon as the user first calls fgetc(), this implies that the initial value of s->pos should be greater than or equal to s->size.
Many students didn't figure out that the whole point of this problem was blocking and deblocking of stream input/output. They assumed that s->size was the file size, and s->pos was the position in the file, as opposed to the position in the buffer. The net result of this misunderstanding was to conclude that the initial value was zero.
Note that the problem statement deliberately avoided saying that fopen() did any I/O. It only said that the optimal buffer size is selected -- not that the buffer is actually read or written.

b) If a file is used for output only, how should the position within the buffer be initialized?

In this case, the buffer is initially empty and is being filled by the user, so s->pos should be initially zero.
Most got this.

c) Actual stream I/O in C and C++ permits mixed reading and writing in a stream, so a stream may be read up to a point, and then written from there on. When you open a stream, you do not necessarily know whether the first operation on that stream will be a read or a write. How would you modify the stream data structure to allow for this?

You need to keep track of the current use of the buffer. This requires a new field in the FILE structure; if there has been no I/O yet on the file, this would be UNKNOWN, and it would be set to INPUT or OUTPUT depending on the most recent use of the stream.
Most students got this.
The question did not ask what to do with this field, but it is fairly clear that, on either fput() or fget(), the first thing you do is check the current buffer use. If the same as the current intended use, no special action is required. If unused, you initialize the pointer appropriately for the current transfer direction. If different from the intended use, you must "turn around the stream", for example, writing out a partially filled buffer before setting up the stream for input. The code to do this is fun.

d) When a C or C++ stream s is opened to a disk device or disk file, the fseek(s,p,SEEK_SET) operation may be used to position the stream so that the next byte read or written will be the pth byte from the start of the stream. How would you modify the stream data structure to support this?

Fairly obviously, we need to track the file position in the random-access file, this is a separate problem from tracking the position in the buffer being used for blocking or deblocking.
Students who didn't catch on that the entire problem was about blocking and deblocking were at a serious disadvantage here.
Background: For a bounded buffer implementation of a FIFO queue, the bound on the queue size is fixed. This is not always appropriate. Whether the queue is for input or for output, both producers and consumers of data my operate at a steady rate, or they may process data in bursts, varying between a high rate and a low rate. Of course, the average rate at which data is enqueued must equal average rate dequeues, since the queue length must be non-negative and finite.
a) Give a general characterization of the circumstances under which a small queue is sufficient, and contrast this with the curcumstances under which a larger queue leads to better performance.

A small queue is sufficient when there is little variation in the rate of data production and consumption. A large queue improves performance if there is significant variation, which is to say, when the producer or the consumer operates in a bursty manner.
The problem statement deliberately hinted at this answer, but a remarkable number of students didn't take the hint. The problem statement also said something obvious, that the long term average rates must be equal. Despite this, many students said that short queues work best for low data rates and big queues for high data rates.

b) The term "better performance" was used above in a rather careless way. Exactly what will be reduced by increasing the bound on the queue size in a system that benefits by a larger queue. Note: There may be more than one thing!

The amount of time that the system spends blocked, waiting on a full or empty queue will be reduced. And, the amount of time that the I/O device spends idle will be reduced (on output) and the likelihood that data will be lost on input will be reduced.

c) Consider the possibility of adaptive optimization of the queue size. Every time the consumer blocks because of an empty queue, how should the queue size be adjusted? Every time the producer blocks because of a full queue, how should the queue size be adjusted? Is there some third circumstance where the queue size should be adjusted? If so, how and when?

If the queue fills, increase the queue size. If the queue empties, increase the queue size. Periodically decrease the queue size.
A majority of students suggested increasing the queue size if it filled, but many thought the size should be decreased if it empties. In fact, an empty queue means that you didn't let enough data accumulate during the last burst of enqueues -- if your queue had been bigger, enough data might have accumulated to permit the next burst of dequeues to run without emptying the queue.
The tricky part is refining the solution so that the queue doesn't grow to infinity. To do this, you have to limit the rate of queue growth. For example, don't increase the queue size as soon as one of the growth conditions is met, but rather, wait for N requests for increased queue size before you respond, where N requests, on average, accumulate in about the same time as the interval between the periodic decrements to the queue size.