Homework 2 Solutions

22C:116, Fall 1997

Write a program on any UNIX system that uses the fork(), exit() and wait primitives to construct the following program structure:

                 A
                _|_
          parent/ \child
               B   \
            __/     \
      parent/ \child \
           C   D      E
            \_/      /
              \     /  
               F   /
                \_/
                 |
                 G

Here is a solution that uses a formulation that can be applied to any UNIX fork-join problem:

	void main()
	{
		int child1,child2;
		write(1,"A",1);
		child1 = fork();
		if (child1 == 0) { /* child */
			write(1,"E",1);
			exit(0); /* child joins */
		} else { /* parent */
			write(1,"B",1);
			child2 = fork();
			if (child2 == 0) { /* child */
				write(1,"D",1);
				exit(0); /* child joins */
			} else { /* parent */
				write(1,"C",1);
				while (child2 != 0) { /* parent joins */
					int pid,stat;
					pid = wait( &stat );
					if (pid == child1) child1 = 0;
					if (pid == child2) child2 = 0;
				}
			}
			write(1,"F",1);
			while (child1 != 0) { /* parent joins */
				int pid,stat;
				pid = wait( &stat );
				if (pid == child1) child1 = 0;
			}
		}
		write(1,"G",1);
	}

Just for fun, here is a sample of the actual outputs from different runs of this program:

ABEDCFG
AEBCDFG
ABCEDFG

Propose a series of experiments you could conduct to determine the item size and the number of items in a Unix pipe?
Consider the following program fragment:
```
	{
		int fildes[2];
		pipe(fildes)
		write(fildes[1],buf,N);
		read(fildes[0],buf,N);
	}
```
If N in the above code is smaller than the grain size of the pipe, this code will deadlock (write will not put sufficient data in the pipe to fill one grain, so read will block). So, to find the granularity of the pipe, run the above code with successively larger values of N until it does not block.
For values of N larger than the bound on the bounded buffer, the above code will block because the first write cannot complete until some read is done. So, to find the size of the bounded buffer, increase N until the program blocks again.
Section 2.2.4 of Tannenbaum suggests using the primitives sleep and wakeup(p). Sleep causes the caller's process to be suspended (put in the waiting state), and wakeup(p) causes process p to become ready -- wakeup has no effect on a ready process, but it undoes the effect of sleep for a waiting process. Can you use sleep and wakeup to create a working mutual exclusion mechanism that is not based on busy waiting? If not, why not?
You cannot! What you would hope is that:
1. when a process determines that it must block,
2. it registers itself as needing awakening,
3. then uses the sleep service to block itself.
The problem is that there are three steps in the above list, and these must all be done as one atomic operation. If not, it is possible that a process may have decided to block but not registered itself at the time another process exits the critical section, or it may have registered itself but not gone to sleep at the time another process discovers its registration record and attempts to wake it up.
Priority schedulers generally assign a numerical priority to each process, and when a scheduling decision must be made, they select the highest priority ready process as the next process to be run. In terms of abstract data types, this involves replacing the FIFO ready queue with a non-FIFO priority queue. Refer to the material you have studied in data structures and algorithms, and identify an appropriate and efficient priority queue implementation.
The best alternatives here are O(log n) data structures based in one way or the other on binary trees. Consider using a heap, as in heapsort. The enqueue operation on a heap sorts an item into the queue in O(log n) time, while the dequeue operation finds the head in constant time and then promotes a replacement into the head position in O(log n) time.
For this implementation, discuss the problems you would encounter in a large scale shared memory multiprocessor where many CPU's (under the direction of the operating system) were competing to take processes from and add processes to the ready queue.
The problem here is that the operations on a heap take O(log n) time, not constant time. For the duration of each operation, the entire heap must be locked, and for large systems, the delays caused by this will eventually interfere with parallelism because the frequency of scheduling operations for each process is not expected to decline as the system grows, so on a parallel system, we would expect O(n) scheduling operations per second. Thus, the fraction of the time the queue is locked will grow as O(n log n). For some size system, this will be 1, and addition of processors above this point will give no performance improvement.