Homework 10 Solutions

22C:116, Fall 1997

Background A user-level thread package for UNIX might begin with a call to the following routine:

	(void) thsetup(int n; (void) *p(); int s)
	{
		int i;
		thheapinit();
		thschedinit();
		thcreate(p, s);
		for (i = 1; i < n; i++) {
			if (fork() != 0) { /* child */
				thdispatch();
			}
		}
		thdispatch();
	}

This converts the parent process into n parallel processes all running thdispatch9(), a routine that takes ready threads off the thread ready list and runs them. Prior to this, this routine initializes the shared heap (the only memory area shared by the n parallel processes) and initializes the thread ready list before creating a single thread running the function p, with a stack of size s. New threads are created by the following routine:

	(void) thcreate((void) *p(); int s);

This creates and schedules one thread in the thread ready list. This thread will execute the code in function p, using a stack of size s bytes to execute that code.

The following additional thread scheduling and synchronization operations are in our elementary UNIX-based user-level thread package:

	(thsem *) thseminit();
		-- create and return a pointer to a thread-semaphore
		-- in the shared heap
	(void) thwait(thesem s);
		-- wait on a thread-semaphore
	(void) thsignal(thesem s);
		-- signal on a thread-semaphore
	(void) thexit(thesem s);
		-- exit on a thread-semaphore
	(void *) thmalloc(n);
		-- allocate n bytes of shared heap
	(void *) thfree(void * p);
		-- deallocate a block of shared heap

thmalloc() and thfree() must be used instead of the standard C malloc() and free() primitives or the standard C++ object creation for at least 2 reasons:
First, the standard heap manager operates on a heap that is private to each process and does not operate on a shared memory region common to the processes that are acting as compute servers for the thread package.
Second, the shared heap is shared by multiple processes, each possibly running on a different CPU. Therefore, mutual exclusion is required, and no mutual exclusion is included in the standard heap manager.
thheapinit() was called prior to thcreate() inside thsetup() because the default memory address space of a process (if the system is anything like the conventional UNIX or Windows world) does not include any shared memory. Therefore, it is necessary to create a shared heap before any allocation can be done from it! Thcreate does such allocation!
The initial thread created by thsetup could not simply be the original main program started by the user because all threads must have their stacks inside the shared heap -- this is necessary to allow a thread to be run by one process at one instant, and by another at some later instant. The main program's stack is not part of the shared heap! Therefore, the main program's stack must be abandoned after the initial set of threads are launched.
What scope rules apply to our system? To solve this, let's enumerate the classes of memory (and non-memory resources) a thread may access:

Globals (statically allocated at link time).
The fork operations replicate these. As a result, the semantics of assignment to globals within the thread package is quite ambiguous. On the other hand, assignments prior to launching the thread package can be read by the threads after launch, so globals are useful for parameters that are computed at launch time, and they are useful for constants.

Locals on the stack prior to the launch of the thread package.
These are inaccessable from the launched thread.

Locals on the stack of a running thread.
These are accessable and work normally.

In addition, note that there is no way to pass a parameter to a thread at the time it is launched. As a result, there is no way to establish communication between threads, except, perhaps, through files. This is pretty miserable!
- The type thfd (a thread file descriptor) can be implemented as a small integer, just as the small integers are used as file descriptors by UNIX. The fifference is that each thfd would be used as an index into a shared table of thread-level files. Each table entry would hold the name of the file (if it was open), the current file position, a binary thread semaphore, and the UNIX-type file descriptor used for that file in each thread support process.
- The routine thopen finds an entry in the thread file descriptor table which is fully closed (no valid UNIX file descriptors in any of the support processes, stores the file name in this table entry and sets the position to zero, then returns the entry number.
- The routine thclose (close a thread file descriptor) locks the file, then closes the file and marks its UNIX-style descriptor as zero in the current thread support process only. In addition, the file name is set to null to mark it as closed.
- The routine thread (read using a thread file descriptor) first locks the file, using the file's thread semaphore, this assures that no thread context switching will occur inside the thread read operation. The necessary file may not be open in the current thread support process, so if it is not, the UNIX-level open operation is done and the file descriptor is saved. Then, a seek is done to the current file position, the read is done, and the thread package's record of the current file position is updated. Finally, the lock is released. If, after locking the file, it is found that the file has been closed or never opened (the file name is null), the read routine returns an error signal.
This implementation omits one detail: After a thread closes a file, the file must be closed separately in each thread support package! To do this, we need some function in the thread system that will search the thread file table for files that are officially closed (at the thread level) but not yet closed in the current thread support process. When such a file is found, it can be moved one step closer to final closure by closing it, at the UNIX level, in that thread support process.
The best time to do this might be at the time a new file is opened. As the open service searches for an available thread file descriptor table entry, it can work on finishing closing any file that is still not all the way closed. If the open service does a thread relinquish each time it fails to find a completely closed file, it will eventually (at random) have a chance to run on every thread support processor, and this will give it a chance to finish closing some file, after some number of tries, thus allowing the open operation to succeede eventually.
Consider a system with 16 CPUs sharing access to a single main memory and running a UNIX-like operating system that supports threads at either the user or kernel level. Many observers of such systems would not consider load balancing to be a problem because the scheduling algorithms on a typical shared memory system allow any idle CPU to run any ready process, as a result, the load will trivially balance except in the case that there are not enough processes in the system to use all the CPUs.
On the other hand, this same natural approach to achieving a balanced load can be viewed as a very active and effective load balancing algorithm; every time a process changes state from ready to running, it migrates to a specific CPU, so this environment involves very frequent process migration!