Assignment 10 Solutions

Part of the homework for 22C:112, Spring 2011
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Background Consider the problem of writing a thread manager on a Unix system that supports only the heavyweight primitives fork(), exit(), and wait(), plus the shmat() and shmdt() primitives for attaching and detatching shared memory segments. Your system supports multicore CPUs by running processes on separate cores of the CPU whenever possible.
Also, suppose you have a lightweight thread manager such as the user-level thread package that was the subject of last week's homework. This thread manager multiplexes a single user process to support multiple threads. As such, it cannot exploit the resources of a multicore CPU. This thread manager creates thread data structures on a heap, maintains a ready list, and provides a relinquish() operation by which a thread may allow another thread to run.
Your goal is to design (but not write the code for) a lightweight thread manager. This thread manager should allow lightweight threads to run on different cores of the CPU when possible.
One way (not very smart, but easy to debug) to write the thread manager would be for it to start by launching one idle thread per core of the CPU and then fork multiple times so that there is one process per core. Then, and only then, would it begin launching and running user threads. All of the thread manager processes would run concurrently until the thread manager terminates, usually when the program itself terminates. The disadvantage of this is obvious: It uses all of the cores as much of the time as the operating system permits whether or not the user needs that many threads.
a) Suggest an alternative rule for when the thread manager should fork another process to run threads. (0.5 points)

Base the decision on data collected by the idle threads. Or, base it on the average length of the ready list. First, let's be vague about the colleciton mechanism and simply say, if all thread manager processes are usually busy (not running the idle thread) or there is is usually at least one ready thread, increase the number of processes running threads -- but never use more processes than there are processors.
Now, to the details: You should not increase and decrease the number of processes too frequently, since process creation and destruction are expensive. Therefore, always base the decision on some kind of running average of the load, not the current instantaneous load.
For example, every time a thread is taken from the ready list, the length of the list could be sampled to compute the average length. After every thousandth updat (or some other interval) the average could be used to decide whether to create or destroy one of the server processes.

b) As a companion to part a, suggest an alternative rule for when one process of the thread manager should terminate. (0.5 points)

If there is usually an idle thread-manager process -- that is, there is usually a thread-manager process that is running an idle thread, then there are too many thread manager processes. Again, this decision should not be made too frequently.

c) Consider the problem of supporting semaphores in this multicore thread manager. The semaphore implementation in the user-level thread package discussed in last week's homework would suffice on a uniprocessor. Why would this fail on a multithreaded processor? (0.5 points)

Because that thread manager assumed only one CPU, it did not need to guard its critical sections involving things such as enqueueing and dequeueing in the ready list or incrementing or decrementing counters.
Background: Consider the following approaches to mutual exclusion:
- Disable interrupts, then critical seciton, then enable interrupts.
- Claim a spin-lock, then critical seciton, then release a spin lock.
- Wait on a semaphore, then critical seciton, then signal that semaphore.
- Use flock(fd,LOCK_EX), then critical seciton, then flock(fd,LOCK_UN).
- Create a lock file, then critical seciton, then delete the lock file.
a) For each of these, clearly identify the setting in which it is the most appropriate alternative. (0.5 points)
- Disable interrupts, then critical seciton, then enable interrupts.
  -- works on a uniprocessor, appropriate for short critical sections such as those involved in communication with interrupt service routines.
- Claim a spin-lock, then critical seciton, then release a spin lock.
  -- works on a multiprocessor or multicore machine, appropriate for short critical sections in both application and system code.
- Wait on a semaphore, then critical seciton, then signal that semaphore.
  -- requires low level primitives so it is appropriate primarily for long critical sections or critical secitons that have high contention.
- Use flock(fd,LOCK_EX), then critical seciton, then flock(fd,LOCK_UN).
  -- works in any environment that includes these primitives, but because of the use of a file, it is most appropriate for locking access to that file, for example, when the file holds a shared database.
- Create a lock file, then critical seciton, then delete the lock file.
  -- works in any environment that includes the necessary primitives, but because of the overhead of file creation and destruction, this is the highest overhead of the primitives discussed here, and therefore it is only appropriate for very long critical secitons.
b) For each of these, clearly identify the major disadvantage or limitaiton of the approach that makes it inappropriate in many settings. (0.5 points)
- Disable interrupts, then critical seciton, then enable interrupts.
  -- User programs cannot be allowed to disable interrupts, since if they could, they could effectively shut out all other programs. Therefore, this model of critical section can only be used within the operating system. Also, if the critical seciton being served is close to the shortest required response time to an interrupt request, this type of critical section can cause the system to miss the real-time constraints on interrupt service. Therefore, this type of locking should only be used for very brief critical sections.
- Claim a spin-lock, then critical seciton, then release a spin lock.
  -- Claiming a spin lock monopolizes the resources of the CPU, so it is only appropriate for low-contention locks (where the likelihood of having to wait at all is low) with brief critical sections (where, if there is a need to wait, it is not for long).
- Wait on a semaphore, then critical seciton, then signal that semaphore.
  -- Semaphore implementation usually requires use of some kind of lower level locking prmitives (such as the above), so semaphores are necessarily high level operations. Testing a semaphore can take several tens of instructions, so it is not appropriate for brief critical sections such as those involving incrementing a shared variable.
- Use flock(fd,LOCK_EX), then critical seciton, then flock(fd,LOCK_UN).
  -- As with semaphores, these rest on lower level primitives, so they can only be used in high-level code. These primitives are more expensive than semaphores because they involve the complexity of the open file descriptor mechanism. Therefore, they should only be used for long critical sections.
- Create a lock file, then critical seciton, then delete the lock file.
  -- Creating and deleting files is even more expensive, and it involves underlying synchronization. Therefore, lock files cannot possibly be used for internal synchronization at low levels in an operating system.
c) Each of the above approaches rests on mechanisms. Some of the mechanisms used in some of these approaches require mutual exclusion to implement them. Others are stand-alone mechanisms. For each such mechanism mentioned in the above list, which approach (or approaches) will likely be used to implement that mechanism. (0.5 points)
- Disable interrupts, then critical seciton, then enable interrupts.
  -- This rests directly on hardware to disable and re-enable interrupts.
- Claim a spin-lock, then critical seciton, then release a spin lock.
  -- This rests directly on hardware to implement the load and and store (or test and set, or exchange memory and register) instructions.
- Wait on a semaphore, then critical seciton, then signal that semaphore.
  -- Implementations of semaphores typically rest on disabled interrupts plus, on multiprocessors, spin locks.
- Use flock(fd,LOCK_EX), then critical seciton, then flock(fd,LOCK_UN).
  -- File locking probably rests on semaphores or some similar lower level primitives within the operating system.
- Create a lock file, then critical seciton, then delete the lock file.
  -- File creation and opening probably rest on semaphores or some similar lower level primitives within the operating system.