Homework 11 Solutions
22C:116, Spring 1997
Ioana M. Lungeanu
PROBLEM 1 PART A
The performance of a load balancing algorithm is mainly
given by two factors: how well it balances the load and how
much traffic it generates.
While Taccept-Toffer is decreasing towards zero, a
machine tends to make offers and accept load at the same
time (when it reaches a load equal to Taccept and practically
to Toffer). After both machines reach this load, they will
bounce back and fourth small processes. The load will be
balanced between the two machines, but the bandwidth used is
very high. Besides the machines will spend more time
transferring processes between one another, then doing useful
work.
At the other extreme, when Taccept-Toffer approaches one,
hardly any offers, nor any accepts take place. Load balancing
is practically not performed. So poor balancing and low
bandwidth utilization occurs.
A poor performance is obtained at both extremes. A good
performance is obtained for Taccept-Toffer around 0.5. The
center of the interval should be chosen in accordance with
the average process lifetime.
PROBLEM 1 PART B
As the diameter of the network grows, the ability of this
algorithm to balance the load decreases. To see this consider
the following situation: if most of the load is generated at
a machine at one end of the diameter (of length D), then the
neighboring machines will soon be overloaded, while the ones
at the far end of the network may by close to idle. A process
has to be sent over D hops to reach the other end. With the
increase of the diameter of the network the load can be poor
in some situations, and the traffic becomes heavier.
When offers are made every 1/10 of the average process
lifetime, a process can be moved 10 times, on the average. If
unloaded machines are further away than 10 hops, they will not
have a chance to take over the extra load. So for diameters
larger than 10, this algorithm cannot hope to achieven any
global load balance!
PROBLEM 1 PART C
The given algorithm is an example of a distributed
heuristic algorithm, but it is not the same heuristic as the
one given by Tannenbaum.
The main difference is that Tannenbaum's example
distributed heuristic algorithm is sender initiated (the
overloaded machine tries to find an unloaded one), while the
given algorithm is receiver initiated (the underloaded machine
advertises its availability).
Other differences are: the time in a process life it can
be moved on another machine, the number of thresholds used,
the decision of what process to move.
PROBLEM 2 PART A
Pseudo-codes for file management services on the user's
machine:
struct openFiles {
int * inode;
int pos;
}
int open(char * filename) {
i = FindFreeEntry(); /* in the table of open files */
openFiles[i].inode = DirServerRPC(filename);
/* call to the directory server, which in turn will
call the file server until it has obtained
the file number (inode) */
openFiles[i].pos = 0; /* initialize the current position */
/* some other information can be stored here,
like access rights */
return i; /* the file handle */
}
int read(int fileHandle, char * buf, int len) {
c = FileServerReadRPC(openFiles[fileHandle].inode,
openFiles[fileHandle].pos, /* start */
len, buf);
/* the number of bytes actually read are returned
by the file server's read procedure */
openFiles[fileHandle] += c; /* the current position in the
file is updated */
return c;
}
int write(int fileHandle, char * buf, int len) {
c = FileServerWriteRPC(openFiles[fileHandle].inode,
openFiles[fileHandle].pos, /* start */
len, buf); /* appends */
/* the number of bytes actually written are returned
by the file server's write procedure */
openFiles[fileHandle] += c; /* the current position in the
file is updated */
return c;
}
void close(int fileHandle) {
free(openFiles[fileHandle]);
} /* it just frees the corresponding entry in table */
PROBLEM 2 PART B
The directory server is given a file name that looks like
this: /dir1/dir2/ ... /filename. Assuming the directory server
knows the file holding the root directory, it recursively calls
the file server with each directory name, until it finds the
inode for the actual file name.
struct {
char name[14];
short nr;
} fileEntry;
int DirServerRPC(char * filename) {
if(filename[0] == '/') {
currentDirNr = rootDirNr;
filename = afterSlash(filename);
} else {
currentDirNr = workDirNr;
}
while(strstr(filename, "/")) {
subDir = beforeSlash(filename);
pos = 0;
while(1) {
if(sizeof(fileEntry)
!= FileServerReadRPC(currentDirNr, pos,
fileEntry, sizeof(fileEntry)))
return -1;
pos += sizeof(fileEntry);
if(strcmp(fileEntry.name, subDir)) {
currentDirNr = fileEntry.nr;
break;
}
}
filename = afterSlash(filename);
}
pos = 0;
while(1) {
if(sizeof(fileEntry)
!= FileServerReadRPC(currentDirNr, pos,
fileEntry, sizeof(fileEntry)))
return -1;
pos += sizeof(fileEntry);
if(strcmp(fileEntry.name, filename))
return fileEntry.nr;
}
}
PROBLEM 2 PART C
Adding state information in the server can marginally
lower the communication overhead between client and server
and marginally simplify the client code because clients
would no longer have to track file positions. This benefit
is small, and the cost -- adding state information to the
server and using a connection-based client-server protocol
instead of the connectionless protocol possible with the
original design, does not justify such a change!
The big benefit of adding state information to the server
lies in the server's ability to track client activity, for
example, by supporting agressive read-ahead on files tha
clients are actively reading.
A server that maintains a local cache of recently
referenced blocks can attain most of the benefits discussed
above, while still using a stateless client-server protocol.
LRU replacement combined with read-ahead into the cache
will do this.
PROBLEM 2 PART D
While client caching is employed users will observe the
changes made on a file like in session semantics: changes
are visible to other users only after the file was closed by
the process who modified it.
When server side caching is done changes are viewed
according to Unix semantics: a read performed after two
consecutive writes will return the value set by the last
write.
Users can observe a better response time while client
caching is used since network transfers can be reduced (not
only disk transfers).
Therefore, client-side caching is likely to be preferred
for read-only files, while server-side caching is probably
preferred for read-write files, particularly those
representing such things as shared databases.