CS:2820 Notes, Lecture 15

In our program, we have output error messages to System.err and normal data output to System.out

By default, when running under the Unix/Linux shell (and under the DOS command line under Windows), output to System.err is mixed in with output to System.out, but they can be separated. Here is a Unix/Linux example to illustrate this:

[HawkID@serv15 ~/project]$ java RoadNetwork roads > t
Intersection A redefined.
[HawkID@serv15 ~/project]$ cat t
Intersection A
Intersection B
Intersection A
Road A B 10
Road B A 20
[HawkID@serv15 ~/project]$ rm t

The added > t at the end of the command running our program diverts System.out (or rather, the Linux/Unix standard output stream) to the file named t. So, when our program runs, the only thing we see on the screen is the error message. Then, we use the command cat t to dump the file t to the screen. We could just as easily have used any text editor to examine the file, and finally, although nothing required us to do so, we deleted the file with the rm command.

Under some Unix/Linux command shells, it is almost as easy to divert standard error (System.err) to a file, but this was an afterthought, so the way you do so differs from one shell to another. Initially, the designers of the Unix shell assumed that users always wanted to see the error messages immediately, while they might want to save other output. As a result, shell tools for redirecting standard error are afterthoughts and differ from one shell to the next.

The two most common families of Unix/Linux shells are sh (the Bourne shell) and its open-source replacement bash (the Bourne-again shell), on the one hand, and csh (the C shell) and its open-source replacement tcsh (the TENEX-inspired rewrite of csh). To find out what shell you are using, type echo $SHELL. This will output the file name from which your current shell is being executed.

In sh and bash, typing >f after a shell command redirects standard output to a file named f while leaving standard error directed to the terminal. In contrast, typing 2>f redirects standard error and leaves standard output unchanged. This strange use of the numeral 2 is based on the fact that, in Unix and Linux, all open files are numbered, and by default, file 0 is standard input, file 1 is standard output, and file 2 is standard error. This is a really odd design, but it works. If you want to redirect both standard output and standard error to different files, you can write >f 1>g.

In csh and tcsh, typing >f after a shell command works as it did in sh. In csh typing >&f after a shell command redirects both standard output and standard error to the same file. If you want to split the two into different files, you can use >f >&g. This works because the first redirection took only standard output, so all that is left for the second redirection is standard error. In effect, the >& really means to take standard output, standard error or both, whichever has not already been redirected.

In both csh and tcsh, typing >f after a shell command will overwrite the contents of file f if that file already exists. In contrast, typing >>f after the command will append that command's output to the existing file.

A cute trick with the shell

If you want to write a shell script that does one thing if a program terminates normally and another thing if it doesn't, you can write shell scripts with if statements in them:

#!/bin/bash
# shell script to run the road network with input from testfile
java RoadNetwork testfile
if [ $? -eq 0 ]
then
    echo "--exit success"
else
    echo "--exit failure"
fi

The notation for boolean expressions on the if statement is awful, and to understand the above, you need to know that $? is the parameter passed to System.exit() inside the RoadNetwork program.

You can even make the script check the output of your program against the expected output. For example, suppose you have a file called testfile and you know the program ought to produce output identical to what you stored in expectedoutput, and it ought to exit with a success indication (exit code 0):

#!/bin/bash
# shell script to run the road network with input from testfile
java RoadNetwork testfile > output
if [ $? -eq 0 ] ; then
    echo "--Exit success"
    diff output expectedoutput
    if [ $? -eq 0 ]
        echo "--Output was as expected"
    else
        echo "--Output differences noted above"
    fi
else
    echo "--Exit failure with error messages noted above"
fi

The script's first mention of "noted above" is because the diff shell command outputs all lines where output differs from expectedoutput.

The script's second mention of "noted above" is because the error messages output by RoadNetwork were sent to System.err and therefore went straight to the screen instead of being mingled with normal text that was sent to System.out and captured in the file output.

Path Testing the Input Parser

In the previous discussion of testing, we suggested that one rational way to design tests for a program is to attempt to exercise each distinct error message or regular output message in the code. This is a limited case of path testing. In complete form, path testing involves trying to get each instruction in the program to execute at least once. For the road network example, consider these input files without worrying about scripts and automation:

intersection A

If we make the program produce test output simply by listing all the intersections and then all the roads, we can make the output appear identical to the input, or at least, very similar to it, making it easy to see that the program built a data structure correctly. If we pass this initial test, we can build on it, adding more intersections and roads, working up to something like this:

intersection A
intersection B
road A B 10
road B A 20

This is not particularly interesting unless we uncover some bugs. The next step is to start making some errors. Consider this input file:

intersection A
intersection B
intersection A
road A B 10
road B A 20

Here, we've deliberately inserted a duplicate intersection definition. When we run the program over this input (stored in the file roads, we get this output:

[HawkID@serv15 ~/project]$ java RoadNetwork roads
Intersection A redefined.
Intersection A
Intersection B
Intersection A
Road A B 10
Road B A 20

This is correct, in as far as it goes, but the output is not very readable. The problem is, the error message is not cleanly distinguished from the output. Our current version of the errors package is at fault, with code something like this:

class Error {
    static void warn( String msg ) {
        System.err.println( msg );
    }
    static void fatal( String msg ) {
        warn( msg );
        System.exit( 1 );
    }
}

What we need is simple, a standard prefix on each error message that distinguishes it from the normal output of the program. Consider this:

class Error {
    static void warn( String msg ) {
        System.err.println( "Error: " + msg );
    }
    static void fatal( String msg ) {
        System.err.print( "Fatal " );
        warning( msg )
        System.exit( 1 );
    }
}

Another obvious error to explore occurs when a road is defined in terms of an undefined intersection. Consider this input file:

intersection A
intersection B
intersection A
road A B 10
road B A 20
road A C 2000

When we run this, we get the expected error messages, but when it tries to output Road C we get a null pointer exception.

What is the problem? There are some bug notices in our code that are closely related to this. Specifically, in the initializer for Road, when we output the warning about an undefined intersection, we wrote this:

        if (destination == null) {
            Errors.warning(
                "In road " + sourceName + " " + dstName +
                ", Intersection " + dstName + " undefined"
            );
            // Bug:  Should we prevent creation of this object?
        }

We did not prevent creation of the object when the declaration of that object contained an undefined destination interseciton name. Instead, we left the object with a null destination field. This caused no problem until later when we tried to output the road description using the toString() method:

    public String toString() {
        return (
            "Road " +
            source.name + " " +
            destination.name + " " +
            travelTime
        );
    }

In this code, we blindly reached for the name fields of the source and destination intersections without checking to see if they exist. We need to add this check. Perhaps the uglyest but most compact way to do so is to use the embedded conditional operator from Java:

    public String toString() {
        return (
            "Road " +
            (source != null ? source.name : "---" ) +
            " " +
            (destination != null ? destination.name : "---" ) +
            " " +
            travelTime
        );
    }

This code works, substituting --- for any names that were undefined in the input file, but it is maddeningly difficult to format this code so that it is easy to read. C, C++ and Java all share the same basic syntax for the conditional operator (a?b:c), and some critics consider this operator to be so unreadable that they advise never using it. It might be better to add a private method that is easier to read and packages the safe return of either the name or dashes if there is no name. We'll worry about this later.

A More Complex Model

The code we have focused on up to this point works, but it is vastly oversimplified. For example, a major detail in real road networks is that there are many types of intersection. We have at least the following variants:

Stop lights have several characteristics, but one of the most significant is that the simplest ones turn green in two directions while they are red in the other two directions. This means that, for example, the lights facing both north and south are green when the east and west lights are red, and visa versa. More complex stoplights have turn arrows, but for all varieties of stoplights, roads into or out of that intersection must have labels indicating the direction from which they enter or leave.

Similarly, in the neural net example, we have several kinds of synapses. There are excitatory synapses where an action potential traveling down an axon to that synapse causes a positive change in the receiving neuron, pushing it closer to the threshold that would cause it to fire, and there are inhibitory synapses that cause a negative change in the receiving neuron, making it less likely to fire. there are also axosynaptic interfaces where a secondary synapse transmits signals to a primary synapse, activating or inhibiting the primary synapse.

In a logic simulator, there are several kinds of gates. We typically speak of and, or, and not gates, but there are also nand, nor, and exclusive-or gates, as well as assymetric gates that perform functions such as a and not b. This means that we must document each wire leading to a gate by indicating which input it connects to. In the general case, gates may have multiple outputs, so wires from a gate must also be tagges with which output they connect to.

In an epidemic simulator, we need to worry about classes of people. There are employees and students, for example. Employees have a workplace as well as a home. Students have a school as well as a home. This also leads us to think about multiple categories of place, homes, schools and workplaces, where a school is a type of workplace that also has both students and employees. We'll stop here, but we could add students with part-time jobs and homes as workplaces for domestic labor or high-tech startups run out of people's basements.

Impact on the Road Network Descriptioin Language

This has an immediate impact on our road network description. Where we formerly just said:

intersection A
intersection B

intersection A stoplight
intersection B

We've made a decision above, a decision that has several consequences. That is, for specialized types of intersection, we explicitly name the intersection type, but there is also a default where there is no explicit name. We could have reqired intersection B above to be declared as a simple intersection or something like that. The primary problem with this design decision is that it complicates the problem of parsing the input file.

A second consequence is that for roads, we need to document how the road connects to the intersections it joins, for example, using a notation like this:

road A north B south

This means that there is a road leaving intersection A going north to intersection B where it enters from the south. We are not yet committed to this notation, but we will have to come up with something like it when we reach the point of developing stop-lights control for intersections.

Impact on the Road Network Epidemic Model

In our epidemic model, we need to add new classes to our model, but we also need to extend the model description. A quick game of search-engine shows that both school sizes and household sizes are reasonably approximated by log-normal distributions, although Poisson distributions may be technically better. A log-normal distribution can be generated from a normal distribution, and such a distribution has just two parameters, the mean of the normal distribution and its standard deviation. The wikipedia page shows how to derive these from the median size and variance.

So, we should use median and variance as our parameters on the description of our population:

family 3,4
school 200,100

This describes families with a median size of 3 and a variance of 4, and schools with a median size of 200 and a variance of 100. These figures are not necessarily right, but they are good enough. We could do workplaces similarly, but note that schools are a special case of workplace, characterized by student-teacher ratios.

Polymorphism

Java supports polymorphic classes, that is, classes where there are multiple possible implementations. In fact, polymorphism was added to Simula '67, the original object-oriented programming language, precisely to allow for the kind of variation that we need.

Specifically Java, C++ and their ancestor Simula '67 all allow us to introduce new classes that extend an existing class. For example, in Java we could have:

/** Intersections with a stop light
 *  @see Intersection
 */
class StopLight extends Intersection {
}

How do these new subclasses differ from the parent class? The simplest place they differ is in the toString method, so we can immediately create new methods for that:

/** Intersections with a stop light
 *  @see Intersection
 */
class StopLight extends Intersection {
    String toString() {
        return (
                "intersection " +
                name +
                " stoplight"
        );
    }
}

In the above, we've made the output of the toString() method recreate our input text, unless the input contained tabs or multiple spaces between the words. Information about those details is lost (deliberately) by the scanner we are using.

Note that wherever it is legal to have an Intersection, it is now legal to have a StopLight. Consider the following declarations in a hypothetical bit of Java code:

Intersection i;
StopLight s;

Here assignments i=s is legal because i can hold any kind of intersection. It is also legal to write i=new StopLight(). In the opposite direction, you cannot be so free. s=i is illegal — what you have to write if you want this is s=(StopLight)i which means "check to see that i is actually a StopLight and then, if it is, do the assignment; if it isn't, throw an exception."

Note that we have a challenge here: Should uncontrolled intersections be the default class or should uncontrolled intersections be a subclass of a generic intersection class. In the latter case, no instances of the generic class would ever be created. We can enforce this by declaring class Intersection to be an abstract class.

Constructors for Polymorphic Objects

When you create a new object, you must pick its actual class. Once an object is created, you cannot change its class. So, we must change the code to create intersections. Here is the old code that called the constructor: for readNetwork()

static void readNetwork( Scanner sc ) {
    while (sc.hasNext()) {
        // until the input file is finished
        String command = sc.next();
        if ("intersection".equals( command ))
            new Intersection( sc );
        } else if ("road".equals( command ))
            new Road( sc );
        } else {
            // Bug: Should probably support comments
            Errors.warning(
                "'"
                + command
                + "' not a road or intersection"
            );
        }
    }
}

We need to change the part for creating intersections. Either we must have readNetwork() decide what kind of intersection to construct, or it must call something in class Intersection that makes that decision. What it calls in Intersection cannot be a constructor, because if it is, then that very fact makes it too late to decide what kind of intersection to construct.

This creates a problem. Our current idea for an input language is arranged like this:

intersection X stoplight

In our current code, the Intersection constructor is responsible for scanning the identifier X from the above as well as anything that follows it. Perhaps we should change the language to something like this:

intersection stoplight X

If we do this, buildModel can learn the class it is supposed to create before calling a constructor. We could go farther and completely eliminate the keyword intersection from the input language and just have a large collection of keywords like stoplight and perhaps stopsign and roundabout. We won't do that.

If we use the form intersection stoplight X for complicated intersections and interesection X for default ones, we could call the right constructors in buildModel with code like this:

        String command = sc.next();
        if ("intersection".equals( command ))
            if (sc.hasNext( "stoplight" )) {
		sc.next(); // discard the keyword
                new StopLight( sc );
            if (sc.hasNext( "stopsign" )) {
		sc.next(); // discard the keyword
                new StopSign( sc );
                ...
            } else {
                new NoStop( sc );
            }
        } else if ("road".equals( command ))
            ...

This solution would force readNetwork() to know about every subclass of Intersection. As a general rule, one way to improve the maintainability of large programs is to limit the need for one part of the program to know anything about internal details of another part. From the top level, all we need to know is that there are intersections. Only within class Intersection is there any reason to know that there are subclasses. Therefore, we will abandon this solution.

Subclass Constructors.

We never intend to allow anyone to create an object of class Intersection. There are two ways to do this in Java. One approach is to prevent anyone outside the class from calling its constructor. For example, we could declare the constructor to be private:

        // constructor
        private Intersection() {}

Declaring the constructor to be private prevents any code from outside class Intersection from creating any objects of this class, but it still allows code within the class to do so. Another approach to solving the problem is to declare the class to be abstract like this:

abstract class Intersection() {
	String name;
        ...
}

Declaring the class to be abstract makes it illegal to call new Intersection() anywhere in the program. The only way to create an instance of an abstract class is to create an instance of one of its subclasses. So, if StopLight is a subclass of Intersection we can call new StopLight(), and the new StopLight will contain all the fields of an Intersection.

It is legal for an abstract class to have a constructor, but that constructor can only be used from within the constructor of one of the subclasses. That is, the constructor for StopLight can use the constructor for Intersection to initialize the fields that it inherits from Intersection.

No matter how we prevent naked instances of Intersection from being created, we must add constructors for its subclasses. For example, we can begin with a constructor for NoStop that looks something like this:

    public NoStop( Scanner sc ) {
        // scan and process one intersection
	String name = sc.getNext( "???", "intersection missing name" );
        if (RoadNetwork.findIntersection( name ) != null) {
            Errors.warning( "Intersection " + name + " -- redefined." );
            // Bug:  Can we prevent creation of this object?
        }
    }

The code for StopLight is similar. Before we give this, though, note that we have a bug notice that is repeated three times in our code. In class Road and again, in classes NoStop and StopLight, we have repeated the same basic bug notice asking how we detect improper end of line. What we need is a service method to solve this problem in one place, instead of duplicating code everywhere.

Initializing Subclass instances

    public StopLight( Scanner sc ) {
        // scan and process one stop-light intersection
	String name = sc.getNext( "???", "intersection missing name" );
        if (RoadNetwork.findIntersection( name ) != null) {
            Errors.warning( "Intersection " + name + " redefined." );
            // Bug:  We should prevent creation of this object!
        }
	// Bug:  Excessive code duplication if all intersections start as above!

	// Bug:  Missing anything specific to stop lights
    }

This constructor will be the basis of some extended discussion, first because of the bug notices. In the event that an attempt is made to redefine an intersection that already exists, we need to suppress the definition. Second, it is foolish to do computationally complex things like string concatenation in order to construct error messages that will never be printed. This issue shows up all over our code in constructing messages passed to the MyScanner get methods. String concatenation is expensive, and we should not do it until we learn that the string will actually be used! In real use, most of those error messages we computed and passed as parameters are never ever used.

This constructor still doesn't do anything specifically related to stoplights, and we've added some bug notices since we really don't want to have to write duplicate code at the head of every constructor for subclasses of Intersection. The code at the end of each constructor says what kind of intersection it is, and it is just one method call, so that duplication is not flagged as a bug, even though we write out the code as 4 lines, or 5 if you include the bug notice.

Furthermore, it really isn't the constructor we want, because the syntax we want is intersection X stoplight, and we really want to call something like a constructor in class Intersection that will take care of both grabbing the name X and creating the right subclass.

Factories, not Constructors

In reality, a Java constructor is just a static method that implicitly gets an uninitialized object, initializes it, does anything lese, and returns the new object. Any constructor can be rewritten as a static method that uses the default constructor (the one that does no initializations) to do its job, although this may require that final be removed from some variables declarations. Removing final tags from working code leaves working code, so this doesn't change anything.

We call a method that is used to construct objects a factory method. Sometimes, we will even create factory classes, where each instance of that class can be used to construct objects. In that case, the factory instances are usually initialized with the customizations they will apply to the objects they manufacture.

15. Using the Class Hierarchy

Aside: Standard Error versus Standard Output