19. The Road to Lambda

Part of CS:2820 Object Oriented Software Development Notes, Spring 2021
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

Background

Consider this method call:

        dst = sc.getNext( "???", "road " + src + " to missing destination" );

Java treats this as something equivalent to the following:

        dst = sc.getNext(
            "???",
            new StringBuilder( "road " )
                    .append( src )
                        .append( " to missing destination" )
                            .toString()
        );

The net result is, for the above callto getNext() we must construct a new StringBuilder object, call several methods on that object to append the different concatenated strings, and then construct a new String from the StringBuilder.

The net result of all of this is that the total cost of this call to the getNext() method is likely to be dominated by the cost of computing the second parameter to the call, particularly for some of the more ornate error messages in our code. So we proposed this rewrite of some of our code:

String errMsg = "road"
srcname = in.getNext( "???", errMsg + ": source missing" );
errMsg = errMsg + " " + srcname;
dstname = in.getNext( "???", errMsg + ": destination missing" );
errMsg = errMsg + " " + dstname;
travelTime = in.getNextFloat( 99999.9F, errMsg + ": travel time missing" );
errMsg = errMsg + " " + travelTime;
in.getNextLiteral( ";", errMsg + ": semicolon missing" );

This solution takes advantage of the particular pattern of information used in this context, and it cuts the number of concatenations drastically, while making it significantly harder to see what exactly is concatenated into each error message.

A General Solution

The most general solution involves replacing the data parameter with a parameter that conveys a computation. Don't pass the string you might need, pass the tools to compute that string if it is needed. In Java, the way we do this is to contruct an object and pass that object. If the called routine needs the value, it will call a method of that object. That is the method that will do the work. Consider this new version of the MyScanner.getNext() method, the prototype for all the other getNext methods:

public abstract class ErrorMessage {
    public abstract String myString();
}

public String getNext( String defalt, ErrorMessage msg ) {
    if (self.hasNext()) {
        return self.next();
    } else {
        Error.warn( msg.myString() );
        return def;
    }
}

Now, all we have to do to call our syntax-check method is first create a new subclass of ErrorMessage with the appropriate toString() method.

This sounds awful, but Java provides some shorthand to make it easy. We'll do the awful long-winded solution first before we look at the shorthand notation.

Note, we really wanted to use toString() as the name of the method above, but that doesn't work. You can't declare an abstract method in a Java class if it already inherits a concrete method from one of its superclasses, and all classes inherit toString() from class Class.

A Preliminary Approach

Where our original call looked like this:

travelTime = in.getNextFloat(
    99999.9F,
    "road " + srcname + " " + dstname + ": travel time missing"
);

We write this new supporting class:

class MissTravelTime extends ErrorMessage {
    private final String sname;
    private final String dname;
    public MissTravelTime( String s, String d ) {
        sname = s;
        dname = d;
    }
    public myString s() {
        return "road " + sname + " " + dname + ": travel time missing"
    }
}

With this groundwork done, we rewrite the original call to the getNext message as:

ErrorMessage msg = new MissTravelTime( srcname, dstname ));
travelTime = in.getNextFloat( 99999.9F, msg );

Of course, we don't need to add a new variable, we can shorten this code to this:

travelTime = in.getNextFloat(
    99999.9F, new MissTravelTime( srcname, dstname )
);

In this context, this led to an extra line of code, but it might be just as easy to read because you don't have to wondere if the variable msg gets used anywhere else.

The above code is hardly convenient! We had to create a new class with its own fields and constructor as well as the method that encapsulates our delayed computation, all to pass a simple but long expression. Doing this over and over, once for each call to sc.getNext() promises to make a totally unreadable program. Fortunately, Java offers more conventient notation, but it is worth understanding from the start that these are merely shorthand notations for the ideas presented above.

Inner Classes

The long winded code we just gave would work equally well if we declare the class ErrorMessage at the outer level of the program, but putting lots of little classes at the outer level leads to a very messy program. Fortunately, Java provides an alternative: We can declare class ErrorMessage as an inner class inside MyScanner.

class MyScanner {
    Scanner self; // the scanner this object wraps

    /**
     * Parameter carrier class for deferred string construction
     * used only for error message parameters to getXXX() methods
     */
    public static abstract class ErrorMessage {
        abstract String myString();
    }

    ... deleted code ...

    public String getNext( String def, ErrorMessage msg ) {
        if (self.hasNext()) return self.next();
        Error.warn( msg.myString() );
        return def;
    }

Note that the inner class here is defined as public, so that code outside class MyScanner can use it, and it is define to be static so that instances of this class have no access to anything other than static components of MyScanner. In fact, class ErrorMessage makes no use of any access it has to fields of MyScanner, but Java does allow such uses in a limited way. Finally, class ErrorMessage is abstract so you have to create a specific subclass for each kind of error message, and it commits those subclasses to providing a myString method by declaring that to be an abstract method.

Similarly, while we could declare each subclass of ErrorMessage at the outer level, each of those subclasses is likely to be needed in only one place, so it is better to declare them as inner classes right at the point of use. So, in Road, we can write code like this:

class Road {

    ... several lines deleted ...

    // the constructor
    public Road( MyScanner sc ) throws ConstructorFail {
        final String srcname;       // where does it come from
        final String dstname;       // where does it go

        class MissingSource extends MyScanner.ErrorMessage {
            String myString() {
                return "road: from missing source";
            }
        }
        srcname = sc.getNext( "???", new MissingSource() );

        class MissingDestination extends MyScanner.ErrorMessage {
            final private String src;
            MissingDestination( String s ) {
                src = s;
            }
            String myString() {
                return "road " + src + ": to missing destination";
            }
        }
        dstname = sc.getNext( "???", new MissingDestination( src ) );

In this code, classes MissingSource and MissingDestination are each used in just one place, the line immediately following the class declaration. Each of them extends MyScanner.ErrorMessage, referring to the inner class of class ErrorMessage. Of these two MissingSource is trivial. Its myString method just returns a constant string, making the entire mechanism just an expensive way to pass a string constant to getNext().

MissingDestination is more interesting. This has an instance variable src that is initialized by from a parameter to the constructor. Here, the actual parameter passed to the constructor for MissingDestination is also called src, but in the constructor call, it is the string holding the name of the source intersection. The myString method is no longer trivial, it concatenates two string constants, one before src and one after.

Java allows code in an inner block to reference items declared in outer blocks, so we can simplify the above code, writing just this:

    // the constructor
    public Road( MyScanner sc ) throws ConstructorFail {
        final String srcname;       // where does it come from
        final String dstname;       // where does it go

        srcname = ... some deleted code ...

        class MissingDestination extends MyScanner.ErrorMessage {
            String myString() {
                return "road " + srcname + ": to missing destination";
            }
        }
        dstname = sc.getNext( "???", new MissingDestination );

In the above code, the variable src used in MissingDestination.myString() appears to be a direct reference to the variable src that is a local variable of the constructor Road.

The truth is more complicated. Java imposes some very strict limits on uses of outer variables from within inner classes. Specifically, Java requires that such "up-level references" be confined to variables that are "final or effectively final." In our case, srcname was declared to be final, so we have trivially met this constraint.

Why does Java have this restriction? The answer has to do with the history of Java and its antecedants, C++ and C. Inner classes are an afterthought in Java, and similar nesting relationships are an afterthought in C++. Classical C does not support any kind of nesting of one function definition within another.

So, how did the implementors of Java add inner classes with up-level variable references? The answer is, they cheated and made the compiler convert all inner classes to outer classes. Wherever an inner class contains a reference to a variable declared in the enclosing context, the compiler turns that into an implicit final instance variable of the inner class, adding a implicit parameter to the constructor to initialize that implicit variable. In short, the notation above without an explicit constructor for MissingDestination is merely shorthand, and the Java compiler actually generates the code describe by the original version where the constructor for MissingDestination had a parameter used to initialize an instance variable.

When you write code with an explicit constructor and explicit initialization of the instance variable, you can pass anything you want to the constructor. The designers of of Java did not want to advertise what they were doing, so instead of explaining it, they simply make the compiler enforce the rule that the only outer variables you can use from a class are those that are final or effectively final. With this rule, they do not need to explain how they passed the value because all the possible implementations would produce the same result.

Aside: When you compile a program containing inner classes, a separate .class file will be created for each inner class. The names of these files are constructed from the name of the outer class concatenated with the name of the inner class. This allows inner classes in two different outer classes to have the same name without creating any name conflicts in the file names.

Aside: Sadly, the general solution to the up-level addressing problem was developed back in the 1960s for implementations of the Algol 60 programming language, first released in 1961. This solution was also used in Simula 67, the first object-oriented programming language and the direct ancestor of the object-oriented features of C++ and Java. Sadly, the general solutio to the up-level addressing problem never made it to C++ and Java.

The most general implementation works as follows: Except at the outermost nesting level, each object has an implicit final field that is never explicitly mentioned in your code. It is common to call this the enclosing scope pointer or the uplink, but we'll just call it up. Whenever a new object is created, the uplink in that new object is set to point to the object that encloses this object.

We can rewrite the above code with these explicit uplinks as follows:

    // the constructor
    public Road( MyScanner sc ) throws ConstructorFail {
        final String src;       // where does it come from
        final String dst;       // where does it go

        src = ... some deleted code ...

        class MissingDestination extends MyScanner.ErrorMessage {
            private final BlockReference up;
            MissingDestination( BlockReference u ) {
                up = u;
            }
            String myString() {
                return "road " + u.src + ": to missing destination";
            }
        }
        dst = sc.getNext( "???", new MissingDestination( this.Road ) );

In the above, the this.Road in the call to the constructor for MissingDestination() is not legal Java, but it is an attempt to suggest that the block of memory holding the local variables of the constructor Road are actually in an object (sometimes called an activation record or a stack frame), and the handle for that object is passed to MissingDestination(). All up-level addressing can be done this way.

Anonymous Inner Classes

If we only use a class name in one place, why not just put the class definition there instead of giving it a name. We do this with variables all the time. We can write this, and in fact, the following code is close to what actually gets executed inside the computer, where each line corresponds to one machine instruction:

int t1 = a + b;
int t2 = t1 * 5;
methodCall( t2 );

Most programmers won't write that. Instead, they eliminate the variables t1 and t2 and simply put the expressions together into the place where the final value is needed:

methodCall( (a + b) * 5 );

Inside the computer, the temporary variables t1 and t2 still exist, but they are now anonymous. When a is added to b the result still has to be put somewhere before it is multiplied by 5, but the variable holding this intermediate value now has no name.

Java lets us write code with single-use inner classes abbreviated the same way. We can write this:

class Road {

    ... several lines deleted ...

    // the constructor
    public Road( MyScanner sc ) throws ConstructorFail {
        final String srcname;       // where does it come from
        final String dstname;       // where does it go

        srcname = sc.getNext(
            "???",
            new MyScanner.ErrorMessage() {
                String myString() {
                    return "road: from missing source";
                }
            }
        );

        dstname = sc.getNext(
            "???",
            new MyScanner.ErrorMessage() {
                String myString() {
                    return "road " + srcname + ": to missing destination";
                }
            }
        )

In the above, constructs like new MyScanner.ErrorMessage() mean call the constructor of an anonymous subclass of MyScanner.ErrorMessage() where that class has the following body. That is to say, the notation just introduced is just a short-hand notation for what we have already done with explicit non-anonymous inner classes, and that is just a short-hand for conventional outer classes that are only visible in one part of the program and may have implicit constructors and hidden fields to handle up-level addressing. It's all syntactic sugar, but the result is a reasonably compact notation.

Aside: When you compile a program containing anonymous inner classes, a separate .class file will be created for each inner class. The compiler makes up names for these these files by tacking a number onto the name of the outer class. This allows inner classes in two different outer classes to have the same name without creating any name conflicts in the file names.

It can be confusing, though, to see the large number of .class produced when compiling a program that makes extensive use of inner classes and anonymous inner classes.

Interfaces

Before we finish the alternative, let's look back at the code for our abstract class ErrorMessage:

public abstract class ErrorMessage {
    public abstract String myString();
}

Notice that this class has no fields and no methods that are not abstract. All it does is define the interface to a class that may have many implementations. In Java, we can use the keyword interface instead of abstract class in this context. When we declare an interface instead of an abstract class, Java forbids declaring any fields, and all of the methods are implicitly abstract. So we can replace the above with this:

public interface ErrorMessage {
    public String myString();
}

When you declare something as an interface, classes that build on that interface are said to implement that interface, so you use the keyword implements instead of extends when you use the interface as the basis of a class. Interfaces are essential for the next step along our road to avoiding doing any computation until it is needed.

Lambda

Once you shift from an abstract class, Java lets you abbreviate the call that passes a lazy parameter to a remarkably compact form:

        dst = sc.getNext(
            "???",
            () -> return "road " + src + ": to missing destination"
        )

This is called a lambda expression or λ-expression. The text before the -> is the formal parameter list for the method we are passing, and the code after the -> is the body of the method. Java only allows this notation if the type of the formal parameter is a functional interface. That is, it must have just one public method of that interface that matches the argument list on the lambda expression.

All of Java's restrictions on uplevel addressing from within an inner class apply to references to variables from within the body of a λ expression, because Java actually implemets λ expressions by creating an anonymous inner class for each such expression. That is to say, Java's λ notation is nothing but syntactic sugar.