Lecture 32, Compiling Assignment Statements

Part of the notes for CS:4980:1
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Parsing Assignment Statements

As with the previous discussion of references, we'll begin with a parser. Here, there's no point in creating a class AssignmentStatement. Class statement is sufficient, and the assignment_compil() routine can easily be a local support routine that is part of statement.cpp. Aside from that, we'll follow the usual boilerplate approach to writing the code.

Statement * assignment_compile() {
        // <statement> ::= <reference> [ "=" <expression> ]
        // returns NULL until we discover a need for it to return more
        
        Reference::compile();

        if (lex_ispunc( LEX_THIS, PT_EQUALS )) {
                lex_advance(); // skip =

                Expression::compile();
        }
        lex_wantinset( FOLLOW_PUNCS, FOLLOW_KEYS, FOLLOW_LEXS, ER_WANT_SEMI );

        return NULL;
}

Compiling Assignment

Now, we're ready to seriously tackle the problem of compiling assignment statements and procedure calls. To do this, we need to recall the views of classes Reference and Expression. At minimum, we'll assume the return value for Reference::compile() includes a Declarator for the item that was referenced. The Declarator and the Expression object returned by Expression::compile() must carry a type, and in the case of constant-valued expressions, the Expression carries a value.

Statement * assignment_compile( Environment * e ) {
        // <statement> ::= <reference> [ "=" <expression> ]
        // returns NULL until we discover a need for it to return more
        
        Reference * r = Reference::compile( e );

        if (lex_ispunc( LEX_THIS, PT_EQUALS )) {
                lex_advance(); // skip =

                Expression * e Expression::compile( e );

                if (r == NULL) {
                        // =BUG= error, assignment to ill-defined reference
                        return NULL;
                }
                Declarator * d = r->refersto;
                if ( (d == NULL)  // =BUG= is this possible?
                ||   (d->kind != DECL_VAR) ) {
                        // =BUG= error, assignment to non variable 
                        return NULL;
                }
                // assert that the reference address is on the stack
                if (e == NULL) {
                        // Expression::compile already gave an error message
                        return NULL
                }
                // assert that we have <variable> '=' <expression>
                if (e->kind == EXPR_CONST) {
                        // constant expressions didn't push their values
                        cg_pushi( e->value );
                }
                // assert that the expression value is on the stack
                if (e->type == d->type) {
                        // easy case, no range checking needed, just assign
                        cg_pops();
                        return NULL
                }
                if ( (e->type->kind == TYPE_SUBRANGE)
                &&   (d->type->kind == TYPE_SUBRANGE)
                &&   (e->type->basetype == d->type->basetype)) {
                        // the subranges are compatible
                        cg_rangecheck( d->type->min, d->type->max );
                        // =BUG= can optimize above considerably so it only
                        //       checks when the source range exceeds the
                        //       destination range
                        cg_pops();
                        return NULL
                }
                // =BUG= what about assignment of records or arrays?

                // =BUG= error, the assignment involves incompatable types

        } else { // not an assignment statement
                if (r == NULL) {
                        // it was a procedure call or an appropriat
                        // error message has already been output
                        return NULL
                }
                // =BUG= error, assignment statement expected
        }
}

Note that the way the above was coded, it contains no code to check either the start set or the follow set of the assginment. This is because the code we already gave for Reference::compile() checks the start set for the reference that begins an assignment statement, and we can assume that Statement::compile() takes care of checking for the follow set on statements.

The above code got quite large compared to the parser because of the number of special cases it had to consider, and we've left out many of these cases. Type compatibility for subrange types is more complex than type compatability for simple types, and we've completely omitted consideration of array and record assignment. Kestrel allows entire arrays or records to be assigned at once, and Kestrel allows pointer assginment when the source pointer refers to an extension (subclass) of the type of the destination pointer.

Range Checking

In the above, we assumed a code-generator operation cg_rangecheck(min,max). This high-level machine instruction should raise a range exception if the value on the stack top exceeds max or is less than min.

This is subject to considerable optomization. First, note that it is only called when both the expression and the reference are of subrange types. Therefore, we can speak of the following values:

emin -- the minimum value of the source expression
emax -- the maximum value of the source expression
eminemax

rmin -- the minimum value of the destination reference
rmax -- the maximum value of the destination reference
rminrmax

Now, we can identify a number of distinct cases:

It is worth noting that exactly the same range checking computations apply to parameters passed by value to procedures and functions as to assignment. Therefore, code to pick which range checking code to generate should be partitioned into a subroutine so it can be used in multiple places.

It would make sense to include not only cg_rangecheck(min,max) in the high-level instruction set that is the interface to the code generator, but also cg_rangemin(min) and cg_rangemax(max), so that the compiler can tell the code generator exactly which bounds to check where checking is required.

Practical Range Checking

You might imagine that the code to check to see whether the value on the stack top was within bounds would look something like this:

if (top < min) | (top > max) raise range;

On some computers, strangitforward compiling of the above code does indeed produce range checking code that is about as good as is possible. This is not always the case. On a number of computers designed in the 1950s and 1960s, very clever instruction sequences could be used for range checking. Toda, the ARM also has some clever possibilities. Here is the definiton of a typical compare instruction from early computers that predate the invention of condition codes:

CMP a,b
if a < b then pc = pc + 1;
if a = b then pc = pc + 2;
if a > b then pc = pc + 3;

The typical use of this instruction was as follows:

CMP     a,b
BR      ALESS   ; executed if a < b
BR      ASAMEB  ; executed if a = b
BR      ABIGG   ; executed if a > b

A less than obvious use of this instruction was in bounds checking:

CMP     a,max
CMP     a,min   ; if a < max
BR      OUT     ; if a = max or (a < max and a < min)
BR      OUT     ; if a > max or (a < max and a = min)
                ; if a < max and a > min

The above code branches to OUT if a is in the range from min to max exclusive of the end points. This is the kind of trick you can fail to invent after years of using a machine.

The Fortran if Statement

The original Fortran language had a bizarre if statement that only makes sense in the context of this strange comparison instruction:

if (a) 10,20,30

This branches to the label 10 if a is less than zero, to the label 20 if a is zero, and to 30 if a is greater. Labels in Fortran were numeric, and if you left out a label in the above, it meant "fall through" to the next statement.

Efficient Bounds Checking on the ARM

The ARM instruction set has an equally strange but delightfully optimal coding trick for bounds checking. Every ARM instruction can optionally be made contingent on the condition codes. As a result, you can write bounds checking code as follows:

cmp     a,upper
cmple   lower,b
ble     inrange         @ (a <= upper) and (lower <= a)

This kind of tricky code is obvious for register-to-register operations, and within the rather tricky constraints of the ARM instruciton set concerning immediate operands, it may also be used.

Of course, to complete our treatment of bounds checking, we need to ask, how do we throw an exception. All code anywhere in the program that needs to throw a range exception can branch to a single point in the code to throw that exception, but what this code does depends on our implementation of exception semantics. A default bit of code that simply outputs the error message unhandled exception and then terminates the program would make an excellent temporary solution until we get around to worrying about catching exceptions.

Speaking of temporary solutions, an excellent way to make your Kestrel programs testable before you have generalized support for subroutine calls that includes support for calls to external C functions would be to define a new kind of statement.

<statement> ::== "putchar" "(" <expression> ")"

This would evaluate the expression, verify that it is a character expression, and then put that parameter in palce for a call to the external C routine, all handled as a special case instead of as a general mechanism.