Lecture 32, Compiling Kestrel References

Part of the notes for CS:4980:1
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

A Grammar Problem

The Kestrel grammar contains some ambiguities that are the result of organizing the grammar so that each section of the language description is headed by a production rule. One of the most interesting involves two subclasses of statements:

<statement> ::= <procedure call>
             |  <assignment>

<procedure call> ::= <reference>

<assignment> ::= <reference> "=" <expression>

Using our parsing methodology, our parser cannot distinguish a <procedure call> from an <assignment>. We can rewrite the grammar to allow parsing to proceed as follows:

<statement> ::= <reference> [ "=" <expression> ]

That is, if the statement begins with a reference, it is either a procedure call or an assignment statement. Only after parsing the reference do we discover the difference. It is an assignment statement if the reference is followed by an equals sign, otherwise it is a procedure call.

Note that we can also distinguish assignment statements from procedure calls by examining the type of the >reference<. If it refers to a variable, this must be an assignment statement and an equals sign is expected. If it refers to a procedure, this must be a procedure call and an equals sign is forbidden.

A parsing purist will naturally prefer to identify the statement type by its syntax and only then check to see if the semantic attributes are compatible with the statement type. If you are not a purist, and if the environment is fully constructed during the first parsing pass, the parser can use semantic attributes to distinguish between different statement types. Kestrel permits such a design.

Note also that the grammar given for <reference> poses problems. It is left recursive, and our parsing methodology has difficulty with left recusrion:

<reference> ::= <identifier>
             |  <reference> "@"
             |  <reference> "." <identifier>
             |  <reference> <expression list>

Fortunately, this is easy to rewrite in iterative form using the following EBNF, where the new nonterminal name <refinement> describes the operators on references that refine them to refer to more specific items:

<reference> ::= <identifier> { <refinement> }
<refinement> ::= "@"
              |  "." <identifier>
              |  <expression list>

Parsing References

A syntax-driven compiler for any construct begins with a parser, adding flesh to the parser to do the semantic processing. So, let's build a parser for the above first. The following code is written in the context of the compiling and parsing framework discussed previously, where each of the major nonterminals is associated with a class with a factory method called compile() that parses an instance of that nonterminal. Here, it returns NULL because the it does no semantic processing:

// set of lexemes that begin a reference
#define START_PUNCS SET32_EMPTY
#define START_KEYS  SET32_EMPTY
#define START_LEXS  to_set32( IDENT )

// set of lexemes that refine a reference
#define REFIN_PUNCS ( to_set32_2( PT_ATSIGN, PT_DOT )                   \
                    | to_set32_3( PT_LPAREN, PT_LBRAKT, PT_LBRACE )     \

static Reference::Reference * compile() {
        // factory method, but returns NULL
        
        lex_wantinset( START_PUNCS, START_KEYS, START_LEXS, ER_WANT_REFER );
        if (!lex_is(IDENT)) return NULL;

        lex_advance(); // skip the identifier

        while (lex_ispuncset( LEX_THIS, REFIN_PUNCS )) {
                if (lex_ispunc( LEX_THIS, PT_ATSIGN )) {
                        lex_advance(); // skip @

                } else if (lex_ispunc( LEX_THIS, PT_DOT )) {
                        lex_advance(); // skip .
			if (lex_is( IDENT ) {
                        	lex_advance(); // skip identifier
			} else {
				lex_gotbutwant( &lex_this, ER_WANT_IDENT );
			}

                } else { // assert lex_ispuncset( BEGINPARENS ))
			Expressionlist::compile
                } // there are no other alternatives
        }
        lex_wantinset( FOLLOW_PUNCS, FOLLOW_KEYS, FOLLOW_LEXS, ER_WANT_EQSEMI );
}

Compiling References

Now, we're ready to seriously tackle the problem of compiling references. First, we need to worry about the external view of the Reference class in our compiler. The definition of this belongs in reference.h. In the broadest use, a reference may refer to anything that can be declared. So, a reference may refer to a constant, type, exception, variable, procedure or function. Of course, in the context of assignment statements and procedure calls, some of these are illegal. Each of these is legal in some context, so we'll write the following code to permit references to all of them.

At minimum, this means that Reference::compile() must return the Declarator for the item that was referenced. Even if the referenced item is anonymous, for example, an anonymous type, a declarator can be returned that carries that type. This leads to the following:

// reference.h

// Author ...

// BNF
// <reference> ::= <identifier> { <refinement> }
// <refinement> ::= "@"
//               |  "." <identifier>
//               |  <expression list>

class Reference {
public:
        static Reference * compile( Environment * e );
        // factory method

        Declarator: refersto; // what does this reference refer to
}

Code in the outside world does not need to know any of the internal details of how references are compiled, but that code does need to know what the machine code generated by compiling a reference leaves on the stack. Since references may refer to a constant, type, exception, variable, procedure or function, we must account for each of these.

In the following, we'll assume that declarator.h exports an enumerated type that has values that distinguish the different kinds of declarators, and we'll assume that one of the fields of a Declarator is its kind, a value from this enumeration. We'll also assume that Environment has a method lookup() that takes a symbol_handle and returns the corresponding Declarator. With these assumptions, we can flesh out Reference::compile() to begin doing some real compiling as follows:

static Reference::Reference * compile( Environment * e ) {
        // factory method, but returns NULL
        
        lex_wantinset( START_PUNCS, START_KEYS, START_LEXS, ER_WANT_REFER );
        if (!lex_is(IDENT)) return NULL;

        Declarator * d = e->lookup( (symbol_handle)lex_this.value );
        // assume lookup complained and returned NULL if undefined
        lex_advance(); // skip the identifier
	
	if (d != NULL) switch (d->kind) {
	case DECL_CONST:
	case DECL_TYPE:
	case DECL_EXCEPT:
		// =BUG= need to understand more, perhaps nothing to do here?
		break;
	case DECL_VAR:
		// =BUG= must push the address of the variable
		break;
	case DECL_PROC:
	case DECL_FUNC:
		// =BUG= must push the mark and the up-link for it
		// =BUG= when do we do the call?  Depends on whether parameters
		break;
	}

        while (lex_ispuncset( LEX_THIS, REFIN_PUNCS )) {
                if (lex_ispunc( LEX_THIS, PT_ATSIGN )) {
			if ((d != NULL)
			&&  (d->kind == DECL_VAR)
			&&  (d->type->kind == TYPE_POINTER)) {
				cg_load(); // generate LOAD abstract instruction
				d = d->type->referent;
			} else {
				error_warn( ER_WANTPTR, lex_this.line );
			}
                        lex_advance(); // skip @

                } else if (lex_ispunc( LEX_THIS, PT_DOT )) {
			if ((d != NULL)
			&&  (d->kind == DECL_VAR)
			&&  (d->type->kind == TYPE_RECORD)) {
				// nothing to do here
			} else {
				error_warn( ER_WANTREC, lex_this.line );
			}
                        lex_advance(); // skip .

			if (lex_is( IDENT ) {
        			Declarator * f = NULL;
				if (d != NULL) {
					f = d->type->environment->lookup(
						(symbol_handle)lex_this.value
					);
				}
				if (f != NULL) cg_addi( f->offset );

				// =BUG= forgot some special cases
				//       specifically, type.min, type.max, etc.
				
                        	lex_advance(); // skip identifier

			} else {
				lex_gotbutwant( &lex_this, ER_WANT_IDENT );
			}

                } else { // assert lex_ispuncset( BEGINPARENS ))
			Expressionlist::compile
			// =BUG= if it was a parameter list, generate call here
                } // there are no other alternatives
        }
	Reference * v = new Reference();
	v->refersto = d;

        lex_wantinset( FOLLOW_PUNCS, FOLLOW_KEYS, FOLLOW_LEXS, ER_WANT_EQSEMI );
	return v;
}

The above is just a start.