Compiler Construction

Conside this code. Our emphasis here is on mixed references to local and non-local variables, so there are no parameters to any of the procedures.

x: procedure
    a: var int32;
    b: procedure
        c: var int32;
        c = a;
    end;
    d: procedure
        e: var int32;
        f: procedure
           e = a; a = a + 1; b;
        end;
        f;
        if (a < 2) then d;
    end;
    a = 0; d;
end;

The problem is, how do we do up-level addressing here. As we noted in Lecture 25, we will rely on an up-link at displacement up in each activation record. How does this get set? Let's look at how the above code is compiled, one step at a time, but first, note that we did not say that the outer level here was global. In order to give the most general answer, we wrapped the entire block in procedure x, but note that there is no up-level addressing from within x to anything outside of x.

Let us first look at the procedure f above. How does the up pointer in its own activation record get set? This procedure is called from within the recursive procedure d. Therefore, the value of the up pointer in any particular instance of f depends on which instance of d is active at the time of the call.

We solve this by adding an implicit parameter to every procedure and function, the value of the up-link for that procedure. We'll name the up links uniquely in the following example by prefixing them with the (one letter) procedure or function name. So, the up link of a procedure named p would be named pup.

For the purpose of rewriting the above code in something like Kestrel, we need to name the type of these uplinks. We'll type precord for all of the activation records of procedure p, and we'll allow this to be implicitly declared as a pointer to the current activation record in every context. With these conventions, we can rewrite the above code as follows:

x: procedure
    a: var int32;
    b: procedure(bup:@xrecord)
        c: var int32;
        c = bup@.a;
    end;
    d: procedure(dup:@xrecord)
        e: var int32;
        f: procedure(fup:@drecord);
           fup@.e = fup@.dup@.a; fup@.dup@.a = fup@.dup@.a + 1; b(fup@.dup);
        end;
        f(this);
        if (dup@.a < 2) then d(dup);
    end;
    a = 0; d(this);
end;

Note that we didn't bother declaring an uplink for x because we did not need it. Declaring such an uplink would be harmless, but because we don't know the name of the enclosing context here, we can't do so using the naming conventions we've adopted.

Note that, when calling a routine that is declared locally in the current block, a pointer to the current block, this is always the correct value for the uplink. When calling a routine declared in the immediately enclosing block, including simple recursive calls, the current uplink is passed.

Note also that there are a large number of common subexpressions in the above code. For example, inside f, the reference fup@.dup occurs a total of 4 times. A good compiler would recognize this and evaluate fup@.dup just once, saving its value in an anonymous local variable.

f:  -- procedure(fup:@drecord);
    -- fup@.e = fup@.dup@.a;
    PUSHLA fup
    LOAD
    ADDI   e    -- compute the address fup@.e
    PUSHLA fup
    LOAD
    ADDI   dup
    LOAD
    ADDI   a    -- compute the address fup@.dup@.a
    LOAD
    POPS        -- do the assignment
    -- fup@.dup@.a = fup@dup@a + 1;
    PUSHLA fup
    LOAD
    ADDI   dup
    LOAD
    ADDI   a    -- compute the address fup@.dup@.a
    PUSHLA fup
    LOAD
    ADDI   dup
    LOAD
    ADDI   a
    LOAD        -- get the value of fup@.dup@.a
    PUSHI  1
    ADD
    POPS        -- do the assignment
    -- b(fup@.dup);
    PUSHL  1    -- push space for the stack mark
    PUSHLA fup
    LOAD
    ADDI   dup
    LOAD        -- compute the up link to pass to b
    CALL   1, b -- call b with one parameter
    RETURN 1    -- return from b with one local variable, fup

d:  -- procedure(dup:@xrecord)
    PUSHL  1    -- allocate space for e
    -- f(this);
    PUSHL  0    -- allocate space for the stack mark
    PUSHLA 0    -- get address of my own activation record
    CALLI  f, 1 -- call f with one parameter
    -- if (dup@.a < 2) then d(dup);
    PUSHLA dup
    LOAD
    ADDI   a    -- compute the address dup@.a
    LOAD        -- get the value dup@.a
    PUSHI  2
    LT
    BFALSE eif
    PUSHL  0    -- allocate space for the stack mark
    PUSHLA dup
    LOAD
    CALL   1, d -- call d with one parameter
eif:
    RETURN 2    -- return from d, popping e and dup

General Principles

Note in the above, if we are at nesting level i and we reference any object (procedure, function, variable or parameter) declared at level j, where i > j, we always begin by following i – j up links (which requires i – j LOAD operations). In the case of a global variable x, the final instruction is ADDI x to compute the address of that variable, while in the case of a call to a procedure or function p, we then push any additional parameters before calling the function itself.

Relationship to Methods of Objects

A call to a method of an object operates in a manner identical to the above. The caller must have the ability to address the object. The object itself is the enclosing context for the method, so the caller passes the pointer to the enclosing object as the first parameter to the method. Thus, up-level addressing from methods of objects to the fields of that object is no different from up-level addressing from functions to variables global to those functions.

When an instance of an object is created, the instance must contain an up-pointer to the context in which the object's type is declared. Note that the object's initializer can be considered to be called like a procedure with a funny return sequence that does not deallocate the instance and a funny receiving sequence that allocates the space for the object somewhere other than the stack. Seen this way, the initializer must be passed a pointer to the static context of the type declaraton, and this context may then be used as the up-link for the object itself.

Passing Procedures and Functions Parameters

As currently defined, Kestrel does not support passing procedures and functions as parameters, but it could naturally do so if we added procedure and function references as an alternative type of <parameter declarer>. In that case, the data passed as a parameter must be a two-word quantity consisting of the pointer to the code of the procedure or function, one word, plus the up link for that procedure or function.

The up link that is passed depends on the context in which the caller references the procedure or function. The caller could, for example, pass a procedure or function defined locally and that is invisible to the called routine. In that case, the up link for the parameter will point to the current activation record of the caller.

Connection to Theory

Programming language theorists describe the binding of a block of code to the definitions for the identifiers in that code as a closure. This terminology is frequently used with the lambda calculus; consider this expression:

This is an unclosed expression, with both a and b unbound. In contrast, this expression is described as a closure of the above:

(λa b.a(b))(succ 1)

It is a closure because it provides values for both a and b, allowing lambda substitution to produce this equivalent expression:

Of course, we now have the problem of defining the succ function, probably the successor function that returns one plus its parameter. It is worth noting that in significant areas of computer science theory, succ is seen as a primitive that is never evaluated. Now consider this expression, a partial closure of our original expression:

λc.(λa b.a(b))(c 1)

This binds b to the value 1, while leaving a undefined, except that it has been renamed c. That renaming is not significant, since names of formal parameters are meaningless. In sum, the above partial binding is equivalent to this:

The point of this exercise is that, by the time the body of a Kestrel subroutine is executed, we must have a complete closure of the body, with every identifier in the body bound to some actual object. This binding involves two separate steps, first, binding the block to its static context, a partial closure, and second, binding the actual parameters to the formal parameters, a step that completes the closure.

Lecture 31, Nesting and Functions

Up-level Addressing

General Principles

Relationship to Methods of Objects

Passing Procedures and Functions Parameters

Connection to Theory