Lecture 36, Nesting and Functions
Part of
the notes for CS:4908:0004 (22C:196:004)
|
Consider the compilation of this little bit of code, with variables temp and a being local (where local includes, potentially, parameters):
temp = a
If we compile this for the suggested stack architecture that is a model for a potential intermediate code or an interface to the code generator, we might get this:
-- ///| PUSHA temp -- ///| @temp | PUSHA a -- ///| @temp | @a | LOAD -- ///| @temp | a | POP -- ///|
The comments in the above code show the contents of the stack (in symbolic form) before and after each instruction. Pushing adds items to the right. Note that the above uses the notation @temp to mean "the address of temp" while the notation temp is used to indicate the value of that locaiton.
We emphasize, here, the meanings of the instructions:
Also note that locations on the stack can themselves be statically allocated as variables in the activation record, since at any point in the code, the displacement of the stack pointer from the frame pointer can be computed as a static constant.
Conside this code:
a: var int32; b: procedure c: var int32; c = a; end; d: procedure e: var int32; f: procedure e = a; a = 1; b; end; if (a = 0) then d; f; end; a = 0; d;
The problem is, how do we do up-level addressing here. As we noted in Lecture 29, we will rely on an up-link at displacement up in each activation record. How does this get set? Let's look at how the above code is compiled, one step at a time, but first, note that we did not say that the outer level here was global. In order to give the most general answer, we will assume that that code might itself be several levels of nesting in from the global level:
Let us first look at the procedure f above. How does the up pointer in its own activation record get set? This procedure is called from within the recursive procedure d. Therefore, the value of the up pointer in any particular instance of f depends on which instance of d is active at the time of the call.
We solve this by adding an implicit parameter to every procedure and function, the up-link to be used by the called procedure. Thus, we can compile f as follows:
f: RECEIVE 1, 0, 0 -- e = a PUSHA up LOAD ADDI e -- compute the address up.e PUSHA up LOAD ADDI up LOAD ADDI a -- compute the address up.up.a LOAD POP -- do the assignment -- a = 1 PUSHA up LOAD ADDI up LOAD ADDI a -- compute the address up.up.a PUSHI 1 POP -- do the assignment -- b PUSHA up LOAD ADDI up LOAD -- compute the address up.up.0, the up link for b CALLI b, 1 RETURN 1, 0, 0
Note here that we have added a new instruction to our stack architecture:
Given the above, which illustrates the general case, we can now write code for the procedure d:
d: RECEIVE 1, 1, 0 PUSHA up LOAD ADDI a -- compute the address up.a LOAD PUSHI 0 CMP BFALSE eif PUSHA up LOAD CALLI d, 1 eif: PUSHA 0 CALLI f, 1
Note in the above, if we are at nesting level i and we reference any object (procedure, function, variable or parameter) declared at level j, where i > j, we always begin by following i – j up links (which requires i – j LOAD operations). In the case of a global variable x, the final instruction is ADDI x to compute the address of that variable, while in the case of a call to a procedure or function p, we then push any additional parameters before calling the function itself.
For calls to local procedures or functions, that is where we are at nesting level i and we reference a procedure or function declared at level i, we simply push a copy of our own frame pointer as the up pointer for the called procedure or function, using PUSHA 0 (that is, with a displacement of zero).
As currently defined, Kestrel does not support passing procedures and functions as parameters, but it could naturally do so. In that case, the procedure or function parameter must be a two-word quantity consisting of the actual pointer to the procedure or function, plus the pointer to the procedure or function's enclosing block, that is, the up-link for that procedure or function.
At whatever point in the code where the procedure or function parameter is formed by direct reference to the required procedure or function, the up-link can be computed as described above. From that point onward, as the procedure or function parameter is passed and passed on from one place to another in the code, the pair giving the code address and its context is always passed onward as a pair.
A call to a method of an object operates in a manner identical to the above. The caller must have the ability to address the object. The object itself is the enclosing context for the method, so the caller passes the pointer to the enclosing object as the first parameter to the method. Thus, up-level addressing from methods of objects to the fields of that object is no different from up-level addressing from functions to variables global to those functions.
When an instance of an object is created, the instance must contain an up-pointer to the context in which the object's type is declared. Note that the object's initializer can be considered to be called like a procedure with a funny return sequence that does not deallocate the instance and a funny receiving sequence that allocates the space for the object somewhere other than the stack. Seen this way, the initializer must be passed a pointer to the static context of the type declaraton, and this context may then be used as the up-link for the object itself.
Genuinely global variables may always be statically allocated, so they may always be addresses using static addresses. This requires a new form of push instruction in our stack model:
Given this, we can always generate direct references to genuinely global objects, and no object declared at the global level (including activation records for global procedures or functions) needs to have an up-link allocated within it or passed to it.
Furthermore, any procedure or function that does not reference any variables that are global to it but not at the outermost (static) level does not need an up link, nor do calls to that procedure or function need to pass such a link.
Objects that contain no code or methods that address variables global to that object but not at the outermost level do not need to contain up links. Really good compilers can determine these properties and make these allocations.
Programming language theorists describe the binding of a block of code to the definitions for the identifiers in that code as a closure. This terminology is frequently used with the lambda calculus; consider this expression:
λa b.a(b)
This is an unclosed expression, with both a and b unbound. In contrast, this expression is described as a closure of the above:
(λa b.a(b))(succ 1)
It is a closure because it provides values for both a and b, allowing lambda substitution to produce this:
succ(1)
Of course, we now have the problem of defining the succ function, although it is worth noting that in significant areas of computer science theory, this is seen as a primitive that is never evaluated. Now consider this expression, a partial closure of our original expression:
λc.(λa b.a(b))(c 1)
This binds b to the value 1, while leaving a undefined, except that it has been renamed c, but that renaming is not significant, since names of formal parameters are meaningless. In sum, the above partial binding is equivalent to this:
λc.c(1)
The point of this exercise is that, by the time the body of a Kestrel subroutine is executed, we must have a complete closure of the body, with every identifier in the body bound to some actual object. This binding involves two separate steps, first, binding the block to its static context, a partial closure, and second, binding the actual parameters to the formal parameters, a step that completes the closure.