Lecture 30, Subroutine Calls

Part of the notes for CS:4980:1
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

A Naive Model

A naive designer of stack computers might conclude that the following basic call and return instructions are appropriate:

CALL dst
CALL the subroutine at address dst, pushing the return address on the stack:
M[--sp] = pc;
pc = dst;
RETURN
Return to the caller by popping the stack top into the program counter.
pc = M[sp++];

To see what is wrong with these instructions, consider the following bit of code:

min: function int32 ( a:int32, b:int32 )
    temp: var int32
    temp = a
    if a > b then
        temp = b
    end
    return temp
end

Our first attempt to compile this requires that we have some way to push the addresses of local variables on the stack. We'll use the PUSHLA high-level instruction to do this, ignoring for the moment how it works.

MIN: PUSHL  1    -- allocate space for temp

     PUSHLA TEMP
     PUSHLA A
     LOAD
     POPS        -- temp = a

     PUSHLA A
     LOAD
     PUSHLA B
     LOAD
     GT
     BFALSE EI  -- if a > b then

     PUSHLA TEMP
     PUSHLA B
     LOAD
     POPS       -- temp = b

EI:             -- end
     PUSHLA RETURNVALUE
     PUSHLA TEMP
     LOAD
     POPS       -- return temp
     POPL   1   -- deallocate temp
     RETURN

This code has several problems.

The classic way to handle local variable addressing is to set aside a register called the frame pointer or fp. This lets us define

PUSHLA x
PUSH Local Address; the address of local variable x is pushed onto the stack.
M[--sp] = fp - x;

This definition shifts the problem from how do we push the value of a local variable to how does the frame pointer get set on entry to a subroutine and how does it get restored on return. An obvious idea is to make the CALL and RETURN instructions in our stack architecture do this. For example, we can define them as follows:

CALL dst
CALL the subroutine at address dst, pushing the return address and the previous frame pointer on the stack and defining a new frame pointer:
M[--sp] = pc;
M[--sp] = fp;
fp = sp;
pc = dst;
RETURN
Return to the caller by popping the frame pointer and program counter from the stack.
fp = M[sp++];
pc = M[sp++];

There are several awkward problems with the above. First, consider the expression x+min(y,z). The code we would like to generate looks like this:

     PUSHGA x
     LOAD
     PUSHGA y
     LOAD
     PUSHGA z
     LOAD
     CALL   min
     ADD

Unfortunately, this won't work because the return from the call did not pop the parameters off the stack. We could solve this by adding a POPL command to get rid of one parameter right after the CALL, but this is awkward.

There is a second problem. The parameters got pushed onto the stack before the return address and saved frame pointer, but the local variables get pushed after the return address. If the frame pointer points to the saved value of the frame pointer (that's how our definition of CALL worked out above) this means that the addresses of local variables and parameters have the opposite sign relative to the frame pointer. There's nothing difficult to implement about this, but it is awkward.

Finally, what is the local address where we store the return value? We can easily solve this problem: It is the same as the address of the first parameter.

Kluges

Numerous computers have gone to market with call and return instructions that follow the naive model described at the top of this section, including the Intel x86 architecture. On these machines, programmers typically must add instructions before each return operation to pop local variables from the stack, and after call instructions to pop parameters from the stack.

Sometimes, later models of the architecture include instructions, added as afterthoughts, to simplify stack cleanup. The PDP-11 computer is a good example. This machine has 8 registers; 6 are general purpose, while one is effectively reserved as the stack pointer and one serves as the program counter. Despite the fact that it has general purpose registers, the original PDP-11 model (the 11/20) had call and return instructions that basically followed the naive model given here (if you ignore irrelevant features).

Late in the history of the PDP-11, a new and truly remarkable instruction was added:

MARK n
What in heavens?
sp = pc
pc = M[sp++]
sp = sp + n

What is going on here? This instruction seems to be doing entirely random things! First, note that the PDP-11 stack grew down, just like the ARM and the Intel x86. Incrementing the stack pointer popped things from the stack, while push decremented the stack pointer. The key to understanding the mark instruction lies how it is used. Consider the expression x+min(y,z). The code we will generate for a call using a PDP-11ish MARK instruction looks like this:

     PUSHLA X
     PUSHLA Y
     LOAD      -- push parameter 1, y
     PUSHLA Z
     LOAD      -- push parameter 2, z
     CALL   MIN
               -- the MARK pops down to the result
     ADD       -- compute x+min(y, z)

Now look at how the called function is coded:

MIN: PUSHI (MARK 1) -- push mark instruction to pop one parameter
     PUSHI 0        -- allocate space for temp

     -- nothing in the body changes

     POPS           -- min = temp
     PUSHLA MARK_R
     RETURN         -- jump to the mark

Above, we used the naive RETURN instruction, but we used it in a strange way. It's not returning to the point of call because we just pushed the address of the MARK instruction itself (effectively a local variable) onto the stack. So, the RETURN is going to jump to an instruction on the stack. Executing the MARK instruction in that context puts everything right, loading the following word (the return address) into the program counter and then setting the stack pointer 2 words farther into memory. This combination pops all the local variables off of the stack as well as the parameters.

The PDP-11 MARK instruction didn't solve all of our problems. It forced us to reserve space on the stack just below the parameters for the return value of the function, and it didn't automate the handling of the frame pointer.

The MARK instruction is genuinely absurd, but it is an impressively effective afterthought. The way the MARK instruction works comes close to being the definitive kluge. Jackson W. Granholm's classic 1962 article in Datamation entitled "How to Design a Kludge" gave the following definition: "An ill-assorted collection of poorly-matching parts, forming a distressing whole." The New Hacker's Dictionary offered these definitions for the word kluge: "A Rube Goldberg (or Heath Robinson) device, whether in hardware or software. A clever programming trick intended to solve a particular nasty case in an expedient, if not clear, manner; often used to repair bugs and often involves ad-hockery and verges on being a crock. Something that works for the wrong reason."

The MARK instruction takes its name from the B5000 architecture, where the term "stack mark" was used to refer to the block of stuff pushed on the stack between the caller's local variables and the part of the stack used by the called code. The Burroughs stack marks were carefully packaged bundles containing the return address and saved frame pointer.

Activation Record Layout

The following table shows the layout of the activation record of the min function as we would expect in the naive model, at a point immediately after the values of the parameters a and b have been pushed on the stack, right before the GT instruction is executed.

     
caller's
 activation 
record
5x   a = returnvalue
45   b
3   return address
2x   temp
1x  
05 ← top of stack
free
space
 

In the above, we assumed that the stack grows down, toward low memory, and we assumed that the return value will go in the location occupied by the first parameter. The displacements from the top of stack are positive when the referent is in the stack. If the stack pointer points to the top item on the stack, the above displacements are the ones that could be used in the code assuming that PUSHL uses the stack pointer instead of the frame pointer. We have assumed word addressing here, for the sake of simplicity.

On a machine with a PDP-11 style MARK instruction, the activation record would look like this:

     
caller's
 activation 
record
6x   a = returnvalue
55   b
4   return address
3MARK 2 
2x   temp
1x  
05 ← top of stack
free
space
 

Again, we have used the location of the first parameter to hold the return value. It is crucial that the return value not be popped off the stack by the function return, so the MARK instruction only popped one of the two parameters from the stack.

Note that if the return value of the function is more than one word, it may be necessary to reserve space for the return value before pushing the parameters. Consider, for example, the case of a function that takes a one-word integer as a parameter and returns a two-word floating point value as a result.

Trying to do it right

The fundamental problem that led to the need for a MARK instruction on the PDP-11 was that the call and return instructions operated by directly pushing and popping the return address on the stack top. This is a fine way to do it on a pure stack machine, but on a machine with general purpose registers, it does not make sense. On machines with general purpose registers, reaching into the stack with indexed addressing is cheap. Therefore, we can use something like the following scheme:

CALL params,dst
CALL the subroutine at address dst, saving the return address on the stack below the parameters:
M[sp + params] = pc
pc = dst;
RETURN arsize
Return to the caller and pop the activation record off of the stack.
pc = M[sp + arsize]
sp = sp + locals + 1

The above definitions assume that arsize is the size of the activation record, in words, including parameters, local variables, and anything else that might be on the stack. The following picture shows the activation record structure for this model, with the snapshot taken at the same instant (after pushing copies of a and b for comparison) as was illustrated in the previous examples.

     
caller's
 activation 
record
6   returnvalue
5   return address
4x   a
35   b
2x   temp
1x  
05 ← top of stack
free
space
 

It is useful, with this model, to have a PUSHL instruction that pushes uninitialized data onto the stack. There is no need to waste a memory cycle pushing a zero in the spot where the return address or the return value will go. For example, before calling a function that returns a one-word value, we should do PUSHL 2 in order to reserve space for both the return address and the return value of the function. Similarly, if our programming language does not provide default initialization of variables, then all of the local variables can be allocated with a single PUSHL instruction that allocates them as a block.

If we are generating code on a general register machine, taking advantage of the fact that the activation record size is always statically known, along with the displacement of all fields within it, we can eliminate the stack pointer entirely and keep a single index register to point to the current activation record. In that case, PUSHL merely does a compile-time computation, incrementing the current size of the activation record without emitting any run-time code. On the ARM, we can't do this because the ARM coding standards ask us to assure that the stack pointer always points to the last word we are using on the stack.