Lecture 33, Calling Sequences
Part of
the notes for 22C:196:002 (CS:4908:0002)
|
Consider the following C program:
int main() { int i; i = 0; }
If we compile this with the command cc -S ... on a Raspberry Pi computer, the compiler outputs the following assembly code:
.arch armv6 .eabi_attribute 27, 3 .eabi_attribute 28, 1 .fpu vfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 6 .eabi_attribute 18, 4 .file "tt.c" .text .align 2 .global main .type main, %function main: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 1, uses_anonymous_args = 0 @ link register save eliminated. str fp, [sp, #-4]! add fp, sp, #0 sub sp, sp, #12 mov r3, #0 str r3, [fp, #-8] mov r0, r3 add sp, fp, #0 ldmfd sp!, {fp} bx lr .size main, .-main .ident "GCC: (Debian 4.6.3-14+rpi1) 4.6.3" .section .note.GNU-stack,"",%progbits
We can break this down into a number of distinct pieces. The first and most obvious of these is made up of a prologue and an epilogue that surround the output that is specific to this program:
.arch armv6 .eabi_attribute 27, 3 .eabi_attribute 28, 1 .fpu vfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 6 .eabi_attribute 18, 4 .file "tt.c" .text ------------- everything else ------------- .ident "GCC: (Debian 4.6.3-14+rpi1) 4.6.3" .section .note.GNU-stack,"",%progbits
Some pieces of the prologue and epilogue are obvious: There is a directive identifying the instruction set for which this code is to be compiled (armv6), the specific floating point unit to be used (vfp), and a signature of the compiler (GCC 4.6.3). Other parts, the eabi_attributes are more of a mystery, but they declare, to the assembler, various arcane options that are present or absent on the CPU. The attribute settings listed above are correct for the Raspberry PI, so you can merely parrot them without understanding them. This is cargo-cult programming, but acceptable in this context.
The next bit of code we can identify above is a frame around the function main. This frame contains no machine code, and is all about informing the linker that the associated code is a function, how big it is, and its name:
.align 2 .global main .type main, %function ------------- the code for main ------------- .size main, .-main
This declares the alignment required (align 2 means align to the next 4-byte boundary). Following this, it declares that the identifier main is globally defined (relative to the linker's scope system), and that it is a function. Finally, after the code for the function is given, it declares it gives an assembly directive that computes the size of the function, so that the linker can know how much memory to set aside for the code. These assembly directives must frame every globally visible function. The linker does not need to know about local functions that are, effectively, private property of one compilation unit, so long as their memory requirements are folded into one of the memory blocks known to the linker.
For any subroutine, it is possible to identify three closely related sequences of instructions:
In the code in question, we can see the receivig and return sequences:
main: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 1, uses_anonymous_args = 0 @ link register save eliminated. str fp, [sp, #-4]! add fp, sp, #0 sub sp, sp, #12 ------------- the code for the subroutine body ------------- add sp, fp, #0 ldmfd sp!, {fp} bx lr
Note that the ARM version of the Gnu assembler gas uses the at-sign as a comment marker. The compiler has helpfully provided us with minimal comments, one of which warns that the receivig and return sequences given here are slightly optimized by eliminating the need for a link register.
Most of the code here involves management of the activation record or stack frame of the subroutine. The sub instruction that subtracts 12 from the stack pointer allocates the activation record for this routine. The two add instructions save the former stack pointer n the frame pointer (on entry) and restore the stack pointer (before return). The str and ldmfd instructions save and restore the frame pointer. In the more general case, these also save and restore the link register.
Consider the following C program:
void f(){ int i; i = 1; } int main(){ int i; i = 2; f(); return i; }
The above code allows us to find out a bit more. Looking at the assembly code it produces, we find the following: First, the function f has the same optimized entry and return sequemce as the main program did in our first example. Because it is a void function, it contains no code to return anything:
f: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 1, uses_anonymous_args = 0 @ link register save eliminated. str fp, [sp, #-4]! add fp, sp, #0 sub sp, sp, #12 mov r3, #1 str r3, [fp, #-8] add sp, fp, #0 ldmfd sp!, {fp} bx lr
The main program now includes the general entry and return sequence, because it is forced to save its link register by the call to a subsidiary subroutine.
main: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 1, uses_anonymous_args = 0 stmfd sp!, {fp, lr} add fp, sp, #4 sub sp, sp, #8 ------------- the code for the subroutine body ------------- sub sp, fp, #4 ldmfd sp!, {fp, pc}
In the general case, the stmfd at the head of the subroutine is matched by an ldmfd at the end. The frame pointer and link register are saved and restored by these.
The main program contains a call to a parameterless void function.
bl f
This illustraes that the rather complex entry and return sequence we have seen has a payoff! The calling sequence is, in this case, just one instruction, bl (branch and link).
The final statement of the main program is return i. This compiles to produce the following sequence of instructions:
ldr r3, [fp, #-8] mov r0, r3
The first, ldr loads r3 with the value of the local variable i, indexed from the frame pointer with the displacement -8. The second statement moves the return value to r0. The presence of this move instruction demonstrates that all optimizaiton has been turned off in this compiler -- an optimizer would have loaded the return value directly into r0.