8. The SMAL Linker

Part of the SMAL Manual
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Index

  1. The Linker Control Sublanguage
  2. Linker Control Functions
  3. Producing an Object File as Linker Output
  4. Linker Control Macros
  5. Object Libraries
  6. Library Macros
  7. A Simplified Linker for Unix

When the SMAL assembler processes a source program, it produces an object file. Object files are encoded in the SMAL object sublanguage, and may be linked into a load file by the SMAL assembler. When the assembler is used for this linkage editing function, it is controlled by a linker control program, encoded in the SMAL32 linker control sublanguage. As a linker, the assembler may process object library files in addition to simple object files; such object library files must be encoded in the SMAL32 object library sublanguage. Finally, the output of the assembler when it is used as a linkage editor is either another object file or a load file encoded in the SMAL loader sublanguage.

As the terminology suggests, object files, linker control files, and load files are all legal SMAL programs, encoded in particular subsets of the language. Object files have less symbolic information in them than source files, and load files have almost no symbolic content (what remains is concerned only with establishing relocatability). SMAL32 linker control files and object library files consist primarily of information concerning how to find each part of the program to be linked and when to link it.

SMAL32 is targeted for 32 bit architectures, so the sizes of all relocatable object files and the sizes of all common blocks are rounded up to a 32 bit boundary (so that the last two bits of each size are zero).

8.1. The Linker Control Sublanguage

A SMAL linker control program consists of an optional sequence of linker controls, followed by a sequence of object specifications which name the files to be linked, followed by an optional common specification.

<linker control program> ::= { <linker control> }
                             { <object specification> }
                             [ <common specification> ]
                               <end of file>

If the object files to be linked do not reference any common regions, and if the result to be produced is intended for loading only, then the linker control program need not contain anything but object specifications.

<object specification> ::= USE <quoted string> <line end>

Each object specification is a USE directive (see Section 7.1) which names either an object file or a library file. The object files named will be loaded sequentially in memory, while the library files will be searched to see if they contain definitions of any symbols referenced by the programs already loaded. If so, appropriate material from the library files is also loaded. If the files main.o, sub1.o, and sub2.o are SMAL object files, the following linker control program could be used to link them:

    USE "main.o"
    USE "sub1.o"
    USE "sub2.o"

If there is some external symbol used by one of these programs but not defined in another of them, an "undefined symbol" error will be produced on the terminal and in the listing file.

If any of the object files to be linked contain references to commons, the linker must be told where to place the commons. The linkage-time symbol C determines where commons will be placed. The common specification may take a number of forms. In the simplest form, the linker is told to place the commons in memory immediately after everything else by setting C (see Section 2.2) after all of the object specifications:

<common specification> ::= C = . <line end>

The following example loader control program illustrates this form of use of commons:

    USE "main.o"
    USE "sub1.o"
    USE "sub2.o"
    USE "library.o"
    C=.

Note that the common specification is placed after all object specifications, including any which reference libraries.

If programs are to be loaded in random access memory in a conventional way, the above should suffice. On the other hand, there are times when common regions should not be loaded in the same memory as normal object code, for example, on systems where one is to be in read-only memory and the other in read-write memory. In this case, the following form can be used:

<common specification> ::= C = <number> <line end>

Here, C is set (see Section 2.2) to the absolute address where the first common is to be placed. all other commons will be placed successively in memory after it. Note that this form of common specification may be safely placed before the sequence of object specifications with linker controls. To assure word alignment of commons, the address given here must be word aligned.

8.2. Linker Control Functions

The normal action of the SMAL assembler, when used as a linker, is to produce a relocatable load file. This normal behavior may be changed in a number of ways by the linker controls. For example, to create an absolute load file, the following control can be used:

<linker control> ::= . = <number> <line end>

To assure word alignment of object files, the address specified here must be word aligned.

For example, consider a tiny system with 4K bytes of read-write memory starting at address 000016 and with 16K bytes of read-only memory starting at address 100016. If the source programs for the object files involved were consistent in their use of common for all read-write variables, maintaining a strict separation between code and data, then the following linker control file could be used to prepare a load file to be used in "burning" the read-only memory:

    .=#1000
    USE "main.o"
    USE "sub1.o"
    USE "sub2.o"
    C=#0000

Note the use of the common specification to control the placement of all commons in the read-write region.

At times, there may be some commons or other external symbols which must be placed in specific memory locations, for example, a program may reference a block of perepheral device control registers as a common, allowing adjustments for the particular machine configuration to be made at linkage time. In this case, the following form can be used:

<linker control> ::= R<identifier> =: <number> <line end>

Note that an external name is a legal identifier which begins with the letter R. External names are formed by adding the R prefix to the identifier used internally in a source program as operands on an EXT, INT, or COMMON directives (see Chapter 4). Definition using =: (see Section 2.2) guarantees that there will be a linkage time error message if the same external symbol is also defined by an INT directive in one of the object files being linked.

When it is necessary to specify the size of a common at linkage time instead of at assembly time, the following linker control may be used:

<linker control> ::= S<identifier> =: <number> <line end>

When this form is used, it will override the size of the given common declared in the programs being linked. To maintain alignment of data in commons, this size should be an even number of words. Definition using =: (see Section 2.2) guarantees that the size will be set only once. It is possible to decrease the size of a common below that which a program declared that it needed. When common sizes are given at linkage time, the data structure in common itself should begin with an external reference which the linker can set to indicate the size of the common actually allocated.

Finally, if the starting address (see Section 7.3) of a program is not specified internally in any of the routines being linked, it may be specified at linkage time as follows:

<linker control> ::= S R<identifier>

Where the identifier is an internal symbol (see Section 4.2) defined in one of the files being linked. An error will result if this directive is used and one of the linked routines already has a specified starting address.

8.3. Producing an Object File as Linker Output

It is possible to direct the SMAL assembler, as a linker, to produce an object file as output instead of a load file. This is done, for example, when the set of object files being linked does not comprise an entire program, but only part of one; thus, the result may contain unresolved references to external symbols, and it may define internal symbols (see Chapter 4). In this case, linker control directives must be used to inform the assembler, as a linker, which external symbols are to be exported by or imported to the resulting object file. Export is done as follows:

<linker control> ::= <identifier> = [ : ] R<identifier> <line end>
                     INT <identifier> <line end>

Export using = allows redefinition, while export using =: is non-redefinable (see Section 2.2). The identifiers used on the two lines of this directive should be the same. Import is done as follows:

<linker control> ::= EXT <identifier> <line end>
                     R<identifier> = [ : ] <identifier> <line end>

Again, the identifiers must be the same, and the effect is to make all references to the given external name from within any of the object files being linked into references to a symbol external to the object file resulting from the linkage process. The following example SMAL linker control file illustrates these forms:

    EXT DATA
    RDATA = DATA
    ENTRY = RENTRY
    INT ENTRY
    USE "entry.o"
    USE "table.o"
    C=.

If the object file entry.o was generated from source containing EXT TABLE and INT ENTRY, and the object file table.o was generated from source containing INT TABLE and EXT DATA then the result of processing this linkage control file will be a relocatable object file referencing the external name RDATA and defining the external name RENTRY.

When a common area declared in one of the programs being linked is to be preserved as a common area in the resulting object file, thus allowing it to be shared with other programs linked to the resulting object file, a special form of import/export must be used:

<linker control> ::= COMMON <identifier> , S<identifier> <line end>
                     R<identifier> =: <identifier> <line end>

Note that the identifiers must all be the same, and that the special identifier consisting of an S prefix attached to the common name gives the declared size of the first instance of that common.

Any commons used by the programs being linked which are not to be preserved in the final object file will usually be lumped onto the end of that file as local data by using a C=. common specification at the end of the linker control file. An alternative is to lump the commons which are not individually preserved into a single common in the resulting object file. This is useful, for example, when the program must be able to be stored in read-only memory and the commons must be stored in, or rather, must refer to read-write memory. When this is desired, the following form can be used:

<common specification> ::= COMMON <identifier> , C <line end>
			   C = <identifier> <line end>

Note, here, that all of the identifiers must be the same. The following modified version of the previous example uses this form to allow the result to be linked with commons separated from normal relocatable code:

    EXT DATA
    RDATA =: DATA
    ENTRY =: RENTRY
    INT ENTRY
    COMMON ENTRYCOM,C
    C=ENTRYCOM
    USE "entry.o"
    USE "table.o"

Note the use in the above example of the convention that the common used to store data for a particular routine (or, as in this case, collection of routines) has its name formed by appending the suffix COM to the name of the entry point for that routine.

It should be noted that there is an important restriction on the set of symbols which can be exported from or imported to an object file created as linker output: The symbols R and C have special meaning during the linkage process; the linker also reserves symbols formed with R as a prefix on an identifier being exported by one of the files being linked, or with R or S as a prefix on the name of a common.

8.4. Linker Control Macros

The last four linker control directives discussed above are complex enough that it is conventional to package them as macros (see Chapter 6). The definitions of these macros should be inserted with a USE directive (see Section 7.1) at the head of the linker control file. Typically, the macros might be packaged in a file called linkmacs.h, so the following new linker directive would be introduced:

<linker control> ::= USE "linkmacs.h"

Inclusion of this linker control directive at the head of a file would define the following new linker control directives:

<linker control> ::= IMPORTS <identifier>
                   | EXPORTS <identifier>
                   | PRESERVECOMMON <identifier>
                   | LUMPCOMMONS <identifier>

These would be defined by the following macros:

    MACRO IMPORTS NAME
      EXT NAME
      R'NAME =: NAME
    ENDMAC

    MACRO EXPORTS NAME
      NAME =: R'NAME
      INT NAME
    ENDMAC

    MACRO PRESERVECOMMON NAME
      COMMON NAME,S'NAME
      R'NAME =: NAME
    ENDMAC

    MACRO LUMPCOMMONS NAME
      COMMON NAME,C
      C=NAME
    ENDMAC

In terms of these macros, the last example above could be rewritten as follows:

    USE "lincmacs.h"
    IMPORTS DATA
    EXPORTS ENTRY
    LUMPCOMMONS ENTRYCOM
    USE "entry.o"
    USE "table.o"

8.5. Object Libraries

A SMAL object library file controls linkage of object code files. Library files are textual and may be maintained by a conventional text editor. Unlike many other systems, library files do not contain object code, only references to object files. The text of the library file is written in the SMAL object library sublanguage.

<library file> ::= { <object reference> } <end of file>

Each object reference specifies the name of an object file and the names of one or more symbols which that object file defines:

<object reference> ::= IF <symbol list> <line end>
                         USE <quoted string> <line end>
                       ENDIF <line end>
<symbol list> ::= <symbol reference> { ! <symbol reference> }
<symbol reference> ::= FWD ( R<identifier> )

The mechanism used here rests on the FWD function (see Section 2.6) and simple conditional directives (see Section 5.1).

An object reference causes the SMAL assembler, when used as a linker, to link an object file if the previously loaded material has made references to any symbol in the symbol list and that symbol has not yet been defined. Each symbol in the symbol list takes the form of an external name, that is, it starts with an R prefix on an identifier used as an operand of a COMMON or INT directive (see Chapter 4) in the object file. See the following example:

    IF FWD(RSIN) ! FWD(RCOS) ! FWD(RTAN)
      USE "trigfunc.o"
    ENDIF
    IF FWD(REXP) ! FWD(RPOWER)
      USE "expfunc.o"
    ENDIF
    IF FWD(RLOG) ! FWD(RLN)
      USE "logfunc.o"
    ENDIF
    IF FWD(RFLOAT)
      USE "floatfunc.o"
    ENDIF
    IF FWD(RFIX)
      USE "fixfunc.o"
    ENDIF

Note that, if one of the object files in the library references another, that file should be placed before the file it relies on. It is legal to repeat an object reference any number of times in an object library file, and it is legal to have multiple object references with different symbol references which refer to the same object file.

When SMAL is run under a hierarchically sturctured file system, each library should be organized as a single directory, with the library file itself stored under that directory in a file named dir.o, and all other object files in the library stored in the same directory. If the USE directive (see Section 7.1) properly interprets relative file names, the library as a whole can be copied using a tree copy operation without the need to modify any internal file names.

8.6. Library Macros

The form of an object reference is complex enough that it is convenient to package it as a macro (see Chapter 6), thus introducing the following new form:

<object reference> ::= LIB <quoted string> , <identifier> <line end>

The macro to do this should be included in the file linkmacs.h used for linkage editor macros; this macro can be defined as follows:

    MACRO LIB FILE,NAME
      IF FWD(R'NAME)
        USE FILE
      ENDIF
    ENDMAC

Using this form, the above example library could be recoded as follows:

    LIB "trigfunc.o",SIN
    LIB "trigfunc.o",COS
    LIB "trigfunc.o",TAN
    LIB "expfunc.o",EXP
    LIB "expfunc.o",POWER
    LIB "logfunc.o",LOG
    LIB "logfunc.o",LN
    LIB "floatfunc.o",FLOAT
    LIB "fixfunc.o",FIX

Note that this sequence of macro calls does not expand to the same sequence of directives as was used in the previous example, but that it expands to a sequence of directives with the same effect.

8.7. A Simplified Linker for Unix

Unix (including Linux) users are generally familiar with the idea of a link command with the following description:

    % link xx.o yy.o zz.o

This links the object or library files xx.o, yy.o and zz.o and produces output to a file called link.o. (The name a.out would be even more in the Unix tradition, but becaue the file formats are incompatable, we don't use it). The following Unix shell script can be used to implement a link command with this behavior:

    !#/bin/sh
    # sh script (Bourne or BASH shell)
    #    link xx.o yy.o zz.o
    #    produces output to link.o

    if [ -help = $1 ]
    then
        echo "link a b c
         links SMAL object and library files named a, b and c
         into a new loadable file called link.o"
        exit 1
    fi

    echo TITLE smallink $* > link.a
    echo .=#0 >> link.a
    while [ $# -gt 0 ]
    do
        echo USE \"$1\" >> link.a
        shift
    done
    echo RUNUSED=:C >> link.a
    echo C=. >> link.a
    smal -D -L link.a
    rm link.a
    sort +2-3 link.d > link.map

This script first creates a file link.a, written in the linker control sublanguage described above, and then runs the SMAL assembler, as a linker, using this script. The script given above is only an example and as a suggested starting point for development of scripts for specific machines. As a general rule, as illustrated above, the linkage script for a particular machine should offer help in response to the command line argument -help.

The script given above produces output in the loader sublanguage, starting at absolute address zero, and it places all common blocks immediately after all object code. In addition, it defines the external symbol UNUSED as the address of the next location after the last loaded common block. More complex scripts will typically be needed to account for machines with separate program and data segments.

This script leaves the linker's symbol table, sorted into numerical order, in a file named link.map.