Assignment 4, due Sept. 20

Solutions

Part of the homework for 22C:112, Fall 2013
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

On all assignments, your name must be legible as it appears on your University ID card! Assignments are due at the start of class on the day indicated (usually Friday). Exceptions will be by advance arrangement unless there is what lawyers call "an act of God" (something outside your control). Homework must be turned in on paper, either in class or in the teaching assistant's mailbox. Never push late work under someone's door!

Background: Consider a computer with the following characteristics (patterned on the MIPS architecture, but not identical to it):
- A 32-bit instruction word, all instructions are word-aligned.
- no direct memory addressing mode (all addressing is indexed).
- PC relative addressing uses the program counter as an index register.
- Immediate operands are composed of a 16-bit constant and a 5-bit rotate count.
- Displacements for index registers are identical to immediate operands.
- The add instruction allows an immediate operand.
In detail, the constant field of an immediate operand is sign-extended and then rotated left from 0 to 31 places. A load-immediate followed by an add immediate or an or-immediate can load an arbitrary 32-bit constant.
There are several ways a relocating loader could deal with this machine:
1. It could insist that the only relocatable locations are full-word memory locations,
2. it could allow relocation of halword constants in the immediate field of an instruction word, taking into account the rotate count in that instruction, or
3. it could restrict relocation of halfword constants by requiring them to be in a fixed relationship, for example, the high half in the immediate field of the first instruction word and the low half in the immediate field of the next successive instruction word.
a) Just to see if you understand this instruction format, About how many different ways are there to load the hexidecimal constant 01234567 into some particular index register using only immediate operands and no more than two instructions? (Suggested answers range from "it can't be done" to "on the order of 64") (0.2 points)
We have to load the constant as 2 chunks of 16 bits each. We can do this with a load immediate followed by an add immediate in at least 32 distinct ways. Here are a few of them, expressed using something like C notation except that we use the &carr; symbol for a circular shift to the left:
```
r = (0x0123 &carr; 16) + (0x4567 &carr;  0)
r = (0x1234 &carr; 12) + (0x5670 &carr; 28)
r = (0x2345 &carr;  8) + (0x6701 &carr; 24)
r = (0x3456 &carr;  4) + (0x7012 &carr; 20)
r = (0x4567 &carr;  0) + (0x0123 &carr; 16)
r = (0x5670 &carr; 28) + (0x1234 &carr; 12)
r = (0x6701 &carr; 24) + (0x2345 &carr;  8)
r = (0x7012 &carr; 20) + (0x3456 &carr;  4)
```
In all of the above the 2 16-bit fields are non-overlapping. There are actually more solutions because of possible overlap. Consider these:
```
r = (0x0246 &carr; 15) + (0x4567 &carr;  0)
r = (0x048C &carr; 14) + (0x4567 &carr;  0)
r = (0x0918 &carr; 13) + (0x4567 &carr;  0)
r = (0x1230 &carr; 12) + (0x4567 &carr;  0)
r = (0x2460 &carr; 11) + (0x4567 &carr;  0)
r = (0x48C0 &carr; 10) + (0x4567 &carr;  0)
r = (0x9180 &carr; 09) + (0x4567 &carr;  0)
```
The above solutions involve overlaps in the 16-bit fields, but the nonzero parts of those fields do not overlap. There are even more solutions if you divide the nonzero parts of the fields differently and actually take advantage of the add instruction:
```
r = (0x1234 &carr; 12) + (0x0567 &carr;  0)
r = (0x1233 &carr; 12) + (0x1567 &carr;  0)
r = (0x1232 &carr; 12) + (0x2567 &carr;  0)
r = (0x1231 &carr; 12) + (0x3567 &carr;  0)
r = (0x122F &carr; 12) + (0x5567 &carr;  0)
r = (0x122E &carr; 12) + (0x6567 &carr;  0)
r = (0x122D &carr; 12) + (0x7567 &carr;  0)
r = (0x122C &carr; 12) + (0x8567 &carr;  0)
r = (0x122B &carr; 12) + (0x9567 &carr;  0)
r = (0x122A &carr; 12) + (0xA567 &carr;  0)
```
Needless to say, there are many more solutions. The answer 32 is the smallest reasonable answer, and those guessing higher numbers have a better intuitive grasp of this architecture. It is fair to guess that the actual answer could exceed 1024.
b) If relocation applies only to aligned 32-bit words (alternative i), how would a programmer write code (or a compatible compiler generate code) to load an arbitrary relocatable constant (the address of a statically allocated variable) into a register. (0.4 points)

The constant subject to relocation must be stored in a full word of memory, for example, just after an unconditional branch instruction near the point where the constant is needed. PC relative addressing can be used to load that one-word constant.

c) Alternative iii) given above works but it is restrictive. Why is the following implementation of alternative ii) wrong? (Hint: It almost works, the failure is subtle. Consider what it does when loading an arbitrary 32 bit value using a load-immediate and an or-immediate 2-instruction sequence.) (0.4 points)
```
// extract fields from word
value = word.16bit_const_field
shift = word.5bit_shift_field
// relocate
value = left_rotate( value, shift )
value = value + relocation_base
value = right_rotate( value, shift )
// pack result back into word
word.16bit_const_field = value
```
Suppose I want to load the constant (hexadecimal) ABCDE which is to be relocated. I could load this as follows, using the notation from part a):
```
r = (0x000A &carr; 16) + (0xBCDE &carr;  0)
```
Now, suppose the relocation base is (hexadecimal)8000. The relocating loader takes the value BCDE, adds 8000, and gets 13CDE, which it then truncates to 16 bits. As a result, the value loaded is:
```
r = (0x000A &carr; 16) + (0x3CDE &carr;  0)
```
When it should be:
```
r = (0x000B &carr; 16) + (0x3CDE &carr;  0)
```
In short, the problem with this solution is that it fails when the relocation produces a carry out of one part of the loaded constant into another.
A Quick Question: In Unix (and Linux), the cc command runs the C compiler. This is how the command is usually introduced to new C programmers, but it is false. Give an example use of the command that does not invoke any compiler. (0.5 points)
```
stackit: main.o stack.o
        cc -o stackit main.o stack.o
```
The above line, from the example makefile in the manual of C style, uses cc to invoke the linker.
Background: Consider this mess:
- The program table.c reads data from standard input, and writes a set of C constant definitons to standard output.
- The file data contains data in the form expected by table.c.
- The program application.c has an #include directive for the file table.h that is supposed to define a number of constants derived by table.c from data.
- The result of compiling applicaton.c should be in an executable file named applicaiton.
A Not-so Quick Question: Write a makefile that binds all of the above so that you can make application to put it all together. (1.0 points)
```
# Makefile

application: application.c table.h
        cc -o application application.c

table.h: data table
        table < data > table.h

table: table.c
        cc -o table table.c
```
Background: There are loaders, relocating loaders, and linking loaders, all are possible.
A Problem: From what you know about the Unix/Linux execve command, which kind of loader does it incorporate? The answer can be inferred from what you already know. (0.5 points)

It must be a relocating loader. The linkage was already done, for example, by a makefile, and at the time you exec the program, you do not have the option of specifying object libraries, multiple binaries to be linked, or or other things that linking loaders typically use. And, there must be relocation facilities because you do not specify where in RAM the object file goes.
What you can't tell from the external view of execve is whether it uses load-time relocation (a classic linking loader) or run-time relocation of the sort you can do with MMU hardware. It turns out that different Posix compatible systems can answer the question differently, but since we have not talked about MMU hardware, we can ignore this issue.