22C:18, Lecture 11, Fall 1996

Douglas W. Jones
University of Iowa Department of Computer Science

Debugging Support in the Hawk Emulator

The Hawk emulator supports a number of commands to help in observing what a program does:

Formally, the i and p commands set what is known as a breakpoint; in the listing, the breakpoint is noted with an asterisk (*) in the memory listing. When the emulator reaches a breakpoint, it stops executing instructions. By default, the breakpoint is set at location zero (and p with no number entered resets that default).

The Hawk emulator will jump to location #0010 if a program attempts to address nonexistant memory, or to write in ROM, and it will jump to location #0020 if a program attempts to execute an unimplemented instruction. These jumps are called traps, and the locations #0010 and #0020 are called the trap vector.

To simplify debugging of programs that "run wild" and cause a trap, we have provided a minimal "operating system" to monitor these addresses and recover some information on how your program got there. This system, monitor.o may be included with your program in two ways. Either include it when you run the emulator:

	hawk /group/22c018/monitor.o yourcode.o
Or, include it into your program directly, by starting your source file as follows:
	TITLE	yourcode
	USE	"/group/22c018/hawk.macs"
	USE	"/group/22c018/monitor.o"
With this monitor, when your program runs wild, the monitor will print an error message and show the program counter, offending memory address, and program status word at the time of the trap. The program counter shown is the address from which the instruction was fetched that committed the offence. The memory address, in the case of a bus trap, is the illegal memory address that was referenced. The monitor program terminates with a jump to zero; this will usually halt the emulator unless a breakpoint was set at the time.

Note that, when the monitor catches a trap, it uses registers 1 to 7 to print its error message, but before jumping to location zero (to force a halt, assuming no odd breakpoints are set), it restores these registers and the PSW to their state at the time of the trap. Thus, the display of those registers on the screen when the monitor halts is correct!

Hawk Data Alignment

When allocating an array or a record, it is natural to imagine the following:

Some compilers actually do this, on machines that support non-aligned memory references, but the cost is significant, and on the Hawk machine, the programming effort required to reference a word or halfword that is not aligned on a word or halfword boundary is very high.

As a result, the SMAL Hawk assembler includes an ALIGN directive (actually a macro in the hawk.macs file) that can be used to force alignment:

	ALIGN	1	; align to a byte boundary
	ALIGN	2	; align to a halfword boundary
	ALIGN	4	; align to a word boundary
Consider the following C declaration and its naive translation to SMAL:
	struct rec {
		char a;
		int b;
		char c;
		int d;
	} array[2] = { { 'x', 1, 'y', 2 },
		       { 'z', 3, 'w', 4 } };
(This declares an array named array of 2 records of 4 fields each, with initial values given. A naive translation of this data structure to SMAL would be:
	array:	B	'x'	; array[0].a
		W	1	; array[0].b
		B	'y'	; array[0].c
		W	2	; array[0].d

		B	'z'	; array[1].a
		W	3	; array[1].b
		B	'w'	; array[1].c
		W	4	; array[1].d
In memory, the SMAL assembler would store the following:
		  byte
           3     2    1     0
	 -----------------------
	| #00 | #00 | #01 | 'x' | 1
	|-----------------------|
	| #00 | #02 | 'y' | #00 | 2
	|-----------------------|
	| #03 | 'z' | #00 | #00 | 3  word
	|-----------------------|
	| 'w' | #00 | #00 | #00 | 4
	|-----------------------|
	| #00 | #00 | #00 | #04 | 5
	 -----------------------
This is exactly 5 words, 4 characters plus 4 full word integers, but writing a program to some field of an arbitrary array element is very messy! Even on machines that support non-aligned memory references, there is a significant performance penalty! Reading a non-aligned word operand from memory takes two memory cycles, and on many machines, writing a non-aligned word operand to memory takes 4 memory cycles (two reads and two writes, although many machines have hardware to speed up writes when they follow immediately after a read from the same location).

Because of this, even if the Hawk machine tried to make non-aligned memory references look inexpensive, we would be better off storing this array in memory as follows:

	array:	B	'x'	; array[0].a
		ALIGN	4
		W	1	; array[0].b
		B	'y'	; array[0].c
		ALIGN	4
		W	2	; array[0].d

		B	'z'	; array[1].a
		ALIGN	4
		W	3	; array[1].b
		B	'w'	; array[1].c
		ALIGN	4
		W	4	; array[1].d
The effect of this is to store the array in memory as follows:
		  byte
           3     2    1     0
	 -----------------------
	|/////|/////|/////| 'x' | 1
	|-----------------------|
	| #00 | #00 | #00 | #01 | 2
	|-----------------------|
	|/////|/////|/////| 'y' | 3
	|-----------------------|
	| #00 | #00 | #00 | #02 | 4
	|-----------------------|   word
	|/////|/////|/////| 'z' | 5
	|-----------------------|
	| #00 | #00 | #00 | #03 | 6
	|-----------------------|
	|/////|/////|/////| 'w' | 7
	|-----------------------|
	| #00 | #00 | #00 | #04 | 8
	 -----------------------
This wastes a significant amount of storage (it comes close to doubling the amount of memory required, in this example), but all of the fields of the array are easy to fetch and manipulate.

In the Pascal programming language, any array or record declaration may be preceeded by the keyword packed. This tells the compiler that it is OK to pack the fields of the array or record as tightly as possible, even if this requires complex and slow code to access components of the resulting structure. C has nothing analogous to this! The semantics of C requires that record fields be allocate in memory in the order they appear in the declaration, while in Pascal, the compiler may reorganize records for more efficient storage.

With the example record, to force more efficient storage allocation, a C programmer can group all character and short-integer fields together, or, at least, group character fields in groups of 4 and short-integer fields in groups of 2. Doing this for the example gives:

	struct rec {
		char a;
		char c;
		int b;
		int d;
	} array[2] = { { 'x', 'y', 1, 2 },
		       { 'z', 'w', 3, 4 } };
This would typically imply a structure such as the following on a machine that did not allow non-aligned words:
		  byte
           3     2     1     0
	 -----------------------
	|/////|/////| 'y' | 'x' | 1
	|-----------------------|
	| #00 | #00 | #00 | #01 | 2
	|-----------------------|
	| #00 | #00 | #00 | #02 | 3
	|-----------------------|   word
	|/////|/////| 'w' | 'z' | 4
	|-----------------------|
	| #00 | #00 | #00 | #03 | 5
	|-----------------------|
	| #00 | #00 | #00 | #04 | 6
	 -----------------------
If the machine allowed non-aligned words, the C programmer might be advised to write:
	struct rec {
		char a;
		char c;
		char pad1,pad2; /* unused fields for padding */
		int b;
		int d;
	} array[2] = { { 'x', 'y', '#', '#', 1, 2 },
		       { 'z', 'w', '#', '#', 3, 4 } };
This explicitly adds extra unused fields to force the integer fields to be aligned on a word boundary, thus allowing single cycle access to those fields. C (or C++) programmers should generally avoid this kind of fiddling with the details of data structure allocation except when the last iota of speed or size must be squeezed out of a program! Furthermore, these kinds of fiddles depend immensely on the details of the CPU and compiler being used. What leads to a significant improvement on an Intel Pentium may do noting for a DEC Alpha, or vica versa!

When programming in assembly language, on the other hand, you must be aware of how fields are packed. If one word is allocated for each character field in a structure, the code is simple. If multiple characters are packed per word, the code is somewhat more complex. If full-word variables are allocated so they straddle word boundaries in memory, the code required to read or write those variables is far more complex.