22C:18, Lecture 11, Fall 1996

Douglas W. Jones
University of Iowa Department of Computer Science

Debugging Support in the Hawk Emulator

The Hawk emulator supports a number of commands to help in observing what a program does:

s - single step - execute one instruction.
r - run - execute instructions.
z - snore? - this command sets the internal timer that the emulator uses to determine how frequently it should "wake up" and display the state of the Hawk machine's registers. The hex value most recently entered from the keyboard is used to determine how frequently the display is are updated.
By default, it shows the registers rather frequently, seriously slowing the execution of Hawk programs with the effort needed to update the screen. By setting the internal timer to 1, the screen will be updated after almost every or every other instruction. Setting it to 1000 will give only a few updates per second, and by setting it to 10000, execution will be very fast. These numbers are in hex, and formally, what is being counted is memory cycles (32 bit fetches and stores).
i - iterate - execute until the program returns to the current location. Typically, you might single step until control enters a loop, and then, after verifying one loop iteration the hard way, use the iterate command to repeatedly execute loop iterations.
Warning! If control does not return to the location where the iterate command was issued, the emulator will run until you hit control-C. On the HP machines, you may really have to bang on control-C to get it to notice. (We can hope HP has fixed the bugs that made catching control-C difficult.)
p - proceed - execute until the program reaches an address entered from the keyboard. This address is entered in hex, and must be entered before the proceed command. If no number is entered first, proceed will execute until the program reaches location 0.
To make effective use of the proceed command, open a second window and use it to look at the assembly listing of the program you are debugging. Then, you can note the location of some instruction that you want to execute up to, enter the address, and then hit p.

Formally, the i and p commands set what is known as a breakpoint; in the listing, the breakpoint is noted with an asterisk (*) in the memory listing. When the emulator reaches a breakpoint, it stops executing instructions. By default, the breakpoint is set at location zero (and p with no number entered resets that default).

The Hawk emulator will jump to location #0010 if a program attempts to address nonexistant memory, or to write in ROM, and it will jump to location #0020 if a program attempts to execute an unimplemented instruction. These jumps are called traps, and the locations #0010 and #0020 are called the trap vector.

To simplify debugging of programs that "run wild" and cause a trap, the linker combines your program with a minimal "operating system" to monitor for traps and recover some information on how your program got there. This system, monitor.o is automatically included with your program when you run the Hawk linker.

	link yourfile.o

With this monitor, when your program runs wild, the monitor will print an error message and show the program counter, offending memory address, and program status word at the time of the trap. The program counter shown is the address from which the instruction was fetched that committed the offence. The memory address, in the case of a bus trap, is the illegal memory address that was referenced. The monitor program terminates with a jump to zero; this will usually halt the emulator unless a breakpoint was set at the time.

Note that, when the monitor catches a trap, it uses registers 1 to 7 to print its error message, but before jumping to location zero (to force a halt, assuming no odd breakpoints are set), it restores these registers and the PSW to their state at the time of the trap. Thus, the display of those registers on the screen when the monitor halts is correct!

You will need to look at the assembler listing file to interpret error messages from the monitor and to use the debugging features discussed here! Furthermore, you will have to be aware that addresses and values marked in the assembly listing with a + are values that will be adjusted by the linker before the program is run. Consider the following assembly listing:

SMAL32, rev  6/97.              Hello World Program, by D.   09:01:34  Page
                                                             Mon Jun 16 199

                             1          TITLE   Hello World Program, by D. 
                             2          USE    "/group/22c018/hawk.macs"
                             3          USE    "/group/22c018/hawk.system"
+000000:+00000000            4
        +00000000
        +00000000
        +00000000
        +00000000
        +00000000
        +00000000
                             5          S       START           ; set start
                             6
                             7  START:
                             8          LOAD    R2,PSTACK       ; set up th
+00001C: F2E0  FFE0          9          CALL    DSPINI
+000020: F1E0  FFE0         10          LEA     R3,HELLO
         F131
+000026: F3C0  000A         11          CALL    DSPST
+00002A: F1E0  FFE2         12          CLR     R1
         F131
+000030: D100               13          JUMPS   R1              ; stop!
+000032: F031               14
+000034: 48  65  6C  6C     15  HELLO:  ASCII   "Hello World!",0
         6F  20  57  6F
         72  6C  64  21
         00
                            16
                            17          END
                    no errors

This listing shows the program starting at address +00001C, but the linker places this in memory starting at address 00001000₁₆, so the actual starting address of the program is 0000101C₁₆.

Note that, unlike the Hello World program given previously, this one does not use the CALL macro or the header file for access to the system. Instead, it exposes, directly, the actual assembly code used to link to the operating system routines.

If you start the hawk emulator on the link.o file resulting from linking the assembly output from the above, the initial emulator output will show the program counter has exactly this value:

 HAWK EMULATOR
   /------------------CPU------------------\   /----MEMORY----\
   PC:  0000101C                R8: 00000000   001018: #0206
   PSW: 00000000  R1: 00000000  R9: 00000000   00101A: #0000
   NZVC: 0 0 0 0  R2: 00000000  RA: 00000000 ->00101C: LOAD    #2,#001000
                  R3: 00000000  RB: 00000000   001020: LOAD    #1,#001004
                  R4: 00000000  RC: 00000000   001024: JSRS    #1,#1
                  R5: 00000000  RD: 00000000   001026: LEA     #3,#001034
                  R6: 00000000  RE: 00000000   00102A: LOAD    #1,#001010
                  R7: 00000000  RF: 00000000   00102E: JSRS    #1,#1

 **HALTED**  r(run) s(step) q(quit) ?(help)

Furthermore, the emulator shows the code at this address is a LOAD instruction, loading register 2 with the contents of memory location 1000₁₆. This brings up another issue! The assembly listing shows the value +00000000 being stored in addresses +000000 to +000018. Since each of these is preceded by a + sign, each is subject to modification by the linker! The addresses are translated to 00001000₁₆ to 00001018₁₆ by the linker, and the contents of these addresses are adjusted so that the point to the operating system data areas specified by the file /group/22c018/hawk.system.

The program listing shown by the emulator is not created by examining the source code of your program! Instead, it is created by "disassembly" of the code in memory! As a result, macros in the source program are shown in expanded form. The sequence LOAD/JSRS at addresses 1020₁₆ and 1024₁₆, for example, is the result of expanding the the CALL macro on line 9 of the source program! This macro was defined in the file /group/22c018/hawk.system.

If you wish to see the values the linker has assigned to symbols that weren't locally defined in your program, look at the file "link.map" produced by the linker. This is called the linkage map, or the map of the linker output, and the map for the example program above is as follows:

SDSPPTR=                #00000004
STRAPBUF=               #00000038
CT=             #00000171
RDSPINI=                #00000178
RDSPAT=         #00000186
RDSPCH=         #000001AE
RDSPST=         #000001C0
RDSPHX=         #000001D6
RDSPDEC=                #00000206
RKBGETC=                #0000024C
RKBGETS=                #00000260
RTIMES=         #000002AE
RDIVIDE=                #000002C8
SSTACK=         #00001000
R=              #00001044
RSTACK=         #00010000
RTRAPBUF=               #00011000
RDSPPTR=                #00011038
C=              #0001103C
RUNUSED=                #0001103C
RUNAVAIL=               #00020000

Note that the map file is sorted by value. An extra letter is added at the front of each identifier; those that start with R are normal external symbols. This map shows that the stack referenced on line 8 of the assembly program begins at location 00010000₁₆ (given by the value of RSTACK) and that the entry point for the DSPST routine in the operating system is at location 000001C0₁₆.

Hawk Data Alignment

When allocating an array or a record, it is natural to imagine the following:

For an array, simply place all of the identical components in adjacent memory locations.
For a record or structure, simply place all of the non-identical fields in adjacent memory locations.

Some compilers actually do this, on machines that support non-aligned memory references, but the cost is significant, and on the Hawk machine, the programming effort required to reference a word or halfword that is not aligned on a word or halfword boundary is very high.

As a result, the SMAL Hawk assembler includes an ALIGN directive (actually a macro in the hawk.macs file) that can be used to force alignment:

	ALIGN	1	; align to a byte boundary
	ALIGN	2	; align to a halfword boundary
	ALIGN	4	; align to a word boundary

Consider the following C declaration and its naive translation to SMAL:

	struct rec {
		char a;
		int b;
		char c;
		int d;
	} array[2] = { { 'x', 1, 'y', 2 },
		       { 'z', 3, 'w', 4 } };

(This declares an array named array of 2 records of 4 fields each, with initial values given. A naive translation of this data structure to SMAL would be:

	array:	B	'x'	; array[0].a
		W	1	; array[0].b
		B	'y'	; array[0].c
		W	2	; array[0].d

		B	'z'	; array[1].a
		W	3	; array[1].b
		B	'w'	; array[1].c
		W	4	; array[1].d

In memory, the SMAL assembler would store the following:

		  byte
           3     2    1     0
	 -----------------------
	| #00 | #00 | #01 | 'x' | 1
	|-----------------------|
	| #00 | #02 | 'y' | #00 | 2
	|-----------------------|
	| #03 | 'z' | #00 | #00 | 3  word
	|-----------------------|
	| 'w' | #00 | #00 | #00 | 4
	|-----------------------|
	| #00 | #00 | #00 | #04 | 5
	 -----------------------

This is exactly 5 words, 4 characters plus 4 full word integers, but writing a program to some field of an arbitrary array element is very messy! Even on machines that support non-aligned memory references, there is a significant performance penalty! Reading a non-aligned word operand from memory takes two memory cycles, and on many machines, writing a non-aligned word operand to memory takes 4 memory cycles (two reads and two writes, although many machines have hardware to speed up writes when they follow immediately after a read from the same location).

Because of this, even if the Hawk machine tried to make non-aligned memory references look inexpensive, we would be better off storing this array in memory as follows:

	array:	B	'x'	; array[0].a
		ALIGN	4
		W	1	; array[0].b
		B	'y'	; array[0].c
		ALIGN	4
		W	2	; array[0].d

		B	'z'	; array[1].a
		ALIGN	4
		W	3	; array[1].b
		B	'w'	; array[1].c
		ALIGN	4
		W	4	; array[1].d

The effect of this is to store the array in memory as follows:

		  byte
           3     2    1     0
	 -----------------------
	|/////|/////|/////| 'x' | 1
	|-----------------------|
	| #00 | #00 | #00 | #01 | 2
	|-----------------------|
	|/////|/////|/////| 'y' | 3
	|-----------------------|
	| #00 | #00 | #00 | #02 | 4
	|-----------------------|   word
	|/////|/////|/////| 'z' | 5
	|-----------------------|
	| #00 | #00 | #00 | #03 | 6
	|-----------------------|
	|/////|/////|/////| 'w' | 7
	|-----------------------|
	| #00 | #00 | #00 | #04 | 8
	 -----------------------

This wastes a significant amount of storage (it comes close to doubling the amount of memory required, in this example), but all of the fields of the array are easy to fetch and manipulate.

In the Pascal programming language, any array or record declaration may be preceeded by the keyword packed. This tells the compiler that it is OK to pack the fields of the array or record as tightly as possible, even if this requires complex and slow code to access components of the resulting structure. C and C++ have nothing analogous to this! The semantics of C requires that record fields be allocate in memory in the order they appear in the declaration, while in Pascal, the compiler may reorganize records for more efficient storage.

With the example record, to force more efficient storage allocation, a C programmer can group all character and short-integer fields together, or, at least, group character fields in groups of 4 and short-integer fields in groups of 2. Doing this for the example gives:

	struct rec {
		char a;
		char c;
		int b;
		int d;
	} array[2] = { { 'x', 'y', 1, 2 },
		       { 'z', 'w', 3, 4 } };

This would typically imply a structure such as the following on a machine that did not allow non-aligned words:

		  byte
           3     2     1     0
	 -----------------------
	|/////|/////| 'y' | 'x' | 1
	|-----------------------|
	| #00 | #00 | #00 | #01 | 2
	|-----------------------|
	| #00 | #00 | #00 | #02 | 3
	|-----------------------|   word
	|/////|/////| 'w' | 'z' | 4
	|-----------------------|
	| #00 | #00 | #00 | #03 | 5
	|-----------------------|
	| #00 | #00 | #00 | #04 | 6
	 -----------------------

If the machine allowed non-aligned words, the C programmer might be advised to write:

	struct rec {
		char a;
		char c;
		char pad1,pad2; /* unused fields for padding */
		int b;
		int d;
	} array[2] = { { 'x', 'y', '#', '#', 1, 2 },
		       { 'z', 'w', '#', '#', 3, 4 } };

This explicitly adds extra unused fields to force the integer fields to be aligned on a word boundary, thus allowing single cycle access to those fields. C (or C++) programmers should generally avoid this kind of fiddling with the details of data structure allocation except when the last iota of speed or size must be squeezed out of a program! Furthermore, these kinds of fiddles depend immensely on the details of the CPU and compiler being used. What leads to a significant improvement on an Intel Pentium may do nothing for a DEC Alpha or vica versa!

When programming in assembly language, on the other hand, you must be aware of how fields are packed. If one word is allocated for each character field in a structure, the code is simple. If multiple characters are packed per word, the code is somewhat more complex. If full-word variables are allocated so they straddle word boundaries in memory, the code required to read or write those variables is far more complex.