Assignment 9, Solutions

Part of the homework for 22C:60 (CS:2630), Fall 2011
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Background: To compute n mod 3, where n is a decimal number, just add up the digits of n and divide that by three. So, 128 mod 3 is the same as 1+2+8 mod 3 which is 11 mod 3 which is 1+1 mod 3 which is 2. This works in decimal because 10 mod 3 equals 1. It also works in any number base b for which b mod 3 equals 1. Using base 4, this gives us the following useful algorithm:

unsigned int mod3(unsigned int x) {
        while (x > 3) {
                x = sum of base 4 digits of x
        }
	if (x == 3) x = 0;
        return x;
}

A problem: Translate the above to Hawk code. You will have to work out how to do the loop body, perhaps with an inner loop, although there are tricks that may be even faster. (1 point)

First, the obvious solution:

MOD3:   ; given x in R3, returns x mod 3 in R3
        ; uses R4, R5 as tempories y and z
MOD3L1:                 ; for (;;) {
        CMPI   R3,3
        BLEU   MOD3Q1   ;   if (x ≤ 3) break;
        MOVE   R4,R3	;   y = x; -- t guaranteed to be nozero
        LIS    R3,0     ;   x = 0
MOD3L2:                 ;   do { -- x = sum of base 4 digits in y
        MOVE   R5,R4
        TRUNC  R5,2
        ADD    R3,R3,R5 ;     x = x + (y % 4)
        SRU    R4,2     ;     y = y / 4
        BZR    MOD3L2   ;   } while (y != 0)
        BR     MOD3L1
MOD3Q1:                 ; }
;       CMPI   R3,3     ; -- note, this instruction not needed
        BNE    MOD3QT   ; if (x == 3) {
        ADDSI  R3,-3    ;   x = x - 3
MOD3QT:                 ; }

Now, a tricky solution that has a much faster worst case, using strange logic with no loops and only one forward conditional branch:

MOD3:   ; given x in R3, returns x mod 3 in R3
        ; uses R4, R5 as tempories y and z

        LIW    R5,#33333333
        MOVE   R4,R3
        SRU    R4,2
        AND    R4,R5    ; y = (x >> 2) & #33333333
        AND    R3,R5    ; x = x & #33333333
        ADD    R3,R3,R4 ; x = x + y;
                        ; -- each 4-bit chunk sums 2 base-4 digits
                        ; -- the maximum value of each chunk is 6
        LIW    R5,#07070707
        MOVE   R4,R3
        SRU    R4,4
        AND    R4,R5    ; y = (x >> 4) & #07070707
        AND    R3,R5    ; x = x & #07070707
        ADD    R3,R3,R4 ; x = x + y;
                        ; -- each 8-bit chunk sums 4 base-4 digits
                        ; -- the maximum value of each chunk is 12
        LIW    R5,#000F000F
        MOVE   R4,R3
        SRU    R4,8
        AND    R4,R5    ; y = (x >> 4) & #000F000F
        AND    R3,R5    ; x = x & #000F000F
        ADD    R3,R3,R4 ; x = x + y;
                        ; -- each 16-bit chunk sums 8 base-4 digits
                        ; -- the maximum value of each chunk is 24
        MOVE   R4,R3
        SRU    R4,16    ; y = (x >> 16)
        TRUNC  R3,5     ; x = x & #0000001F
        ADD    R3,R3,R4 ; x = x + y;
                        ; -- x sums 16 base-4 digits
                        ; -- the maximum value is 48 (16 digits times 3)
                        ; -- this has no more than 3 base-4 digits

	MOVE   R4,R3
        SRU    R4,2
        TRUNC  R4,2     ; y = (x >> 2) & #00000003
	MOVE   R5,R3
        SRU    R5,4
        TRUNC  R5,2     ; z = (x >> 4) & #00000003
        TRUNC  R3,2     ; x = x & #00000003
        ADD    R3,R3,R4
        ADD    R3,R3,R5 ; x = x + y + z
                        ; -- x sums the 3 base-4 digits of the previous sum
                        ; -- the maximum value is now 8 (< 3 digits times 3)
                        ; -- this has no more than 2 base-4 digits

	MOVE   R4,R3
        SRU    R4,2
        TRUNC  R4,2     ; y = (x >> 2) & #00000003
        TRUNC  R3,2     ; x = x & #00000003
        ADD    R3,R3,R5 ; x = x + y
                        ; -- x sums the 2 base-4 digits of the previous sum
                        ; -- the maximum value is now 4 (< 2 digits times 3)

        CMPI   R3,3
        BLT    MOD3QT   ; if (x >= 3) {
        ADDSI  R3,-3    ;   x = x - 3;
MOD3QT:                 ; }
        JUMPS  R1       ; return x -- the original value mod 3!

Yet another solution. This one doesn't sum the base 4 digits; instead, it relies on the fact that not only is i mod 3 the same as the sum of the base 4 digits mod 3, but also the sum of the base 4ⁱ digits, for all integer values of i. So, we can sum the base 16 digits or sum the base 64 digits or sum the base 256 digits. Specifically, we'll first sum the base 2¹⁶ digits (that is, 4⁸), then the base 2⁸ digits (that is), 4⁴, then the base 2⁴ digits, etc.

MOD3:   ; given x in R3, returns x mod 3 in R3
        ; uses R4, R5 as tempories y and z

        MOVE   R4,R3
        SRU    R4,16
        TRUNC  R3,16
        ADD    R3,R3,R4 ; x = (x & 0xFFFF) + (x >> 16);
                        ; -- sum the base 2**16 digits
                        ; -- the maximum value is 0x1FFFE
        MOVE   R4,R3
        SRU    R4,8
        TRUNC  R3,8
        ADD    R3,R3,R4 ; x = (x & 0xFF) + (x >> 8);
                        ; -- sum the base 2**8 digits
                        ; -- the maximum value is 0x2FD
        MOVE   R4,R3
        SRU    R4,4
        TRUNC  R3,4
        ADD    R3,R3,R4 ; x = (x & 0xF) + (x >> 4);
                        ; -- sum the base 2**4 digits
                        ; -- the maximum value is 0x3C
        MOVE   R4,R3
        SRU    R4,2
        TRUNC  R3,2
        ADD    R3,R3,R4 ; x = (x & 0x3) + (x >> 2);
                        ; -- sum the base 2**2 digits
                        ; -- the maximum value is 0x18
                        ; -- this has no more than 3 base-4 digits

	MOVE   R4,R3
        SRU    R4,2
        TRUNC  R4,2     ; y = (x >> 2) & #00000003
	MOVE   R5,R3
        SRU    R5,4
        TRUNC  R5,2     ; z = (x >> 4) & #00000003
        TRUNC  R3,2     ; x = x & #00000003
        ADD    R3,R3,R4
        ADD    R3,R3,R5 ; x = x + y + z
                        ; -- x sums the 3 base-4 digits of the previous sum
                        ; -- the maximum value is now 8 (< 3 digits times 3)
                        ; -- this has no more than 2 base-4 digits

	MOVE   R4,R3
        SRU    R4,2
        TRUNC  R4,2     ; y = (x >> 2) & #00000003
        TRUNC  R3,2     ; x = x & #00000003
        ADD    R3,R3,R5 ; x = x + y
                        ; -- x sums the 2 base-4 digits of the previous sum
                        ; -- the maximum value is now 4 (< 2 digits times 3)

        CMPI   R3,3
        BLT    MOD3QT   ; if (x >= 3) {
        ADDSI  R3,-3    ;   x = x - 3;
MOD3QT:                 ; }
        JUMPS  R1       ; return x -- the original value mod 3!

The display access routines putat() and putchar() implicitly operate on an output window. The Hawk monitor could have been written to support a more general windowing interface if this were made explicit, with a class window and potentially a large number of window objects. If w is a window object, then w.putchar('c') would put the character 'c' in the appropriate place in the window w, and w.putat(0,0) would make the next character output in window w appear in the upper left.
There's quite a bit missing from the above specification: How do you create new windows? How does an application determine the size of a window? Ignore this, but do assume that there may be multiple output devices, perhaps a character-only device such as the default Hawk display and also a graphic output device more akin to what you expect on a modern computer. This means that the class window is inherently polymorphic.
a) Give an appropriate interface specification file window.h to be used by SMAL Hawk programs. (0.5 points)
```
; window.h -- the interface spec for the window manager

; every window object w begins with the following fields:

PUTCHAR =       0       ; pointer ot the w.putchar(ch) routine
                        ;   expects R3 -- pointer to object w
                        ;   expects R4 -- the character ch

PUTAT   =       4       ; pointer ot the w.putat(x,y) routine
                        ;   expects R3 -- pointer to object w
                        ;   expects R4,R5 -- x and y coordinates
```
b) Give code for w.putat(x,y) assuming that w, x and y are all local variables of the calling subroutine. (1.0 points)
```
        LOAD    R3,R2,W         ; -- get w, object pointer
        LOAD    R4,R2,X         ; -- get parameter x
        LOAD    R5,R2,Y         ; -- get parameter y
        ADDI    R2,R2,ARSIZE
        LOAD    R1,R2,PUTAT     ; -- get w.putat() method pointer
        JSRS    R1,R1           ; w.putat(x,y)
        ADDI    R2,R2,-ARSIZE
```

Exercise from Chapter 11 of the Notes:

b) (0.5 points).

	LIL     R1,FPENAB + FPSEL
        COSET   R1,COSTAT       ; enable floating point single precision

	COSET   R4,FPA0         ; -- given R4 holds x
        COSET   R4,FPMUL+FPA0   ; a0 = x * x
	COSET   R5,FPA1         ; -- given R5 holds y
        COSET   R5,FPMUL+FPA1   ; a1 = y * y
        COGET   R3,FPA0
        COSET   R3,FPADD+FPA1   ; a1 = x*x + y*y
        COGET   R3,FPA1
        COSET   R3,FPSQRT+FPA0  ; a0 = sqrt( x*x + y*y )
        COGET   R3,FPA0         ; -- get result into R3, as required

        COSET   R1,COSTAT       ; disable floating point