Assignment 9, Solutions

Part of the homework for 22C:60 (CS:2630), Fall 2011
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

  1. Background: To compute n mod 3, where n is a decimal number, just add up the digits of n and divide that by three. So, 128 mod 3 is the same as 1+2+8 mod 3 which is 11 mod 3 which is 1+1 mod 3 which is 2. This works in decimal because 10 mod 3 equals 1. It also works in any number base b for which b mod 3 equals 1. Using base 4, this gives us the following useful algorithm:
    unsigned int mod3(unsigned int x) {
            while (x > 3) {
                    x = sum of base 4 digits of x
            }
    	if (x == 3) x = 0;
            return x;
    }
    

    A problem: Translate the above to Hawk code. You will have to work out how to do the loop body, perhaps with an inner loop, although there are tricks that may be even faster. (1 point)

    First, the obvious solution:
    MOD3:   ; given x in R3, returns x mod 3 in R3
            ; uses R4, R5 as tempories y and z
    MOD3L1:                 ; for (;;) {
            CMPI   R3,3
            BLEU   MOD3Q1   ;   if (x ≤ 3) break;
            MOVE   R4,R3	;   y = x; -- t guaranteed to be nozero
            LIS    R3,0     ;   x = 0
    MOD3L2:                 ;   do { -- x = sum of base 4 digits in y
            MOVE   R5,R4
            TRUNC  R5,2
            ADD    R3,R3,R5 ;     x = x + (y % 4)
            SRU    R4,2     ;     y = y / 4
            BZR    MOD3L2   ;   } while (y != 0)
            BR     MOD3L1
    MOD3Q1:                 ; }
    ;       CMPI   R3,3     ; -- note, this instruction not needed
            BNE    MOD3QT   ; if (x == 3) {
            ADDSI  R3,-3    ;   x = x - 3
    MOD3QT:                 ; }
    

    Now, a tricky solution that has a much faster worst case, using strange logic with no loops and only one forward conditional branch:

    MOD3:   ; given x in R3, returns x mod 3 in R3
            ; uses R4, R5 as tempories y and z
    
            LIW    R5,#33333333
            MOVE   R4,R3
            SRU    R4,2
            AND    R4,R5    ; y = (x >> 2) & #33333333
            AND    R3,R5    ; x = x & #33333333
            ADD    R3,R3,R4 ; x = x + y;
                            ; -- each 4-bit chunk sums 2 base-4 digits
                            ; -- the maximum value of each chunk is 6
            LIW    R5,#07070707
            MOVE   R4,R3
            SRU    R4,4
            AND    R4,R5    ; y = (x >> 4) & #07070707
            AND    R3,R5    ; x = x & #07070707
            ADD    R3,R3,R4 ; x = x + y;
                            ; -- each 8-bit chunk sums 4 base-4 digits
                            ; -- the maximum value of each chunk is 12
            LIW    R5,#000F000F
            MOVE   R4,R3
            SRU    R4,8
            AND    R4,R5    ; y = (x >> 4) & #000F000F
            AND    R3,R5    ; x = x & #000F000F
            ADD    R3,R3,R4 ; x = x + y;
                            ; -- each 16-bit chunk sums 8 base-4 digits
                            ; -- the maximum value of each chunk is 24
            MOVE   R4,R3
            SRU    R4,16    ; y = (x >> 16)
            TRUNC  R3,5     ; x = x & #0000001F
            ADD    R3,R3,R4 ; x = x + y;
                            ; -- x sums 16 base-4 digits
                            ; -- the maximum value is 48 (16 digits times 3)
                            ; -- this has no more than 3 base-4 digits
    
    	MOVE   R4,R3
            SRU    R4,2
            TRUNC  R4,2     ; y = (x >> 2) & #00000003
    	MOVE   R5,R3
            SRU    R5,4
            TRUNC  R5,2     ; z = (x >> 4) & #00000003
            TRUNC  R3,2     ; x = x & #00000003
            ADD    R3,R3,R4
            ADD    R3,R3,R5 ; x = x + y + z
                            ; -- x sums the 3 base-4 digits of the previous sum
                            ; -- the maximum value is now 8 (< 3 digits times 3)
                            ; -- this has no more than 2 base-4 digits
    
    	MOVE   R4,R3
            SRU    R4,2
            TRUNC  R4,2     ; y = (x >> 2) & #00000003
            TRUNC  R3,2     ; x = x & #00000003
            ADD    R3,R3,R5 ; x = x + y
                            ; -- x sums the 2 base-4 digits of the previous sum
                            ; -- the maximum value is now 4 (< 2 digits times 3)
    
            CMPI   R3,3
            BLT    MOD3QT   ; if (x >= 3) {
            ADDSI  R3,-3    ;   x = x - 3;
    MOD3QT:                 ; }
            JUMPS  R1       ; return x -- the original value mod 3!
    

    Yet another solution. This one doesn't sum the base 4 digits; instead, it relies on the fact that not only is i mod 3 the same as the sum of the base 4 digits mod 3, but also the sum of the base 4i digits, for all integer values of i. So, we can sum the base 16 digits or sum the base 64 digits or sum the base 256 digits. Specifically, we'll first sum the base 216 digits (that is, 48), then the base 28 digits (that is), 44, then the base 24 digits, etc.

    MOD3:   ; given x in R3, returns x mod 3 in R3
            ; uses R4, R5 as tempories y and z
    
            MOVE   R4,R3
            SRU    R4,16
            TRUNC  R3,16
            ADD    R3,R3,R4 ; x = (x & 0xFFFF) + (x >> 16);
                            ; -- sum the base 2**16 digits
                            ; -- the maximum value is 0x1FFFE
            MOVE   R4,R3
            SRU    R4,8
            TRUNC  R3,8
            ADD    R3,R3,R4 ; x = (x & 0xFF) + (x >> 8);
                            ; -- sum the base 2**8 digits
                            ; -- the maximum value is 0x2FD
            MOVE   R4,R3
            SRU    R4,4
            TRUNC  R3,4
            ADD    R3,R3,R4 ; x = (x & 0xF) + (x >> 4);
                            ; -- sum the base 2**4 digits
                            ; -- the maximum value is 0x3C
            MOVE   R4,R3
            SRU    R4,2
            TRUNC  R3,2
            ADD    R3,R3,R4 ; x = (x & 0x3) + (x >> 2);
                            ; -- sum the base 2**2 digits
                            ; -- the maximum value is 0x18
                            ; -- this has no more than 3 base-4 digits
    
    	MOVE   R4,R3
            SRU    R4,2
            TRUNC  R4,2     ; y = (x >> 2) & #00000003
    	MOVE   R5,R3
            SRU    R5,4
            TRUNC  R5,2     ; z = (x >> 4) & #00000003
            TRUNC  R3,2     ; x = x & #00000003
            ADD    R3,R3,R4
            ADD    R3,R3,R5 ; x = x + y + z
                            ; -- x sums the 3 base-4 digits of the previous sum
                            ; -- the maximum value is now 8 (< 3 digits times 3)
                            ; -- this has no more than 2 base-4 digits
    
    	MOVE   R4,R3
            SRU    R4,2
            TRUNC  R4,2     ; y = (x >> 2) & #00000003
            TRUNC  R3,2     ; x = x & #00000003
            ADD    R3,R3,R5 ; x = x + y
                            ; -- x sums the 2 base-4 digits of the previous sum
                            ; -- the maximum value is now 4 (< 2 digits times 3)
    
            CMPI   R3,3
            BLT    MOD3QT   ; if (x >= 3) {
            ADDSI  R3,-3    ;   x = x - 3;
    MOD3QT:                 ; }
            JUMPS  R1       ; return x -- the original value mod 3!
    

  2. The display access routines putat() and putchar() implicitly operate on an output window. The Hawk monitor could have been written to support a more general windowing interface if this were made explicit, with a class window and potentially a large number of window objects. If w is a window object, then w.putchar('c') would put the character 'c' in the appropriate place in the window w, and w.putat(0,0) would make the next character output in window w appear in the upper left.

    There's quite a bit missing from the above specification: How do you create new windows? How does an application determine the size of a window? Ignore this, but do assume that there may be multiple output devices, perhaps a character-only device such as the default Hawk display and also a graphic output device more akin to what you expect on a modern computer. This means that the class window is inherently polymorphic.

    a) Give an appropriate interface specification file window.h to be used by SMAL Hawk programs. (0.5 points)

    ; window.h -- the interface spec for the window manager
    
    ; every window object w begins with the following fields:
    
    PUTCHAR =       0       ; pointer ot the w.putchar(ch) routine
                            ;   expects R3 -- pointer to object w
                            ;   expects R4 -- the character ch
    
    PUTAT   =       4       ; pointer ot the w.putat(x,y) routine
                            ;   expects R3 -- pointer to object w
                            ;   expects R4,R5 -- x and y coordinates
    

    b) Give code for w.putat(x,y) assuming that w, x and y are all local variables of the calling subroutine. (1.0 points)

            LOAD    R3,R2,W         ; -- get w, object pointer
            LOAD    R4,R2,X         ; -- get parameter x
            LOAD    R5,R2,Y         ; -- get parameter y
            ADDI    R2,R2,ARSIZE
            LOAD    R1,R2,PUTAT     ; -- get w.putat() method pointer
            JSRS    R1,R1           ; w.putat(x,y)
            ADDI    R2,R2,-ARSIZE
    

  3. Exercise from Chapter 11 of the Notes:

    b) (0.5 points).

    	LIL     R1,FPENAB + FPSEL
            COSET   R1,COSTAT       ; enable floating point single precision
    
    	COSET   R4,FPA0         ; -- given R4 holds x
            COSET   R4,FPMUL+FPA0   ; a0 = x * x
    	COSET   R5,FPA1         ; -- given R5 holds y
            COSET   R5,FPMUL+FPA1   ; a1 = y * y
            COGET   R3,FPA0
            COSET   R3,FPADD+FPA1   ; a1 = x*x + y*y
            COGET   R3,FPA1
            COSET   R3,FPSQRT+FPA0  ; a0 = sqrt( x*x + y*y )
            COGET   R3,FPA0         ; -- get result into R3, as required
    
            COSET   R1,COSTAT       ; disable floating point