Assignment 10, due Nov 7

Solutions

Part of the homework for CS:2630, Fall 2019
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

  1. Background: Look at the IEEE floating point format documented in chapter 11. Imagine a very similar floatin point format for 16-bit floating point values, with the following structure:

    15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
    s exp mant

    As with the IEEE format, the maximum exponent value (here, 1111) is reserved for NaN (not a number), and the minimum value (here, 0000) is usef for un-normalized values. Otherwise, the exponent is biased so that 0111 means 0.

    As with the IEEE format, there is a hidden bit, so the normalized mantissa 01001010010 represents a mantissa value of 1.01001010010, and as with the IEEE format, the hidden bit is zero when the exponent has its minimimum value of 0000.

    a) What is the binary representation of the smallest positive nonzero value in this number system, and give an algebraic expression in decimal for the value it represents, along with a decimal expression in scientific notation for that value, to the appropriate number of significant figures. (0.5 points)

    0 0000 00000000001 = (1.0 × 2-11) × 2-6 = 1.0 × 2-17 ≅ 7.63 × 10-6

    b) What is the binary representation of the largest positive legitimate value in this number system, and give an algebraic expression in decimal for the value it represents, along with a decimal expression in scientific notation for that value, to the appropriate number of significant figures. (0.5 points)

    0 1110 11111111111 = (2.0 - 2-12) × 27 ≅ 2.56 × 102

  2. Background: Consider the problem of adding two fixed point numbers, one in R3 with 4 places right of the point, one in R4 with 7 places right of the point, where we want the sum in R5 with 4 places right of the point. We could do this:
            MOVE    R5,R4		; bug fixed from original version
            SR      R5,--?--
            ADD     R5,R5,R3
    

    a) Give the shift count that should be used on the SR instruction to replace the --?--. (0.5 points)

    7 &ndash 4 = 3

    b) Give the instruction and its operand(s) that should be added to the above code (and indicate where this goes relative to the SR instruction) so that the result of the SR is rounded and not truncated. Note: There are two ways to do this, one with an instruction before the SR and one with a different instruction after the SR. (0.5 points)

    Solution 1:

            ADDSI   R5,4
            SR      R5,3
    

    Solution 2:

            SR      R5,3
    	ADDC	R5,R0
    

    The second solution is more obscure and rests on the fact that the last bit shifted out of R5 is left in the C condition code, so adding the C bit to R5 increments it only if the most significant of the discarded bits was 1.

    Background: The vector dot product is the sum of the products of corresponding vector elements. We can write this in C as:

    float dotprod( const float * a, const float * b, int len ) {
        float acc = 0.0;
        while (len > 0) {
            acc = acc + ((*a) * (*b));
            a = a + 1;
            b = b + 1;
            len = len - 1;
        }
        return acc;
    }
    

    A problem: Write the equivalent SMAL Hawk code. (1.0 points)

    DOTPROD:; expects R3 = a -- pointer to first element of an array
            ;         R4 = b -- pointer to first element of an array
            ;         R5 = len -- count of elements in arrays a and b
            ;         the floating point unit is already be on and selected
            ; returns R3 = the vector dot product of a and b
            ; uses    R6,7 = temporaries for accessing a and b
            ;         FPA0 = acc, the accumulator for the product
            ;         FPA1 = term, holds each term
            LIS     R6,0
            COSET   R6,FPA0         ; float acc = 0
    
            TESTR   R5
            BLE     DOTPQT          ; if (len > 0) {
    DOTPLP:                         ;   do {
            LOADS   R6,R3
            LOADS   R7,R4
            COSET   R6,FPA1
            COSET   R7,FPA1+FPMUL   ;     float term = (*a) * (*b)
    
            ADDSI   R3,4            ;     a = a + 1  -- move to next element
            ADDSI   R4,4            ;     b = b + 1 
    
            COGET   R6,FPA1
            COSET   R6,FPA0+FPADD   ;     acc = acc + term
    
            ADDSI   R5,-1                 len = len - 1
            BGT                     ;   } while (len > 0)
    DOTPQT:                         ; }
            COGET   R3,FPA0
            JUMPS   R1              ; return acc
    

    There are, of course, many possible solutions. The above solution makes an effort to give the floating-point coprocessor some time after each operation is initiated before it asks for a result.