22C:18, Lecture 17, Fall 1996

Douglas W. Jones
University of Iowa Department of Computer Science

Fractions

Not all numbers on a computer can be represented as integers! As a result, we need ways to represent fractions. There are many possible ways of doing this; consider, for example, representing fractions using a numerator and a denominator, as:

	typedef struct rational {
		int numerator, denominator;
	} rational;

Given this, we could define addition, subtraction, multiplication and division functions. For example, we might define addition as:

	rational_add(rational * a,b,c);
	{
		a->denominator = b->denominator * c->denominator;
		a->numerator = b->numerator * c-denominator
			     + c->numerator * b-denominator;
	}

This is ugly, but in high level languages that allow operator overloading, once the definitions are finished, the cumbersome problems with this number representation can be hidden from view. There is, however, one problem with this number system, that of reducing rational numbers to their simplest form; so for example, we would rather see 3/4 instead of 36/48.

Euclid, one of the great mathematicians of ancient Greece, devised an algorithm for computing the greatest common divisor of two numbers. Euclid's GCD algorithm can be used to simplify rational numbers after arithmetic operations, but this is not a fast way to do arithmetic. For the curious, Euclid's algorithm operates by noting that GCD(a,b) is the same as GCD(a,b-a) where a is the smaller of the two numbers. Repeatedly applying this rule will eventually reduce every GCD(a,b) problem to GCD(0,x) where x=GCD(a,b).

Because of the complexity of computing greatest common divisors, most work done on computers uses a different approach to representing fractional values, either using fixed point binary numbers or using floating point binary numbers. Of these, floating point is by far the most common approach, but floating point numbers are based on fixed point numbers, so it is best to explain the the latter first.

Fixed Point Numbers

Fixed point number representation is based on a simple idea. Instead of counting units, so that the least significant digit of the number is the one's place, count some fraction. For example, in the fixed point decimal system we use to describe financial transactions, the least significant digit counts in units of 1/100 of a dollar. So, when I look at my postal receipt for $10.65, I have:

       place   10     1    1/    1/
       values              /10   /100
              _____ _____ _____ _____
             |     |     |     |     |
             |  1  |  0  |  6  |  5  |
             |_____|_____|_____|_____|

In printing fixed point numbers on paper, we use a period to indicate where the 1's place is. We call this a decimal point, but in fact, the period has nothing to do with the number base and everything to do with telling where the 1's place is.

Fixed point binary numbers are just as useful as fixed point decimal. For example, we can use the following representation of 2.75:

       place    2     1    1/    1/
       values              /2    /4
              _____ _____ _____ _____
             |     |     |     |     |
             |  1  |  0  |  1  |  1  |
             |_____|_____|_____|_____|

In print, we would represent this as 10.11₂. Formally, this is a 4-bit fixed point binary number with 2 places after the point, or in slightly more compact terminology, a 4-bit fixed point binary number with a 2-bit fraction.

On the Hawk computer, with its 32 bit word size, we would typically use 32 bit fixed point numbers, and an obvious question comes up: How does the programmer select the number of bits used to represent the fraction? The answer is that this depends on the problem.

Consider a problem in computer aided design, where you are represeing dimensions of the parts of some machine. The precision needed for these dimensions depends on the manufacturing technology! If, for example, the tools used to measure and cut machine parts can only operate to a precision of 1/100 millimeter, then representing dimensions in millimeters to the nearest 1/128 millimeter will probably be close enough. However, for such an application, it is tempting to use an 8 bit fraction and round dimensions to the nearest 1/256 millimeter just to be on the safe side. Precision beyond this is unlikely to be needed in such an application.

So, we come up with the following binary number representation for our computer aided design system:

 _______ _______ _______ _______ _______ _______v_______ _______
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
               24 bit integer part              ^ 8 bit fraction

This allows representation of dimensions up to 16 million millimeters (2 to the 24th) to the nearest 1/256 millimeter (2 to the minus 8th). How big is this? 16 million millimeters is 16 thousand meters or 16 kilometers; this is definitely larger than any known machine tool, so it ought to be OK.

Cumulative Precision

There are problems with the above representation. Yes, it does allow representation to greater than the 0.01 millimeter precision required by the application, but it does not allow for cumulative error. Consider the following problem:

A part requires 101 holes drilled in a line. Each hole is to be 1.1 mm from its neighbor and 1 mm diameter.

Note that the 0.1₁₀ cannot be accurately represented as a binary fraction in any finite precision. It is a repeating fraction in binary:

           ____
0.0001100110011

The closest fixed point binary representations to 1.1, using an 8 bit fraction are 1.00011010 and 1.00011001, which are about 1.10156 and 1.09766; thus, our error in representing the space between consecutive holes will be either +.00156 or -.00234, both well within the acceptable precision of 0.01 millimeter.

The problem comes up with the specification that there are to be 101 uniformly spaced holes. If these holes are all 1.10156 mm apart, the distance from the first hole to the last will be 110.156 mm, while our specification calls for a distance of 110.00 mm. Here, the error is .156 mm, quite a bit larger than the 0.01 mm precision we are required to keep, and quite likely a sufficient error to cause real problems!

Short of designing all dimensions in the products of our machine shop to conform to the system of number representation used inside our computer system, there is a solution to our problem! While the specifications call for 101 uniformly spaced holes, we must take into account the inaccuracy in measurement and use alternating spacings of 1.10156 and 1.09766 mm; if we alternate these spacings appropriately, it is easy to keep the error below 1/100 mm.

How do we compute this alternation? There are exact methods, but the easiest is to simply assign extra precision to all of our fractions whenever we are computing extended summations, and then round the result to the specified precision. For example, if we use a 16 bit fraction, we can add up 512 numbers before errors in the least significant bits of the numbers being added could accumulate to creep into 0.01 mm range.

Can we afford to use a 16 bit fraction? To use this fraction, we had to steal 8 bits from the integer part of our originally proposed number representation. How big an integer part do we need? The original problem statement was in terms of a machine shop, and our original proposal allowed for a 24 bit integer part, representing millimeters! Recall that one thousand is roughly 2 to the 10th, and one million is roughly 2 to the 20th. Thus, a 24 bit number allows us to represent values on the order of 16 million. 16 million millimeters is 16 thousand meters or 16 kilometers, far larger than any machine shop!

We have precision to spare, but can we afford to give up 8 bits? Instead of 24 bits, we are proposing to use only 16 bits for the integer part of our dimensions. 16 bits allows a maximum dimension of about 64 thousand millimeters, or 64 meters. This is large enough for a machine shop that builds toasters or locomotives, but it may not be big enough for use in a shipyard; many ships are well over 64 meters long!

This brings us full circle! Selecting a number representation requires a clear understanding of the the problem being solved! A number representation that is more than sufficient for the auto industry may not work for shipyards or aerospace applications. Furthermore, selecting a number representation requires an understanding of the impact of any arithmetic to be performed as well as of the precision of individual numbers. insufficient for use in a s

Fixed Point Arithmetic

The rules for addition, subtraction, multiplication and division of fixed point binary numbers are the same rules students learn in elementary school for dealing with decimal fractions!

For addition or subtraction, you first align the points, then add. In decimal, to add 3.75, 4.6 and 12.2, we would align the points and add as follows:

	   3.75
	   4.6
        + 12.2
	--------
          20.55

In binary, programming in C or C++, we might do a similar problem as follows:

	int a; /* fixed point with a 4 bit fraction */
	int b; /* fixed point with an 8 bit fraction */
	int c; /* fixed point with a 12 bit fraction */

	/* -- some time later in the program -- */

	c = (a << 8) + (b << 4) + c;
	/* shift operatons above are used to align the points */

Note first that C does not provide any tools for us to use to declare that our numbers have fractional parts. We therefore rely on comments to indicate how many bits of each binary integer are fractional. The shift operations shown above (using the << operators) are there to align the points in the binary fractions. The points are not explicitly represented anywhere in the code understood to the compiler, they are there only in the comments and in the progrmmer's mind. Since a had 4 places to the right of the point, a<<8 has 12 places to the right of the point and is therefore compatable with c for the purpose of addition.

A few languages allow users to declare fixed point numbers with their precision, and the compilers automatically do the appropriate shifting before addition. Among these are IBM's PL/I, defined in the mid 1960's and widely used in the early 1970's, and Ada, defined in the late 1970's and widely used for defence and avionics applications today.

The same example, coded in SMAL Hawk assembly code, might be written as:

	LOAD	R1,A	; get number with 4 bit fraction
	LOAD	R2,B	; get number with 8 bit fraction
	LOAD	R3,C	; get number with 12 bit fraction
	ADDSL	R1,R2,4	; add R2 to (R1 << 4)
	ADDSL	R1,R3,4	; add R3 to (R1 << 4)
	STORE	R1,C	; save result
	; shifts above are required to align the points!

Multiplication is slightly more complex. The rule, multiplying decimal fractions, is that the number of places after the point in the product is the sum of the number of places after the point in the multiplier and in the multiplicand. This is illustrated in the following decimal example:

	  1 2.7 5  -  2 places after the point
	 x  2.6    -  1  place after the point
	---------- 
	  7 6 5 0   +
        2 5 5 0     ----
        ----------
        3 3.1 5 0  -  3 places after the point

The same rule applies in binary! Consider the following example in C:

	int a; /* fixed point with a 4 bit fraction */
	int b; /* fixed point with an 8 bit fraction */
	int c; /* fixed point with a 12 bit fraction */

	/* -- some time later in the program -- */

	c =  a * b       /* no shift needed */
	c = (a * a) << 4 /* product has an 8 bit fraction */
	c = (b * b) >> 4 /* product has a 16 bit fraction */

The final example above required a right shift to discard the least significant 4 bits of a 16 bit fraction before storing that number in a variable declared to have a 12 bit fractional part. As shown above, this code simply throws away the least significant bits, but to minimize loss of precision, the result should have been rounded.

To round, add 1 to the most significant of the bits that will be discarded. For example, to round the final product in the above example, use:

	c = ((b * b)+8) >> 4 /* product has a 16 bit fraction */

Here, note that 8₁₀ is 1000₂, and we are throwing away 4 bits, so this indeed adds 1 to the product in exactly the right position required by our rounding rule.

As a general rule, to retain the maximal fractional precision, rounding and right shifts to discard bits should be deferred as long as possible. Thus, for example, if a sum of numbers with 8 bit fractions is to be computed, and then stored in a number with a 4 bit fraction, first compute the sum using high precision, then round to the lower precision!

Division

Division follows the same rules for binary fixed point numbers as it does for decimal fixed point numbers: If the divisor has n places after the point and the dividend has m places after the point, the quotient will have m - n places after the point. This follows from the rule that quotient times divisor equals dividend and the rule for computing how many places the product has after the point!

In general, division is computationally difficult, so when dividing by a constant, it is far better to multiply by the reciprocal of that constant. For example, consider replacing a/10 with the product a*0.01 and then using the binary representation of the constant 0.01 as a fixed point number:

.00011001100110011001100110011001₂ = .19999999₁₆

The above is not rounded, and as a rule, we will get better results if we round; as such, we should use:

.00011001100110011001100110011010₂ = .1999999A₁₆

This number has 15 1-bits in it, so multiplication by this constant requires exactly 15 add-and-shift steps. If an integer (a fixed point number with no bits in the fractional part) is multiplied by this fraction (a fixed point number with no bits in the integer part), the most significant 32 bits of the 64 bit product will be the integer part of the product, and the least significant 32 bits will be the fractional part.

The method discussed above is called reciprocal multiplication, and most modern high quality compilers use this method for division by a constants. On the hawk machine, the instruction ADDSRU can be used to discard the least significant bits of a product as it is accumulated. As a result, the fastest way to divide an unsigned integer by 10 on the Hawk machine is as follows:

;       R3 = R2 / 10
;               reciprocal multiply
;               1/10 = .0001 1001 1001 1001 1001 1001 1001 1010

        MOVE    R3,R2
        ADDSR   R3,R0,2 ; x .010
        ADDSR   R3,R2,1 ; x .1010
        ADDSR   R3,R2,3 ; x .0011010
        ADDSR   R3,R2,1 ; x .10011010
        ADDSR   R3,R2,3 ; x .00110011010
        ADDSR   R3,R2,1 ; x .100110011010
        ADDSR   R3,R2,3 ; x .001100110011010
        ADDSR   R3,R2,1 ; x .1001100110011010
        ADDSR   R3,R2,3 ; x .0011001100110011010
        ADDSR   R3,R2,1 ; x .10011001100110011010
        ADDSR   R3,R2,3 ; x .00110011001100110011010
        ADDSR   R3,R2,1 ; x .100110011001100110011010
        ADDSR   R3,R2,3 ; x .001100110011001100110011010
        ADDSR   R3,R2,1 ; x .1001100110011001100110011010
        ADDSR   R3,R2,4 ; x .00011001100110011001100110011010

Signed Fractions

Just as there are many options for representing signed integers, there are many options for representing signed fixed point numbers. On a modern binary computer, the most common approach is to use the 2's complement number system, exactly as is used for integers. Thus, for our example machine shop, we might use the folloing form:

 _v_____ _______ _______ _______ _______ _______v_______ _______
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
 s^            23 bit integer part              ^ 8 bit fraction

The following example numbers show the 32 bit binary equivalents of decimal numbers in this system:

00000000000000000000000100000000 =  1.00
11111111111111111111111100000000 = -1.00
00000000000000000000000111111111 =  1.99609375
11111111111111111111111000000001 = -1.00390625
00000000000000000000000000000001 =  1.00390625
11111111111111111111111111111111 = -1.00390625

To divide a signed number by 10, we must use ADDSR instructions instead of ADDSRU instructions. The ADDSR instruction preserves the sign of the number as it shifts, so the result will have the same sign as the original number. There is one problem with division of negative numbers by a positive divisor! Consider using

	SR	R1,1

to divide R1 by 2. This will produce exactly the result you expect if R1 is evenly divisible by 2, but if it is not, it will tend to be off by 1 from what you might expect. The reason is, the remainder after division using shifting is always positive, and as a result, the quotient will be off by exactly what it takes to give the remainder the right sign.