Assignment 10 Solutions
Part of
the homework for 22C:60 (CS:2630), Spring 2012
|
|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _| |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_| |s| exp | mantissa |
If the exponent is greater than 00000, the mantissa should always be normalized in the range from 0.5 to just less than 1.0.
Each part is worth 0.2 points.
a) What is the approximate decimal equivalent of FFFF16 in this number system?
FFFF16, in binary, is 1111 1111 1111 1111. Breaking this into fields for the given number representation, we have:
s = 1 — the number is negative.
exp = 11111 corresponding to positive 15.
mant = 0.1111111111 which is almost, but not quite, 1.0So, the number is approximately -1.0 × 215. That is, approximately -32,768.
The error in this approximation is 0.0000000001 × 215. That is, 2-10 × 215, which is 25, which is 32. So, the exact answer is 32,736.
We can get an approximate estimate of the error much more simply. A ten-bit decimal fraction is accurate to one part in 210 which is about the same as one part in 103, so we expect our first approximation to be off by about 1/1000. That is, we need to correct it by about 32 or 33, depending on how you round 1/1000 times 32,768. Our initial approximation was good enough for most purposes, and this second-order approximation is close to perfect.
b) What is the exact decimal equivalent of 123416 in this number system?
123416, in binary, is 0001 0010 0011 0100. Breaking this into fields for the given number representation, we have:
s = 0 — the number is positive.
exp = 00100 corresponding to negative 12, that is, 1/4096
mant = 0.1000110100 = 0.5 + 0.03125 + 0.015625 + 0.00390625 = 0.55078125so, using a calculator, 0.55078125/4096 = 0.000134468078613
c) What is the normalized binary representation of of 1 in this number system?
It will be 0.5 × 21 since that is the only solution that puts the mantissa in the correct normalized range.
s = 0 — the number is positive.
exp = 10001 corresponding to positive 1.
mant = 1000000000 corresponding to 0.5Put these pieces to gether and we get 0 10001 1000000000 or 460016.
d) What is the normalized binary representation of of 1010 in this number system?
s = 0 — the number is positive.
exp = 10100 corresponding to positive 4 because we need to multiply by 16 which is 24.
mant = 1010000000 corresponding to 10/16 which is 5/8Put these pieces to gether and we get 0 10100 1010000000 or 528016.
e) What is the normalized approximate binary representation of of 0.110 in this number system?
Note that, in binary, 1/10 is 0.000110011001100 (repeating).
s = 0 — the number is positive.
exp = 01101 corresponding to negative 3, to account for the leading zeros in the binary equivalent of one tenth.
mant = 1100110011 corresponding to the normalized fraction part of one tenth.Put these pieces to gether and we get 0 01101 1100110011 or 373316.
|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _| |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_| |s| exp | mantissa |
The special exponent value 00000 also represents -15, but indicates that the mantissa is not normalized.
The special exponent value 11111 means not a number or infinity.
For normalized numbers, there is a hidden bit just to the left of the point, so the mantissa 00 0000 0000 = 1.0 and 10 0000 0000 = 1.5 (the hidden bit is always one as a consequence of normalization).
For non-normalized numbers, the hidden bit is zero, so 00 0000 0000 = 0.0 and 10 0000 0000 = 0.5.
Each part is worth 0.2 points.
a) What is the binary equivalent of 1.010 in this number system.
It will be 1.0 × 20 since that is the only solution that puts the mantissa in the correct normalized range.
s = 0 — the number is positive.
exp = 10000 corresponding to zero.
mant = .0000000000 corresponding to 1.0 (the one bit to the left of the point is hidden).Put these pieces to gether and we get 0 10000 0000000000 or 400016.
b) What is the binary representation of the largest non-infinite positive number in this system?
s = 0 — the number is positive.
exp = 11110 (one less than 11111 which is not a number).
mant = .1111111111 which is almost 2 counting hidden one bit to the left of the point).Put these pieces to gether and we get 0 11110 1111111111 or 7BFF16.
c) What is the approximate decimal equivalent of your answer to part a? (That was a typo, this should have asked about part b!)
Almost 2.0 × 215 which is almost 65536.
d) What is the binary representation of the smallest non-zero positive number in this system?
s = 0 — the number is positive.
exp = 00000 corresponding to a non-normalized -15.
mant = .0000000001 which is 2-10 (the hidden bit is zero because it is not normalized).Put these pieces to gether and we get 0 00000 0000000001 or 000116.
e) What is the approximate decimal equivalent of your answer to part c? (That was a typo, this should have asked about part d!)
2-25 is exact. Note that, in decimal, 210 is approximately 1000, and 220 is approximately one million. So, we have approximately 2-5 × 0.000001 (that is times one millionth). 2-5 is .03125, so the approximate answer is 0.00000003125; my calculator gives an exact answer of 0.000000029802322
SUBTITLE "FLTTOINT, float to int convert" BIAS = 16 ; the bias on the exponent field FLTTOINT: ; given R3 = f, a floating point number ; returns R3 = i, the integer equivalent ; uses R4 = a copy of f ; uses R3 = mmmmmmmmmm the mantissa ; uses R6 = eeeee the exponent ; the floating point format is s eeeee mmmmmmmmmm MOVE R4,R3 ; -- copy f so we can check bit 15, the sign TRUNC R3,10 ; mmmmmmmmmm = f, bits 9 - 0 MOVE R5,R3 SL R5,10 TRUNC R5,5 ; eeeee = f, bits 14-10 CMPI R5,10+BIAS BEQ FTOIDEN ; if (eeeee != 10) { -- need to denormalize BGT FTOIGT ; if (eeeee <= 10) { FTOILT: ; do { SR R3,1 ; mmmmmmmmmm = mmmmmmmmmm / 2 ADDSI R5,1 ; eeeee = eeeee + 1 CMPI R5,10+BIAS BLT FTOILT ; } while (eeeee < 10) BR FTOIDEN ; -- now eeeee = 10 FTOIGT: ; } else { ; do { SL R3,1 ; mmmmmmmmmm = mmmmmmmmmm * 2 ADDSI R5,-1 ; eeeee = eeeee - 1 CMPI R5,10+BIAS BGT FTOIGT ; } while (eeeee > 10) ; -- now eeeee = 10 ; } FTOIDEN: ; } ; -- now eeeee = 10 and mmmmmmmmmm is integer TBIT R4,15 BBR FTOIQT ; if (f < 0) { NEG R3 ; mmmmmmmmmm = -mmmmmmmmmm FTOIQT: ; } JUMPS R1 ; return i