Review F: Floating-point representation
printable versionF2: [1] [2] [3] // F3: [1] [2] [3] [4] [5] [6] [7] [8] // F4: [1] [2]
Problem F2.1
Represent each of the following using the 8-bit floating-point format we studied (which had 3 bits for the mantissa and 4 bits for the excess-7 exponent). Show your intermediate work.
a. | 2.25 |
b. | −80.0 |
c. | 1/32 |
a. | 2.25 | becomes | 0 1000 001 |
b. | −80.0 | becomes | 1 1101 010 |
c. | 1/32 | becomes | 0 0010 000 |
Problem F2.2
Consider the 8-bit floating-point format we studied, including 3 bits for the mantissa and 4 bits for the excess-7 exponent. Show your intermediate work.
a. What 8-bit pattern represents the number −0.625 = −5/8?
b. What base-10 integer or fraction does 01001001 represent?
a. −101(2) × 2−3 = −1.01(2) × 2−1 → 1 0110 010
b. 1.001(2) × 22 = 100.1(2) = 4.5
Problem F2.3
Consider a 7-bit floating-point representation with 3 bits for the excess-3 exponent and 3 bits for the mantissa.
a. | How would 0.375(10) be represented in this 7-bit representation? |
b. | What decimal value does 0110110 represent? (If you like, you can express your solution to this question and the next as a fraction.) |
c. | What decimal value does 1001100 represent? |
a. | 0001100 |
b. | 14(10) |
c. | −3/8 = −0.375(10) |
Problem F3.1
Explain the motivation behind the introduction of the denormalized case in IEEE's floating-point representation.
With only normalized numbers, the set of floating-point numbers would have a gap representing numbers that are very close to 0: Numbers are become increasingly concentrated down to a certain point (1/128 in our eight-bit example), and then there are no numbers below that.
The denormalized case “spreads out” those numbers with exponent bits of all-zeroes, spreading them evenly out down to and including zero.
Problem F3.2
Consider the 8-bit floating-point representation we studied in class, including support for denormalized numbers and nonnumeric values. It used 3 bits for the mantissa and 4 bits for the excess-7 exponent.
a. | What 8-bit pattern represents the number 0.125(10)? |
b. | What 8-bit pattern represents the number 20(10)? |
c. | What base-10 integer or fraction does the bit pattern 11001100 represent? |
d. | What base-10 integer or fraction does the bit pattern 00000010 represent? |
e. | What bit pattern results from multiplying 240 and −240? |
a. | 0 0100 000 |
b. | 0 1011 010 |
c. | −6 |
d. | 1/256 |
e. | 11111000 |
Problem F3.3
Consider a 6-bit floating-point representation with a 3 bits for the excess-3 exponent and 2 bits for the mantissa, including support for denormalized and nonnumeric values.
a. | How would 0.75(10) be represented in this 6-bit representation? |
b. | What decimal value does 011010 represent? |
c. | What decimal value does 000010 represent? |
d. | How would infinity (∞) be represented in this representation? |
a. | 001010 |
b. | 12.0(10) |
c. | 0.125(10) |
d. | 011100 |
Problem F3.4
Consider a 7-bit floating-point representation with a 3 bits for the excess-3 exponent and 3 bits for the mantissa, including support for denormalized and nonnumeric values.
a. | What values do 1010100 and 00000100 represent? Express each answer as a decimal number or a base-10 fraction. |
b. | What is the bit pattern of the smallest positive normalized number supported by this representation? Convert this to a decimal fraction or number. |
c. | What is the bit pattern of the largest denormalized number supported by this representation? Convert this to a decimal fraction or number. |
d. | Suppose we add 0101010 and 1111000 as 7-bit floating-point numbers. What is the bit pattern of the result? |
a. | −0.75(10), 0.125(10) |
b. | 0001000, which converts to 1/4 or 0.25 |
c. | 0000111, which converts to 7/32 or 0.2187 |
d. | 1111000 (since any number added to −∞ is −∞) |
Problem F3.5
Using the 8-bit floating-point format we studied, describe a computation using only numeric floating-point values that would result in negative infinity.
-240.0 + -240.0
: −240 is the smallest
number representable in our 8-bit floating point format
(1 1110 111); doubling it is well beyond the minimum number
possible, so it is “approximated” by negative infinity.
Another example is -1.0 / 0.0
.
Problem F3.6
Using the 8-bit floating-point format we studied, describe a computation using only numeric floating-point values that would result in the nonnumeric floating-point value NaN (“Not a Number”).
Finding the square root of −1 leads to NaN, as does dividing zero by zero, or adding 1/x and −1/x, where x is the smallest positive numeric value (and so 1/x results in infinity).
Problem F3.7
Consider the 8-bit floating-point format we studied, including support for denormalized numbers and nonnumeric values. It included 3 bits for the mantissa and 4 bits for the excess-7 exponent. Show your intermediate work.
a. What 8-bit pattern represents the number 40(10)?
b. What 8-bit pattern represents the number 1/256 = 1 × 2−8?
c. Give an 8-bit pattern representing “not a number.”
d. What base-10 integer or fraction does 10110100 represent?
a. | 0 1011 010 (from 10100(2) = 1.01 × 24) |
b. | 0 0000 010 (from 1 × 2−8 = 0.01(2) × 2−6) |
c. | 0 1111 111 (or any of the form x1111yyy where at least one y is 1) |
d. | 5/8 (from −1.01(2) × 2−1 = −0.101(2)) |
Problem F3.8
Consider the 8-bit floating-point format we studied, including support for denormalized numbers and nonnumeric values. It included 3 bits for the mantissa and 4 bits for the excess-7 exponent. Show your intermediate work.
a. What 8-bit pattern represents the number −0.375 = −3/8?
b. What 8-bit pattern represents the number 3/256 = 3 × 2−8?
c. Give an 8-bit pattern representing “negative infinity.”
d. What base-10 integer or fraction does 01100010 represent?
a. 1 0101 100. (−0.375 = −0.011(2) = −1.1(2) × 2−2)
b. 0 0000 110 (11(2) × 2−8 = 0.11(2) × 2−6)
c. 1 1111 000
d. 40 (1.010(2) × 25 = 101000(2) = 32 + 8)
Problem F4.1
Consider the 8-bit floating-point format we studied, including support for denormalized numbers and nonnumeric values. Give an example of values for a, b, and c where (a + b) + c is not the same as a + (b + c). Explain your answer, including the result for (a + b) + c and for a + (b + c).
Suppose x = −120, y = 120, and z = 1. Then notice the following.
x + (y + z) | = | −120 + (120 + 1) | = | −120 + 120 | = | 0 |
(We get 120 + 1 = 120 because the 1 can't be represented within the number's precision.) On the other side, we get:
(x + y) + z | = | (−120 + 120) + 1 | = | 0 + 1 | = | 1 |
Problem F4.2
Give an example of three floating-point numbers x, y, and z, such that the distributive property x (y + z) = x y + x z does not hold. (Feel free to describe the values rather than give numerical values: For example, you might say “the largest denormalized number” rather than give a particular value.) Note: Your answer should include the values of x (y + z) and x y + x z for your values of x, y, and z.
One possibility is x − 0.5, y = largest possible number, and z = 1. In this case, x (y + z) is infinity, while x y + x z is a finite number.
Another possibility is x = ∞, y = −1, and z = 1. In this case, x (y + z) is infinity (since ∞ ⋅ 0 = ∞), while x y + x z is NaN (since −∞ + ∞ = NaN).
While these answers are fine, they are somewhat dissatisfying because of their reliance on overflow. Another possibility, which does not resort to nonnumeric values, has x = 0.5, y = smallest positive number, and z = smallest positive number. In this case, x (y + z) is the smallest possible number, while x y + x z results in adding two numbers that are too small to represent, so we get 0.