Review F: Floating-point representation

F2: [1] [2] [3] // F3: [1] [2] [3] [4] [5] [6] [7] [8] // F4: [1] [2]

Problem F2.1

Represent each of the following using the 8-bit floating-point format we studied (which had 3 bits for the mantissa and 4 bits for the excess-7 exponent). Show your intermediate work.

a.	2.25
b.	−80.0
c.	1/32

a.	2.25	becomes	0 1000 001
b.	−80.0	becomes	1 1101 010
c.	1/32	becomes	0 0010 000

Problem F2.2

Consider the 8-bit floating-point format we studied, including 3 bits for the mantissa and 4 bits for the excess-7 exponent. Show your intermediate work.

a. What 8-bit pattern represents the number −0.625 = −5/8?

b. What base-10 integer or fraction does 01001001 represent?

a. −101₍₂₎ × 2⁻³ = −1.01₍₂₎ × 2⁻¹ → 1 0110 010

b. 1.001₍₂₎ × 2² = 100.1₍₂₎ = 4.5

Problem F2.3

Consider a 7-bit floating-point representation with 3 bits for the excess-3 exponent and 3 bits for the mantissa.

a.	How would 0.375₍₁₀₎ be represented in this 7-bit representation?
b.	What decimal value does 0110110 represent? (If you like, you can express your solution to this question and the next as a fraction.)
c.	What decimal value does 1001100 represent?

a.	0001100
b.	14₍₁₀₎
c.	−3/8 = −0.375₍₁₀₎

Problem F3.1

Explain the motivation behind the introduction of the denormalized case in IEEE's floating-point representation.

With only normalized numbers, the set of floating-point numbers would have a gap representing numbers that are very close to 0: Numbers are become increasingly concentrated down to a certain point (1/128 in our eight-bit example), and then there are no numbers below that.

The denormalized case “spreads out” those numbers with exponent bits of all-zeroes, spreading them evenly out down to and including zero.

Problem F3.2

Consider the 8-bit floating-point representation we studied in class, including support for denormalized numbers and nonnumeric values. It used 3 bits for the mantissa and 4 bits for the excess-7 exponent.

a.	What 8-bit pattern represents the number 0.125₍₁₀₎?
b.	What 8-bit pattern represents the number 20₍₁₀₎?
c.	What base-10 integer or fraction does the bit pattern 11001100 represent?
d.	What base-10 integer or fraction does the bit pattern 00000010 represent?
e.	What bit pattern results from multiplying 240 and −240?

a.	0 0100 000
b.	0 1011 010
c.	−6
d.	1/256
e.	11111000

Problem F3.3

Consider a 6-bit floating-point representation with a 3 bits for the excess-3 exponent and 2 bits for the mantissa, including support for denormalized and nonnumeric values.

a.	How would 0.75₍₁₀₎ be represented in this 6-bit representation?
b.	What decimal value does 011010 represent?
c.	What decimal value does 000010 represent?
d.	How would infinity (∞) be represented in this representation?

a.	001010
b.	12.0₍₁₀₎
c.	0.125₍₁₀₎
d.	011100

Problem F3.4

Consider a 7-bit floating-point representation with a 3 bits for the excess-3 exponent and 3 bits for the mantissa, including support for denormalized and nonnumeric values.

a.	What values do 1010100 and 00000100 represent? Express each answer as a decimal number or a base-10 fraction.
b.	What is the bit pattern of the smallest positive normalized number supported by this representation? Convert this to a decimal fraction or number.
c.	What is the bit pattern of the largest denormalized number supported by this representation? Convert this to a decimal fraction or number.
d.	Suppose we add 0101010 and 1111000 as 7-bit floating-point numbers. What is the bit pattern of the result?

a.	−0.75₍₁₀₎, 0.125₍₁₀₎
b.	0001000, which converts to 1/4 or 0.25
c.	0000111, which converts to 7/32 or 0.2187
d.	1111000 (since any number added to −∞ is −∞)

Problem F3.5

Using the 8-bit floating-point format we studied, describe a computation using only numeric floating-point values that would result in negative infinity.

-240.0 + -240.0: −240 is the smallest number representable in our 8-bit floating point format (1 1110 111); doubling it is well beyond the minimum number possible, so it is “approximated” by negative infinity.

Another example is -1.0 / 0.0.

Problem F3.6

Using the 8-bit floating-point format we studied, describe a computation using only numeric floating-point values that would result in the nonnumeric floating-point value NaN (“Not a Number”).

Finding the square root of −1 leads to NaN, as does dividing zero by zero, or adding 1/x and −1/x, where x is the smallest positive numeric value (and so 1/x results in infinity).

Problem F3.7

Consider the 8-bit floating-point format we studied, including support for denormalized numbers and nonnumeric values. It included 3 bits for the mantissa and 4 bits for the excess-7 exponent. Show your intermediate work.

a. What 8-bit pattern represents the number 40₍₁₀₎?

b. What 8-bit pattern represents the number 1/256 = 1 × 2⁻⁸?

c. Give an 8-bit pattern representing “not a number.”

d. What base-10 integer or fraction does 10110100 represent?

a.	0 1011 010 (from 10100₍₂₎ = 1.01 × 2⁴)
b.	0 0000 010 (from 1 × 2⁻⁸ = 0.01₍₂₎ × 2⁻⁶)
c.	0 1111 111 (or any of the form `x1111yyy` where at least one `y` is 1)
d.	5/8 (from −1.01₍₂₎ × 2⁻¹ = −0.101₍₂₎)

Problem F3.8

a. What 8-bit pattern represents the number −0.375 = −3/8?

b. What 8-bit pattern represents the number 3/256 = 3 × 2⁻⁸?

c. Give an 8-bit pattern representing “negative infinity.”

d. What base-10 integer or fraction does 01100010 represent?

a. 1 0101 100. (−0.375 = −0.011₍₂₎ = −1.1₍₂₎ × 2⁻²)

b. 0 0000 110 (11₍₂₎ × 2⁻⁸ = 0.11₍₂₎ × 2⁻⁶)

c. 1 1111 000

d. 40 (1.010₍₂₎ × 2⁵ = 101000₍₂₎ = 32 + 8)

Problem F4.1

Consider the 8-bit floating-point format we studied, including support for denormalized numbers and nonnumeric values. Give an example of values for a, b, and c where (a + b) + c is not the same as a + (b + c). Explain your answer, including the result for (a + b) + c and for a + (b + c).

Suppose x = −120, y = 120, and z = 1. Then notice the following.

`x` + (`y` + `z`)	=	−120 + (120 + 1)
	=	−120 + 120
	=	0

(We get 120 + 1 = 120 because the 1 can't be represented within the number's precision.) On the other side, we get:

(`x` + `y`) + `z`	=	(−120 + 120) + 1
	=	0 + 1
	=	1

Problem F4.2

Give an example of three floating-point numbers x, y, and z, such that the distributive property x (y + z) = x y + x z does not hold. (Feel free to describe the values rather than give numerical values: For example, you might say “the largest denormalized number” rather than give a particular value.) Note: Your answer should include the values of x (y + z) and x y + x z for your values of x, y, and z.

One possibility is x − 0.5, y = largest possible number, and z = 1. In this case, x (y + z) is infinity, while x y + x z is a finite number.

Another possibility is x = ∞, y = −1, and z = 1. In this case, x (y + z) is infinity (since ∞ ⋅ 0 = ∞), while x y + x z is NaN (since −∞ + ∞ = NaN).

While these answers are fine, they are somewhat dissatisfying because of their reliance on overflow. Another possibility, which does not resort to nonnumeric values, has x = 0.5, y = smallest positive number, and z = smallest positive number. In this case, x (y + z) is the smallest possible number, while x y + x z results in adding two numbers that are too small to represent, so we get 0.