Data representation

Internally, computers represent all data using sequences of bits — pieces of memory that are either 0 or 1. Since integers constitute the most important type of data that a computer manipulates, we should understand how they are represented using bits. Understanding this is helpful in many ways, such as understanding how to manipulate bits directly using the bit operators found in C, Java, and many other programming languages.

1. Binary numbers

We can already represent integers from zero to one using a single bit. To represent larger numbers, we need to use combinations of bits. The most convenient technique for assigning values to combinations is based on binary notation (also called base 2).

You're already familiar with decimal notation (also called base 10). You may remember the following sort of diagram from grade school.

That is, in representing the number 1024, we put a 4 in the ones place, a 2 in the tens place, a 0 in the hundreds places, and a 1 in the thousands place. This system is called base 10 because there are 10 possible digits for each place (0 through 9) and because the place values go up by factors of 10 (1, 10, 100, 1000,…).

In binary notation, we have only two digits (0 and 1) and the place values go up by factors of 2. So we have a ones place, a twos place, a fours place, an eights place, a sixteens place, and so on. The following diagrams a number written in binary notation.

This value, 1011₍₂₎, represents a number with 1 eight, 0 fours, 1 two, and 1 one: 1 ⋅ 8 + 0 ⋅ 4 + 1 ⋅ 2 + 1 ⋅ 1 = 11₍₁₀₎. (The parenthesized subscripts indicate whether the number is in binary notation or decimal notation.)

We'll often want to convert numbers between their binary and decimal representations. We saw how to convert binary to decimal with 1011₍₂₎. Here's another example: Suppose we want to identify what 100100₍₂₎ represents. We first determine what places the 1's occupy.

We then add up the values of these places to get a base-10 value: 32 + 4 = 36₍₁₀₎.

To convert a number from decimal to binary, we repeatedly determine the largest power of two that fits into the number and subtract it, until we reach zero. The binary representation has a 1 bit in each place whose value we subtracted. Suppose, as an example, we want to convert 88₍₁₀₎ to binary. We observe the largest power of 2 less than 88 is 64, so we decide that the binary expansion of 88 has a 1 in the 64's place, and we subtract 64 to get 88 − 64 = 24. Then we see than the largest power of 2 less than 24 is 16, so we decide to put a 1 in the 16's place and subtract 16 from 24 to get 8. Now 8 is the largest power of 2 that fits into 8, so we put a 1 in the 8's place and subtract to get 0. Once we reach 0, we write down which places we filled with 1's.

We put a zero in each empty place and conclude that the binary representation of 88₍₁₀₎ is 1011000₍₂₎.

2. Integer representation

Now we can look at how computers are actually stored by the computer.

2.1. Unsigned integer representation

We can now examine how computers remember integers on the computer. (Recall that integers are numbers with no fractional part, like 2, 105, or −38.)

Modern computers usually represent numbers in a fixed amount of space. For example, we might decide that each byte represents a number. A byte, however, is very limiting: The largest number we can fit is 11111111₍₂₎ = 255₍₁₀₎, and we often want to deal with larger numbers than that.

Thus, computers tend to use groups of bytes called words. Different computers have different word sizes. Very simple machines have 16-bit words; today, most machines use 32-bit words, though some computers use 64-bit words. (The term word comes from the fact that four bytes (32 bits) is equivalent to four ASCII characters, and four letters is the length of many useful English words.) Thirty-two bits is plenty for most numbers, as it allows us to represent any integer from 0 up to 2³² − 1 = 4,294,967,295. But the limitation is becoming increasingly irritating, and so people are beginning to move to 64-bit computers. (This isn't because people are dealing with larger numbers today than earlier, so much as the fact that memory has become much cheaper, and so it seems silly to continue trying to save money by skimping on bits.)

The representation of an integer using binary representation in a fixed number of bits is called an unsigned representation. The term comes from the fact that the only numbers representable in the system have no negative sign.

But what about negative integers? After all, there are some perfectly respectable numbers below 0. We'll examine two techniques for representing integers both negative and positive: sign-magnitude representation and two's-complement representation.

2.2. Sign-magnitude representation

Sign-magnitude representation is the more intuitive technique. Here, we let the first bit indicate whether the number is positive or negative (the number's sign), and the rest tells how far the number is from 0 (its magnitude). Suppose we are working with 8-bit sign-magnitude numbers.

3 would be represented as 00000011

−3 would be represented as 10000011

For −3, we use 1 for the first bit, because the number is negative, and then we place 3 into the remaining seven bits.

What's the range of integers we can represent with an 8-bit sign-magnitude representation? For the largest number, we'd want 0 for the sign bit and 1 everywhere else, giving us 01111111, or 127₍₁₀₎. For the smallest number, we'd want 1 for the sign bit and 1 everywhere else, giving us −127₍₁₀₎. An 8-bit sign-magnitude representation, then, can represent any integer from −127₍₁₀₎ to 127₍₁₀₎.

This range of integers includes 255 values. But we've seen that 8 bits can represent up to 256 different values. The discrepency arises from the fact that the representation includes two representations of the number zero (0 and −0, represented as 00000000 and 10000000).

Arithmetic using sign-magnitude representation is somewhat more complicated than we might hope. When you want to see if two numbers are equal, you would need additional circuitry so that −0 is understood as equal to 0. Adding two numbers requires circuitry to handle the cases of when the numbers' signs match and when they don't match. Because of these complications, sign-magnitude representation is not often used for representing integers.

2.3. Two's-complement representation

Nearly all computers today use the two's-complement representation for integers. In the two's-complement system, the topmost bit's value is the negation of its meaning in an unsigned system. For example, in an 8-bit unsigned system, the topmost bit is the 128's place.

In an 8-bit two's-complement system, then, we negate the meaning of the topmost bit to be −128 instead.

To represent the number −100₍₁₀₎, we would first choose a 1 for the −128's place, leaving us with (−100) − (−128) = 28. (We are using the repeated subtraction algorithm described in Section 3.2. Since the place value is negative, we subtract a negative number.) Then we'd choose a 1 for the 16's place, the 8's place, and the 4's place to reach 0.

Thus, the 8-bit two's-complement representation of −100₍₁₀₎ would be 10011100.

3 would be represented as 00000011

−3 would be represented as 11111101

What's the range of numbers representable in an 8-bit two's-complement representation? To arrive at the largest number, we'd want 0 for the −128's bit and 1 everywhere else, giving us 01111111, or 127₍₁₀₎. For the smallest number, we'd want 1 for the −128's bit and 0 everywhere else, giving −128₍₁₀₎. In an 8-bit two's-complement representation, we can represent any integer from −128₍₁₀₎ up to 127₍₁₀₎. (This range includes 256 integers. There are no duplicates as with sign-magnitude representation.)

It's instructive to map out the bit patterns (in order of their unsigned value) and their corresponding two's-complement values.

bit pattern value

00000000 0

00000001 1

00000010 2

:

01111110 126

01111111 127

10000000 −128

10000001 −127

:

11111110 −2

11111111 −1

bit pattern	value
00000000	0
00000001	1
00000010	2
:
01111110	126
01111111	127
10000000	−128
10000001	−127
:
11111110	−2
11111111	−1

Notice that the two's-complement representation wraps around: If you take the largest number, 01111111, and add 1 to it as if it were an unsigned number, and you get 10000000, the smallest number. This wrap-around behavior can lead to some interesting behavior. In one game I played as a child (back when 16-bit computers were popular), the score would go up progressively as you guided a monster through a maze. I wasn't very good at the game, but my little brother mastered it enough that the score would hit its maximum value and then wrap around to a very negative value! Trying to get the largest possible score — without wrapping around — was an interesting challenge.

2.4. Adding two's-complement numbers

One of the nice things about two's-complement numbers is that you can add them just as you add regular numbers. Suppose we want to add −3 and 5.

We can attempt to do this using regular addition, akin to the technique we traditionally use in adding base-10 numbers.

We get an extra 1 in the ninth bit of the answer, but if we ignore this ninth bit, we get the correct answer of 2₍₁₀₎.

We can reason that this is correct as follows: Say one of the two numbers is negative and the other is positive. That is, one of the two numbers has a 1 in the −128's place, and the other has 0 there. If there is no carry into the −128's place, then the answer is OK, because that means we got the correct sum in the last 7 bits, and then when we add the −128's place, we'll maintain the −128 represented by the 1 in that location in the negative number.

If there is a carry into the −128's place, then this represents a carry of 128 taken from summing the 64's column. This carry of 128 (represented by a carry of 1), added to the −128 in that column for the negative number (represented by a 1 in that column), should give us 0. This is exactly what we get we add the carry of 1 into the leftmost column to the 1 of the negative number in this column and then throw away the carry (1 + 1 ≠ 10₍₂₎).

A similar sort of analysis will also work when we are adding two negative numbers or two positive numbers. Of course, the addition only works if you end up with something in the representable range. If you add 120 and 120 with 8-bit two's-complement numbers, then the result of 240 won't fit. The result of adding 120 and 120 as 8-bit numbers would turn out to be −16!

The upshot of all this is that the adding circuit that we built for adding unsigned integers works also for two's-complement integers. The extra carry bit output doesn't have an obvious meaning, but as long as we only heed the actual bits.

3	would be represented as 00000011
−3	would be represented as 10000011

3	would be represented as 00000011
−3	would be represented as 11111101