Using bit operators

by Carl Burch, Hendrix College, September 2012

Using bit operators by Carl Burch is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
Based on a work at www.cburch.com/books/bitops/.

1. Bit operators
1.1. Bitwise operators
1.2. Shift operators
1.3. Operator hierarchy
2. Bits as flags
3. Counting one bits
3.1. Two simple techniques
3.2. A clever technique
3.3. A cleverer technique
3.4. Comparing techniques

In this document, we'll study how a program can manipulate computers' integer representation directly using bit operators. This is sometimes pejoratively called “bit twiddling,” but expert developers would agree that mastering bit operators is a key skill for proficient programmers.

1. Bit operators

We'll study the bit operators provided by C — though though many other languages, including Python and Java, have copied the same symbols that C uses. The bit operators fall into two basic categories: the bitwise operators and the shift operators.

1.1. Bitwise operators

The most significant bit operators are the bitwise operators, which perform logic operations on all bits at once. For example, the bitwise NOT operator ~ (also called the bitwise complement) takes the NOT of all the bits in an integer. For example, if b holds the int value 00110111, then ~b flips all these bits to arrive at the int value 11001000.

The other bitwise operators — bitwise AND &, bitwise OR |, and bitwise XOR ^ — work with two operands. Each pairs corresponding bits in the two operands and performing the logical operation on them. For example, to compute the bitwise OR of 1100 and 1010, we take the OR of the first bit from each number (1, 1), followed by the OR of the second bits (1, 0), then the third bits (0, 1), then the fourth bits (0, 0). For each of these bit pairs, we include a 1 in the result if the first bit or the second bit in the pair is 1 (or if both are one). Thus, we end up with an output of 1110.

Similarly, the bitwise AND includes a 1 in the result if the first bit and the second bit in the pair is 1; the bitwise AND of 1100 and 1010 ends up being 1000. And the bitwise XOR (short for exclusive or) includes a 1 if the first bit or the second bit is 1 (but excluding the case that both bits are 1); the bitwise XOR of 1100 and 1010 ends up being 0110.

The following illustrates these operators at work.

int a = 0x23; /* 00100011 */ int b = 0x36; /* 00110110 */ printf("%x\n", ~a); /* prints DC (hex for 11011100) */ printf("%x\n", a & b); /* prints 22 (hex for 00100010) */ printf("%x\n", a | b); /* prints 37 (hex for 00110111) */ printf("%x\n", a ^ b); /* prints 15 (hex for 00010101) */

1.2. Shift operators

C provides two shift operators: the left-shift operator << and the right-shift operator >>. These are binary operators, with the value to be shifted on the left side of the operator, and the distance to shift the value on the right side.

For example, the value of 5 << 2 is 20, since the binary representation of 5 is 101₍₂₎, which we shift left two spaces (and fill the empty spots with zeroes) to get 10100₍₂₎ = 20₍₁₀₎. Bits that go off the left end are simply lost. (Many programmers would say they fall into the “bit bucket” (or, more verbosely, the “great bit bucket in the sky”), a mythical location where bits go when they disappear.)

Notice that left-shifting by n bits is equivalent to multiplying by 2ⁿ, since we've essentially multiplied the value of each 1 bit in the number by that much. In the example of left-shifting 5 by 2, we get 5 ⋅ 2² = 20.

The right-shift operator works analogously: The value of 20 >> 2 is 5, since 20's binary representation is 10100₍₂₎, which shifted right two spots is 101₍₂₎ = 5. Again, the bits pushed off the right end simply fall into the bit bucket, so 22₍₁₀₎ = 10111₍₂₎ shifted right twice is also 101₍₂₎ = 5. Note that right-shifting is similar to dividing by a power of 2 (and ignoring the remainder).

C doesn't specify how to handle right-shifting negative numbers. There are two common alternatives. In the first, called the logical right shift, a right-shift of a negative number will inserts zeroes into the top end. The second, the arithmetic right shift also does this for positive numbers and zero, but it inserts one bits on the top end for a negative number. Arithmetic right shifts are not as intuitive, but they correspond better to dividing by a power of 2 since negative numbers remain negative: If you take −24 and arithmetically right-shift 2 places, you get −6. Because this is so useful, programmers usually prefer it, and so most compilers do an arithmetic right shift for >>.

because programmers frequently use right-shift in place

(Java resolves this ambiguity by adding a logical right-shift operator >>> in addition to the arithmetic right-shift operator >>. Thus if c holds 11…111101100 representing −20₍₁₀₎, a Java compiler would compute c >> 2 to be 1111…1111011 representing −5₍₁₀₎, but it would compute c >>> 2 as 0011…1111011 representing 1073741822₍₁₀₎).

1.3. Operator hierarchy

The bit operators fit into the operator precedence hierarchy as follows.

~ ! + - (unary operators)

* / %

+ -

<< >> (and in Java, >>>)

< > <= >=

== !=

&

^

|

&&

||

conditional operator (?:)

= *= /= %= += -= <<= >>= &= ^= |=

At the bottom level, you can see that the bitwise and shift operators can be combined with the assignment operator, much like +=.

2. Bits as flags

So why would one want to use bitwise operators? One of the most important applications is to use an integer to hold a set of flags, where each bit corresponds to a different “fact.” A prime example of this comes from the Linux file system, where each file has a number called its mode that indicates who can access the file. If the 4's bit is set, then anybody can read the file; if the 2's bit is set, then anybody can modify the file; if the 128's bit is set, then the person who “owns” the file can modify it; and if the 256's bit is set, then the person who “owns” the file can read it. (The meanings for other bits are more complex.)

You might retrieve this integer in a program, but inevitably you will want to decode it. For example, what if our C program wants to check whether the owner has permission to modify a file? We would want to check whether the 128's bit is set. The most straightforward way to do this is to use the AND operator.

if ((mode & 128) != 0) {

By doing a bitwise AND with 128, we are making it so that only the 128's bit will remain — all other bits in 128 are 0, and anything ANDed with 0 is 0. The result will be 0 if mode's 128's bit is 0, since all other bits become 0 when ANDed with 128; and if mode's 128's bit is 1, the result will be nonzero. So our if statement compares the result of the bitwise AND with zero.

To a casual observer, the appearance of 128 here seems quite meaningless. So the header files define constants that correspond to each of the bits to allow us to refer to the bits in a less obscure way. For owner write permission, the constant 128 is named S_IWUSR — the W stands for write permission, and USR stands for the user; the S and I are simply to ensure that the constant's name is distinct from all other constants' names. So we'd instead write the following.

if ((mode & S_IWUSR) != 0) {

When we create a new file, we might want to indicate the permissions for this new file. For this, we would use the OR operator to combine several bits together.

creat("new_file.txt", S_IWUSR | S_IRUSR | S_IROTH);

(Yes, the function is named creat. Ken Thompson was once asked if he would do anything differently if he were to design UNIX again. He responded, “I'd spell creat with an e.”)

This application of binary representation and bit operators extends beyond just file permissions. Three additional examples: An integer representation can indicate which of the mouse buttons and modifier keys are down, so when a program is responding to pressing a mouse button, it can retrieve a single number indicating the current combination, like control-right-click or shift-alt-left-click. Java's libraries also use flags for indicating whether a font is plain, bold, italic, or bold italic. And Python's re library uses flags for options on how regular expressions should be interpreted.

3. Counting one bits

Let's now ponder a popular programmers' puzzler: Write a function countOnes() that, given an int value, returns the number of bits in that number that are set to 1. For example, countOnes(21) should return 3, since 21's binary representation 10101₍₂₎ has three one bits in it. In fact, I visited Microsoft once interviewing for an internship, and one interviewer spent our appointment discussing various solutions to this question. (Many software companies like to spend interviews gauging interest and mastery using problems like this.)

3.1. Two simple techniques

The simplest technique is right-shift the number 32 times, counting the number of times you right-shift a one bit off.

int countOnesA(int num) { int ret = 0; int cur = num; int i; for (i = 0; i < 32; i++) { ret += cur & 1; cur >>= 1; } return ret; }

A similar alternative checks each bit of the number from right to left.

int countOnesB(int num) { int ret = 0; int mask = 1; int i; for (i = 0; i < 32; i++) { if ((num & mask) != 0) ret++; mask <<= 1; } return ret; }

Note the importance of the parentheses in the if condition: If they were omitted, then the compiler would interpret the expression as “num & (mask != 0),” since the operator hierarchy places != above &. Because C treats Boolean expressions as integers, this would compile fine, but it would not be what is desired. Many programmers always place parentheses around the Boolean operators because their precedence can result is such unexpected behavior.

3.2. A clever technique

In my interview at Microsoft, I composed essentially countOnesB(), which impressed the interviewer well enough. But then he showed me a different technique, which I have to admit is pretty clever. It begins with the observation that, when you subtract a number by 1, all of the lowest bits change up to and including the lowest 1 bit; but the rest of the bits stay the same. So if I do a bitwise AND of n with n − 1, essentially I will remove the last one bit from n.

Once we observe this, we have only to write code that counts how many times we can remove the final bit in this way before we reach a number with no 1 bits at all (i.e., 0).

int countOnesC(int num) { int ret = 0; int cur = num; while (cur != 0) { cur &= cur - 1; ret++; } return ret; }

3.3. A cleverer technique

Much later, I heard of yet another technique, which is yet more clever.

int countOnesD(int num) { int ret; ret = (num & 0x55555555) + ((num >> 1) & 0x55555555); ret = (ret & 0x33333333) + ((ret >> 2) & 0x33333333); ret = (ret & 0x0F0F0F0F) + ((ret >> 4) & 0x0F0F0F0F); ret = (ret & 0x00FF00FF) + ((ret >> 8) & 0x00FF00FF); ret = (ret & 0x0000FFFF) + ((ret >> 16) & 0x0000FFFF); return ret; }

The first assignment statement begins with num & 0x55555555. The appearance of 0x55555555 may seem rather random, but recall that its binary representation alternates between zeroes and ones: 0101010101…. The bitwise AND of num with this number gives a version of num where every other bit (the 2's bit, the 8's bit, the 32's bit) has been removed and the remaining bits (1's, 4's, 16's, …) are kept. We'll call this result a.

The right side of this first addition, which we'll call b, involves right-shifting num once and then finding the bitwise AND of this. The result includes all the bits omitted from a, but shifted so that they they line up with a's bits, in the 1's place, 4's place, 16's place, and so on.

This first addition then adds these two results a and b together. The result, placed into ret, has each pair of bits holding the sum of that pair of bits in the original value of num. Let's look at an 8-bit example, 01110110.

num 01110010

a = num & 0x55 01010000

b = (num >> 1) & 0x55 00010001

ret = a + b 01100001

The first two bits from num are 0 and 1, and the first two bits from the result are 01, which is 0 + 1. The next two bits from num are 1 and 1, and the next two bits from the result are 10, which is 1 + 1. And so on.

Notice what has happened here: In a single 32-bit addition operation, we have managed to perform 16 different bit-additions at the same time.

Now we proceed to add pairs of bits. Finding ret & 0x33333333 zeroes out every other pair of bits, leaving only the bottom pair (1's and 2's places), the third pair from the bottom (16's and 32's places), and so on. And (ret >> 2) & 0x33333333 finds all the remaining pairs, shifted so they are parallel to ret & 0x33333333. We now add these two groups of pairs together, and we end up with each group of four bits containing the sum of the two pairs previously in those four bits. Let's continue with our example.

ret 01100001

a = ret & 0x33 00100001

b = (ret >> 2) & 0x33 00010000

ret = a + b 00110001

Now the first four bits (0011) contain the sum of the first pair of bits from the original ret (01) and the second pair (10); and the second four bits (0001) contain the sum of the third pair (00) and the fourth pair (01). In the original value of num, the three of the first four bits were 1, and now the first four bits contain the number 3 in binary; likewise, the lower four bits in the original num contained one 1 bit, and now the lower four bits contain the number 1 in binary. This is not a coincidence.

The countOnesD() function continues, adding the groups of four bits together, then the groups of eight bits together, and finally the two groups of sixteen bits together. At the end, we have the total number of 1 bits in the original number.

3.4. Comparing techniques

The countOnesD() technique is clever, but is it really faster? To compare the efficiency of each of the techniques, we'll count the total number of operations in each function, ignoring assignments.

For countOnesA(), each iteration includes five operations: a less-than comparison for the for loop, a bitwise AND and addition in the first line of the loop's body, a right-shift in the second line, and an addition as we increment i for the next iteration. With five operations per iteration for each of 32 bits, we have 32 ⋅ 5 = 160 operations.
For countOnesB(), each iteration includes five or six operations: a less-than comparison for the for loop, a bitwise AND and a not-equals comparison for the if statement, possibly an addition if we execute the if statement's body, a left-shift operation for the second line of the loop's body, and an addition as we increment i for the next iteration. We don't know how often the if statement's body will be executed, but let's estimate that half of the bits will be 1's. The number of operations then would be 16 ⋅ 5 + 16 ⋅ 6 = 176 operations.
For countOnesC(), each iteration includes four operations: a not-equals comparison for the while condition, a subtraction and a bitwise AND for the loop's first statement, and an addition as we increment ret. The number of iterations will match the number of 1 bits in the number, which we estimate will be true for 16 of the 32 bits. So there would be 16 ⋅ 4 = 64 operations.
Finally, for countOnesD(), each assignment statement involves four operations: two bitwise ANDs, a right-shift, and an addition. There are five assignment statements, for 5 ⋅ 4 = 20 operations.

Based on this analysis, we'd surmise that countOnesD() would be about three times faster than countOnesC(), which itself is almost three times faster than countOnesA() or countOnesB().

That's the theoretical approach, but you might wonder about the actual performance. I ran a test involving running each of these four functions across 131,072 random numbers, using a Linux computer with an Intel Core i5 750 processor [test code]. Here is the average time per function call.

countOnesA(): 68.8 ns

countOnesB(): 83.1 ns

countOnesC(): 16.0 ns

countOnesD(): 0.9 ns

While the times here don't match the exact ratios our theory predicted, the conclusion remains the same: The cleverness of countOnesD() pays off.

`num`	01110010
`a = num & 0x55`	01010000
`b = (num >> 1) & 0x55`	00010001
`ret = a + b`	01100001

`ret`	01100001
`a = ret & 0x33`	00100001
`b = (ret >> 2) & 0x33`	00010000
`ret = a + b`	00110001

`countOnesA()`:	68.8 ns
`countOnesB()`:	83.1 ns
`countOnesC()`:	16.0 ns
`countOnesD()`:	0.9 ns