Counting Bits

24 Aug 2019

What's the best way of counting the number of 1-bits in an integer?

On modern hardware, have your CPU do it for you. On Intel CPUs, the popcnt instruction is what you want. I learned reading Hacker's Delight that some architectures called counting the 1-bits population count, which presumably informed the name of the Intel instruction.

Let's try to count the number of 1-bits in a 64-bit integer on an Intel CPU.

I'm trying to keep this as simple as possible (for me; I'm not an assembly guru), so I'll purposefully keep the number of 1-bits low, so that I can just return the count as the exit value from my little assembler program.

The following went into a file named popcnt.s:

.section .data
.globl _start
_start:
# I'll put my test value in one of the new-ish 64-bit registers.
# I would have preferred to just us a literal arg to popcnt, but
# apparently the instruction does not support that. Also,
# thanks, GNU Assembler, for having binary literals!
movq $0b1000000010000000100000001000000010000000100000001000000010000000, %r11
# Count the number of 1-bits in %r11 and put the answer in %rdi
popcntq %r11, %rdi
# Move the number 60 to %rax in preparation for a syscall.
# Syscall number 60 is exit. The exit syscall requires the exit number
# to be in %rdi, which we have done, above.
movq $60, %rax
syscall

Build the little assembly program...

$ as --gstabs popcnt.s -o popcnt.o
$ ld popcnt.o -o popcnt

Run the little assembly program and ask for the exit number:

$ ./popcnt
$ echo $?
8

And that's how we use popcnt.

In Go 1.13rc1, we can also use a binary literal (thanks, Go Team!) and we can use bits.OnesCount64() to count the number of 1-bits:

package main

import (
	"fmt"
	"math/bits"
)

func main() {
	var n uint64 = 0b10000000_10000000_10000000_10000000_10000000_10000000_10000000_10000000
	fmt.Printf("OnesCount64(%064b) = %d\n", n, bits.OnesCount64(n))
}

$ go1.13rc1 build
$ ./popcnt 
OnesCount64(1000000010000000100000001000000010000000100000001000000010000000) = 8

Let's ask Go to show us the assembly code for the program.

$ go1.13rc1 tool compile -S main.go > main.S

When we look in main.S, we do indeed see the popcnt instruction:

        0x0049 00073 (main.go:10)       POPCNTQ AX, CX

Cool!

Because of CPU instruction support for counting 1-bits, the rest of this discussion is moot. But the book Hacker's Delight really does show a neat way of counting the number of 1-bits in an integer. There was a clearer example on Stack Overflow that really made this click for me.

Let's say I have the byte 10110110, and I want to count the number of 1-bits in it.

The approach explained in Hacker's Delight is to sum the number of 1-bits in each pair of bits, and then sum the number of 1-bits in each quad of bits, and then add both quads together. It's a basic divide-and-conquer algorithm, but seeing it work in practice makes it way clearer.

As I said, the first step is to count the number of 1-bits in each pair of bits. For our example number, 10110110, that means we want to count the number of bits in each of these pairs:

10 11 01 10

The interesting thing is that we can store the sum of each pair using only two bits! After all, each pair can have only 0 or 1 or 2 bits, and 0, 1, and 2 in binary are 00, 01, and 10 respectively. So not only can we sum the 1-bits for each pair, we can overwrite each pair with the sum! And we can sum all 4 pairs in parallel.

First, we need to arrange the bits in those 4 pairs so that we can add them together. Essentailly what we want to do is sum all of the odd bits to all of the event bits. If we find a way to arrange the bits of this number

10 11 01 10

like so,

0  1  1  0
1  1  0  1

then we could just sum them together like so:

 0  1  1  0
 1  1  0  1
----------
01 10 01 01

And if we could only arrange this result like so:

10 01
01 01

then we could sum those together like so:

10 01
01 01
-----
11 10

And if we could only arrange that result like so:

10
11

then we could sum to get our final answer like so:

  10
  11
----
 101

which is 5 in decimal, and the number of 1-bits in the number we started with!

Happily, it turns out there is a way to do this. If we want to take our starting value, 10110110, and sum each pair of bits, we can do it like so.

First, isolate all the odd bits by masking them, like so:

   10110110
 & 01010101
   --------
   00010100 <-- odd bits

Second, shift all of the even bits so that we can place them "under" the odd bits for summing...

   10110110
      >> 1
   --------
   01011011

...but don't forget to isolate only the even bits! (Yes, I realize the even bits have now shifted into the odd place, but bear with me.)

   01011011
 & 01010101
--------
   01010001 <-- even bits

Now we have two sets of bits, even and odd, which are "aligned", so we can now stack one on top of the other and sum them.

   00010100 <-- odd bits
 + 01010001 <-- even bits
   --------
   01100101 <-- result A

If we stack our original pairs of bits on top of their sums, we will see that each pair of bits in our result is the sum of 1-bits.

   10 11 01 10 <-- starting bits
   01 10 01 01 <-- result A, from above
    1  2  1  1 <-- each pair's sum in decimal

Progress! The next step is to take each pair of sums, and sum those. The number of spaces we need to shift doubles, and the bitmask pattern, used to isolate each sum, changes too.

   01100101 <-- result A, from above
 & 00110011
   --------
   00100001 <-- even sums

   01100101 <-- result A, from above
      >> 2
   --------
   00011001 <-- shift result

   00011001 <-- shift result from right above
 & 00110011
   --------
   00010001 <-- odd sums
   
   00100001 <-- even sums
 + 00010001 <-- odd sums
   --------
   00110010 <-- result B

If we were successful, we added two pairs of sums together:

      binary | decimal
     10   01 | 2 1 <-- even sums
 +   01   01 | 1 1 <-- odd sums
   --------- | ---
   0011 0010 | 3 2

   1011 0110 <-- starting bits

Looks good! Now all we have to do is add our final two sums together. Once more, we double our shift amount, and change the bit mask pattern.

   00110010 <-- result B from above
 & 00001111
   --------
   00000010 <-- right sum

   00110010 <-- result B from above
      >> 4
   --------
   00000011 <-- left sum shifted

   00000011 <-- left sum shifted from right above
 & 00001111
   --------
   00000011 <-- left sum

   00000010 <-- right sum
 + 00000011 <-- left sum
   --------
   00000101 <-- final sum, 5 in decimal

And there we are! The starting bits were 10110110, and there are indeed 5 1-bits.

In go 1.13 code, the above would look like this:

package main

import (
	"fmt"
)

func main() {
	var startBits uint8 = 0b10110110
	x := startBits
	x = (x & 0b01010101) + ((x >> 1) & 0b01010101)
	x = (x & 0b00110011) + ((x >> 2) & 0b00110011)
	x = (x & 0b00001111) + ((x >> 4) & 0b00001111)
	fmt.Printf("Number of 1-bits in %08b: %d\n", startBits, x)
}

$ go1.13rc1 build
$ ./popcnt 
Number of 1-bits in 10110110: 5

In the Go example, I only count the 1-bits in a byte. For the more common use-cases of 32-bit or 64-bit numbers, you can extend out the pattern above. Also, the above code can be simplified (and therefore sped up) even more, which books like Hacker's Delight will show you how to do. Of course, I'll return to the advice from the top; modern CPUs have an instruction to do this, so always favor that.