3.20.2017

Benford's law

Benford's law or the "First-digit" law states that for a naturally occurring set of numbers, the first digit of these numbers would follow a frequency distribution as -

Let us check this on a sample data set of catalogue data for all known exoplanets. If we calculate the percentage frequency distribution of the first digit for each planets period of revolution, this is what we get -
That's pretty close to the Benford's distribution!

3.18.2017

Mod

x mod y is defined as the remainder you get after dividing the largest number smaller than equal to 'x' that is perfectly divisible by 'y'. Division is a costly operation even on modern processor architectures.

For cases when y=2^n, we could remodel the mod operation using bitwise operations as -
Note:

Population count of a bitset

Population count of a bitset is defined as number of bits set to 1.

The most obvious approach would be to scan every bit and check for its status and maintain a counter for the set bits -

This code runs in O(k) where k is the number of bits. It can be little better by replacing the (i < 64) condition by (b!=0) to stop when b no longer has any set bits.

A better approach suggested by Brian Kernighan was to keep resetting LS1B (Lease Significant 1 Bit) until the bitset becomes 0.
We can reset the LS1B in O(1) which means this algorithm runs in O(r) where r is the number is set bits.

Another approach is to use pre-calculated lookup tables. For n bit long bitset, there are 2^n possible numbers that can be generated using them. If we pre-calculate the population count for the entire set of 2^64 numbers, it will pollute the cache and besides it won't be memory efficient. But we can use Divide and Conquer to create 8 groups of 8 bits each. And for each group of 8 bits, there are 2^8=256 possible numbers for which we can easily generate a pre-calculated lookup table. We can do this as - Now all we need is to get groups of 8 bits in the LS8B (Lease Significant 8 Bit) positions of the bit set (while the top (MSB) 56 bits are all reset) and then find its popcnt by using the lookup table as - Note: