Generating Random Numbers without Modulo Bias

TLDR: If (RAND_MAX + 1) % n != 0, you have modulo bias. Fix it!

Let’s say you need a random number from 0 to 2 (0, 1 or 2). If you do this:

/* LISTING 1 */
int x;
int n = 3;
x = rand() % n;
return x;

your code is broken, thanks to modulo bias. You will always get modulo bias whenever RAND_MAX + 1 of your rand() function is not evenly divisible by your modulus n.

For example, let’s say rand()‘s RAND_MAX is 4, so there are 5 different possible values (0, 1, 2, 3, 4). Now rand() will return each value 20% of the time, assuming that rand() has very good distribution from 0 to RAND_MAX (all modern rand() functions should have very good distribution). But since we take the modulo by 3, we get 0 40% of the time (0 and 3), 1 40% of the time (1 and 4), and 2 20% of the time (2 by itself). Visually, it is as follows:

The following code is correct:

/* LISTING 2 */
int x;
int n = 3;
int rand_limit;
int rand_excess;
rand_excess = (RAND_MAX + 1) % n;
rand_limit = RAND_MAX - rand_excess;
while (x = rand() > rand_limit) {};
return x % n;

So, if RAND_MAX is 4 as in the strongvious example, rand_excess is (4 + 1) % 3, or 5 % 3, or 2. Then rand_limit becomes 4 – 2, or 2. The while statement then throws out the values 3 and 4 (i.e., x is only allowed to be 0, 1, or 2). Then, we return the modulo expression x % n. The expression x % n is redundant here only because RAND_MAX is very low for the sake of our example.

The only problem with the above code is if RAND_MAX is already the maximum value allowed (e.g., if rand() returns an unsigned 64-bit integer within the range 0 through 2^64-1). Then, (RAND_MAX + 1) would wrap back around to 0, and rand_excess would be 0. To avoid this, you can use the alternate expression

/* LISTING 3 */
...
rand_excess = (RAND_MAX % n) + 1;
...

Alas there still remains a remote problem with this workaround: there is a remote chance that rand_excess will be equal to n, needlessly reducing the size of rand_limit. For example, say RAND_MAX is 8 and n is 3. Then (8 % 3) + 1 is (2 + 1), or 3. Now rand_limit is 8 – 3, or 5. But RAND_MAX of 8 is already valid because there are 9 possible values 0 through 8, and there is no modulo bias to begin with (since 9 % 3 = 0)! Luckily, this scenario should be pretty rare given that rand() is a very large number.

But remember: whenever you generate a random number, make sure to remove modulo bias! Use LISTING 2, and maybe LISTING 3 if you want. Lastly, if you know that n will not change, and you also know the size of RAND_MAX, it might be wise to simplify LISTING 2 mathematically (using constants, not expressions) and save some CPU cycles.

Advertisements