TLDR: If **(RAND_MAX + 1) % n != 0**, you have modulo bias. Fix it!

Let’s say you need a random number from 0 to 2 (0, 1 or 2). If you do this:

/* LISTING 1 */ int x; int n = 3; x = rand() % n; return x;

your code is broken, thanks to modulo bias. You will always get modulo bias whenever **RAND_MAX + 1** of your **rand()** function is not evenly divisible by your modulus **n**.

For example, let’s say **rand()**‘s **RAND_MAX** is 4, so there are 5 different possible values (0, 1, 2, 3, 4). Now **rand()** will return each value 20% of the time, assuming that **rand()** has very good distribution from 0 to **RAND_MAX** (all modern **rand()** functions should have very good distribution). But since we take the modulo by 3, we get **0** 40% of the time (0 and 3), **1** 40% of the time (1 and 4), and **2** 20% of the time (2 by itself). Visually, it is as follows:

The following code is correct:

/* LISTING 2 */ int x; int n = 3; int rand_limit; int rand_excess; rand_excess = (RAND_MAX + 1) % n; rand_limit = RAND_MAX - rand_excess; while (x = rand() > rand_limit) {}; return x % n;

So, if **RAND_MAX** is 4 as in the strongvious example, **rand_excess** is (4 + 1) % 3, or 5 % 3, or 2. Then **rand_limit** becomes 4 – 2, or 2. The **while** statement then throws out the values 3 and 4 (i.e., x is only allowed to be 0, 1, or 2). Then, we return the modulo expression x % n. The expression x % n is redundant here only because **RAND_MAX** is very low for the sake of our example.

The only problem with the above code is if **RAND_MAX** is already the maximum value allowed (e.g., if **rand()** returns an unsigned 64-bit integer within the range 0 through 2^64-1). Then, (**RAND_MAX** + 1) would wrap back around to 0, and rand_excess would be 0. To avoid this, you can use the alternate expression

/* LISTING 3 */ ... rand_excess = (RAND_MAX % n) + 1; ...

Alas there still remains a remote problem with this workaround: there is a remote chance that **rand_excess** will be equal to **n**, needlessly reducing the size of **rand_limit**. For example, say **RAND_MAX** is 8 and **n** is 3. Then (8 % 3) + 1 is (2 + 1), or 3. Now rand_limit is 8 – 3, or 5. But **RAND_MAX** of 8 is already valid because there are 9 possible values 0 through 8, and there is no modulo bias to begin with (since 9 % 3 = 0)! Luckily, this scenario should be pretty rare given that **rand()** is a very large number.

But remember: whenever you generate a random number, make sure to remove modulo bias! Use LISTING 2, and maybe LISTING 3 if you want. Lastly, if you know that **n** will not change, and you also know the size of **RAND_MAX**, it might be wise to simplify LISTING 2 mathematically (using constants, not expressions) and save some CPU cycles.