Benford’s law: learning to fraud or to detect frauds?

Originating author is Christiane Rousseau.
It is very risky to change too many numbers in some financial statements if one does not know some mathematics. Indeed, most often the numbers appearing in financial statements follow some strange mathematical rule, called Benford’s law, or law of the first significant digit. If one forgets to follow the rule, then the numbers will fail some statistical tests and are likely to be scrutinized with care. Benford’s law claims that if you collect numbers at random and calculate the frequencies of their first significant digits, the numbers with first significant digit $1$ should appear around $30$ % of the time, while the numbers with first significant digit $9$ appear only $4.5$ % of the time. This rule is observed in many other sets of numbers, like powers of $2$ or Fibonacci numbers.

Why?

We now have satisfying explanations. We are going to share them with you.

Benford law’s concerns the distribution of the first significant digits of numbers. The first significant digit of a positive number is the leftmost nonzero digit of its decimal expression. For instance, the first significant digit of $\pi%$ is $3$ , that of $2371.5$ is $2$ , and that of $0.00563$ is $5$ . Another way to define it which will be useful for our mathematical discussion is to write a positive real number $x$ as a number $m \in [ 1 , 9 )$ times a power of $10$ :

$x = m 10^n ~ , ~~ n \in \mathbb{Z}.$

Then the first significant digit of $x$ is the integer part of $m$ , which can be denoted by $\lfloor m \rfloor$ . The number $m$ is called the mantissa of $x$ . We now claim that if you collect numbers at random and compute the frequency $B(i)$ of the first significant digit $i$ , then $B(i)$ is approximately given by $\log_{10} (1 + \frac{1}{i})$ . This gives us the frequencies:

Table 1: Frequencies of Benford’s law.

Figure 1: Frequencies B(i) of Benford’s law.

Let us now give a very brief historical note. The phenomenon was first discovered by the astronomer Simon Newcombe (1835-1909) who noticed that the first pages of the logarithmic tables corresponding to small first significant digits looked much more worn than the later pages. His discovery was forgotten and the law was rediscovered by Frank Benford (1883-1948) around 1938. Frank Benford collected tenths of thousands of numbers from all origins following his law. The modern data base of Simon Plouffe which contains $215$ millions mathematical constants also follows Benford’s law.

Many sets of numbers that are not random also follow Benford’s law. This is the case with the populations of countries, with the areas of countries, with the length of rivers, etc. Maybe you will stop me and start being skeptic… In which units are these lengths or areas collected? Are lengths in miles or in kilometers? This does not matter… If the lengths of rivers in kilometers follow Benford’s law, then the lengths in miles follow Benford’s law! A change of unit corresponds to a change of scale. We will see that Benford’s law is invariant under change of scale. Moreover, it is the only probability law which is invariant under change of scale.

Figure 2: Some data approximately following Benford’s law: areas of countries in
square kilometers, areas of countries in square miles, and populations of countries.

I told you in the introduction that the Fibonacci numbers follow Benford’s law. But, in a sense, Benford’s law is subjective, since it depends on the basis $10$ in which we write our numbers. In some basis $b$ with $b \neq 10$ , the nonzero digits are the elements of the set $\{ 1; ... ; b-1\}$ , and Benford’s law in basis $b$ says that the frequency of the first significant digit $i$ is $B_b (i) = \log_b (1+\frac{1}{i})$ . Well! Fibonacci numbers follow Benford’s law in any basis $b!$ Benford’s law is invariant under change of basis. And it is the only non trivial probability law which is invariant under change of basis.

Now it is time to come to explanations. They will require that you remember some of your probability course. But you could prefer to experiment by yourself first before starting reading more serious maths.

1. Invariance under change of scale

Let us consider a simple change of scale obtained by multiplying all numbers of a set of numbers by $2$ . If we consider the numbers with first significant digit $1$ , then they are changed to numbers with first significant digit either $2$ or $3$ . It is easy to check that $B(1) = B(2) + B(3)$ . Indeed,

Rendered by QuickLaTeX.com

Similarly, you could check that $B(2) = B(4)+B(5)$ , etc. But how do you manage if you change from miles to kilometers, i.e. multiply numbers by $1.6$ ? As stated, Benford’s law is too restrictive and we need to generalize it. What does it mean that the first significant digit is $i$ ? It means that the mantissa $m$ belongs to the interval $[i, i + 1)$ . So, Benford’s law is a partial probability distribution on the mantissa. The generalized Benford law (which, by abuse of language, we will call Benford’s law) on the mantissa is given by a density function on the interval $[1, 10)$ . When we pick up a number at random, we can compute its mantissa. This gives us a random variable $M$ taking values in $[1, 10)$ . We say that it follows Benford’s law if its density function is given by

$f(x) = \left\{ \begin{array}{ll} \frac{1}{x \log 10}, & x \in [1 , 10 ), \\ 0, & \mbox{otherwise.} \end{array}\right.$

If $P(a \leqslant M < b)$ represents the probability that $a \leqslant M < b$ , then this means that we must have

$P( a \leqslant M < b ) = \int_a^b f(x) d x.$

It is really a generalization of Benford’s law since

Rendered by QuickLaTeX.com

What does it mean to say that a random variable $X$ on $[1, 10)$ is invariant under change of scale? It means that, if $c$ is a positive real number and if we take the random variable $Y = cX$ , then the mantissa $M$ of the random variable $Y$ has the same density function as $X$ . It is not difficult to show that this is the case when $X$ follows Benford’s law, but there are several cases to distinguish depending on the size of $c$ . We will do one case and will let you do the other cases. We can write $c = m 10^r$ , with $m \in [1, 10)$ the mantissa of $c$ . Since the mantissa of $cX$ is the same as that of $mX$ , it suffices to consider the case $c \in [1, 10)$ .
What is the tool to show that? You may remember from your probability course that the (cumulative) distribution function is sometimes more useful than the density function for a continuous random variable. The distribution function of a random variable $M$ is
defined as

$F(x) = P(M \leqslant x).$

If $X$ follows Benford’s law, then its distribution function is given by

(1) $\begin{eqnarray*} F(x) = \left\{ \begin{array}{ll} 0, & x < 1, \\ \log_{10} x, & x \in [1 , 10), \\ 1, & x \geqslant 10. \end{array}\right. \end{eqnarray*}$

So we must show that if $X$ follows Benford’s law and $M$ is the mantissa of $cX$ for $c \in [1, 10)$ , then the distribution function of $M$ is given by (1).

For that purpose, we need to calculate $P(M \leqslant z)$ for $z \in [1, 10]$ . $M$ is the mantissa of $cX$ which takes values inside $[c, 10c)$ . So $M = cX$ , when $cX < 10$ and $cX / 10$ when $cX \geqslant 10.$ The first case occurs when $z < c$ . For the mantissa of $cX$ to lie inside $[1, c)$ , the only possibility is that $cX \in [10, 10c]$ . Then the mantissa of $cX$ is equal to $cX / 10$ .
Hence,

Rendered by QuickLaTeX.com

as expected. The other cases are done in the same way.

The reciprocal is more exciting…

2. Benford’s law is the only probability law on the mantissa which is invariant under change of scale

It seems an impressive statement! Yet, you will see that its proof is not much more complicated than the previous argument. Let $X$ be a random variable representing the mantissa and taking values inside $[1, 10)$ . We look for its distribution function $F(x)$ , under the hypothesis that $X$ is invariant under change of scale. So we need to compute

$F(x) = P(X \leqslant x) = P(1 \leqslant X \leqslant x).$

Hence, we must have $F(0) = 0$ and $F(10) = 1$ .
The main difficulty of the proof lies in interpreting what it means that $X$ is invariant under change of scale. Since $1 \leqslant X \leqslant x$ and $c \leqslant cX \leqslant cx$ are the same events, then we have that

(2) $\begin{eqnarray*} P(1 \leqslant X \leqslant x) = P(c \leqslant cX \leqslant cx) = F(x). \end{eqnarray*}$

As before, let us consider the case of some $c \in [1, 10)$ so that $cx < 10$ ( $c$ depends on $x$ ). Then, for $c \leqslant cX \leqslant cx$ , $cX$ is equal to its mantissa. Since $X$ is invariant under change of scale, then the mantissa of $cX$ has the same distribution function as $X$ . Hence,

$P(c \leqslant cX \leqslant cx) = F(cx) - F(c).$

Combining with (2) we see that $F(x)$ satisfies

(3) $\begin{eqnarray*} F(x) = F(cx) - F(c),~~~ F(1) = 0,~~ F(10) = 1. \end{eqnarray*}$

provided that $c \in [1, 10)$ is not too large. We must find $F$ from the functional equation (3). Let’s see how to do this. If we let $c = 1 + \varepsilon$ , this yields

$F(x) = F(x(1 + \varepsilon)) - F(1 + \varepsilon)$

which can be written

$\frac{F(x(1 + \varepsilon)) - F(x)}{x \varepsilon} = \frac{F(1 + \varepsilon) - F(1)}{x \varepsilon},$

since $F(1) = 0$ . Let us take the limit when $\varepsilon \longrightarrow 0$ . We must recognize on each side a quotient whose limit is a derivative. On the left side it is $\frac{F(x + x \varepsilon) - F(x)}{x \varepsilon}$ , the limit of which is $F'(x)$ , and on the right side it is $\frac{F(1 + \varepsilon) - F(1)}{\varepsilon}$ which tends to $F'(1)$ . Hence, we get the
differential equation with separable variables

$F'(x) = \frac{F'(1)}{x},$

the solution of which is $F(x) = F'(1) \ln x + C$ . Since $F(1) = 0$ we have $C = 0$ , and since $F(10) = 1$ , then $F'(1) = \frac{1}{\ln 10}$ . Hence, $F(x) = \frac{\ln x}{\ln 10} = \log_{10} x$ and we are done!

3. Why numbers collected from all origins follow Benford’s law?

An answer was provided by Theodore Hill in 1995, and we will discuss briefly his idea. Of course, not all sets of numbers follow Benford’s law. For instance, if you consider the height of humans in meters then, up to a few exceptions, only the first significant digits $1$ and $2$ will occur, and if you convert the size in feet (a foot is approximately $30$ cm) you will change the distribution law of the first significant digit. So this set of numbers is not invariant under change of scale. But, suppose we consider a large set of numbers coming from all origins and we change of scale. There are different subsets of numbers with their particular scale. Because the set is large and the numbers come from all origins then, most likely all different scales are present. Multiplying all numbers present in the set by a positive constant induces a permutation of the scales present in the new set. So, as a whole, we could expect the set of numbers to behave as if it had no special scale. Hence, it will follow Benford’s law.

This explanation is good for sets of numbers coming from all origins. But it does not explain why areas of countries, populations of countries, or lengths of rivers should follow Benford’s law. We will discuss very recent explanations (2008!) for this case given by Gauvrit, Delahaye and Fewster. Their explanation is also valid for large sets of numbers coming from all origins.

4. Sets of numbers spreading over several orders of magnitude are likely to follow Benford’s law!

We are working in base $10$ and we have seen that positive numbers $x$ can be written as $x = m 10^n$ , where $m \in [1, 10)$ and $n \in \mathbb{Z}$ . We could consider $n$ as the order of magnitude and we say that there are several orders of magnitude if there are several values of $n$ for our set of numbers. (Note that such a property is invariant under change of scale!) To simplify the explanation, suppose that the numbers lie within $[1, 10^6)$ . Then the numbers with signicant digit $1$ are the ones in the set

$S_1 = [1, 2) \cup [10,20) \cup [100,200) \cup [1000, 2000) \cup [10^4 , 2 \times 10^4 ) \cup [10^5 , 2 \times 10^5),$

and similar sets $S_i$ with the other digits. It is better to move to the logarithm in base $10$ of these numbers: $y = \log_{10} x$ . Then $y = \log_{10} m + n$ . Let us show that if a random variable $M$ on $[1, 10)$ obeys Benford’s law, then the random variable $Z = \log_{10} M$ is simply uniform on $[0, 1)$ . For that, it suffices to show that the distribution function of $Z$ is that of the uniform random variable on $[0, 1)$ , namely

$F(z) = \left\{ \begin{array}{ll} 0, & z < 0, \\ z, & z \in [0 , 1), \\ 1, & z \geqslant 1. \end{array}\right.$

Indeed, when $z \in [0, 1)$ ,

$P(Z \leqslant z) = P(0 \leqslant \log_{10} M \leqslant z) = P(1 \leqslant M \leqslant 10^z) = \log_{10} 10^z = z.$

If $X$ belongs to the set $S_1$ , then $Y$ belongs to the set $T_1 = \log_{10} S_1$ :

and similarly for the other digits. Suppose that taking a random number in our set is a random variable $X$ taking values inside $[1, 10^6)$ . Then $Y = \log_{10} X$ takes values inside $[0 , 6)$ . Recall that the probability that some random variable belongs to some set is equal to the area under the graph of the density function over the set. If the density function $f$ of $Y$ over $[0 , 6)$ was uniform as in Figure 3 (a) we would be done. Most often however, it will not be the case as in Figure 3 (b). But this is why it is so important that the original set of numbers is spread over several orders of magnitude. The different portions corresponding to a given first significant digit $i$ are spread horizontally over several segments, whose sum of lengths is of the order of $\log_{10} (1 + \frac{1}{i})$ of the total width. So, even if the height of $f(x)$ is not the same from one segment to the other, one could hope that the mean height be of the same order of magnitude for the different digits. When this occurs, then the data follow Benford’s law.

(a) density function f uniform
(b) density function f non uniform
Figure 3: The areas corresponding to the frequencies of the first significant digits 1, 2, 3 and 4 for two
different density functions of Y. The values of the corresponding areas are plotted in Figure 4.

(a) density function of f
(b) Areas under the curve for the signicant digits of f and for the uniform function
Figure 4: The areas corresponding to the frequencies of the first signicant digits 1, 2, 3 and 4 for the density function of Figure 3(b). On the right we see that these values are pretty close to those obtained by Benford’s law in the case of a uniform density function for Y.

5. How to test if a set of numbers follows Benford’s law?

If you have followed a statistics course, then you probably studied the $\chi^2$ goodness-of-fit test. This test allows you to check if some data follow some probability distribution. Suppose you want to make the test with a set of $n$ numbers. You just need to construct a table, in which $n_i$ represents the number of numbers in your set which have first
significant digit $i$ . Of course, $n = n_1 + ... + n_9$ . The $N_i$ represent the numbers of numbers that should have first significant digit $i$ if your set was following Benford’s law, namely $N_i = nB(i)$ .

Rendered by QuickLaTeX.com

Table 2: The table for the $\chi^2$ goodness-of-fit test.

You then calculate

$\chi^2 = \sum \limits_{i=1}^9 {\frac{(n_i - N_i)^2}{N_i}},$

and you look in a table of the $\chi^2$ at the line corresponding to $8$ degrees of freedom. If you are to make a test with $5$ % of error, then you accept that the data fit Benford’s law if $\chi^2 < 15.51$ and you reject otherwise. This is a fast recipe, but if you are to make such tests with your students, then just take some time to get more familiar with the details of the test and its meaning.

6. Invariance of Benford’s law under change of basis

This could be modeled in a similar way as the invariance under change of scale. It is more tricky though, since we cannot limit our work to the mantissa. Indeed, if $x = m 10^n$ , then the part $10^n$ also need to be converted to the new basis. And, in fact, the main difficulty is to express in mathematical terms what it means for a random variable to be independent under change of basis. We skip the details.

7. Conclusion

Benford’s law is fascinating: it defies the intuition, and is it something that you can test yourself and also adapt for a classroom activity. It used to be a curiosity, but it is now a standard tool to detect frauds. Of course, more and more tax evaders learn of it. But pay attention: the first significant digit is not the only thing to take care of. The generalized Benford’s law allows to derive a law for the second significant digit, third significant digit, etc. You can try to find it by yourself: just think in which unions of intervals the mantissa of a number should be so that its second significant digit is $i$ .

2 Responses to Benford’s law: learning to fraud or to detect frauds?

Steve Harrington says:

October 6, 2012 at 8:25 am

There’s an Excel spreadsheet to investigate Benford’s Law here

Andy Edwards says:

October 11, 2012 at 2:47 am

Nice vignette, this one. Maybe a bit too algebraic and generalised a bit too soon but the bases are all covered pretty well. If you think about it, Benford’s law is not all that counterintuitive providing the numbers are randomly selected across many orders of magnitude and no limits are applied to curtail the frequency of some of the leading digits.
Taking the lengths of all the rivers in the world as an example, this should follow Benford’s Law irrespective of whether you measure them in miles or kilometres. But how long does a watercourse have to be to be properly considered a river? Make the minimum cutoff point 100 km and the very large number of the shortest rivers between 100 and 160 km will all have 1 as their leading digit. But when their lengths are converted into miles, none of them will. All will be between 62.4 and 99.4 miles, distorting the leading-digit count in favour of 6, 7, 8 and 9 at the expense of 1, 2, 3 and especially 4 and 5. Benford’s Law, which may have seemed to apply in kilometres, will no longer apply. This is because placing the arbitrary minimum has de-randomised the numbers very powerfully.
The lengths of the half-lives of the radioactive isotopes of all elements follows Benford’s Law quite nicely, mainly because they vary from thousandths of a second up to trillions of years across about twenty orders of magnitude.
Beware the examples which are sometimes given of people whose fraud was detected because the cheques they presented or the total weekly takings figures of a store they managed contravened Benford’s Law. There are compelling reasons why the weekly takings might all start with the same digit ( $10000-$ 19000) or vary across a range which doesn’t match the Benford’s Law suggestions ( $50000-$ 80000). This is because not enough orders of magnitude are spanned for the law to hold.

Benford’s law: learning to fraud or to detect frauds?

Related

2 Responses to Benford’s law: learning to fraud or to detect frauds?

Leave a Reply Cancel reply

Receive notice of every new vignette.

Search

Benford’s law: learning to fraud or to detect frauds?

Share this:

Related

2 Responses to Benford’s law: learning to fraud or to detect frauds?

Leave a Reply Cancel reply

Receive notice of every new vignette.

Search