Elsevier

Physics Letters A

Volume 380, Issues 22–23, 20 May 2016, Pages 1895-1899
Physics Letters A

Ubiquity of Benford's law and emergence of the reciprocal distribution

https://doi.org/10.1016/j.physleta.2016.03.045Get rights and content

Highlights

  • We apply the Law of Total Probability to the construction of scale-invariant pdf's, and require that probability measures be dimensionless and unitless under a continuous change of scales.

  • Iterating this procedure for an arbitrary set of normalized pdf's again produces scale-invariant distributions.

  • The invariant function of this iteration is given uniquely by the reciprocal distribution, suggesting a kind of universality.

  • Requiring maximum entropy for uniformly binned size-class distributions also leads uniquely to the reciprocal distribution.

  • We discuss some applications of the above to computation and to the evolution of genomes.

Abstract

We apply the Law of Total Probability to the construction of scale-invariant probability distribution functions (pdf's), and require that probability measures be dimensionless and unitless under a continuous change of scales. If the scale-change distribution function is scale invariant then the constructed distribution will also be scale invariant. Repeated application of this construction on an arbitrary set of (normalizable) pdf's results again in scale-invariant distributions. The invariant function of this procedure is given uniquely by the reciprocal distribution, suggesting a kind of universality. We separately demonstrate that the reciprocal distribution results uniquely from requiring maximum entropy for size-class distributions with uniform bin sizes.

Introduction

In 1881 [1] the astronomer and mathematician Simon Newcomb observed that the front pages of tables of logarithms were more worn than later pages. In other words mantissas corresponding to quantities that had a smaller first digit were more common than for quantities with a larger first digit. He argued that the distribution of “typical” mantissas was therefore logarithmic. The physicist Frank Benford [2] rediscovered this in 1938 and provided more detail, for which his name is now associated with this phenomenon.

By now it is well documented that the frequency of first digits D in the values of quantities randomly drawn from an “arbitrary” sample follows Benford's Law of Significant Digits, namely,Bb(D)=ln(1+D)ln(D)ln(b)=D1+Ddxxln(b), where b is the arbitrary base for the logarithms and is commonly taken to be 10. We note that the probability of first digit 1 for base 10 is log10(2).30, far exceeding that for a uniform distribution of digits. The rightmost expression in Eqn. (1) expresses Newcomb's and Benford's logarithmic distribution as the cumulative distribution function (cdf) based on the reciprocal probability distribution function (pdf), which has been normalized to 1. The pdf that underlies Benford's Law is therefore the reciprocal distribution, r(x)c/x, with normalization constant c=1/lnb when the random variable x ranges between 1/b and 1. We note that Eqn. (1) is base invariant (i.e., invariant under a common change in the base of the various logarithms) and that the reciprocal pdf is scale invariant (a function f(x) is said to be scale invariant if f(λx)=λpf(x) for any pC). In this work we will concentrate on the emergence of the reciprocal distribution under a variety of conditions. The invariant (or fixed-point) function of an iterative procedure applied to distribution functions that are invariant under a continuous change of scales will be shown to be the reciprocal distribution. Additionally, requiring maximum entropy for size-class distributions with uniformly distributed bin sizes leads to the same function.

Very relevant to the discussion above is T.P. Hill's proof in 1995 [3], [4], [5], [6] that random samples chosen from random probability distributions are collectively described by the reciprocal distribution, which is the pdf for the logarithmic or Benford distribution. In Hill's words: “If distributions are selected at random (in any “unbiased” way) and random samples are then taken from each of these distributions the significant digits of the combined sample will converge to the logarithmic (Benford) distribution.” Because of this, the latter has been appropriately characterized as “the distribution of distributions,” as Hill's theorem is in some sense the obverse (counterpart) of the Central Limit Theorem for probability distributions with large numbers of samples.

Benford's Law has been found to hold in an extraordinary number and variety of phenomena in areas as diverse as physics [7], [8], [9], [10], [11], [12], genomics [13], engineering [14] and among many others, forensic accounting [15]. Recently the number of examples where it applies has been expanding rather rapidly.

In the 1960's the need for understanding the constraints imposed in computation by finite word length and its impact on round-off errors were behind the interest of many, including R. Hamming [16], [17], in Benford's law.

Importantly, Hamming argued that repeated application of any of the four basic arithmetic operations (addition, subtraction, multiplication and division) to numbers leads to results whose distribution of leading floating-point digits approaches the logarithmic (Benford) distribution. Hamming further argued that if any one arithmetic operation involves a quantity already distributed according to the reciprocal distribution, r(x), then the result of this and all subsequent operations will result in quantities whose pdf for the leading floating-point digits is the reciprocal distribution. Hamming called this property the “persistence of the reciprocal distribution” although a better word might be contagiousness, since contact with the reciprocal distribution at any point in a calculational chain modifies the remaining chain irrevocably.

In this paper we use elementary methods to explore the connection between Benford's law, Hill's theorem and the “contagiousness” property of the reciprocal distribution. We will demonstrate this by constructing a simple but comprehensive class of probability distributions that depends on a single random variable that is dimensionless and unitless under a continuous change of scales. This class depends on an underlying pdf that is arbitrary, and which can be sampled in a manner consistent with Hill's Theorem. We further generalize this into an iterative procedure whose invariant functions are shown uniquely to be the reciprocal distribution, and which demonstrate Hamming's “contagiousness”. Uniqueness obtains because the arbitrary (or “random” in this sense) underlying pdf eliminates any particular solutions in the invariant functions and leaves only the general solution. Our procedure generalizes the work of Hamming [16], and to the best of our knowledge is both new and useful. We show alternatively by invoking maximum entropy for a size-class distribution function that the reciprocal distribution again obtains as the unique solution. We conclude by speculating on the universality and applications of these results, with particular emphasis on minimizing errors in computations of various types.

Section snippets

Invariance under changes in units and the law of total probability

In most scientific applications a stochastic variable x is assigned to the random values of some physical quantity. This quantity carries either physical dimensions (e.g., length or volume) or units (such as the number of base pairs in a genome). However, because it refers to probabilities, the probability measure F(x)dx that characterizes x must be dimensionless and unitless.

Hence, in order to remove units or dimensions from the measure it is necessary to introduce a parameter that results

Selected applications to computing and to minimal truncation errors

Scale invariance (which only restricts h(x) to a general power law: xs) is thus seen to be a necessary condition for the emergence of the reciprocal pdf (uniquely s=1) and its cumulative distribution function that leads to Benford's Law of Significant Digits. A sufficient condition is that the reciprocal pdf emerge when in contact with arbitrary members of the set of normalizable pdf's. Alternatively, requiring an invariant-function solution to the “contagion” process leads immediately and

Acknowledgements

One of us (JP-M) would like to thank the everis Foundation and Repsol for generous support, and the Theoretical Division of Los Alamos National Laboratory for its hospitality. This work was carried out in part under the auspices of the National Nuclear Security Administration of the U.S. Department of Energy at Los Alamos National Laboratory under Contract No. DE-AC52-06NA25396.

References (30)

  • X.J. Liu et al.

    Eur. Phys. J. A

    (2011)
  • L. Pietronero et al.

    Physica A

    (2001)
  • Ed. Bormashenko et al.

    Physica A

    (2016)
  • J. von Neumann

    The Computer and the Brain

    (2000)
  • S. Newcomb

    Am. J. Math.

    (1881)
  • F. Benford

    Proc. Am. Philos. Soc.

    (1938)
  • T.P. Hill

    Proc. Am. Math. Soc.

    (1995)
  • T.P. Hill

    Stat. Sci.

    (1995)
  • A. Berger et al.

    Math. Intell.

    (2011)
  • A. Berger et al.

    Probab. Surv.

    (2011)
  • J.C. Pain

    Phys. Rev. E

    (2008)
  • M.A. Moret et al.

    Int. J. Mod. Phys.

    (2006)
  • M. Sambridge et al.

    Geophys. Res. Lett.

    (2010)
  • J.L. Friar et al.

    PLoS ONE

    (2012)
  • S.W. Smith

    The Scientist and Engineer's Guide to Digital Signal Processing

    (1997)
  • View full text