Noisy Deductive Reasoning: How Humans Construct Math, and How Math Constructs Universes

Wolpert, David H.; Kinney, David

doi:10.1007/978-3-030-70354-7_10

David H. Wolpert^11,12,13 &
David Kinney¹¹

Part of the book series: The Frontiers Collection ((FRONTCOLL))

842 Accesses

Abstract

We present a computational model of mathematical reasoning according to which mathematics is a fundamentally stochastic process. That is, in our model, whether or not a given formula is deemed a theorem in some axiomatic system is not a matter of certainty, but is instead governed by a probability distribution. We then show that this framework gives a compelling account of several aspects of mathematical practice. These include: 1) the way in which mathematicians generate research programs, 2) the applicability of Bayesian models of mathematical heuristics, 3) the role of abductive reasoning in mathematics, 4) the way in which multiple proofs of a proposition can strengthen our degree of belief in that proposition, and 5) the nature of the hypothesis that there are multiple formal systems that are isomorphic to physically possible universes. Thus, by embracing a model of mathematics as not perfectly predictable, we generate a new and fruitful perspective on the epistemology and practice of mathematics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that just like the authors of all other papers written about mathematics, we believe that the deductive reasoning in this essay is correct. The fact that we acknowledge the possibility of erroneous deductive reasoning, and that in fact the unavoidability of erroneous reasoning is the topic of this essay, doesn’t render our belief in the correctness of our reasoning about that topic any more or less legitimate than the analogous belief by those other authors.
2.
This is equivalent to requiring that an NDR machine is a “sequential information source” [8]. In the current context, it imposes restrictions on how likely the NDR machine is to remove claims from the claims tape.
3.
Note that even if a claims set C is small, it might only arise with non-negligible probability in large claims lists, i.e., claims lists produced after many iterations of the NDR machine. For example, this might happen in the NDR machine of the community of mathematicians if the claims in c would not even make sense to mathematicians until the community of mathematicians has been investigating mathematics for a long time.
4.
Note the implicit convention that \({\overline{P}}(v \;|\; q)\) concerns the probability of a claims list containing a single claim in which the answer v arises for the precise question q, not the probability of a claims list that has an answer v in some claim, and that also has the question q in some (perhaps different) claim.
5.
In general, even if a mathematician updates their beliefs in a Bayesian manner, the priors and likelihoods they use to do so may be “wrong”, in the sense that they differ from the ones used by the far-future community of mathematicians. The use of purely Bayesian reasoning, by itself, provides no advantage over using non-Bayesian reasoning—unless the subjective priors and likelihoods of the current community of mathematicians happen to agree with those of the far-future community of mathematicians. In the rest of this section we assume that there is such agreement. See [5, 23] for how to analyze expected performance of a Bayesian decision-maker once we allow for the possibility that the priors they use to make decisions differ from the real-world priors that determine the expected loss of their decision-making.
6.
Note that this argument doesn’t require the answer distribution of the far-future community of mathematicians to be mistake-free. (The possibility that “correct” mathematics contains inconsistencies with some nonzero probability is discussed below, in Sect. 10.5.) Note also that the simple algebra leading from Eq. (10.7) to Eq. (10.12) would still hold even if q and/or \(q'\) were not currently an open question, and in particular even if one or both of them were in the current claims list C. However, in that case, the conclusion of the argument would not concern the process of abduction narrowly construed, since the conclusion would also involve the probability that the far-future community of mathematicians overturns claims that are accepted by the current community of mathematicians.
7.
Technically the update function only needs to be defined on the “finitary” subset of \({\mathbb {R}}\times {\mathbb {Z}}\times \Lambda ^\infty \), namely, those elements of \({\mathbb {R}}\times {\mathbb {Z}}\times \Lambda ^\infty \) for which the tape contents has a non-blank value in only finitely many positions.

References

S. Aaronson, Why philosophers should care about computational complexity, in Computability: Turing, Gödel, Church, and Beyond, pp. 261–327 (MIT Press, 2013)
Google Scholar
S. Arora, B. Barak, Computational Complexity: A Modern Approach (Cambridge University Press, 2009)
Google Scholar
J.D. Barrow, Theories of Everything: The Quest for Ultimate Explanation (Clarendon Press, Oxford, 1991)
MATH Google Scholar
J.D. Barrow, Godel and physics. Kurt Gödel and the Foundations of Mathematics: Horizons of Truth, p. 255 (2011)
Google Scholar
J.L. Carroll, A Bayesian decision theoretical approach to supervised learning, selective sampling, and empirical function optimization (2010)
Google Scholar
R. Fagin, Y. Moses, J.Y. Halpern, M.Y. Vardi, Reasoning About Knowledge (MIT Press, 2003)
Google Scholar
K. Gödel, On Undecidable Propositions of Formal Mathematics Systems (Institute for Advanced Study, 1934)
Google Scholar
P. Grunwald, P. Vitányi, Shannon information and Kolmogorov complexity. arXiv preprint arXiv:cs/0410002 (2004)
D. Hilbert, Die grundlagen der mathematik, in Die Grundlagen der Mathematik, pp. 1–21 (Springer, 1928)
Google Scholar
D. Hume, A Treatise of Human Nature (Courier Corporation, 2012). Book 1, Part 4, Section 1
Google Scholar
P. Hut, M. Alford, M. Tegmark, On math, matter and mind. Found. Phys. 36(6), 765–794 (2006)
Article ADS MathSciNet Google Scholar
D. Lewis, Counterfactuals (Basil Blackwell, Oxford, 1973)
MATH Google Scholar
C.S. Peirce, Collected Papers of Charles Sanders Peirce, vol. 2 (Harvard University Press, 1960)
Google Scholar
H. Poincaré, Mathematical creation. The Monist 321–335 (1910)
Google Scholar
J. Schmidhuber, A computer scientist’s view of life, the universe, and everything, in Foundations of Computer Science, pp. 201–208 (Springer, 1997)
Google Scholar
B. Settles, Active learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences (2009)
Google Scholar
M. Tegmark, Is “the theory of everything” merely the ultimate ensemble theory? Ann. Phys. 270(1), 1–51 (1998)
Article ADS MathSciNet Google Scholar
M. Tegmark, The mathematical universe. Found. Phys. 38(2), 101–150 (2008)
Article ADS MathSciNet Google Scholar
M. Tegmark, The multiverse hierarchy. arXiv preprint arXiv:0905.1283 (2009)
M. Tegmark, Our Mathematical Universe: My Quest for the Ultimate Nature of Reality (Vintage, 2014)
Google Scholar
S. Viteri, S. DeDeo, Explosive proofs of mathematical truths. arXiv preprint arXiv:2004.00055 (2020)
E.P. Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences, vol. 13, pp. 1–14 (1960)
Google Scholar
D.H. Wolpert, The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)
Article Google Scholar
D.H. Wolpert, The stochastic thermodynamics of computation. J. Phys. A Math. Theor. 52(19), 193001 (2019)
Article ADS MathSciNet Google Scholar
D.H. Wolpert, W.G. Macready, No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Santa Fe Institute, Santa Fe, NM, USA
David H. Wolpert & David Kinney
Complexity Science Hub Vienna, Vienna, Austria
David H. Wolpert
Arizona State University, Tempe, AZ, USA
David H. Wolpert

Authors

David H. Wolpert
View author publications
You can also search for this author in PubMed Google Scholar
David Kinney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David H. Wolpert .

Editor information

Editors and Affiliations

Physics Department, UC Santa Cruz, Santa Cruz, CA, USA
Anthony Aguirre
Foundational Questions Institute, Decatur, GA, USA
Zeeya Merali
Physics Department, Lancaster University, Lancaster, UK
David Sloan

A Probabilistic Turing Machines

Perhaps the most famous class of computational machines are Turing machines. One reason for their fame is that it seems one can model any computational machine that is constructable by humans as a Turing machine. A bit more formally, the Church-Turing thesis states that “a function on the natural numbers is computable by a human being following an algorithm, ignoring resource limitations, if and only if it is computable by a Turing machine.”

There are many different definitions of Turing machines (TMs) that are “computationally equivalent” to one another. For us, it will suffice to define a TM as a 7-tuple \((R,\Lambda ,b,v,r^\varnothing ,r^A,\rho )\) where:

1.
R is a finite set of computational states;
2.
\(\Lambda \) is a finite alphabet containing at least three symbols;
3.
\(b \in \Lambda \) is a special blank symbol;
4.
\(v \in {\mathbb {Z}}\) is a pointer;
5.
\(r^\varnothing \in R\) is the start state;
6.
\(r^A \in R\) is the halt state; and
7.
\(\rho : R \times {\mathbb {Z}}\times \Lambda ^\infty \rightarrow R \times {\mathbb {Z}}\times \Lambda ^\infty \) is the update function. It is required that for all triples (r, v, T), that if we write \((r', v', T') = \rho (r, v, T)\), then \(v'\) does not differ by more than 1 from v, and the vector \(T'\) is identical to the vectors T for all components with the possible exception of the component with index v^{Footnote 7};

We sometimes refer to R as the states of the “head” of the TM, and refer to the third argument of \(\rho \) as a tape, writing a value of the tape (i.e., of the semi-infinite string of elements of the alphabet) as T.

Any TM \((R,\Sigma ,b,v,r^\varnothing , r^A, \rho )\) starts with \(r = r^\varnothing \), the counter set to a specific initial value (e.g, 0), and with T consisting of a finite contiguous set of non-blank symbols, with all other symbols equal to b. The TM operates by iteratively applying \(\rho \), until the computational state falls in \(r^A\), at which time it stops, i.e., any ID with the head in the halt state is a fixed point of \(\rho \).

If running a TM on a given initial state of the tape results in the TM eventually halting, the largest blank-delimited string that contains the position of the pointer when the TM halts is called the TM’s output. The initial state of T (excluding the blanks) is sometimes called the associated input, or program. (However, the reader should be warned that the term “program” has been used by some physicists to mean specifically the shortest input to a TM that results in it computing a given output.) We also say that the TM computes an output from an input. In general, there will be inputs for which the TM never halts. The set of all those inputs to a TM that cause it to eventually halt is called its halting set.

The set of triples that are possible arguments to the update function of a given TM are sometimes called the set of instantaneous descriptions (IDs) of the TM. Note that as an alternative to the definition in (7) above, we could define the update function of any TM as a map over an associated space of IDs.

In one particularly popular variant of this definition of TMs the single tape is replaced by multiple tapes. Typically one of those tapes contains the input, one contains the TM’s output (if and) when the TM halts, and there are one or more intermediate “work tapes” that are in essence used as scratch pads. The advantage of using this more complicated variant of TMs is that it is often easier to prove theorems for such machines than for single-tape TMs. However, there is no difference in their computational power. More precisely, one can transform any single-tape TM into an equivalent multi-tape TM (i.e., one that computes the same partial function), as shown by Arora and Barak [2].

A universal Turing machine (UTM), M, is one that can be used to emulate any other TM. More precisely, in terms of the single-tape variant of TMs, a UTM M has the property that for any other TM \(M'\), there is an invertible map f from the set of possible states of the tape of \(M'\) into the set of possible states of the tape of M, such that if we:

1.
apply f to an input string \(\sigma '\) of \(M'\) to fix an input string \(\sigma \) of M;
2.
run M on \(\sigma \) until it halts;
3.
apply \(f^{-1}\) to the resultant output of M;

then we get exactly the output computed by \(M'\) if it is run directly on \(\sigma '\).

An important theorem of computer science is that there exist universal TMs (UTMs). Intuitively, this just means that there exists programming languages which are “universal”, in that we can use them to implement any desired program in any other language, after appropriate translation of that program from that other language. The physical CT thesis considers UTMs, and we implicitly restrict attention to them as well.

Suppose we have two strings \(s^1\) and \(s^2\) where \(s^1\) is a proper prefix of \(s^2\). If we run the TM on \(s^1\), it can detect when it gets to the end of its input, by noting that the following symbol on the tape is a blank. Therefore, it can behave differently after having reached the end of \(s^1\) from how it behaves when it reaches the end of the first \(\ell (s^1)\) bits in \(s^2\). As a result, it may be that both of those input strings are in its halting set, but result in different outputs. A prefix (free) TM is one in which this can never happen: there is no string in its halting set that is a proper prefix of another string in its halting set. For technical reasons, it is conventional in the physics literature to focus on prefix TMs, and we do so here.

The coin-flipping distribution of a prefix TM M is the probability distribution over the strings in M’s halting set generated by IID “tossing a coin” to generate those strings, in a Bernoulli process, and then normalizing. So any string \(\sigma \) in the halting set has probability \(2^{-\;|\;\sigma \;|\;} / \Omega \) under the coin-flipping prior, where \(\Omega \) is the normalization constant for the TM in question.

Finally, for our purposes, a Probabilistic Turing Machine (PTM) is a conventional TM as defined by conditions (1)–(7), except that the update function \(\rho \) is generalized to be a conditional distribution. The conditional distribution is not arbitrary however. In particular, we typically require that there is zero probability that applying such an update conditional distribution violates condition (7). Depending on how we use a PTM to model NDR machines, we may introduce other requirements as well.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wolpert, D.H., Kinney, D. (2021). Noisy Deductive Reasoning: How Humans Construct Math, and How Math Constructs Universes. In: Aguirre, A., Merali, Z., Sloan, D. (eds) Undecidability, Uncomputability, and Unpredictability. The Frontiers Collection. Springer, Cham. https://doi.org/10.1007/978-3-030-70354-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-70354-7_10
Published: 21 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70353-0
Online ISBN: 978-3-030-70354-7
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics

Noisy Deductive Reasoning: How Humans Construct Math, and How Math Constructs Universes

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Probabilistic Turing Machines

A Probabilistic Turing Machines

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation