ABSTRACT
Bayesian inference, of posterior knowledge from prior knowledge and observed evidence, is typically defined by Bayes's rule, which says the posterior multiplied by the probability of an observation equals a joint probability. But the observation of a continuous quantity usually has probability zero, in which case Bayes's rule says only that the unknown times zero is zero. To infer a posterior distribution from a zero-probability observation, the statistical notion of disintegration tells us to specify the observation as an expression rather than a predicate, but does not tell us how to compute the posterior. We present the first method of computing a disintegration from a probabilistic program and an expression of a quantity to be observed, even when the observation has probability zero. Because the method produces an exact posterior term and preserves a semantics in which monadic terms denote measures, it composes with other inference methods in a modular way-without sacrificing accuracy or performance.
- Nathanael L. Ackerman, Cameron E. Freer, and Daniel M. Roy. 2011. Noncomputable conditional distributions. In LICS 2011: Proceedings of the 26th Symposium on Logic in Computer Science, pages 107–116, Washington, DC. IEEE Computer Society Press. Google ScholarDigital Library
- Nathanael L. Ackerman, Cameron E. Freer, and Daniel M. Roy. 2016. On computability and disintegration. Mathematical Structures in Computer Science, pages 1–28. Hadi Mohasel Afshar, Scott Sanner, and Christfried Webers. 2016.Google Scholar
- Closed-form Gibbs sampling for graphical models with algebraic constraints. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press. Google ScholarDigital Library
- Nimar S. Arora, Stuart Russell, and Erik Sudderth. 2013. NETVISA: Network processing vertically integrated seismic analysis. Bulletin of the Seismological Society of America, 103(2A): 709–729. Philippe Audebaud and Christine Paulin-Mohring. 2009. Proofs of randomized algorithms in Coq. Science of Computer Programming, 74(8):568–589. Robert J. Aumann. 1961. Borel structures for function spaces. Illinois Journal of Mathematics, 5(4):614–630. Joseph Bertrand. 1889. Calcul des Probabilités. Gauthier-Villars et fils, Paris. Sooraj Bhat, Ashish Agarwal, Richard Vuduc, and Alexander Gray. 2012.Google Scholar
- A type theory for probability density functions. In POPL’12: Proceedings of the 39th Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pages 545–556, New York. ACM Press. Google ScholarDigital Library
- Sooraj Bhat, Johannes Borgström, Andrew D. Gordon, and Claudio V. Russo. 2013. Deriving probability density functions from probabilistic functional programs. In Proceedings of TACAS 2013: 19th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, number 7795 in Lecture Notes in Computer Science, pages 508–522, Berlin. Springer. Google ScholarDigital Library
- Maximilian Bolingbroke and Simon Peyton Jones. 2010. Supercompilation by evaluation. In Haskell’10: Proceedings of the 2010 ACM SIGPLAN Haskell Symposium, pages 135–146, New York. ACM Press. Google ScholarDigital Library
- Anders Bondorf. 1992. Improving binding times without explicit CPS-conversion. In LFP’92: Proceedings of the 1992 ACM Conference on LISP and Functional Programming, pages 1–10, New York. ACM Press. Google ScholarDigital Library
- Émile Borel. 1909. Éléments de la Théorie des Probabilités. Librairie scientifique A. Hermann et fils, Paris. Johannes Borgström, Andrew D. Gordon, Michael Greenberg, James Margetson, and Jurgen Van Gael. 2013. Measure transformer semantics for Bayesian machine learning. Logical Methods in Computer Science, 9(3:11):1–39. Jacques Carette and Chung-chieh Shan. 2016. Simplifying probabilistic programs using computer algebra. In Practical Aspects of Declarative Languages: 18th International Symposium, PADL 2016, Lecture Notes in Computer Science, pages 135– 152, Berlin. Springer.Google Scholar
- Joseph T. Chang and David Pollard. 1997. Conditioning as disintegration. Statistica Neerlandica, 51(3):287–317. Olivier Danvy and Andrzej Filinski. 1990. Abstracting control. In LFP’90: Proceedings of the 1990 ACM Conference on Lisp and Functional Programming, pages 151–160, New York. ACM Press. Google ScholarDigital Library
- Olivier Danvy, Karoline Malmkjær, and Jens Palsberg. 1996. Etaexpansion does The Trick. ACM Transactions on Programming Languages and Systems, 18(6):730–751. Bruno de Finetti. 1974. Theory of Probability: A Critical Introductory Treatment, volume 1. Wiley, New York. Translated from Teoria Delle Probabilità, 1970. Google ScholarDigital Library
- Jean Dieudonné. 1947–1948.Google Scholar
- Sur le théorème de Lebesgue-Nikodym (III). Annales de l’université de Grenoble, 23:25–53. Edsger W. Dijkstra. 1975. Guarded commands, nondeterminacy and formal derivation of programs. Communications of the ACM, 18(8):453–457. Joshua Dunfield and Neelakantan R. Krishnaswami. 2013. Complete and easy bidirectional typechecking for higher-rank polymorphism. In ICFP’13: Proceedings of the 2013 ACM SIGPLAN International Conference on Functional Programming, pages 429–442, New York. ACM Press. Google ScholarDigital Library
- Peter Dybjer and Andrzej Filinski. 2002. Normalization and partial evaluation. In APPSEM 2000: International Summer School on Applied Semantics, Advanced Lectures, number 2395 in Lecture Notes in Computer Science, pages 137–192, Berlin. Springer. Google ScholarDigital Library
- Sebastian Fischer, Oleg Kiselyov, and Chung-chieh Shan. 2011.Google Scholar
- Purely functional lazy nondeterministic programming. Journal of Functional Programming, 21(4–5):413–465. Sebastian Fischer, Josep Silva, Salvador Tamarit, and Germán Vidal. 2008. Preserving sharing in the partial evaluation of lazy functional programs. In Revised Selected Papers from LOPSTR 2007: 17th International Symposium on Logic-Based Program Synthesis and Transformation, number 4915 in Lecture Notes in Computer Science, pages 74–89, Berlin. Springer. Google ScholarDigital Library
- Nate Foster, Kazutaka Matsuda, and Janis Voigtländer. 2012.Google Scholar
- Three complementary approaches to bidirectional programming. In Generic and Indexed Programming, International Spring School, SSGIP 2010, Revised Lectures, number 7470 in Lecture Notes in Computer Science, pages 1–46, Berlin. Springer.Google Scholar
- Timon Gehr, Sasa Misailovic, and Martin T. Vechev. 2016. PSI: Exact symbolic inference for probabilistic programs. In Proceedings of the 28th International Conference on Computer Aided Verification, Part I, number 9779 in Lecture Notes in Computer Science, pages 62–83, Berlin. Springer.Google Scholar
- Michèle Giry. 1982. A categorical approach to probability theory. In Categorical Aspects of Topology and Analysis: Proceedings of an International Conference Held at Carleton University, Ottawa, August 11–15, 1981, number 915 in Lecture Notes in Mathematics, pages 68–85, Berlin. Springer.Google Scholar
- Noah D. Goodman, Vikash K. Mansinghka, Daniel Roy, Keith Bonawitz, and Joshua B. Tenenbaum. 2008. Church: A language for generative models. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, pages 220–229, Corvallis, Oregon. AUAI Press. Google ScholarDigital Library
- Noah D. Goodman and Andreas Stuhlmüller. 2014. The design and implementation of probabilistic programming languages. http://dippl.org. Accessed: 2016-11-04. Carl A. Gunter, Didier Rémy, and Jon G. Riecke. 1998. Return types for functional continuations.Google Scholar
- Jesper Jørgensen. 1992. Generating a compiler for a lazy language by partial evaluation. In POPL’92: Proceedings of the 19th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 258–268, New York. ACM Press. Google ScholarDigital Library
- Oleg Kiselyov and Chung-chieh Shan. 2009. Embedded probabilistic programming. In Proceedings of the Working Conference on Domain-Specific Languages, number 5658 in Lecture Notes in Computer Science, pages 360–384, Berlin. Springer. Google ScholarDigital Library
- Oleg Kiselyov, Chung-chieh Shan, and Amr Sabry. 2006. Delimited dynamic binding. In ICFP’06: Proceedings of the 11th ACM SIGPLAN International Conference on Functional Programming, pages 26–37, New York. ACM Press. Google ScholarDigital Library
- Andrey Nikolaevich Kolmogorov. 1933. Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer, Berlin. English translation Foundations of the Theory of Probability, Chelsea, New York, 1950.Google Scholar
- Dexter Kozen. 1981. Semantics of probabilistic programs. Journal of Computer and System Sciences, 22(3):328–350. John Launchbury. 1993.Google ScholarCross Ref
- A natural semantics for lazy evaluation. In POPL’93: Proceedings of the 20th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pages 144–154, New York. ACM Press. Google ScholarDigital Library
- Julia L. Lawall and Olivier Danvy. 1994.Google Scholar
- Continuation-based partial evaluation. In LFP’94: Proceedings of the 1994 ACM Conference on Lisp and Functional Programming, pages 227– 238, New York. ACM Press. Google ScholarDigital Library
- David J. C. MacKay. 1998. Introduction to Monte Carlo methods. In Michael I. Jordan, editor, Learning and Inference in Graphical Models. Kluwer, Dordrecht. Paperback: Learning in Graphical Models, MIT Press. Google ScholarDigital Library
- Geoffrey Mainland. 2007. Why it’s nice to be quoted: Quasiquoting for Haskell. In Proceedings of the ACM SIGPLAN Workshop on Haskell, Haskell ’07, pages 73–82, New York, NY, USA. ACM. Google ScholarDigital Library
- Neil Mitchell. 2010. Rethinking supercompilation. In ICFP’10: Proceedings of the 2010 ACM SIGPLAN International Conference on Functional Programming, pages 309–320, New York. ACM Press. Google ScholarDigital Library
- Wazim Mohammed Ismail and Chung-chieh Shan. 2016. Deriving a probability density calculator (functional pearl). In ICFP’16: Proceedings of the 2016 ACM SIGPLAN International Conference on Functional Programming, pages 47–59, New York. ACM Press. Google ScholarDigital Library
- Praveen Narayanan, Jacques Carette, Wren Romano, Chung-chieh Shan, and Robert Zinkov. 2016. Probabilistic inference by program transformation in Hakaru (system description). In Functional and Logic Programming: 13th International Symposium, FLOPS 2016, number 9613 in Lecture Notes in Computer Science, pages 62–79, Berlin. Springer.Google ScholarCross Ref
- Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. 2014. R2: An efficient MCMC sampler for probabilistic programs. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, pages 2476–2482. AAAI Press. Google ScholarDigital Library
- Sungwoo Park, Frank Pfenning, and Sebastian Thrun. 2008.Google Scholar
- A probabilistic language based on sampling functions. ACM Transactions on Programming Languages and Systems, 31(1): 4:1–4:46. David Pollard. 2001. Google ScholarDigital Library
- A User’s Guide to Measure Theoretic Probability. Cambridge University Press, Cambridge. Norman Ramsey and Avi Pfeffer. 2002. Stochastic lambda calculus and monads of probability distributions. In POPL’02: Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 154–165, New York. ACM Press. Google ScholarDigital Library
- H. L. Royden. 1988. Real Analysis. Macmillan, third edition. Vijay A. Saraswat, Martin C. Rinard, and Prakash Panangaden. 1991. Semantic foundations of concurrent constraint programming. In POPL’91: Proceedings of the 18th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pages 333–352, New York. ACM Press. Google ScholarDigital Library
- Saurabh Srivastava, Sumit Gulwani, Swarat Chaudhuri, and Jeffrey S. Foster. 2011. Path-based inductive synthesis for program inversion. In PLDI’11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 492–503, New York. ACM Press. Google ScholarDigital Library
- Sam Staton, Hongseok Yang, Chris Heunen, Ohad Kammar, and Frank Wood. 2016. Semantics for probabilistic programming: Higher-order functions, continuous distributions, and soft constraints. In LICS 2016: Proceedings of the 31st Symposium on Logic in Computer Science, Washington, DC. IEEE Computer Society Press. Google ScholarDigital Library
- Joseph E. Stoy. 1977. Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory. MIT Press. Google ScholarDigital Library
- Luke Tierney. 1998. A note on Metropolis-Hastings kernels for general state spaces. The Annals of Applied Probability, 8(1): 1–9. Tue Tjur. 1975. A constructive definition of conditional distributions. Preprint 13, Institute of Mathematical Statistics, University of Copenhagen. Neil Toronto, Jay McCarthy, and David Van Horn. 2015. Running probabilistic programs backwards. In ESOP 2015: Proceedings of the 24th European Symposium on Programming, number 9032 in Lecture Notes in Computer Science, pages 53–79, Berlin. Springer.Google ScholarCross Ref
- David Wingate, Andreas Stuhlmüller, and Noah D. Goodman. 2011. Lightweight implementations of probabilistic programming languages via transformational compilation. In Proceedings of AISTATS 2011: 14th International Conference on Artificial Intelligence and Statistics, number 15 in JMLR Workshop and Conference Proceedings, pages 770–778, Cambridge. MIT Press.Google Scholar
Index Terms
- Exact Bayesian inference by symbolic disintegration
Recommendations
Exact Bayesian inference by symbolic disintegration
POPL '17Bayesian inference, of posterior knowledge from prior knowledge and observed evidence, is typically defined by Bayes's rule, which says the posterior multiplied by the probability of an observation equals a joint probability. But the observation of a ...
Exact Bayesian Inference for Loopy Probabilistic Programs using Generating Functions
We present an exact Bayesian inference method for inferring posterior distributions encoded by probabilistic programs featuring possibly unbounded loops. Our method is built on a denotational semantics represented by probability generating functions, ...
Exact Bayesian inference for the Bingham distribution
This paper is concerned with making Bayesian inference from data that are assumed to be drawn from a Bingham distribution. A barrier to the Bayesian approach is the parameter-dependent normalising constant of the Bingham distribution, which, even when ...
Comments