skip to main content
10.1145/3375395.3387659acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Generative Datalog with Continuous Distributions

Published:14 June 2020Publication History

ABSTRACT

Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and propose a more foundational approach towards defining its semantics. It is based on standard notions from probability theory known as stochastic kernels and Markov processes. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al. We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, ICDT 2020), and we show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database.

References

  1. Charu C. Aggarwal and Philip S. Yu. A Survey of Uncertain Data Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering (TKDE), 21(5):609--623, 2009. http://dx.doi.org/10.1109/TKDE.2008.190 pathdoi:10.1109/TKDE.2008.190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Parag Agrawal and Jennifer Widom. Continuous Uncertainty in Trio. In Proceedings of the 3rd VLDB workshop on Management of Uncertain Data (MUD 2009), pages 17--32, Enschede, The Netherlands, 2009. Centre for Telematics and Information Technology (CTIT).Google ScholarGoogle Scholar
  3. Vince Bárány, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. Declarative Probabilistic Programming with Datalog. ACM Transactions on Database Systems (TODS), 42(4):22:1--22:35, 2017. http://dx.doi.org/10.1145/3132700 pathdoi:10.1145/3132700.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul A. Szerlip, Paul Horsfall, and Noah D. Goodman. Pyro: Deep Universal Probabilistic Programming. J. Mach. Learn. Res., 20:28:1--28:6, 2019.Google ScholarGoogle Scholar
  5. Johannes Borgström, Andrew D. Gordon, Michael Greenberg, James Margetson, and Jurgen Van Van Gael. Measure Transformer Semantics for Bayesian Machine Learning. Logical Methods in Computer Science, 9(3), 2013. http://dx.doi.org/10.2168/LMCS-9(3:11)2013 pathdoi:10.2168/LMCS-9(3:11)2013.Google ScholarGoogle ScholarCross RefCross Ref
  6. Zhuhua Cai, Zografoula Vagena, Luis Perez, Subramanian Arumugam, Peter J. Haas, and Christopher Jermaine. Simulation of Database-Valued Markov Chains using SimSQL. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD 2013), pages 637--648, New York, NY, USA, 2013. ACM. http://dx.doi.org/10.1145/2463676.2465283 pathdoi:10.1145/2463676.2465283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bob Carpenter, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A Probabilistic Programming Language. Journal of Statistical Software, 76(1), 2017.Google ScholarGoogle ScholarCross RefCross Ref
  8. D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure. Probability and its Applications. Springer, New York, NY, USA, 2nd edition, 2008. http://dx.doi.org/10.1007/978-0--387--49835--5 pathdoi:10.1007/978-0--387--49835--5.Google ScholarGoogle ScholarCross RefCross Ref
  9. Luc De Raedt, Angelika Kimmig, and Hannu Toivonen. ProbLog: A Probabilistic Prolog and Its Application in Link Discovery. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), pages 2468--2473, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc.Google ScholarGoogle Scholar
  10. Luc De De Raedt, Kristian Kersting, Sriraam Natarajan, and David Poole. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation, volume 10. Morgan & Claypool Publishers, 2016. http://dx.doi.org/10.2200/S00692ED1V01Y201601AIM032 pathdoi:10.2200/S00692ED1V01Y201601AIM032.Google ScholarGoogle ScholarCross RefCross Ref
  11. Daniel Deutch, Christoph Koch, and Tova Milo. On Probabilistic Fixpoint and Markov Chain Query Languages. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2010), pages 215--226, New York, NY, USA, 2010. ACM. http://dx.doi.org/10.1145/1807085.1807114 pathdoi:10.1145/1807085.1807114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David H. Fremlin. Measure Theory. Vol. II: Broad Foundations. Torres Fremlin, 2nd edition, 2010.Google ScholarGoogle Scholar
  13. David H. Fremlin. Measure Theory. Vol. IV: Topological Measure Spaces, Part I. Torres Fremlin, 2nd edition, 2013.Google ScholarGoogle Scholar
  14. Norbert Fuhr. Probabilistic Datalog -- A Logic for Powerful Retrieval Methods. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1995), pages 282--290, New York, NY, USA, 1995. ACM. http://dx.doi.org/10.1145/215206.215372 pathdoi:10.1145/215206.215372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Marie Gaudard and Donald Hadwin. Sigma-Algebras on Spaces of Probability Measures. Scandinavian Journal of Statistics, 16(2):169--175, 1989.Google ScholarGoogle Scholar
  16. Noah D. Goodman. The principles and practice of probabilistic programming. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2013), pages 399--402, New York, NY, USA, 2013. ACM. http://dx.doi.org/10.1145/2429069.2429117 pathdoi:10.1145/2429069.2429117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Noah D. Goodman, Vikash K. Mansinghka, Daniel Roy, Keith Bonawitz, and Joshua B. Tenenbaum. Church: A Language for Generative Models. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI 2008), pages 220--229, Arlington, VA, USA, 2008. AUAI Press.Google ScholarGoogle Scholar
  18. Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. Probabilistic programming. In Proceedings of the on Future of Software Engineering (FOSE 2014), pages 167--181, New York, NY, USA, 2014. ACM. http://dx.doi.org/10.1145/2593882.2593900 pathdoi:10.1145/2593882.2593900.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Georg Gottlob, Thomas Lukasiewicz, Maria Vanina Martinez, and Gerardo I. Simari. Query Answering under Probabilistic Uncertainty in DatalogGoogle ScholarGoogle Scholar
  20. /- Ontologies. Annals of Mathematics and Artificial Intelligence, 69(1):37--72, 2013. http://dx.doi.org/10.1007/s10472-013--9342--1 pathdoi:10.1007/s10472-013--9342--1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Martin Grohe, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and Peter Lindner. Generative Datalog with Continuous Distributions. arXiv e-prints, 2020. URL: https://arxiv.org/abs/2001.06358.Google ScholarGoogle Scholar
  22. Martin Grohe and Peter Lindner. Probabilistic Databases with an Infinite Open-World Assumption. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2019), pages 17--31, New York, NY, USA, 2019. ACM. http://dx.doi.org/10.1145/3294052.3319681 pathdoi:10.1145/3294052.3319681.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Martin Grohe and Peter Lindner. Infinite Probabilistic Databases. In Carsten Lutz and Jean Christoph Jung, editors, 23rd International Conference on Database Theory (ICDT 2020), volume 155 of Leibniz International Proceedings in Informatics (LIPIcs), pages 16:1--16:20, Dagstuhl, Germany, 2020. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik. URL: https://drops.dagstuhl.de/opus/volltexte/2020/11940, http://dx.doi.org/10.4230/LIPIcs.ICDT.2020.16 pathdoi:10.4230/LIPIcs.ICDT.2020.16.Google ScholarGoogle ScholarCross RefCross Ref
  24. Bernd Gutmann, Manfred Jaeger, and Luc De Raedt. Extending ProbLog with Continuous Distributions. In Inductive Logic Progamming (ILP 2010), volume 6489 of Lecture Notes in Computer Science, pages 76--91, Berlin, Germany and Heidelberg, Germany, 2011. Springer. http://dx.doi.org/10.1007/978--3--642--21295--6_12 pathdoi:10.1007/978--3--642--21295--6_12.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ravi Jampani, Fei Xu, Mingxi Wu, Luis Perez, Chris Jermaine, and Peter J. Haas. The Monte Carlo Database System: Stochastic Analysis Close to the Data. ACM Transactions on Database Systems (TODS), 36(3):18:1--18:41, 2011. http://dx.doi.org/10.1145/2000824.2000828 pathdoi:10.1145/2000824.2000828.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Olav Kallenberg. Foundations of Modern Probability. Springer Series in Statistics. Probability and its Applications. Springer, New York, NY, USA, 2nd edition, 2002. http://dx.doi.org/10.1007/978--1--4757--4015--8 pathdoi:10.1007/978--1--4757--4015--8.Google ScholarGoogle ScholarCross RefCross Ref
  27. Angelika Kimmig, Stephen Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. A Short Introduction to Probabilistic Soft Logic. In Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications, pages 1--4, 2012.Google ScholarGoogle Scholar
  28. Daphne Koller and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. Adaptive Computation and Machine Learning. The MIT Press, Cambridge, MA, USA, 2009.Google ScholarGoogle Scholar
  29. Andrei Nikolajewitsch Kolmogorov. Foundations of the Theory of Probability. Chelsea Publishing Company, New York, NY, USA, 2nd English edition, 1956.Google ScholarGoogle Scholar
  30. C. Kuratowski and C. Ryll-Nardzewski. A General Theorem on Selectors. Bulletin of the Polish Academy of Sciences, 13:397--403, 1965.Google ScholarGoogle Scholar
  31. Eckhard Limpert, Werner A. Stahel, and Markus Abbt. Log-normal Distributions across the Sciences: Keys and Clues. BioScience, 51(5):341--352, 2001. http://dx.doi.org/10.1641/0006--3568(2001)051[0341:LNDATS]2.0.CO;2 pathdoi:10.1641/0006--3568(2001)051[0341:LNDATS]2.0.CO;2.Google ScholarGoogle ScholarCross RefCross Ref
  32. Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, David L. Ong, and Andrey Kolobov. shapeBlog: Probabilistic Models with Unknown Objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), pages 1352--1359, San Francisco, CA, USA, 2005. Morgan Kaufmann.Google ScholarGoogle Scholar
  33. Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. R2: An Efficient MCMC Sampler for Probabilistic Programs. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, pages 2476--2482. AAAI Press, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  34. Avi Pfeffer. Figaro: An Object-Oriented Probabilistic Programming Language. Technical report, Charles River Analytics, 2009.Google ScholarGoogle Scholar
  35. Matthew Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 62(1--2):107--136, 2006. http://dx.doi.org/10.1007/s10994-006--5833--1 pathdoi:10.1007/s10994-006--5833--1.Google ScholarGoogle ScholarCross RefCross Ref
  36. Sarvjeet Singh, Chris Mayfield, Sagar Mittal, Sunil Prabhakar, Susanne Hambrusch, and Rahul Shah. Orion 2.0: Native Support for Uncertain Data. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD 2008), pages 1239--1242, New York, NY, USA, 2008. ACM. http://dx.doi.org/10.1145/1376616.1376744 pathdoi:10.1145/1376616.1376744.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Parag Singla and Pedro Domingos. Markov logic in infinite domains. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI 2007), pages 368--375, Arlington, VA, USA, 2007. AUAI Press.Google ScholarGoogle Scholar
  38. Sashi Mohan Srivastava. A Course on Borel Sets, volume 180 of Graduate Texts in Mathematics. Springer, New York, NY, USA, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  39. Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool, San Rafael, CA, USA, 1st edition, 2011. http://dx.doi.org/10.2200/S00362ED1V01Y201105DTM016 pathdoi:10.2200/S00362ED1V01Y201105DTM016.Google ScholarGoogle ScholarCross RefCross Ref
  40. David Tolpin, Jan-Willem van de Meent, and Frank Wood. Probabilistic Programming in Anglican. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015), volume 9286 of Lecture Notes in Computer Science, pages 308--311, Cham, Switzerland, 2015. Springer International Publishing. http://dx.doi.org/10.1007/978--3--319--23461--8_36 pathdoi:10.1007/978--3--319--23461--8_36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. An Introduction to Probabilistic Programming. arXiv e-prints, 2018. URL: https://arxiv.org/abs/1809.10756.Google ScholarGoogle Scholar
  42. Brend Wanders, Maurice van Keulen, and Jan Flokstra. JudgeD: A Probabilistic Datalog with Dependencies. In The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence: Technical Reports WS-16-01 -- WS-16--15, Palo Alto, CA, USA, 2016. AAAI Press.Google ScholarGoogle Scholar
  43. Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, and Stuart Russell. Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), volume 80 of Proceedings of Machine Learning Research, pages 5343--5352. PMLR, 2018.Google ScholarGoogle Scholar

Index Terms

  1. Generative Datalog with Continuous Distributions

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
            June 2020
            480 pages
            ISBN:9781450371087
            DOI:10.1145/3375395
            • General Chair:
            • Dan Suciu,
            • Program Chair:
            • Yufei Tao,
            • Publications Chair:
            • Zhewei Wei

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 14 June 2020

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate642of2,707submissions,24%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader