research-article

Generative Datalog with Continuous Distributions

Authors:
Martin Grohe

RWTH Aachen University, Aachen, Germany

RWTH Aachen University, Aachen, Germany
View Profile

,
Benjamin Lucien Kaminski

RWTH Aachen University & University College London, Aachen, Germany

RWTH Aachen University & University College London, Aachen, Germany
View Profile

,
Joost-Pieter Katoen

RWTH Aachen University, Aachen, Germany

RWTH Aachen University, Aachen, Germany
View Profile

,
Peter Lindner

RWTH Aachen University, Aachen, Germany

RWTH Aachen University, Aachen, Germany
View Profile

PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsJune 2020Pages 347–360https://doi.org/10.1145/3375395.3387659

Published:14 June 2020Publication History

PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pages 347–360

ABSTRACT

Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and propose a more foundational approach towards defining its semantics. It is based on standard notions from probability theory known as stochastic kernels and Markov processes. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al. We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, ICDT 2020), and we show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database.

References

Charu C. Aggarwal and Philip S. Yu. A Survey of Uncertain Data Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering (TKDE), 21(5):609--623, 2009. http://dx.doi.org/10.1109/TKDE.2008.190 pathdoi:10.1109/TKDE.2008.190.Google ScholarDigital Library
Parag Agrawal and Jennifer Widom. Continuous Uncertainty in Trio. In Proceedings of the 3rd VLDB workshop on Management of Uncertain Data (MUD 2009), pages 17--32, Enschede, The Netherlands, 2009. Centre for Telematics and Information Technology (CTIT).Google Scholar
Vince Bárány, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. Declarative Probabilistic Programming with Datalog. ACM Transactions on Database Systems (TODS), 42(4):22:1--22:35, 2017. http://dx.doi.org/10.1145/3132700 pathdoi:10.1145/3132700.Google ScholarDigital Library
Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul A. Szerlip, Paul Horsfall, and Noah D. Goodman. Pyro: Deep Universal Probabilistic Programming. J. Mach. Learn. Res., 20:28:1--28:6, 2019.Google Scholar
Johannes Borgström, Andrew D. Gordon, Michael Greenberg, James Margetson, and Jurgen Van Van Gael. Measure Transformer Semantics for Bayesian Machine Learning. Logical Methods in Computer Science, 9(3), 2013. http://dx.doi.org/10.2168/LMCS-9(3:11)2013 pathdoi:10.2168/LMCS-9(3:11)2013.Google ScholarCross Ref
Zhuhua Cai, Zografoula Vagena, Luis Perez, Subramanian Arumugam, Peter J. Haas, and Christopher Jermaine. Simulation of Database-Valued Markov Chains using SimSQL. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD 2013), pages 637--648, New York, NY, USA, 2013. ACM. http://dx.doi.org/10.1145/2463676.2465283 pathdoi:10.1145/2463676.2465283.Google ScholarDigital Library
Bob Carpenter, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A Probabilistic Programming Language. Journal of Statistical Software, 76(1), 2017.Google ScholarCross Ref
D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure. Probability and its Applications. Springer, New York, NY, USA, 2nd edition, 2008. http://dx.doi.org/10.1007/978-0--387--49835--5 pathdoi:10.1007/978-0--387--49835--5.Google ScholarCross Ref
Luc De Raedt, Angelika Kimmig, and Hannu Toivonen. ProbLog: A Probabilistic Prolog and Its Application in Link Discovery. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), pages 2468--2473, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc.Google Scholar
Luc De De Raedt, Kristian Kersting, Sriraam Natarajan, and David Poole. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation, volume 10. Morgan & Claypool Publishers, 2016. http://dx.doi.org/10.2200/S00692ED1V01Y201601AIM032 pathdoi:10.2200/S00692ED1V01Y201601AIM032.Google ScholarCross Ref
Daniel Deutch, Christoph Koch, and Tova Milo. On Probabilistic Fixpoint and Markov Chain Query Languages. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2010), pages 215--226, New York, NY, USA, 2010. ACM. http://dx.doi.org/10.1145/1807085.1807114 pathdoi:10.1145/1807085.1807114.Google ScholarDigital Library
David H. Fremlin. Measure Theory. Vol. II: Broad Foundations. Torres Fremlin, 2nd edition, 2010.Google Scholar
David H. Fremlin. Measure Theory. Vol. IV: Topological Measure Spaces, Part I. Torres Fremlin, 2nd edition, 2013.Google Scholar
Norbert Fuhr. Probabilistic Datalog -- A Logic for Powerful Retrieval Methods. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1995), pages 282--290, New York, NY, USA, 1995. ACM. http://dx.doi.org/10.1145/215206.215372 pathdoi:10.1145/215206.215372.Google ScholarDigital Library
Marie Gaudard and Donald Hadwin. Sigma-Algebras on Spaces of Probability Measures. Scandinavian Journal of Statistics, 16(2):169--175, 1989.Google Scholar
Noah D. Goodman. The principles and practice of probabilistic programming. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2013), pages 399--402, New York, NY, USA, 2013. ACM. http://dx.doi.org/10.1145/2429069.2429117 pathdoi:10.1145/2429069.2429117.Google ScholarDigital Library
Noah D. Goodman, Vikash K. Mansinghka, Daniel Roy, Keith Bonawitz, and Joshua B. Tenenbaum. Church: A Language for Generative Models. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI 2008), pages 220--229, Arlington, VA, USA, 2008. AUAI Press.Google Scholar
Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. Probabilistic programming. In Proceedings of the on Future of Software Engineering (FOSE 2014), pages 167--181, New York, NY, USA, 2014. ACM. http://dx.doi.org/10.1145/2593882.2593900 pathdoi:10.1145/2593882.2593900.Google ScholarDigital Library
Georg Gottlob, Thomas Lukasiewicz, Maria Vanina Martinez, and Gerardo I. Simari. Query Answering under Probabilistic Uncertainty in DatalogGoogle Scholar
/- Ontologies. Annals of Mathematics and Artificial Intelligence, 69(1):37--72, 2013. http://dx.doi.org/10.1007/s10472-013--9342--1 pathdoi:10.1007/s10472-013--9342--1.Google ScholarDigital Library
Martin Grohe, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and Peter Lindner. Generative Datalog with Continuous Distributions. arXiv e-prints, 2020. URL: https://arxiv.org/abs/2001.06358.Google Scholar
Martin Grohe and Peter Lindner. Probabilistic Databases with an Infinite Open-World Assumption. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2019), pages 17--31, New York, NY, USA, 2019. ACM. http://dx.doi.org/10.1145/3294052.3319681 pathdoi:10.1145/3294052.3319681.Google ScholarDigital Library
Martin Grohe and Peter Lindner. Infinite Probabilistic Databases. In Carsten Lutz and Jean Christoph Jung, editors, 23rd International Conference on Database Theory (ICDT 2020), volume 155 of Leibniz International Proceedings in Informatics (LIPIcs), pages 16:1--16:20, Dagstuhl, Germany, 2020. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik. URL: https://drops.dagstuhl.de/opus/volltexte/2020/11940, http://dx.doi.org/10.4230/LIPIcs.ICDT.2020.16 pathdoi:10.4230/LIPIcs.ICDT.2020.16.Google ScholarCross Ref
Bernd Gutmann, Manfred Jaeger, and Luc De Raedt. Extending ProbLog with Continuous Distributions. In Inductive Logic Progamming (ILP 2010), volume 6489 of Lecture Notes in Computer Science, pages 76--91, Berlin, Germany and Heidelberg, Germany, 2011. Springer. http://dx.doi.org/10.1007/978--3--642--21295--6_12 pathdoi:10.1007/978--3--642--21295--6_12.Google ScholarCross Ref
Ravi Jampani, Fei Xu, Mingxi Wu, Luis Perez, Chris Jermaine, and Peter J. Haas. The Monte Carlo Database System: Stochastic Analysis Close to the Data. ACM Transactions on Database Systems (TODS), 36(3):18:1--18:41, 2011. http://dx.doi.org/10.1145/2000824.2000828 pathdoi:10.1145/2000824.2000828.Google ScholarDigital Library
Olav Kallenberg. Foundations of Modern Probability. Springer Series in Statistics. Probability and its Applications. Springer, New York, NY, USA, 2nd edition, 2002. http://dx.doi.org/10.1007/978--1--4757--4015--8 pathdoi:10.1007/978--1--4757--4015--8.Google ScholarCross Ref
Angelika Kimmig, Stephen Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. A Short Introduction to Probabilistic Soft Logic. In Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications, pages 1--4, 2012.Google Scholar
Daphne Koller and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. Adaptive Computation and Machine Learning. The MIT Press, Cambridge, MA, USA, 2009.Google Scholar
Andrei Nikolajewitsch Kolmogorov. Foundations of the Theory of Probability. Chelsea Publishing Company, New York, NY, USA, 2nd English edition, 1956.Google Scholar
C. Kuratowski and C. Ryll-Nardzewski. A General Theorem on Selectors. Bulletin of the Polish Academy of Sciences, 13:397--403, 1965.Google Scholar
Eckhard Limpert, Werner A. Stahel, and Markus Abbt. Log-normal Distributions across the Sciences: Keys and Clues. BioScience, 51(5):341--352, 2001. http://dx.doi.org/10.1641/0006--3568(2001)051[0341:LNDATS]2.0.CO;2 pathdoi:10.1641/0006--3568(2001)051[0341:LNDATS]2.0.CO;2.Google ScholarCross Ref
Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, David L. Ong, and Andrey Kolobov. shapeBlog: Probabilistic Models with Unknown Objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), pages 1352--1359, San Francisco, CA, USA, 2005. Morgan Kaufmann.Google Scholar
Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. R2: An Efficient MCMC Sampler for Probabilistic Programs. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, pages 2476--2482. AAAI Press, 2014.Google ScholarCross Ref
Avi Pfeffer. Figaro: An Object-Oriented Probabilistic Programming Language. Technical report, Charles River Analytics, 2009.Google Scholar
Matthew Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 62(1--2):107--136, 2006. http://dx.doi.org/10.1007/s10994-006--5833--1 pathdoi:10.1007/s10994-006--5833--1.Google ScholarCross Ref
Sarvjeet Singh, Chris Mayfield, Sagar Mittal, Sunil Prabhakar, Susanne Hambrusch, and Rahul Shah. Orion 2.0: Native Support for Uncertain Data. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD 2008), pages 1239--1242, New York, NY, USA, 2008. ACM. http://dx.doi.org/10.1145/1376616.1376744 pathdoi:10.1145/1376616.1376744.Google ScholarDigital Library
Parag Singla and Pedro Domingos. Markov logic in infinite domains. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI 2007), pages 368--375, Arlington, VA, USA, 2007. AUAI Press.Google Scholar
Sashi Mohan Srivastava. A Course on Borel Sets, volume 180 of Graduate Texts in Mathematics. Springer, New York, NY, USA, 1998.Google ScholarCross Ref
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool, San Rafael, CA, USA, 1st edition, 2011. http://dx.doi.org/10.2200/S00362ED1V01Y201105DTM016 pathdoi:10.2200/S00362ED1V01Y201105DTM016.Google ScholarCross Ref
David Tolpin, Jan-Willem van de Meent, and Frank Wood. Probabilistic Programming in Anglican. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015), volume 9286 of Lecture Notes in Computer Science, pages 308--311, Cham, Switzerland, 2015. Springer International Publishing. http://dx.doi.org/10.1007/978--3--319--23461--8_36 pathdoi:10.1007/978--3--319--23461--8_36.Google ScholarDigital Library
Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. An Introduction to Probabilistic Programming. arXiv e-prints, 2018. URL: https://arxiv.org/abs/1809.10756.Google Scholar
Brend Wanders, Maurice van Keulen, and Jan Flokstra. JudgeD: A Probabilistic Datalog with Dependencies. In The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence: Technical Reports WS-16-01 -- WS-16--15, Palo Alto, CA, USA, 2016. AAAI Press.Google Scholar
Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, and Stuart Russell. Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), volume 80 of Proceedings of Machine Learning Research, pages 5343--5352. PMLR, 2018.Google Scholar

Index Terms

Generative Datalog with Continuous Distributions
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations
2. Theory of computation
  1. Logic
    1. Constraint and logic programming
  2. Theory and algorithms for application domains
    1. Database theory
      1. Database query languages (principles)
      2. Incomplete, inconsistent, and uncertain databases

Recommendations

Generative Datalog with Continuous Distributions
Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a “purely declarative probabilistic programming language.” We revisit this language and ...
Read More
Generative Datalog with Stable Negation
PODS '23: Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Extending programming languages with stochastic behaviour such as probabilistic choices or random sampling has a long tradition in computer science. A recent development in this direction is a declarative probabilistic programming language, proposed by ...
Read More
10 Years of Probabilistic Querying --- What Next?
ADBIS 2013: Proceedings of the 17th East European Conference on Advances in Databases and Information Systems - Volume 8133

Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but--so far--both areas developed almost independently of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
June 2020
480 pages
ISBN:9781450371087
DOI:10.1145/3375395
General Chair:
Dan Suciu
University of Washington, USA
,
Program Chair:
Yufei Tao
Chinese University of Hong Kong, China
,
Publications Chair:
Zhewei Wei
Renmin University of China, China
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
datalog
generative datalog
measure theory
probabilistic databases
probabilistic programming
stochastic kernels
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate642of2,707submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 146
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Generative Datalog with Continuous Distributions

PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Generative Datalog with Continuous Distributions

Generative Datalog with Stable Negation

10 Years of Probabilistic Querying --- What Next?