research-article

Generative Datalog with Continuous Distributions

Authors:
Martin Grohe

RWTH Aachen University, Aachen, Germany

RWTH Aachen University, Aachen, Germany

0000-0002-0292-9142
View Profile

,
Benjamin Lucien Kaminski

Saarland University, Saarland Informatics Campus, Germany and University College London, London, United Kingdom

Saarland University, Saarland Informatics Campus, Germany and University College London, London, United Kingdom

0000-0001-5185-2324
View Profile

,
Joost-pieter Katoen

RWTH Aachen University, Aachen, Germany

RWTH Aachen University, Aachen, Germany

0000-0002-6143-1926
View Profile

,
Peter Lindner

RWTH Aachen University, Aachen, Germany

RWTH Aachen University, Aachen, Germany

0000-0003-2041-7201
View Profile

Authors Info & Claims

Journal of the ACM Volume 69 Issue 6Article No.: 46pp 1–52https://doi.org/10.1145/3559102

Published:17 November 2022Publication History

Journal of the ACM

Abstract

Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a “purely declarative probabilistic programming language.” We revisit this language and propose a more principled approach towards defining its semantics based on stochastic kernels and Markov processes—standard notions from probability theory. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al.

We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, LMCS 2022) and show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database.

REFERENCES

[1] Abiteboul Serge, Hull Richard, and Vianu Richard. 1995. Foundations of Databases (1st ed.). Addison-Wesley Publishing Company, Inc., Reading, MA.Google Scholar
[2] Aggarwal Charu C. and Yu Philip S.. 2009. A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21, 5 (2009), 609–623. DOI:Google ScholarDigital Library
[3] Agrawal Parag and Widom Jennifer. 2009. Continuous uncertainty in trio. In Proceedings of the 3rd VLDB Workshop on Management of Uncertain Data (MUD’09). Centre for Telematics and Information Technology (CTIT), 17–32.Google Scholar
[4] Alviano Mario and Zamayla Arnel. 2021. A speech about generative datalog and non-measurable sets. In Proceedings of the International Conference on Logic Programming Workshops co-located with the 37th International Conference on Logic Programming (ICLP’21). 8. Retrieved from http://ceur-ws.org/Vol-2970/aspocpinvited2.pdf.Google Scholar
[5] Bárány Vince, Cate Balder ten, Kimelfeld Benny, Olteanu Dan, and Vagena Zografoula. 2017. Declarative probabilistic programming with datalog. ACM Trans. Datab. Syst. 42, 4 (2017). DOI:Google ScholarDigital Library
[6] Bingham Eli, Chen Jonathan P., Jankowiak Martin, Obermeyer Fritz, Pradhan Neeraj, Karaletsos Theofanis, Singh Rohit, Szerlip Paul A., Horsfall Paul, and Goodman Noah D.. 2019. Pyro: Deep universal probabilistic programming. J. Mach. Learn. Res. 20 (2019), 28:1–28:6.Google Scholar
[7] Booth T. L. and Thompson R. A.. 1973. Applying probability measures to abstract languages. IEEE Trans. Comput. C-22, 5 (1973), 442–450. DOI:Google ScholarDigital Library
[8] Borgström Johannes, Gordon Andrew D., Greenberg Michael, Margetson James, and Gael Jurgen Van Van. 2013. Measure transformer semantics for bayesian machine learning. Logic. Meth. Comput. Sci. 9, 3 (2013). DOI:Google ScholarCross Ref
[9] Bournez Olivier and Garnier Florent. 2005. Proving positive almost-sure termination. In Proceedings of the 16th International Conference on Term Rewriting and Applications (RTA’05). Springer-Verlag, Berlin, Germany, 323–337. DOI:Google ScholarDigital Library
[10] Cabibbo Luca. 1998. The expressive power of stratified logic programs with value invention. Inf. Computat. 147, 1 (1998), 22–56. DOI:Google ScholarDigital Library
[11] Cai Zhuhua, Vagena Zografoula, Perez Luis, Arumugam Subramanian, Haas Peter J., and Jermaine Christopher. 2013. Simulation of database-valued Markov chains using SimSQL. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 637–648. DOI:Google ScholarDigital Library
[12] Calì Andrea, Gottlob Georg, and Kifer Michael. 2013. Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. 48 (2013), 115–174. DOI:Google ScholarCross Ref
[13] Carpenter Bob, Gelman Andrew, Hoffman Matthew, Lee Daniel, Goodrich Ben, Betancourt Michael, Brubaker Marcus, Guo Jiqiang, Li Peter, and Riddell Allen. 2017. Stan: A probabilistic programming language. J. Statist. Softw. 76, 1 (2017).Google ScholarCross Ref
[14] Chatterjee Krishnendu, Fu Hongfei, and Novotný Petr. 2020. Termination Analysis of Probabilistic Programs with Martingales. Cambridge University Press, Cambridge, UK, 221–258. DOI:Google ScholarCross Ref
[15] Cheng Reynold, Kalashnikov Dmitri V., and Prabhakar Sunil. 2003. Evaluating probabilistic queries over imprecise data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’03). ACM, New York, NY, 551–562. DOI:Google ScholarDigital Library
[16] Cheng Reynold and Prabhakar Sunil. 2003. Managing uncertainty in sensor databases. ACM SIGMOD Rec. 32, 4 (Dec. 2003), 41–46. DOI:Google ScholarDigital Library
[17] Grau B. Cuenca, Horrocks I., Krötzsch M., Kupke C., Magka D., Motik B., and Wang Z.. 2013. Acyclicity notions for existential rules and their application to query answering in ontologies. J. Artif. Intell. Res. 47 (2013), 741–808. DOI:Google ScholarCross Ref
[18] Daley D. J. and Vere-Jones D.. 2008. An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure (2nd ed.). Springer, New York, NY. DOI:Google ScholarCross Ref
[19] Raedt Luc De, Kimmig Angelika, and Toivonen Hannu. 2007. ProbLog: A probabilistic prolog and its application in link discovery. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07). Morgan Kaufmann Publishers Inc., San Francisco, CA, 2468–2473.Google Scholar
[20] Raedt Luc De De, Kersting Kristian, Natarajan Sriraam, and Poole David. 2016. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation, Vol. 10. Morgan & Claypool Publishers, 1–189. DOI:Google ScholarCross Ref
[21] Deshpande Amol, Guestrin Carlos, Madden Samuel R., Hellerstein Joseph M., and Hong Wei. 2004. Model-driven data acquisition in sensor networks. In Proceedings of the VLDB Conference (VLDB’04). Morgan Kaufmann, 588–599. DOI:Google ScholarCross Ref
[22] Deutch Daniel, Koch Christoph, and Milo Tova. 2010. On probabilistic fixpoint and Markov chain query languages. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’10). ACM, New York, NY, 215–226. DOI:Google ScholarDigital Library
[23] Dwork Cynthia, McSherry Frank, Nissim Kobbi, and Smith Adam. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography (TCC 2006) (Lecture Notes in Computer Science), vol. 3876. Springer, Berlin, Germany, 265–284. .Google ScholarDigital Library
[24] Fierens Daan, Broeck Guy Van den, Renkens Joris, Shterionov Dimitar, Gutmann Bernd, Thon Ingo, Janssens Gerda, and Raedt Luc De. 2015. Inference and learning in probabilistic logic programs using weighted boolean formulas. Theor. Pract. Logic Program. 15, 3 (2015), 358–401. DOI:Google ScholarCross Ref
[25] Fremlin David H.. 2013. Measure Theory. Vol. IV: Topological Measure Spaces, Part I (2nd ed.). Torres Fremlin.Google Scholar
[26] Fuhr Norbert. 1995. Probabilistic datalog—A logic for powerful retrieval methods. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95). ACM, New York, NY, 282–290. DOI:Google ScholarDigital Library
[27] Fuhr Norbert. 2000. Probabilistic datalog: Implementing logical information retrieval for advanced applications. J. Amer. Societ. Inf. Sci. 51, 2 (2000), 95–110. DOI:Google ScholarCross Ref
[28] Gaudard Marie and Hadwin Donald. 1989. Sigma-algebras on spaces of probability measures. Scandin. J. Statist. 16, 2 (1989), 169–175.Google Scholar
[29] Gogacz Tomasz, Marcinkowski Jerzy, and Pieris Andreas. 2020. All-instances restricted chase termination. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’20). Association for Computing Machinery, New York, NY, 245–258. DOI:Google ScholarDigital Library
[30] Goodman Noah D.. 2013. The principles and practice of probabilistic programming. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’13). ACM, New York, NY, 399–402. DOI:Google ScholarDigital Library
[31] Goodman Noah D., Mansinghka Vikash K., Roy Daniel, Bonawitz Keith, and Tenenbaum Joshua B.. 2008. Church: A language for generative models. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI’08). AUAI Press, Arlington, VA, 220–229.Google Scholar
[32] Gordon Andrew D., Henzinger Thomas A., Nori Aditya V., and Rajamani Sriram K.. 2014. Probabilistic programming. In Proceedings of the Conference on Future of Software Engineering (FOSE’14). ACM, New York, NY, 167–181. DOI:Google ScholarDigital Library
[33] Gottlob Georg, Lukasiewicz Thomas, Martinez Maria Vanina, and Simari Gerardo I.. 2013. Query answering under probabilistic uncertainty in datalog +/- ontologies. Ann. Math. Artif. Intell. 69, 1 (2013), 37–72. DOI:Google ScholarDigital Library
[34] Gottlob Georg, Lukasiewicz Thomas, and Pieris Andreas. 2014. Datalog+/–: Questions and answers. In Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (KR’14). AAAI Press. Retrieved from https://www.aaai.org/ocs/index.php/KR/KR14/paper/view/7965.Google Scholar
[35] Grohe Martin, Kaminski Benjamin Lucien, Katoen Joost-Pieter, and Lindner Peter. 2020. Generative datalog with continuous distributions. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’20). Association for Computing Machinery, New York, NY, 347–360. DOI:Google ScholarDigital Library
[36] Grohe Martin and Lindner Peter. 2019. Probabilistic databases with an infinite open-world assumption. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’19). ACM, New York, NY, 17–31. DOI:Google ScholarDigital Library
[37] Grohe Martin and Lindner Peter. 2022. Infinite probabilistic databases. Logic. Meth. Comput. Sci. 18, 1 (2022). DOI:Google ScholarCross Ref
[38] Gutmann Bernd, Jaeger Manfred, and Raedt Luc De. 2011. Extending problog with continuous distributions. In Inductive Logic Progamming (ILP 2010) (Lecture Notes in Computer Science), Vol. 6489. Springer, Berlin, Germany, 76–91. DOI:Google ScholarCross Ref
[39] Icard Thomas F.. 2020. Calibrating generative models: The probabilistic Chomsky-Schützenberger Hierarchy. J. Math. Psychol. 95 (2020), 102308. DOI:Google ScholarCross Ref
[40] Jacobs Jules. 2021. Paradoxes of probabilistic programming. In Proceedings of the 48th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL’21), Vol. 5. ACM, 26.Google Scholar
[41] Jampani Ravi, Xu Fei, Wu Mingxi, Perez Luis, Jermaine Chris, and Haas Peter J.. 2011. The Monte Carlo database system: Stochastic analysis close to the data. ACM Trans. Datab. Syst. 36, 3 (2011). DOI:Google ScholarDigital Library
[42] Kallenberg Olav. 2002. Foundations of Modern Probability (2nd ed.). Springer, New York, NY. DOI:Google ScholarCross Ref
[43] Kaminski Benjamin Lucien and Katoen Joost-Pieter. 2015. On the hardness of almost-sure termination. In Mathematical Foundations of Computer Science 2015 (MFCS 2015) (Lecture Notes in Computer Science), Vol. 9234. Springer, Berlin, 307–318. DOI:Google ScholarCross Ref
[44] Kimmig Angelika, Bach Stephen, Broecheler Matthias, Huang Bert, and Getoor Lise. 2012. A short introduction to probabilistic soft logic. In Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications. 1–4.Google Scholar
[45] Koller Daphne and Friedman Nir. 2009. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, Cambridge, MA.Google ScholarDigital Library
[46] Kolmogorov Andrei Nikolajewitsch. 1956. Foundations of the Theory of Probability (2nd English ed.). Chelsea Publishing Company, New York, NY.Google Scholar
[47] Kozen Dexter. 1981. Semantics of probabilistic programs. J. Comput. Syst. Sci. 22, 3 (1981), 328–350. DOI:Google ScholarCross Ref
[48] Kozen Dexter. 1983. A probabilistic PDL. In Proceedings of the ACM Symposium on Theory of Computing. ACM, 291–297.Google ScholarDigital Library
[49] Kuratowski C. and Ryll-Nardzewski C.. 1965. A general theorem on selectors. Bull. Polish Acad. Sci. 13 (1965), 397–403.Google Scholar
[50] Libkin Leonid. 2012. Elements of Finite Model Theory. Springer, Berlin. DOI:Google ScholarCross Ref
[51] Limpert Eckhard, Stahel Werner A., and Abbt Markus. 2001. Log-normal distributions across the sciences: Keys and clues. BioScience 51, 5 (2001), 341–352. DOI:Google ScholarCross Ref
[52] Lindner Peter. 2021. The Theory of Infinite Probabilistic Databases. Ph.D. Dissertation. RWTH Aachen University. DOI:Google ScholarCross Ref
[53] Milch Brian. 2006. Probabilistic Models with Unknown Objects. Ph.D. Dissertation. University of California at Berkeley.Google ScholarDigital Library
[54] Milch Brian, Marthi Bhaskara, Russell Stuart, Sontag David, Ong David L., and Kolobov Andrey. 2005. Blog: Probabilistic models with unknown objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI’05). Morgan Kaufmann, San Francisco, CA, 1352–1359.Google Scholar
[55] Nori Aditya V., Hur Chung-Kil, Rajamani Sriram K., and Samuel Selva. 2014. R2: An efficient MCMC sampler for probabilistic programs. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. AAAI Press, 2476–2482.Google ScholarCross Ref
[56] Pearl Judea. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Elsevier Inc., Imprint: Morgan Kaufmann Publishers Inc., San Francisco, CA. DOI:Google ScholarCross Ref
[57] Pfeffer Avi. 2009. Figaro: An Object-Oriented Probabilistic Programming Language. Technical Report. Charles River Analytics.Google Scholar
[58] Richardson Matthew and Domingos Pedro. 2006. Markov logic networks. Mach. Learn. 62, 1–2 (2006), 107–136. DOI:Google ScholarDigital Library
[59] Saheb-Djahromi N.. 1978. Probabilistic LCF. In Mathematical Foundations of Computer Science 1978 (MFCS 1979) (Lecture Notes in Computer Science), Vol. 64. Springer, Berlin, Germany, 442–451. .Google ScholarCross Ref
[60] Singh Sarvjeet, Mayfield Chris, Mittal Sagar, Prabhakar Sunil, Hambrusch Susanne, and Shah Rahul. 2008. Orion 2.0: Native support for uncertain data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’08). ACM, New York, NY, 1239–1242. DOI:Google ScholarDigital Library
[61] Singla Parag and Domingos Pedro. 2007. Markov logic in infinite domains. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI’07). AUAI Press, Arlington, VA, 368–375.Google Scholar
[62] Srivastava Sashi Mohan. 1998. A Course on Borel Sets. Graduate Texts in Mathematics, Vol. 180. Springer, New York, NY.Google ScholarCross Ref
[63] Suciu Dan, Olteanu Dan, Ré Christopher, and Koch Christoph. 2011. Probabilistic Databases (1st ed.). Morgan & Claypool, San Rafael, CA. DOI:Google ScholarCross Ref
[64] Tolpin David, Meent Jan-Willem van de, and Wood Frank. 2015. Probabilistic programming in Anglican. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015) (Lecture Notes in Computer Science), Vol. 9286. Springer International Publishing, Cham, Switzerland, 308–311. DOI:Google ScholarDigital Library
[65] Meent Jan-Willem van de, Paige Brooks, Yang Hongseok, and Frank Wood. 2018. An introduction to probabilistic programming. arXiv e-prints (2018). Retrieved from https://arxiv.org/abs/1809.10756.Google Scholar
[66] Broeck Guy Van den and Suciu Dan. 2017. Query processing on probabilistic data: A survey. Found. Trends® Datab. 7, 3–4 (2017), 197–341. DOI:Google ScholarCross Ref
[67] Wanders Brend, Keulen Maurice van, and Flokstra Jan. 2016. JudgeD: A probabilistic datalog with dependencies. In The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence: Technical Reports WS-16-01 – WS-16-15. AAAI Press, Palo Alto, CA.Google Scholar
[68] Wang Jue and Domingos Pedro. 2008. Hybrid Markov logic networks. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI’08). 1106–1111. Retrieved from https://www.aaai.org/Papers/AAAI/2008/AAAI08-175.pdf.Google Scholar
[69] Wu Yi, Srivastava Siddharth, Hay Nicholas, Du Simon, and Russell Stuart. 2018. Discrete-continuous mixtures in probabilistic programming: Generalized semantics and inference algorithms. In Proceedings of the 35th International Conference on Machine Learning (ICML’18) (Proceedings of Machine Learning Research), Vol. 80. PMLR, 5343–5352.Google Scholar

Index Terms

Generative Datalog with Continuous Distributions
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations
2. Theory of computation
  1. Logic
    1. Constraint and logic programming
  2. Theory and algorithms for application domains
    1. Database theory
      1. Incomplete, inconsistent, and uncertain databases

Recommendations

Generative Datalog with Continuous Distributions
PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and ...
Read More
Generative Datalog with Stable Negation
PODS '23: Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Extending programming languages with stochastic behaviour such as probabilistic choices or random sampling has a long tradition in computer science. A recent development in this direction is a declarative probabilistic programming language, proposed by ...
Read More
Abstract Hilbertian deductive systems, infon logic, and Datalog

In the first part of the paper, we discuss abstract Hilbertian deductive systems; these are systems defined by abstract notions of formula, axiom, and inference rule. We use these systems to develop a general method for converting derivability problems, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of the ACM Volume 69, Issue 6
December 2022
302 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/3570966
Editor:
Venkatesan Guruswami
University of California, Berkeley, United States
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 November 2022
- Online AM: 30 August 2022
- Accepted: 25 July 2022
- Revised: 7 February 2022
- Received: 1 February 2021
Published in jacm Volume 69, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Datalog
probabilistic databases
Generative Datalog
measure theory
stochastic kernels
probabilistic programming
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 529
  Total Downloads
- Downloads (Last 12 months)269
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Generative Datalog with Continuous Distributions

Journal of the ACM

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Generative Datalog with Continuous Distributions

Generative Datalog with Stable Negation

Abstract Hilbertian deductive systems, infon logic, and Datalog