ABSTRACT
Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and propose a more foundational approach towards defining its semantics. It is based on standard notions from probability theory known as stochastic kernels and Markov processes. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al. We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, ICDT 2020), and we show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database.
- Charu C. Aggarwal and Philip S. Yu. A Survey of Uncertain Data Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering (TKDE), 21(5):609--623, 2009. http://dx.doi.org/10.1109/TKDE.2008.190 pathdoi:10.1109/TKDE.2008.190.Google ScholarDigital Library
- Parag Agrawal and Jennifer Widom. Continuous Uncertainty in Trio. In Proceedings of the 3rd VLDB workshop on Management of Uncertain Data (MUD 2009), pages 17--32, Enschede, The Netherlands, 2009. Centre for Telematics and Information Technology (CTIT).Google Scholar
- Vince Bárány, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. Declarative Probabilistic Programming with Datalog. ACM Transactions on Database Systems (TODS), 42(4):22:1--22:35, 2017. http://dx.doi.org/10.1145/3132700 pathdoi:10.1145/3132700.Google ScholarDigital Library
- Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul A. Szerlip, Paul Horsfall, and Noah D. Goodman. Pyro: Deep Universal Probabilistic Programming. J. Mach. Learn. Res., 20:28:1--28:6, 2019.Google Scholar
- Johannes Borgström, Andrew D. Gordon, Michael Greenberg, James Margetson, and Jurgen Van Van Gael. Measure Transformer Semantics for Bayesian Machine Learning. Logical Methods in Computer Science, 9(3), 2013. http://dx.doi.org/10.2168/LMCS-9(3:11)2013 pathdoi:10.2168/LMCS-9(3:11)2013.Google ScholarCross Ref
- Zhuhua Cai, Zografoula Vagena, Luis Perez, Subramanian Arumugam, Peter J. Haas, and Christopher Jermaine. Simulation of Database-Valued Markov Chains using SimSQL. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD 2013), pages 637--648, New York, NY, USA, 2013. ACM. http://dx.doi.org/10.1145/2463676.2465283 pathdoi:10.1145/2463676.2465283.Google ScholarDigital Library
- Bob Carpenter, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A Probabilistic Programming Language. Journal of Statistical Software, 76(1), 2017.Google ScholarCross Ref
- D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure. Probability and its Applications. Springer, New York, NY, USA, 2nd edition, 2008. http://dx.doi.org/10.1007/978-0--387--49835--5 pathdoi:10.1007/978-0--387--49835--5.Google ScholarCross Ref
- Luc De Raedt, Angelika Kimmig, and Hannu Toivonen. ProbLog: A Probabilistic Prolog and Its Application in Link Discovery. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), pages 2468--2473, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc.Google Scholar
- Luc De De Raedt, Kristian Kersting, Sriraam Natarajan, and David Poole. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation, volume 10. Morgan & Claypool Publishers, 2016. http://dx.doi.org/10.2200/S00692ED1V01Y201601AIM032 pathdoi:10.2200/S00692ED1V01Y201601AIM032.Google ScholarCross Ref
- Daniel Deutch, Christoph Koch, and Tova Milo. On Probabilistic Fixpoint and Markov Chain Query Languages. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2010), pages 215--226, New York, NY, USA, 2010. ACM. http://dx.doi.org/10.1145/1807085.1807114 pathdoi:10.1145/1807085.1807114.Google ScholarDigital Library
- David H. Fremlin. Measure Theory. Vol. II: Broad Foundations. Torres Fremlin, 2nd edition, 2010.Google Scholar
- David H. Fremlin. Measure Theory. Vol. IV: Topological Measure Spaces, Part I. Torres Fremlin, 2nd edition, 2013.Google Scholar
- Norbert Fuhr. Probabilistic Datalog -- A Logic for Powerful Retrieval Methods. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1995), pages 282--290, New York, NY, USA, 1995. ACM. http://dx.doi.org/10.1145/215206.215372 pathdoi:10.1145/215206.215372.Google ScholarDigital Library
- Marie Gaudard and Donald Hadwin. Sigma-Algebras on Spaces of Probability Measures. Scandinavian Journal of Statistics, 16(2):169--175, 1989.Google Scholar
- Noah D. Goodman. The principles and practice of probabilistic programming. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2013), pages 399--402, New York, NY, USA, 2013. ACM. http://dx.doi.org/10.1145/2429069.2429117 pathdoi:10.1145/2429069.2429117.Google ScholarDigital Library
- Noah D. Goodman, Vikash K. Mansinghka, Daniel Roy, Keith Bonawitz, and Joshua B. Tenenbaum. Church: A Language for Generative Models. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI 2008), pages 220--229, Arlington, VA, USA, 2008. AUAI Press.Google Scholar
- Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. Probabilistic programming. In Proceedings of the on Future of Software Engineering (FOSE 2014), pages 167--181, New York, NY, USA, 2014. ACM. http://dx.doi.org/10.1145/2593882.2593900 pathdoi:10.1145/2593882.2593900.Google ScholarDigital Library
- Georg Gottlob, Thomas Lukasiewicz, Maria Vanina Martinez, and Gerardo I. Simari. Query Answering under Probabilistic Uncertainty in DatalogGoogle Scholar
- /- Ontologies. Annals of Mathematics and Artificial Intelligence, 69(1):37--72, 2013. http://dx.doi.org/10.1007/s10472-013--9342--1 pathdoi:10.1007/s10472-013--9342--1.Google ScholarDigital Library
- Martin Grohe, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and Peter Lindner. Generative Datalog with Continuous Distributions. arXiv e-prints, 2020. URL: https://arxiv.org/abs/2001.06358.Google Scholar
- Martin Grohe and Peter Lindner. Probabilistic Databases with an Infinite Open-World Assumption. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2019), pages 17--31, New York, NY, USA, 2019. ACM. http://dx.doi.org/10.1145/3294052.3319681 pathdoi:10.1145/3294052.3319681.Google ScholarDigital Library
- Martin Grohe and Peter Lindner. Infinite Probabilistic Databases. In Carsten Lutz and Jean Christoph Jung, editors, 23rd International Conference on Database Theory (ICDT 2020), volume 155 of Leibniz International Proceedings in Informatics (LIPIcs), pages 16:1--16:20, Dagstuhl, Germany, 2020. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik. URL: https://drops.dagstuhl.de/opus/volltexte/2020/11940, http://dx.doi.org/10.4230/LIPIcs.ICDT.2020.16 pathdoi:10.4230/LIPIcs.ICDT.2020.16.Google ScholarCross Ref
- Bernd Gutmann, Manfred Jaeger, and Luc De Raedt. Extending ProbLog with Continuous Distributions. In Inductive Logic Progamming (ILP 2010), volume 6489 of Lecture Notes in Computer Science, pages 76--91, Berlin, Germany and Heidelberg, Germany, 2011. Springer. http://dx.doi.org/10.1007/978--3--642--21295--6_12 pathdoi:10.1007/978--3--642--21295--6_12.Google ScholarCross Ref
- Ravi Jampani, Fei Xu, Mingxi Wu, Luis Perez, Chris Jermaine, and Peter J. Haas. The Monte Carlo Database System: Stochastic Analysis Close to the Data. ACM Transactions on Database Systems (TODS), 36(3):18:1--18:41, 2011. http://dx.doi.org/10.1145/2000824.2000828 pathdoi:10.1145/2000824.2000828.Google ScholarDigital Library
- Olav Kallenberg. Foundations of Modern Probability. Springer Series in Statistics. Probability and its Applications. Springer, New York, NY, USA, 2nd edition, 2002. http://dx.doi.org/10.1007/978--1--4757--4015--8 pathdoi:10.1007/978--1--4757--4015--8.Google ScholarCross Ref
- Angelika Kimmig, Stephen Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. A Short Introduction to Probabilistic Soft Logic. In Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications, pages 1--4, 2012.Google Scholar
- Daphne Koller and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. Adaptive Computation and Machine Learning. The MIT Press, Cambridge, MA, USA, 2009.Google Scholar
- Andrei Nikolajewitsch Kolmogorov. Foundations of the Theory of Probability. Chelsea Publishing Company, New York, NY, USA, 2nd English edition, 1956.Google Scholar
- C. Kuratowski and C. Ryll-Nardzewski. A General Theorem on Selectors. Bulletin of the Polish Academy of Sciences, 13:397--403, 1965.Google Scholar
- Eckhard Limpert, Werner A. Stahel, and Markus Abbt. Log-normal Distributions across the Sciences: Keys and Clues. BioScience, 51(5):341--352, 2001. http://dx.doi.org/10.1641/0006--3568(2001)051[0341:LNDATS]2.0.CO;2 pathdoi:10.1641/0006--3568(2001)051[0341:LNDATS]2.0.CO;2.Google ScholarCross Ref
- Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, David L. Ong, and Andrey Kolobov. shapeBlog: Probabilistic Models with Unknown Objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), pages 1352--1359, San Francisco, CA, USA, 2005. Morgan Kaufmann.Google Scholar
- Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. R2: An Efficient MCMC Sampler for Probabilistic Programs. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, pages 2476--2482. AAAI Press, 2014.Google ScholarCross Ref
- Avi Pfeffer. Figaro: An Object-Oriented Probabilistic Programming Language. Technical report, Charles River Analytics, 2009.Google Scholar
- Matthew Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 62(1--2):107--136, 2006. http://dx.doi.org/10.1007/s10994-006--5833--1 pathdoi:10.1007/s10994-006--5833--1.Google ScholarCross Ref
- Sarvjeet Singh, Chris Mayfield, Sagar Mittal, Sunil Prabhakar, Susanne Hambrusch, and Rahul Shah. Orion 2.0: Native Support for Uncertain Data. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD 2008), pages 1239--1242, New York, NY, USA, 2008. ACM. http://dx.doi.org/10.1145/1376616.1376744 pathdoi:10.1145/1376616.1376744.Google ScholarDigital Library
- Parag Singla and Pedro Domingos. Markov logic in infinite domains. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI 2007), pages 368--375, Arlington, VA, USA, 2007. AUAI Press.Google Scholar
- Sashi Mohan Srivastava. A Course on Borel Sets, volume 180 of Graduate Texts in Mathematics. Springer, New York, NY, USA, 1998.Google ScholarCross Ref
- Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool, San Rafael, CA, USA, 1st edition, 2011. http://dx.doi.org/10.2200/S00362ED1V01Y201105DTM016 pathdoi:10.2200/S00362ED1V01Y201105DTM016.Google ScholarCross Ref
- David Tolpin, Jan-Willem van de Meent, and Frank Wood. Probabilistic Programming in Anglican. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015), volume 9286 of Lecture Notes in Computer Science, pages 308--311, Cham, Switzerland, 2015. Springer International Publishing. http://dx.doi.org/10.1007/978--3--319--23461--8_36 pathdoi:10.1007/978--3--319--23461--8_36.Google ScholarDigital Library
- Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. An Introduction to Probabilistic Programming. arXiv e-prints, 2018. URL: https://arxiv.org/abs/1809.10756.Google Scholar
- Brend Wanders, Maurice van Keulen, and Jan Flokstra. JudgeD: A Probabilistic Datalog with Dependencies. In The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence: Technical Reports WS-16-01 -- WS-16--15, Palo Alto, CA, USA, 2016. AAAI Press.Google Scholar
- Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, and Stuart Russell. Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), volume 80 of Proceedings of Machine Learning Research, pages 5343--5352. PMLR, 2018.Google Scholar
Index Terms
- Generative Datalog with Continuous Distributions
Recommendations
Generative Datalog with Continuous Distributions
Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a “purely declarative probabilistic programming language.” We revisit this language and ...
Generative Datalog with Stable Negation
PODS '23: Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsExtending programming languages with stochastic behaviour such as probabilistic choices or random sampling has a long tradition in computer science. A recent development in this direction is a declarative probabilistic programming language, proposed by ...
10 Years of Probabilistic Querying --- What Next?
ADBIS 2013: Proceedings of the 17th East European Conference on Advances in Databases and Information Systems - Volume 8133Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but--so far--both areas developed almost independently of ...
Comments