skip to main content
research-article

Generative Datalog with Continuous Distributions

Published:17 November 2022Publication History
Skip Abstract Section

Abstract

Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a “purely declarative probabilistic programming language.” We revisit this language and propose a more principled approach towards defining its semantics based on stochastic kernels and Markov processes—standard notions from probability theory. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al.

We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, LMCS 2022) and show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database.

REFERENCES

  1. [1] Abiteboul Serge, Hull Richard, and Vianu Richard. 1995. Foundations of Databases (1st ed.). Addison-Wesley Publishing Company, Inc., Reading, MA.Google ScholarGoogle Scholar
  2. [2] Aggarwal Charu C. and Yu Philip S.. 2009. A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21, 5 (2009), 609623. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Agrawal Parag and Widom Jennifer. 2009. Continuous uncertainty in trio. In Proceedings of the 3rd VLDB Workshop on Management of Uncertain Data (MUD’09). Centre for Telematics and Information Technology (CTIT), 1732.Google ScholarGoogle Scholar
  4. [4] Alviano Mario and Zamayla Arnel. 2021. A speech about generative datalog and non-measurable sets. In Proceedings of the International Conference on Logic Programming Workshops co-located with the 37th International Conference on Logic Programming (ICLP’21). 8. Retrieved from http://ceur-ws.org/Vol-2970/aspocpinvited2.pdf.Google ScholarGoogle Scholar
  5. [5] Bárány Vince, Cate Balder ten, Kimelfeld Benny, Olteanu Dan, and Vagena Zografoula. 2017. Declarative probabilistic programming with datalog. ACM Trans. Datab. Syst. 42, 4 (2017). DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Bingham Eli, Chen Jonathan P., Jankowiak Martin, Obermeyer Fritz, Pradhan Neeraj, Karaletsos Theofanis, Singh Rohit, Szerlip Paul A., Horsfall Paul, and Goodman Noah D.. 2019. Pyro: Deep universal probabilistic programming. J. Mach. Learn. Res. 20 (2019), 28:1–28:6.Google ScholarGoogle Scholar
  7. [7] Booth T. L. and Thompson R. A.. 1973. Applying probability measures to abstract languages. IEEE Trans. Comput. C-22, 5 (1973), 442450. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Borgström Johannes, Gordon Andrew D., Greenberg Michael, Margetson James, and Gael Jurgen Van Van. 2013. Measure transformer semantics for bayesian machine learning. Logic. Meth. Comput. Sci. 9, 3 (2013). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Bournez Olivier and Garnier Florent. 2005. Proving positive almost-sure termination. In Proceedings of the 16th International Conference on Term Rewriting and Applications (RTA’05). Springer-Verlag, Berlin, Germany, 323337. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Cabibbo Luca. 1998. The expressive power of stratified logic programs with value invention. Inf. Computat. 147, 1 (1998), 2256. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Cai Zhuhua, Vagena Zografoula, Perez Luis, Arumugam Subramanian, Haas Peter J., and Jermaine Christopher. 2013. Simulation of database-valued Markov chains using SimSQL. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 637648. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Calì Andrea, Gottlob Georg, and Kifer Michael. 2013. Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. 48 (2013), 115174. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Carpenter Bob, Gelman Andrew, Hoffman Matthew, Lee Daniel, Goodrich Ben, Betancourt Michael, Brubaker Marcus, Guo Jiqiang, Li Peter, and Riddell Allen. 2017. Stan: A probabilistic programming language. J. Statist. Softw. 76, 1 (2017).Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Chatterjee Krishnendu, Fu Hongfei, and Novotný Petr. 2020. Termination Analysis of Probabilistic Programs with Martingales. Cambridge University Press, Cambridge, UK, 221258. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Cheng Reynold, Kalashnikov Dmitri V., and Prabhakar Sunil. 2003. Evaluating probabilistic queries over imprecise data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’03). ACM, New York, NY, 551562. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Cheng Reynold and Prabhakar Sunil. 2003. Managing uncertainty in sensor databases. ACM SIGMOD Rec. 32, 4 (Dec. 2003), 4146. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Grau B. Cuenca, Horrocks I., Krötzsch M., Kupke C., Magka D., Motik B., and Wang Z.. 2013. Acyclicity notions for existential rules and their application to query answering in ontologies. J. Artif. Intell. Res. 47 (2013), 741808. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Daley D. J. and Vere-Jones D.. 2008. An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure (2nd ed.). Springer, New York, NY. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Raedt Luc De, Kimmig Angelika, and Toivonen Hannu. 2007. ProbLog: A probabilistic prolog and its application in link discovery. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07). Morgan Kaufmann Publishers Inc., San Francisco, CA, 24682473.Google ScholarGoogle Scholar
  20. [20] Raedt Luc De De, Kersting Kristian, Natarajan Sriraam, and Poole David. 2016. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation, Vol. 10. Morgan & Claypool Publishers, 1–189. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Deshpande Amol, Guestrin Carlos, Madden Samuel R., Hellerstein Joseph M., and Hong Wei. 2004. Model-driven data acquisition in sensor networks. In Proceedings of the VLDB Conference (VLDB’04). Morgan Kaufmann, 588599. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Deutch Daniel, Koch Christoph, and Milo Tova. 2010. On probabilistic fixpoint and Markov chain query languages. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’10). ACM, New York, NY, 215226. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Dwork Cynthia, McSherry Frank, Nissim Kobbi, and Smith Adam. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography (TCC 2006) (Lecture Notes in Computer Science), vol. 3876. Springer, Berlin, Germany, 265284. .Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Fierens Daan, Broeck Guy Van den, Renkens Joris, Shterionov Dimitar, Gutmann Bernd, Thon Ingo, Janssens Gerda, and Raedt Luc De. 2015. Inference and learning in probabilistic logic programs using weighted boolean formulas. Theor. Pract. Logic Program. 15, 3 (2015), 358401. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Fremlin David H.. 2013. Measure Theory. Vol. IV: Topological Measure Spaces, Part I (2nd ed.). Torres Fremlin.Google ScholarGoogle Scholar
  26. [26] Fuhr Norbert. 1995. Probabilistic datalog—A logic for powerful retrieval methods. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95). ACM, New York, NY, 282290. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Fuhr Norbert. 2000. Probabilistic datalog: Implementing logical information retrieval for advanced applications. J. Amer. Societ. Inf. Sci. 51, 2 (2000), 95110. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Gaudard Marie and Hadwin Donald. 1989. Sigma-algebras on spaces of probability measures. Scandin. J. Statist. 16, 2 (1989), 169175.Google ScholarGoogle Scholar
  29. [29] Gogacz Tomasz, Marcinkowski Jerzy, and Pieris Andreas. 2020. All-instances restricted chase termination. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’20). Association for Computing Machinery, New York, NY, 245258. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Goodman Noah D.. 2013. The principles and practice of probabilistic programming. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’13). ACM, New York, NY, 399402. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Goodman Noah D., Mansinghka Vikash K., Roy Daniel, Bonawitz Keith, and Tenenbaum Joshua B.. 2008. Church: A language for generative models. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI’08). AUAI Press, Arlington, VA, 220229.Google ScholarGoogle Scholar
  32. [32] Gordon Andrew D., Henzinger Thomas A., Nori Aditya V., and Rajamani Sriram K.. 2014. Probabilistic programming. In Proceedings of the Conference on Future of Software Engineering (FOSE’14). ACM, New York, NY, 167181. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Gottlob Georg, Lukasiewicz Thomas, Martinez Maria Vanina, and Simari Gerardo I.. 2013. Query answering under probabilistic uncertainty in datalog +/- ontologies. Ann. Math. Artif. Intell. 69, 1 (2013), 3772. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Gottlob Georg, Lukasiewicz Thomas, and Pieris Andreas. 2014. Datalog+/–: Questions and answers. In Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (KR’14). AAAI Press. Retrieved from https://www.aaai.org/ocs/index.php/KR/KR14/paper/view/7965.Google ScholarGoogle Scholar
  35. [35] Grohe Martin, Kaminski Benjamin Lucien, Katoen Joost-Pieter, and Lindner Peter. 2020. Generative datalog with continuous distributions. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’20). Association for Computing Machinery, New York, NY, 347360. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Grohe Martin and Lindner Peter. 2019. Probabilistic databases with an infinite open-world assumption. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’19). ACM, New York, NY, 1731. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Grohe Martin and Lindner Peter. 2022. Infinite probabilistic databases. Logic. Meth. Comput. Sci. 18, 1 (2022). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Gutmann Bernd, Jaeger Manfred, and Raedt Luc De. 2011. Extending problog with continuous distributions. In Inductive Logic Progamming (ILP 2010) (Lecture Notes in Computer Science), Vol. 6489. Springer, Berlin, Germany, 7691. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Icard Thomas F.. 2020. Calibrating generative models: The probabilistic Chomsky-Schützenberger Hierarchy. J. Math. Psychol. 95 (2020), 102308. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Jacobs Jules. 2021. Paradoxes of probabilistic programming. In Proceedings of the 48th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL’21), Vol. 5. ACM, 26.Google ScholarGoogle Scholar
  41. [41] Jampani Ravi, Xu Fei, Wu Mingxi, Perez Luis, Jermaine Chris, and Haas Peter J.. 2011. The Monte Carlo database system: Stochastic analysis close to the data. ACM Trans. Datab. Syst. 36, 3 (2011). DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Kallenberg Olav. 2002. Foundations of Modern Probability (2nd ed.). Springer, New York, NY. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Kaminski Benjamin Lucien and Katoen Joost-Pieter. 2015. On the hardness of almost-sure termination. In Mathematical Foundations of Computer Science 2015 (MFCS 2015) (Lecture Notes in Computer Science), Vol. 9234. Springer, Berlin, 307318. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Kimmig Angelika, Bach Stephen, Broecheler Matthias, Huang Bert, and Getoor Lise. 2012. A short introduction to probabilistic soft logic. In Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications. 14.Google ScholarGoogle Scholar
  45. [45] Koller Daphne and Friedman Nir. 2009. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Kolmogorov Andrei Nikolajewitsch. 1956. Foundations of the Theory of Probability (2nd English ed.). Chelsea Publishing Company, New York, NY.Google ScholarGoogle Scholar
  47. [47] Kozen Dexter. 1981. Semantics of probabilistic programs. J. Comput. Syst. Sci. 22, 3 (1981), 328350. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Kozen Dexter. 1983. A probabilistic PDL. In Proceedings of the ACM Symposium on Theory of Computing. ACM, 291297.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Kuratowski C. and Ryll-Nardzewski C.. 1965. A general theorem on selectors. Bull. Polish Acad. Sci. 13 (1965), 397403.Google ScholarGoogle Scholar
  50. [50] Libkin Leonid. 2012. Elements of Finite Model Theory. Springer, Berlin. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Limpert Eckhard, Stahel Werner A., and Abbt Markus. 2001. Log-normal distributions across the sciences: Keys and clues. BioScience 51, 5 (2001), 341352. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Lindner Peter. 2021. The Theory of Infinite Probabilistic Databases. Ph.D. Dissertation. RWTH Aachen University. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Milch Brian. 2006. Probabilistic Models with Unknown Objects. Ph.D. Dissertation. University of California at Berkeley.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Milch Brian, Marthi Bhaskara, Russell Stuart, Sontag David, Ong David L., and Kolobov Andrey. 2005. Blog: Probabilistic models with unknown objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI’05). Morgan Kaufmann, San Francisco, CA, 13521359.Google ScholarGoogle Scholar
  55. [55] Nori Aditya V., Hur Chung-Kil, Rajamani Sriram K., and Samuel Selva. 2014. R2: An efficient MCMC sampler for probabilistic programs. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. AAAI Press, 24762482.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Pearl Judea. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Elsevier Inc., Imprint: Morgan Kaufmann Publishers Inc., San Francisco, CA. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Pfeffer Avi. 2009. Figaro: An Object-Oriented Probabilistic Programming Language. Technical Report. Charles River Analytics.Google ScholarGoogle Scholar
  58. [58] Richardson Matthew and Domingos Pedro. 2006. Markov logic networks. Mach. Learn. 62, 1–2 (2006), 107136. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Saheb-Djahromi N.. 1978. Probabilistic LCF. In Mathematical Foundations of Computer Science 1978 (MFCS 1979) (Lecture Notes in Computer Science), Vol. 64. Springer, Berlin, Germany, 442451. .Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Singh Sarvjeet, Mayfield Chris, Mittal Sagar, Prabhakar Sunil, Hambrusch Susanne, and Shah Rahul. 2008. Orion 2.0: Native support for uncertain data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’08). ACM, New York, NY, 12391242. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Singla Parag and Domingos Pedro. 2007. Markov logic in infinite domains. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI’07). AUAI Press, Arlington, VA, 368375.Google ScholarGoogle Scholar
  62. [62] Srivastava Sashi Mohan. 1998. A Course on Borel Sets. Graduate Texts in Mathematics, Vol. 180. Springer, New York, NY.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Suciu Dan, Olteanu Dan, Ré Christopher, and Koch Christoph. 2011. Probabilistic Databases (1st ed.). Morgan & Claypool, San Rafael, CA. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Tolpin David, Meent Jan-Willem van de, and Wood Frank. 2015. Probabilistic programming in Anglican. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015) (Lecture Notes in Computer Science), Vol. 9286. Springer International Publishing, Cham, Switzerland, 308311. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Meent Jan-Willem van de, Paige Brooks, Yang Hongseok, and Frank Wood. 2018. An introduction to probabilistic programming. arXiv e-prints (2018). Retrieved from https://arxiv.org/abs/1809.10756.Google ScholarGoogle Scholar
  66. [66] Broeck Guy Van den and Suciu Dan. 2017. Query processing on probabilistic data: A survey. Found. Trends® Datab. 7, 3–4 (2017), 197341. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Wanders Brend, Keulen Maurice van, and Flokstra Jan. 2016. JudgeD: A probabilistic datalog with dependencies. In The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence: Technical Reports WS-16-01 – WS-16-15. AAAI Press, Palo Alto, CA.Google ScholarGoogle Scholar
  68. [68] Wang Jue and Domingos Pedro. 2008. Hybrid Markov logic networks. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI’08). 11061111. Retrieved from https://www.aaai.org/Papers/AAAI/2008/AAAI08-175.pdf.Google ScholarGoogle Scholar
  69. [69] Wu Yi, Srivastava Siddharth, Hay Nicholas, Du Simon, and Russell Stuart. 2018. Discrete-continuous mixtures in probabilistic programming: Generalized semantics and inference algorithms. In Proceedings of the 35th International Conference on Machine Learning (ICML’18) (Proceedings of Machine Learning Research), Vol. 80. PMLR, 53435352.Google ScholarGoogle Scholar

Index Terms

  1. Generative Datalog with Continuous Distributions

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Journal of the ACM
          Journal of the ACM  Volume 69, Issue 6
          December 2022
          302 pages
          ISSN:0004-5411
          EISSN:1557-735X
          DOI:10.1145/3570966
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 November 2022
          • Online AM: 30 August 2022
          • Accepted: 25 July 2022
          • Revised: 7 February 2022
          • Received: 1 February 2021
          Published in jacm Volume 69, Issue 6

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)269
          • Downloads (Last 6 weeks)12

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format