Abstract
Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a “purely declarative probabilistic programming language.” We revisit this language and propose a more principled approach towards defining its semantics based on stochastic kernels and Markov processes—standard notions from probability theory. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al.
We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, LMCS 2022) and show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database.
- [1] . 1995. Foundations of Databases (1st ed.). Addison-Wesley Publishing Company, Inc., Reading, MA.Google Scholar
- [2] . 2009. A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21, 5 (2009), 609–623.
DOI: Google ScholarDigital Library - [3] . 2009. Continuous uncertainty in trio. In Proceedings of the 3rd VLDB Workshop on Management of Uncertain Data (MUD’09). Centre for Telematics and Information Technology (CTIT), 17–32.Google Scholar
- [4] . 2021. A speech about generative datalog and non-measurable sets. In Proceedings of the International Conference on Logic Programming Workshops co-located with the 37th International Conference on Logic Programming (ICLP’21). 8. Retrieved from http://ceur-ws.org/Vol-2970/aspocpinvited2.pdf.Google Scholar
- [5] . 2017. Declarative probabilistic programming with datalog. ACM Trans. Datab. Syst. 42, 4 (2017).
DOI: Google ScholarDigital Library - [6] . 2019. Pyro: Deep universal probabilistic programming. J. Mach. Learn. Res. 20 (2019), 28:1–28:6.Google Scholar
- [7] . 1973. Applying probability measures to abstract languages. IEEE Trans. Comput. C-22, 5 (1973), 442–450.
DOI: Google ScholarDigital Library - [8] . 2013. Measure transformer semantics for bayesian machine learning. Logic. Meth. Comput. Sci. 9, 3 (2013).
DOI: Google ScholarCross Ref - [9] . 2005. Proving positive almost-sure termination. In Proceedings of the 16th International Conference on Term Rewriting and Applications (RTA’05). Springer-Verlag, Berlin, Germany, 323–337.
DOI: Google ScholarDigital Library - [10] . 1998. The expressive power of stratified logic programs with value invention. Inf. Computat. 147, 1 (1998), 22–56.
DOI: Google ScholarDigital Library - [11] . 2013. Simulation of database-valued Markov chains using SimSQL. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 637–648.
DOI: Google ScholarDigital Library - [12] . 2013. Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. 48 (2013), 115–174.
DOI: Google ScholarCross Ref - [13] . 2017. Stan: A probabilistic programming language. J. Statist. Softw. 76, 1 (2017).Google ScholarCross Ref
- [14] . 2020. Termination Analysis of Probabilistic Programs with Martingales. Cambridge University Press, Cambridge, UK, 221–258.
DOI: Google ScholarCross Ref - [15] . 2003. Evaluating probabilistic queries over imprecise data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’03). ACM, New York, NY, 551–562.
DOI: Google ScholarDigital Library - [16] . 2003. Managing uncertainty in sensor databases. ACM SIGMOD Rec. 32, 4 (
Dec. 2003), 41–46.DOI: Google ScholarDigital Library - [17] . 2013. Acyclicity notions for existential rules and their application to query answering in ontologies. J. Artif. Intell. Res. 47 (2013), 741–808.
DOI: Google ScholarCross Ref - [18] . 2008. An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure (2nd ed.). Springer, New York, NY.
DOI: Google ScholarCross Ref - [19] . 2007. ProbLog: A probabilistic prolog and its application in link discovery. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07). Morgan Kaufmann Publishers Inc., San Francisco, CA, 2468–2473.Google Scholar
- [20] . 2016. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation, Vol. 10. Morgan & Claypool Publishers, 1–189.
DOI: Google ScholarCross Ref - [21] . 2004. Model-driven data acquisition in sensor networks. In Proceedings of the VLDB Conference (VLDB’04). Morgan Kaufmann, 588–599.
DOI: Google ScholarCross Ref - [22] . 2010. On probabilistic fixpoint and Markov chain query languages. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’10). ACM, New York, NY, 215–226.
DOI: Google ScholarDigital Library - [23] . 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography (TCC 2006) (Lecture Notes in Computer Science), vol. 3876. Springer, Berlin, Germany, 265–284. .Google ScholarDigital Library
- [24] . 2015. Inference and learning in probabilistic logic programs using weighted boolean formulas. Theor. Pract. Logic Program. 15, 3 (2015), 358–401.
DOI: Google ScholarCross Ref - [25] . 2013. Measure Theory. Vol. IV: Topological Measure Spaces, Part I (2nd ed.). Torres Fremlin.Google Scholar
- [26] . 1995. Probabilistic datalog—A logic for powerful retrieval methods. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95). ACM, New York, NY, 282–290.
DOI: Google ScholarDigital Library - [27] . 2000. Probabilistic datalog: Implementing logical information retrieval for advanced applications. J. Amer. Societ. Inf. Sci. 51, 2 (2000), 95–110.
DOI: Google ScholarCross Ref - [28] . 1989. Sigma-algebras on spaces of probability measures. Scandin. J. Statist. 16, 2 (1989), 169–175.Google Scholar
- [29] . 2020. All-instances restricted chase termination. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’20). Association for Computing Machinery, New York, NY, 245–258.
DOI: Google ScholarDigital Library - [30] . 2013. The principles and practice of probabilistic programming. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’13). ACM, New York, NY, 399–402.
DOI: Google ScholarDigital Library - [31] . 2008. Church: A language for generative models. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI’08). AUAI Press, Arlington, VA, 220–229.Google Scholar
- [32] . 2014. Probabilistic programming. In Proceedings of the Conference on Future of Software Engineering (FOSE’14). ACM, New York, NY, 167–181.
DOI: Google ScholarDigital Library - [33] . 2013. Query answering under probabilistic uncertainty in datalog +/- ontologies. Ann. Math. Artif. Intell. 69, 1 (2013), 37–72.
DOI: Google ScholarDigital Library - [34] . 2014. Datalog+/–: Questions and answers. In Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (KR’14). AAAI Press. Retrieved from https://www.aaai.org/ocs/index.php/KR/KR14/paper/view/7965.Google Scholar
- [35] . 2020. Generative datalog with continuous distributions. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’20). Association for Computing Machinery, New York, NY, 347–360.
DOI: Google ScholarDigital Library - [36] . 2019. Probabilistic databases with an infinite open-world assumption. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’19). ACM, New York, NY, 17–31.
DOI: Google ScholarDigital Library - [37] . 2022. Infinite probabilistic databases. Logic. Meth. Comput. Sci. 18, 1 (2022).
DOI: Google ScholarCross Ref - [38] . 2011. Extending problog with continuous distributions. In Inductive Logic Progamming (ILP 2010) (Lecture Notes in Computer Science), Vol. 6489. Springer, Berlin, Germany, 76–91.
DOI: Google ScholarCross Ref - [39] . 2020. Calibrating generative models: The probabilistic Chomsky-Schützenberger Hierarchy. J. Math. Psychol. 95 (2020), 102308.
DOI: Google ScholarCross Ref - [40] . 2021. Paradoxes of probabilistic programming. In Proceedings of the 48th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL’21), Vol. 5. ACM, 26.Google Scholar
- [41] . 2011. The Monte Carlo database system: Stochastic analysis close to the data. ACM Trans. Datab. Syst. 36, 3 (2011).
DOI: Google ScholarDigital Library - [42] . 2002. Foundations of Modern Probability (2nd ed.). Springer, New York, NY.
DOI: Google ScholarCross Ref - [43] . 2015. On the hardness of almost-sure termination. In Mathematical Foundations of Computer Science 2015 (MFCS 2015) (Lecture Notes in Computer Science), Vol. 9234. Springer, Berlin, 307–318.
DOI: Google ScholarCross Ref - [44] . 2012. A short introduction to probabilistic soft logic. In Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications. 1–4.Google Scholar
- [45] . 2009. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, Cambridge, MA.Google ScholarDigital Library
- [46] . 1956. Foundations of the Theory of Probability (2nd English ed.). Chelsea Publishing Company, New York, NY.Google Scholar
- [47] . 1981. Semantics of probabilistic programs. J. Comput. Syst. Sci. 22, 3 (1981), 328–350.
DOI: Google ScholarCross Ref - [48] . 1983. A probabilistic PDL. In Proceedings of the ACM Symposium on Theory of Computing. ACM, 291–297.Google ScholarDigital Library
- [49] . 1965. A general theorem on selectors. Bull. Polish Acad. Sci. 13 (1965), 397–403.Google Scholar
- [50] . 2012. Elements of Finite Model Theory. Springer, Berlin.
DOI: Google ScholarCross Ref - [51] . 2001. Log-normal distributions across the sciences: Keys and clues. BioScience 51, 5 (2001), 341–352.
DOI: Google ScholarCross Ref - [52] . 2021. The Theory of Infinite Probabilistic Databases. Ph.D. Dissertation. RWTH Aachen University.
DOI: Google ScholarCross Ref - [53] . 2006. Probabilistic Models with Unknown Objects. Ph.D. Dissertation. University of California at Berkeley.Google ScholarDigital Library
- [54] . 2005. Blog: Probabilistic models with unknown objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI’05). Morgan Kaufmann, San Francisco, CA, 1352–1359.Google Scholar
- [55] . 2014. R2: An efficient MCMC sampler for probabilistic programs. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. AAAI Press, 2476–2482.Google ScholarCross Ref
- [56] . 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Elsevier Inc., Imprint: Morgan Kaufmann Publishers Inc., San Francisco, CA.
DOI: Google ScholarCross Ref - [57] . 2009. Figaro: An Object-Oriented Probabilistic Programming Language.
Technical Report . Charles River Analytics.Google Scholar - [58] . 2006. Markov logic networks. Mach. Learn. 62, 1–2 (2006), 107–136.
DOI: Google ScholarDigital Library - [59] . 1978. Probabilistic LCF. In Mathematical Foundations of Computer Science 1978 (MFCS 1979) (Lecture Notes in Computer Science), Vol. 64. Springer, Berlin, Germany, 442–451. .Google ScholarCross Ref
- [60] . 2008. Orion 2.0: Native support for uncertain data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’08). ACM, New York, NY, 1239–1242.
DOI: Google ScholarDigital Library - [61] . 2007. Markov logic in infinite domains. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI’07). AUAI Press, Arlington, VA, 368–375.Google Scholar
- [62] . 1998. A Course on Borel Sets.
Graduate Texts in Mathematics , Vol. 180. Springer, New York, NY.Google ScholarCross Ref - [63] . 2011. Probabilistic Databases (1st ed.). Morgan & Claypool, San Rafael, CA.
DOI: Google ScholarCross Ref - [64] . 2015. Probabilistic programming in Anglican. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2015) (Lecture Notes in Computer Science), Vol. 9286. Springer International Publishing, Cham, Switzerland, 308–311.
DOI: Google ScholarDigital Library - [65] . 2018. An introduction to probabilistic programming. arXiv e-prints (2018). Retrieved from https://arxiv.org/abs/1809.10756.Google Scholar
- [66] . 2017. Query processing on probabilistic data: A survey. Found. Trends® Datab. 7, 3–4 (2017), 197–341.
DOI: Google ScholarCross Ref - [67] . 2016. JudgeD: A probabilistic datalog with dependencies. In The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence: Technical Reports WS-16-01 – WS-16-15. AAAI Press, Palo Alto, CA.Google Scholar
- [68] . 2008. Hybrid Markov logic networks. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI’08). 1106–1111. Retrieved from https://www.aaai.org/Papers/AAAI/2008/AAAI08-175.pdf.Google Scholar
- [69] . 2018. Discrete-continuous mixtures in probabilistic programming: Generalized semantics and inference algorithms. In Proceedings of the 35th International Conference on Machine Learning (ICML’18) (Proceedings of Machine Learning Research), Vol. 80. PMLR, 5343–5352.Google Scholar
Index Terms
- Generative Datalog with Continuous Distributions
Recommendations
Generative Datalog with Continuous Distributions
PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsArguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and ...
Generative Datalog with Stable Negation
PODS '23: Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsExtending programming languages with stochastic behaviour such as probabilistic choices or random sampling has a long tradition in computer science. A recent development in this direction is a declarative probabilistic programming language, proposed by ...
Abstract Hilbertian deductive systems, infon logic, and Datalog
In the first part of the paper, we discuss abstract Hilbertian deductive systems; these are systems defined by abstract notions of formula, axiom, and inference rule. We use these systems to develop a general method for converting derivability problems, ...
Comments