Abstract
The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the conjunctive query containment under bag semantics. These connections are established using information inequalities, which are considered to be the laws of information theory. Our first main result asserts that deciding the validity of a generalization of information inequalities is many-one equivalent to the restricted case of conjunctive query containment in which the containing query is acyclic; thus, either both these problems are decidable or both are undecidable. Our second main result identifies a new decidable case of the conjunctive query containment problem under bag semantics. Specifically, we give an exponential-time algorithm for conjunctive query containment under bag semantics, provided the containing query is chordal and admits a simple junction tree.
- Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, and Dan Suciu. 2020. Bag query containment and information theory. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’20). ACM, New York, NY, 95–112. Google ScholarDigital Library
- Foto N. Afrati, Matthew Damigos, and Manolis Gergatsoulis. 2010. Query containment under bag and bag-set semantics. Inf. Process. Lett. 110, 10 (2010), 360–369. Google ScholarDigital Library
- Marcelo Arenas and Leonid Libkin. 2005. An information-theoretic approach to normal forms for relational and XML data. J. ACM 52, 2 (2005), 246–283. Google ScholarDigital Library
- Albert Atserias, Martin Grohe, and Dániel Marx. 2013. Size bounds and query plans for relational joins. SIAM J. Comput. 42, 4 (2013), 1737–1767.Google ScholarDigital Library
- Terence Chan. 2011. Recent progresses in characterising information inequalities. Entropy 13, 2 (2011), 379–401.Google ScholarCross Ref
- Terence H. Chan. 2007. Group characterizable entropy functions. In Proceedings of the IEEE International Symposium on Information Theory. IEEE, 506–510.Google ScholarCross Ref
- Terence H. Chan and Raymond W. Yeung. 2002. On a relation between information inequalities and group theory. IEEE Trans. Inf. Theory 48, 7 (2002), 1992–1995. Google ScholarDigital Library
- Ashok K. Chandra and Philip M. Merlin. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the 9th Annual ACM Symposium on Theory of Computing. 77–90. Google ScholarDigital Library
- Surajit Chaudhuri and Moshe Y. Vardi. 1993. Optimization of real conjunctive queries. In Proceedings of the 12th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 59–70. Google ScholarDigital Library
- Ronald Fagin. 1982. Horn clauses and database dependencies. J. ACM 29, 4 (1982), 952–985. Google ScholarDigital Library
- Ronald Fagin. 1983. Degrees of acyclicity for hypergraphs and relational database schemes. J. ACM 30, 3 (1983), 514–550. Google ScholarDigital Library
- Dan Geiger and Judea Pearl. 1993. Logical and algorithmic properties of conditional independence and graphical models. Ann. Stat. 21, 4 (1993), 2001–2021. Google ScholarCross Ref
- Arley Gomez, Carolina Mejía Corredor, and J. Andres Montoya. 2017. Defining the almost-entropic regions by algebraic inequalities. IJICoT 4, 1 (2017), 1–18. Google ScholarDigital Library
- Georg Gottlob, Stephanie Tien Lee, Gregory Valiant, and Paul Valiant. 2012. Size and treewidth bounds for conjunctive queries. J. ACM 59, 3 (2012), 16:1–16:35. Google ScholarDigital Library
- Martin Grohe and Dániel Marx. 2014. Constraint solving via fractional edge covers. ACM Trans. Algorithms 11, 1 (2014), 4:1–4:20. Google ScholarDigital Library
- Yannis E. Ioannidis and Raghu Ramakrishnan. 1995. Containment of conjunctive queries: Beyond relations as sets. ACM Trans. Database Syst. 20, 3 (1995), 288–324. Google ScholarDigital Library
- T. S. Jayram, Phokion G. Kolaitis, and Erik Vee. 2006. The containment problem for REAL conjunctive queries with inequalities. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 80–89. Google ScholarDigital Library
- Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. 2016. Computing join queries with functional dependencies. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 327–342. Google ScholarDigital Library
- Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. 2017. What do Shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 429–444. Google ScholarDigital Library
- Anthony C. Klug. 1988. On conjunctive queries containing inequalities. J. ACM 35, 1 (1988), 146–160. Google ScholarDigital Library
- George Konstantinidis and Fabio Mogavero. 2019. Attacking diophantus: Solving a special case of bag containment. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. Google ScholarDigital Library
- Swastik Kopparty and Benjamin Rossman. 2011. The homomorphism domination exponent. Eur. J. Comb. 32, 7 (2011), 1097–1114. Google ScholarDigital Library
- Tony T. Lee. 1987. An information-theoretic analysis of relational databases - part I: Data dependencies and information metric. IEEE Trans. Software Eng. 13, 10 (1987), 1049–1061. Google ScholarDigital Library
- Tony T. Lee. 1987. An information-theoretic analysis of relational databases - part II: Information structures of database schemas. IEEE Trans. Software Eng. 13, 10 (1987), 1061–1072. Google ScholarDigital Library
- Frantisek Matús. 2007. Infinitely many information inequalities. In Proceedings of the IEEE International Symposium on Information Theory. 41–44.Google ScholarCross Ref
- Nicholas Pippenger. 1986. What are the laws of information theory. In Proceedings of the 1986 Special Problems on Communication and Computation Conference. 3–5.Google Scholar
- Yehoshua Sagiv and Mihalis Yannakakis. 1980. Equivalences among relational expressions with the union and difference operators. J. ACM 27, 4 (1980), 633–655. Google ScholarDigital Library
- Ron van der Meyden. 1997. The complexity of querying indefinite data about linearly ordered domains. J. Comput. Syst. Sci. 54, 1 (1997), 113–135. Google ScholarDigital Library
- Martin J. Wainwright and Michael I. Jordan. 2008. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1–2 (2008), 1–305. DOI:https://doi.org/10.1561/2200000001 Google ScholarDigital Library
- Raymond W. Yeung. 2008. Information Theory and Network Coding (1st ed.). Springer Publishing Company, Incorporated. Google ScholarDigital Library
- Raymond W. Yeung. 2012. A First Course in Information Theory. Springer Science & Business Media.Google Scholar
- Zhen Zhang and Raymond W. Yeung. 1997. A non-Shannon-type conditional inequality of information quantities. IEEE Trans. Inf. Theory 43, 6 (1997), 1982–1986. Google ScholarDigital Library
- Zhen Zhang and Raymond W. Yeung. 1998. On characterization of entropy function via information inequalities. IEEE Trans. Inf. Theory 44, 4 (1998), 1440–1452. Google ScholarDigital Library
Index Terms
- Bag Query Containment and Information Theory
Recommendations
Bag Query Containment and Information Theory
PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsThe query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question ...
Query containment under bag and bag-set semantics
Conjunctive queries (CQs) are at the core of query languages encountered in many logic-based research fields such as AI, or database systems. The majority of existing work assumes set semantics but often in real applications the manipulation of ...
The containment problem for <bi>Real</bi> conjunctive queries with inequalities
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsQuery containment is a fundamental algorithmic problem in database query processing and optimization. Under set semantics, the query-containment problem for conjunctive queries has long been known to be NP-complete. In real database systems, however, ...
Comments