Abstract
While formal semantics of theoretical languages underlying SQL have been provided in the past, they all made simplifying assumptions ranging from changes in the syntax to omitting bag semantics and nulls. This situation is reminiscent of what happens in the field of programming languages, where semantics of formal calculi underlying the main features of languages are abundant, but formal semantics of real languages that people use are few and far between.
We consider the basic class of SQL queries --- essentially SELECT-FROM-WHERE queries with subqueries, set/bag operations, and nulls --- and define a formal semantics for it, without any departures from the real language. This fragment already requires decisions related to the data model and handling variable names that are normally disregarded by simplified semantics. To justify our choice of the semantics, we validate it experimentally on a large number of randomly generated queries and databases.
We give two applications of the semantics. One is the first formal proof of the equivalence of basic SQL and relational algebra that extends to bag semantics and nulls. The other application looks at the three-valued logic employed by SQL, which is universally assumed to be necessary to handle nulls. We prove however that this is not so, as three-valued logic does not add expressive power: every SQL query in our fragment can be evaluated under the usual two-valued Boolean semantics of conditions.
- H. Abelson et al. Revised report on the algorithmic language Scheme. Higher-Order and Symbolic Computation, 11(1):7--105, 1998. Google ScholarDigital Library
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- J. Albert. Algebraic properties of bag data types. In 17th International Conference on Very Large Data Bases, pages 211--219, 1991. Google ScholarDigital Library
- T. Arvin. Comparison of different SQL implementations. http://troels.arvin.dk/db/rdbms, 2017.Google Scholar
- L. E. Bertossi and L. Bravo. Consistency and trust in peer data exchange systems. TPLP, 17(2):148--204, 2017.Google ScholarCross Ref
- L. Bolk and P. Borowik. Many-Valued Logics: Theoretical Foundations. Springer, 1992.Google Scholar
- S. Ceri and G. Gottlob. Translating SQL into relational algebra: Optimization, semantics, and equivalence of SQL queries. IEEE Trans. Software Eng., 11(4):324--345, 1985. Google ScholarDigital Library
- S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: An automated prover for SQL. In CIDR, 2017.Google Scholar
- S. Chu, K. Weitz, A. Cheung, and D. Suciu. HoTTSQL: Proving query rewrites with univalent SQL semantics. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 510--524. ACM, 2017. Google ScholarDigital Library
- J. Claußen, A. Kemper, G. Moerkotte, K. Peithner, and M. Steinbrunn. Optimization and evaluation of disjunctive queries. IEEE Trans. Knowl. Data Eng., 12(2):238--260, 2000. Google ScholarDigital Library
- E. F. Codd. A database sublanguage founded on the relational calculus. In Proceedings of 1971 ACM-SIGFIDET Workshop on Data Description, Access and Control, San Diego, California, November 11--12, 1971, pages 35--68, 1971. Google ScholarDigital Library
- F. Coelho. DataFiller --- generate random data from database schema. https://www.cri.ensmp.fr/people/coelho/datafiller.html.Google Scholar
- C. J. Date and H. Darwen. A Guide to the SQL Standard. Addison-Wesley, 1996.Google ScholarDigital Library
- C. Ellison. A Formal Semantics of C with Applications. PhD thesis, UIUC, 428pp, 2012.Google Scholar
- H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems - The Complete Book. Pearson Education, 2009. Google ScholarDigital Library
- S. Grumbach and T. Milo. Towards tractable algebras for bags. J. Comput. Syst. Sci., 52(3):570--588, 1996. Google ScholarDigital Library
- P. Guagliardo and L. Libkin. Making SQL queries correct on incomplete databases: A feasibility study. In Proceedings of the 35th ACM Symposium on Principles of Database Systems, pages 211--223, 2016. Google ScholarDigital Library
- C. Gunter. Semantics of Programming Languages: Structures and Techniques. MIT Press, 1992. Google ScholarDigital Library
- Y. Gurevich and J. K. Huggins. The semantics of the C programming language. In Computer Science Logic, 6th Workshop, CSL '92, pages 274--308, 1992. Google ScholarDigital Library
- R. Harper. Practical Foundations for Programming Languages. Cambridge University Press, 2016. Google ScholarDigital Library
- International Organization for Standardization. ISO/IEC 9075:2016: Information technology --- Database languages --- SQL, 2016.Google Scholar
- K. E. Kline and D. Kline. SQL in a Nutshell. O'Reilly & Associates, Inc., 2001. Google ScholarDigital Library
- L. Libkin and L. Wong. Query languages for bags and aggregate functions. J. Comput. Syst. Sci., 55(2):241--272, 1997. Google ScholarDigital Library
- J. G. Malecha, G. Morrisett, A. Shinnar, and R. Wisnesky. Toward a verified relational database management system. In Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 237--248, 2010. Google ScholarDigital Library
- G. Malinowski. Many-valued logic and its philosophy. In Handbook of the History of Logic, pages 13--94. Elsevier, 2007.Google Scholar
- R. Milner and M. Tofte. Commentary on standard ML. MIT Press, 1991. Google ScholarDigital Library
- R. Milner, M. Tofte, and R. Harper. Definition of standard ML. MIT Press, 1990. Google ScholarDigital Library
- J. C. Mitchell. Concepts in programming languages. Cambridge University Press, 2003.Google Scholar
- M. Negri, G. Pelagatti, and L. Sbattella. Formal semantics of SQL queries. ACM Trans. Database Syst., 16(3):513--534, 1991. Google ScholarDigital Library
- Neo Technlogy Inc. openCypher project, http://www.opencypher.org/.Google Scholar
- M. Norrish. C formalised in HOL. Univ. Cambridge Techreport UCAM-CL-TR- 453, 150pp., 1998.Google Scholar
- N. Papaspyrou. A Formal Semantics for the C Programming Language. PhD thesis, NTUA, 253pp, 1998.Google Scholar
- R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill, 2003. Google ScholarDigital Library
- N. Rescher. Topics in Philosophical Logic. Reidel, 1969.Google Scholar
- Transaction Processing Performance Council. TPC Benchmark™ H Standard Specification, 2014. Revision 2.17.1.Google Scholar
- J. Van den Bussche and S. Vansummeren. Translating SQL into the relational algebra. Course notes, Hasselt University and Université Libre de Bruxelles, 2009.Google Scholar
- M. Veanes, N. Tillmann, and J. de Halleux. Qex: Symbolic SQL query explorer. In Logic for Programming, Artificial Intelligence, and Reasoning (LPAR), pages 425--446, 2010. Google ScholarDigital Library
Index Terms
- A formal semantics of SQL queries, its validation, and applications
Recommendations
A Coq mechanised formal semantics for realistic SQL queries: formally reconciling SQL and bag relational algebra
CPP 2019: Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and ProofsIn this article, we provide a Coq mechanised, executable, formal semantics for a realistic fragment of SQL consisting of "select [distinct] from where group by having" queries with null values, functions, aggregates, quantifiers and nested potentially ...
HoTTSQL: proving query rewrites with univalent SQL semantics
PLDI '17Every database system contains a query optimizer that performs query rewrites. Unfortunately, developing query optimizers remains a highly challenging task. Part of the challenges comes from the intricacies and rich features of query languages, which ...
Formal semantics of SQL queries
The semantics of SQL queries is formally defined by stating a set of rules that determine a syntax-driven translation of an SQL query to a formal model. The target model, called Extended Three Valued Predicate Calculus (E3VPC), is largely based on a set ...
Comments