skip to main content
research-article

A formal semantics of SQL queries, its validation, and applications

Published:01 September 2017Publication History
Skip Abstract Section

Abstract

While formal semantics of theoretical languages underlying SQL have been provided in the past, they all made simplifying assumptions ranging from changes in the syntax to omitting bag semantics and nulls. This situation is reminiscent of what happens in the field of programming languages, where semantics of formal calculi underlying the main features of languages are abundant, but formal semantics of real languages that people use are few and far between.

We consider the basic class of SQL queries --- essentially SELECT-FROM-WHERE queries with subqueries, set/bag operations, and nulls --- and define a formal semantics for it, without any departures from the real language. This fragment already requires decisions related to the data model and handling variable names that are normally disregarded by simplified semantics. To justify our choice of the semantics, we validate it experimentally on a large number of randomly generated queries and databases.

We give two applications of the semantics. One is the first formal proof of the equivalence of basic SQL and relational algebra that extends to bag semantics and nulls. The other application looks at the three-valued logic employed by SQL, which is universally assumed to be necessary to handle nulls. We prove however that this is not so, as three-valued logic does not add expressive power: every SQL query in our fragment can be evaluated under the usual two-valued Boolean semantics of conditions.

References

  1. H. Abelson et al. Revised report on the algorithmic language Scheme. Higher-Order and Symbolic Computation, 11(1):7--105, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Albert. Algebraic properties of bag data types. In 17th International Conference on Very Large Data Bases, pages 211--219, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Arvin. Comparison of different SQL implementations. http://troels.arvin.dk/db/rdbms, 2017.Google ScholarGoogle Scholar
  5. L. E. Bertossi and L. Bravo. Consistency and trust in peer data exchange systems. TPLP, 17(2):148--204, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  6. L. Bolk and P. Borowik. Many-Valued Logics: Theoretical Foundations. Springer, 1992.Google ScholarGoogle Scholar
  7. S. Ceri and G. Gottlob. Translating SQL into relational algebra: Optimization, semantics, and equivalence of SQL queries. IEEE Trans. Software Eng., 11(4):324--345, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: An automated prover for SQL. In CIDR, 2017.Google ScholarGoogle Scholar
  9. S. Chu, K. Weitz, A. Cheung, and D. Suciu. HoTTSQL: Proving query rewrites with univalent SQL semantics. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 510--524. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Claußen, A. Kemper, G. Moerkotte, K. Peithner, and M. Steinbrunn. Optimization and evaluation of disjunctive queries. IEEE Trans. Knowl. Data Eng., 12(2):238--260, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. F. Codd. A database sublanguage founded on the relational calculus. In Proceedings of 1971 ACM-SIGFIDET Workshop on Data Description, Access and Control, San Diego, California, November 11--12, 1971, pages 35--68, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Coelho. DataFiller --- generate random data from database schema. https://www.cri.ensmp.fr/people/coelho/datafiller.html.Google ScholarGoogle Scholar
  13. C. J. Date and H. Darwen. A Guide to the SQL Standard. Addison-Wesley, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Ellison. A Formal Semantics of C with Applications. PhD thesis, UIUC, 428pp, 2012.Google ScholarGoogle Scholar
  15. H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems - The Complete Book. Pearson Education, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Grumbach and T. Milo. Towards tractable algebras for bags. J. Comput. Syst. Sci., 52(3):570--588, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Guagliardo and L. Libkin. Making SQL queries correct on incomplete databases: A feasibility study. In Proceedings of the 35th ACM Symposium on Principles of Database Systems, pages 211--223, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Gunter. Semantics of Programming Languages: Structures and Techniques. MIT Press, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Gurevich and J. K. Huggins. The semantics of the C programming language. In Computer Science Logic, 6th Workshop, CSL '92, pages 274--308, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Harper. Practical Foundations for Programming Languages. Cambridge University Press, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. International Organization for Standardization. ISO/IEC 9075:2016: Information technology --- Database languages --- SQL, 2016.Google ScholarGoogle Scholar
  22. K. E. Kline and D. Kline. SQL in a Nutshell. O'Reilly & Associates, Inc., 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Libkin and L. Wong. Query languages for bags and aggregate functions. J. Comput. Syst. Sci., 55(2):241--272, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. G. Malecha, G. Morrisett, A. Shinnar, and R. Wisnesky. Toward a verified relational database management system. In Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 237--248, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Malinowski. Many-valued logic and its philosophy. In Handbook of the History of Logic, pages 13--94. Elsevier, 2007.Google ScholarGoogle Scholar
  26. R. Milner and M. Tofte. Commentary on standard ML. MIT Press, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Milner, M. Tofte, and R. Harper. Definition of standard ML. MIT Press, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. C. Mitchell. Concepts in programming languages. Cambridge University Press, 2003.Google ScholarGoogle Scholar
  29. M. Negri, G. Pelagatti, and L. Sbattella. Formal semantics of SQL queries. ACM Trans. Database Syst., 16(3):513--534, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Neo Technlogy Inc. openCypher project, http://www.opencypher.org/.Google ScholarGoogle Scholar
  31. M. Norrish. C formalised in HOL. Univ. Cambridge Techreport UCAM-CL-TR- 453, 150pp., 1998.Google ScholarGoogle Scholar
  32. N. Papaspyrou. A Formal Semantics for the C Programming Language. PhD thesis, NTUA, 253pp, 1998.Google ScholarGoogle Scholar
  33. R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. N. Rescher. Topics in Philosophical Logic. Reidel, 1969.Google ScholarGoogle Scholar
  35. Transaction Processing Performance Council. TPC Benchmark H Standard Specification, 2014. Revision 2.17.1.Google ScholarGoogle Scholar
  36. J. Van den Bussche and S. Vansummeren. Translating SQL into the relational algebra. Course notes, Hasselt University and Université Libre de Bruxelles, 2009.Google ScholarGoogle Scholar
  37. M. Veanes, N. Tillmann, and J. de Halleux. Qex: Symbolic SQL query explorer. In Logic for Programming, Artificial Intelligence, and Reasoning (LPAR), pages 425--446, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A formal semantics of SQL queries, its validation, and applications
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 11, Issue 1
          Proceedings of the 44th International Conference on Very Large Data Bases, Rio de Janeiro, Brazil
          September 2017
          120 pages
          ISSN:2150-8097
          Issue’s Table of Contents

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 September 2017
          Published in pvldb Volume 11, Issue 1

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader