Abstract
We present a novel approach to context-free grammar parsing that is based on generating a sequence of grammars called derivative grammars from a given context-free grammar and input string. The generation of the derivative grammars is described by a few simple inference rules. We present an O(n2) space and O(n3) time recognition algorithm, which can be extended to generate parse trees in O(n3) time and O(n2logn) space. Derivative grammars can be viewed as a symbolic approach to implementing the notion of derivative languages, which was introduced by Brzozowski.
Might and others have explored an operational approach to implementing derivative languages in which the context-free grammar is encoded as a collection of recursive algebraic data types in a functional language like Haskell. Functional language implementation features like knot-tying and lazy evaluation are exploited to ensure that parsing is done correctly and efficiently in spite of complications like left-recursion. In contrast, our symbolic approach using inference rules can be implemented easily in any programming language and we obtain better space bounds for parsing.
Reifying derivative languages by encoding them symbolically as grammars also enables formal connections to be made for the first time between the derivatives approach and classical parsing methods like the Earley and LL/LR parsers. In particular, we show that the sets of Earley items maintained by the Earley parser implicitly encode derivative grammars and we give a procedure for producing derivative grammars from these sets. Conversely, we show that our derivative grammar recognizer can be transformed into the Earley recognizer by optimizing some of its bookkeeping. These results suggest that derivative grammars may provide a new foundation for context-free grammar recognition and parsing.
- Michael D. Adams, Celeste Hollenbeck, and Matthew Might. 2016. On the Complexity and Performance of Parsing with Derivatives. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16).Google ScholarDigital Library
- Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: principles, techniques, and tools. Addison Wesley.Google ScholarDigital Library
- Jonathan Immanuel Brachthäuser, Tillmann Rendel, and Klaus Ostermann. 2016. Parsing with First-class Derivatives. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016).Google ScholarDigital Library
- Janusz A. Brzozowski. 1964. Derivatives of regular expressions. Journal of the ACM 11, 4 (October 1964), 481–494.Google ScholarDigital Library
- Nils Anders Danielsson. 2010. Total Parser Combinators. SIGPLAN Not. 45, 9 (Sept. 2010).Google ScholarDigital Library
- Jay Earley. 1970. An efficient context-free parsing algorithm. Commun. ACM 13, 2 (Feb. 1970), 94–102.Google ScholarDigital Library
- Matthew Flatt and PLT. 2010. Reference: Racket. Technical Report PLT-TR-2010-1. PLT Design Inc. https://racket- lang.org/tr1/ .Google Scholar
- John E. Hopcroft and Jeffrey D. Ullman. 1979. Introduction to Automata Theory, Languages and Computability (1st ed.). Addison-Wesley Publishing Co., Inc., Boston, MA, USA.Google ScholarDigital Library
- Jeffrey Kegler. 2017. Marpa–R2. https://github.com/jeffreykegler/Marpa- - R2 .Google Scholar
- Joop M.I.M. Leo. 1991. A general context-free parsing algorithm running in linear time on every LR(k) grammar without using lookahead. Theoretical Computer Science 82, 1 (1991), 165 – 176. Google ScholarDigital Library
- Matthew Might, David Darais, and Daniel Spiewak. 2011. Parsing with derivatives: A functional pearl. In International Conference on Functional Programming.Google ScholarDigital Library
- Terence Parr and Kathleen Fisher. 2011. LL(*): The Foundation of the ANTLR Parser Generator. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). ACM, New York, NY, USA, 425–436. Google ScholarDigital Library
- Elizabeth Scott. 2008. SPPF-Style Parsing From Earley Recognisers. Electronic Notes in Theoretical Computer Science 203, 2 (2008), 53 – 67.Google ScholarDigital Library
- Seppo Sippu and Eljas Soisalon-Soininen. 1988. Parsing theory. Springer-Verlag.Google Scholar
- Peter Thiemann. 2017. Partial Derivatives for Context-Free Languages. In Proceedings of the 20th International Conference on Foundations of Software Science and Computation Structures - Volume 10203. Springer-Verlag New York, Inc., New York, NY, USA, 248–264. Google ScholarDigital Library
- Larry Wall. 2000. Programming Perl (3rd ed.). O’Reilly & Associates, Inc., Sebastopol, CA, USA.Google Scholar
Index Terms
- Derivative grammars: a symbolic approach to parsing with derivatives
Recommendations
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languagesFor decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Grammatical analysis of languages with lexical and syntactic ambiguities
The problem of parsing languages with lexical ambiguities is considered. An algorithm for lexical analysis is proposed that allows the correct processing of various ambiguities. For this algorithm, we propose an algorithm for parsing to make it possible ...
Comments