Abstract
Large databases of linguistic annotations are used for testing linguistic hypotheses and for training language processing models. These linguistic annotations are often syntactic or prosodic in nature, and have a hierarchical structure. Query languages are used to select particular structures of interest, or to project out large slices of a corpus for external analysis. Existing languages suffer from a variety of problems in the areas of expressiveness, efficiency, and naturalness for linguistic query. We describe the domain of linguistic trees and discuss the expressive requirements for a query language. Then we present a language that can express a wide range of queries over these trees, and show that the language is first-order complete over trees.
Similar content being viewed by others
References
Afanasiev, L. (2003). XML query evaluation via CTL model checking. Master’s thesis, University of Amsterdam, ILLC Scientific Publications, MoL-2003-07.
Alechina N., Immerman N. (2000) Reachability logic: An efficient fragment of transitive closure logic. Logic Journal of the IGPL 8(3): 325–337
Berwick, R. C., & Weinberg, A. S. (1984). The grammatical basis of linguistic performance: Language use and acquisition, Vol. 11 of Current studies in linguistics. Cambridge, Mass: MIT Press.
Bird, S., Chen, Y., Davidson, S., Lee, H., & Zheng, Y. (2006). Designing and evaluating an XPath dialect for linguistic queries. In 22nd International Conference on Data Engineering (ICDE) (pp. 52–61).
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. O’Reilly Media Inc. http://www.nltk.org/.
Bird, S., & Lee, H. (2007). Graphical query for linguistic treebanks. In 10th Conference of the Pacific Association for Computational Linguistics (pp. 22–30).
Bird S., Liberman M. (2001) A formal framework for linguistic annotation. Speech Communication 33: 23–60
Blackburn P., de Rijke M., Venema Y. (2001) Modal logic. Cambridge University Press., New York, NY, USA
Blackburn, P., Meyer-Viol, W., & de Rijke, M. (1996). A proof system for finite trees. In H. K. Büning, (Ed.), Computer science logic, Vol. 1092 of Lecture Notes in Computer Science (pp. 86–105). Springer.
Cassidy, S. (2002). XQuery as an annotation query language: A use case analysis. In Proceedings of LREC 2002, Las Palmas, Spain, May.
Cassidy, S., & Bird, S. (2000). Querying databases of annotated speech. In Database technologies: Proceedings of the Eleventh Australasian Database Conference (pp. 12–20).
Chomsky N. (1981) Lectures on government and binding. Foris, Dordrecht
Clark, J., & DeRose, S. (1999). XML Path language (XPath). W3C. http://www.w3.org/TR/xpath
Gottlob, G., Koch, C., & Pichler, R. (2003). The complexity of XPath query evaluation. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS (pp. 179–190). San Diego, CA, USA.
Gottlob, G., Koch, C., & Schulz, K. U. (2004). Conjunctive queries over trees. In Proceedings of the Twenty-Third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 189–200). Paris, France.
Harel, D., Kozen, D., & Tiuryn, J. (2002). Dynamic logic. In D. Gabbay & F. Guenthner (Eds.), Handbook of philosophical logic (Vol 4., 2nd ed., pp. 99–217). Dordrecht: Kluwer Academic Publishers.
Heid, U., Voormann, H., Milde, J.-T., Gut, U., Erk, K., & Pado, S. (2004). Querying both time-aligned and hierarchical corpora with NXT Search. In Fourth Language Resources and Evaluation Conference, Lisbon, Portugal.
Henriksen, J., Jensen, J., Jørgensen, M., Klarlund, N., Paige, B., Rauhe, T., & Sandholm, A. (1995). Mona: Monadic second-order logic in practice. In Tools and Algorithms for the Construction and Analysis of Systems, First International Workshop, TACAS ’95, LNCS 1019.
Hinrichs, E. W., Bartels, J., Kawata, Y., & Kordoni, V. (2000). The VERBMOBIL treebanks. In KONVENS 2000 Sprachkommunikation, ITG-Fachbericht 161 (pp. 107–112).
Hoeksema, J. & Janda, R. D. (1988). Implications of process-morphology for categorial grammar. In R. T. Oehrle, E. Bach, & D. Wheeler (Eds.), Categorial grammars and natural language structures. Dordrecht: D. Reidel.
Kamp, J. (1968). Tense logic and the theory of order. Ph.D. thesis, University of California, Los Angeles.
Kepser, S. (2003). Finite structure query: A tool for querying syntactically annotated corpora. In EACL 2003: The 10th Conference of the European Chapter of the Association for Computational Linguistics (pp. 179–186).
Kepser, S. (2006). Properties of binary transitive closure logic over trees. In P. Monachesi, G. Penn, G. Satta, & S. Wintner (Eds.), Formal grammar 2006 (pp. 77–89). CSLI Publications.
König, E. & Lezius, W. (2001). The TIGER language – A description language for syntax graphs. Part 1: User’s guidelines. Technical report, University of Stuttgart, Stuttgart, Germany.
Kracht, M. (1997). Inessential features, Vol. 1328 of Lecture Notes in Artificial Intelligence (pp. 43–62). Berlin: Springer.
Lai, C. (2005). A formal framework for linguistic tree query. Master’s thesis, Department of Computer Science and Software Engineering, University of Melbourne, Australia.
Lai, C., & Bird, S. (2004). Querying and updating treebanks: A critical survey and requirements analysis. In Proceedings of the Australasian Language Technology Workshop (pp. 139–146).
Libkin L. (1998) Elements of finite model theory. Springer-Verlag, Berlin
Marcus M.P., Santorini B., Marcinkiewicz M.A. (1994) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2): 313–330
Marx M. (2005) Conditional XPath. ACM Transactions on Database Systems 30(4): 929–959
Marx, M. (2005b). First order paths in ordered trees. In T. Eiter & L. Libkin (Eds.), Database theory – ICDT 2005, 10th International Conference, Edinburgh, UK, January 5–7, 2005. Proceedings, Vol. 3363 of Lecture Notes in Computer Science (pp. 114–128).
Marx, M., & de Rijke, M. (2004). Semantic characterization of navigational XPath. In Proceedings of TDM’04 Workshop on XML Databases and Information Retrieval. The Netherlands: Twente.
Maryns, H., & Kepser, S. (2008). MonaSearch – A tool for querying linguistic treebanks. http://tcl.sfs.uni-tuebingen.de/MonaSearch/.
Mönnich, U., Morawietz, F., & Kepser, S. (2001). A regular query for context-sensitive relations. In IRCS Workshop on Linguistic Databases 2001 (pp. 187–195).
Palm, A. (1999). Propositional tense logic for trees. In Proceedings of the Sixth Meeting on Mathematics of Language: MOL6. University of Central Florida, Orlando, Florida.
Randall, B. (2008). CorpusSearch 2 users guide. http://corpussearch.sourceforge.net/CS-manual/Contents.html.
Rogers, J. (1994). Studies in the logic of trees with applications to grammar formalisms. Technical Report 95-04, Department of Computer & Information Sciences, University of Delaware, Newark, Delaware.
Rohde, D. (2001). TGrep2 user manual. http://tedlab.mit.edu/dr/Tgrep2/tgrep2.pdf.
Schlingloff, B.H. (1992). On the expressive power of modal logics on trees. In Proceedings of the Second International Symposium on Logical Foundations of Computer Science, Springer LNCS 620 (pp. 441–451).
Shieber S.M. (1985) Evidence against the context-freeness of natural language. Linguistics and Philosophy 8(3): 333–343
Steiner, I., & Kallmeyer, L. (2002). VIQTORYA – A visual query tool for syntactically annotated corpora. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002) (pp. 1704–1711), ELRA.
Tiede H.J. (2008) Inessential features, ineliminable features, and modal logics for model theoretic syntax. Journal of Logic, Language and Information 17(2): 217–227
Tiede H.J., Kepser S. (2006) Monadic second-order logic and transitive closure logics over trees. Electronic Notes in Theoretical Computer Science 165: 189–199
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lai, C., Bird, S. Querying Linguistic Trees. J of Log Lang and Inf 19, 53–73 (2010). https://doi.org/10.1007/s10849-009-9086-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10849-009-9086-9