ABSTRACT
The length of a constituent (number of syllables in a word or number of words in a phrase), or rhythm, plays an important role in Chinese syntax. This paper systematically surveys the distribution of rhythm in constructions in Chinese from the statistical data acquired from a shallow tree bank. Based on our survey, we then used the rhythm feature in a practical shallow parsing task by using rhythm as a statistical feature to augment a PCFG model. Our results show that using the probabilistic rhythm feature significantly improves the performance of our shallow parser.
- Church, K., 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, pp. 136--143. Google ScholarDigital Library
- Collins, M. 1997. Three generative lexicalized models for statistical parsing, in Proceedings of the 35th Annual Meeting of the ACL, pp. 16--23. Google ScholarDigital Library
- Feng, Shengli. 2000. The Rhythmic syntax of Chinese(in Chinese), Shanghai Education Press.Google Scholar
- Goodman, J. 1997. Probabilistic Feature Grammars, In Proceedings of the International Workshop on Parsing Technologies, September 1997Google Scholar
- Magerman, D. 1995. Statistical decision-tree models for parsing, in Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 276--283. Google ScholarDigital Library
- Quirk et al. 1985. A Comprehensive Grammar of English Languge, Longman.Google Scholar
- Ramshaw L., and Marcus M. 1995. Text chunking using transformation-based learning. In Proceedings of the Third Workshop on Very Large Corpora. pp. 86--95.Google Scholar
Recommendations
Chinese word sense disambiguation using hownet
ICNC'05: Proceedings of the First international conference on Advances in Natural Computation - Volume Part IWord sense disambiguation plays an important role in natural language processing, such as information retrieval, text summarization, machine translation etc. This paper proposes a corpus-based Chinese word sense disambiguation approach using HowNet. The ...
On the Structural Disambiguation of Multi-word Terms
Computational and Corpus-Based PhraseologyAbstractMulti-word terms pose many challenges in Natural Language Processing (NLP) because of their structure ambiguity. Although the structural disambiguation of multi-word expressions, also known as bracketing, has been widely studied, no definitive ...
Tense sense disambiguation: a new syntactic polysemy task
EMNLP '10: Proceedings of the 2010 Conference on Empirical Methods in Natural Language ProcessingPolysemy is a major characteristic of natural languages. Like words, syntactic forms can have several meanings. Understanding the correct meaning of a syntactic form is of great importance to many NLP applications. In this paper we address an important ...
Comments