ABSTRACT
We present a method for Noun Phrase chunking in Hebrew. We show that the traditional definition of base-NPs as non-recursive noun phrases does not apply in Hebrew, and propose an alternative definition of Simple NPs. We review syntactic properties of Hebrew related to noun phrases, which indicate that the task of Hebrew SimpleNP chunking is harder than base-NP chunking in English. As a confirmation, we apply methods known to work well for English to Hebrew data. These methods give low results (F from 76 to 86) in Hebrew. We then discuss our method, which applies SVM induction over lexical and morphological features. Morphological features improve the average precision by ~0.5%, recall by ~1%, and F-measure by ~0.75, resulting in a system with average performance of 93% precision, 93.4% recall and 93.2 F-measure.
- Meni Adler and Michael Elhadad, 2006. Unsupervised Morpheme-based HMM for Hebrew Morphological Disambiguation. In Proc. of COLING/ACL 2006, Sidney. Google ScholarDigital Library
- Steven P. Abney. 1991. Parsing by Chunks. In Robert C. Berwick, Steven P. Abney, and Carol Tenny editors, Principle Based Parsing. Kluwer Academic Publishers.Google Scholar
- Erin L. Allwein, Robert E. Schapire, and Yoram Singer. 2000. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers. Journal of Machine Learning Research, 1:113--141. Google ScholarDigital Library
- Claire Cardie and David Pierce. 1998. Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification. In Proc. of COLING-98, Montréal. Google ScholarDigital Library
- Mona Diab, Kadri Hacioglu, and Daniel Jurafsky. 2004. Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks, In Proc. of HLT/NAACL 2004, Boston. Google ScholarDigital Library
- Nizar Habash and Owen Rambow, 2005. Arabic Tokenization, Part-of-speech Tagging and Morphological Disambiguation in One Fell Swoop. In Proc. of ACL 2005, Ann Arbor. Google ScholarDigital Library
- Thorsten Joachims. 1998. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proc. of ECML-98, Chemnitz. Google ScholarDigital Library
- Taku Kudo and Yuji Matsumato. 2000. Use of Support Vector Learning for Chunk Identification. In Proc. of CoNLL-2000 and LLL-2000, Lisbon. Google ScholarDigital Library
- Taku Kudo and Yuji Matsumato. 2003. Fast Methods for Kernel-Based Text Analysis. In Proc. of ACL 2003, Sapporo. Google ScholarDigital Library
- Yael Netzer-Dahan and Michael Elhadad, 1998. Generation of Noun Compounds in Hebrew: Can Syntactic Knowledge be Fully Encapsulated? In Proc. of INLG-98, Ontario.Google Scholar
- Lance A. Ramshaw and Mitchel P. Marcus. 1995. Text Chunking Using Transformation-based Learning. In Proc. of the 3rd ACL Workshop on Very Large Corpora. Cambridge.Google Scholar
- Khalil Sima'an, Alon Itai, Yoad Winter, Alon Altman and N. Nativ, 2001. Building a Tree-bank of Modern Hebrew Text, in Traitement Automatique des Langues 42(2).Google Scholar
- Fei Sha and Fernando Pereira. 2003. Shallow Parsing with Conditional Random Fields. Technical Report CIS TR MS-CIS-02-35, University of Pennsylvania.Google Scholar
- Erik F. Tjong Kim Sang and Sabine Buchholz. 2000. Introduction to the CoNLL-2000 Shared Task: Chunking. In Proc. of CoNLL-2000 and LLL-2000, Lisbon. Google ScholarDigital Library
- Vladimir Vapnik. 1995. The Nature of Statistical Learning Theory. Springer Verlag, New York, NY. Google ScholarDigital Library
- Tong Zhang, Fred Damerau and David Johnson. 2002. Text Chunking based on a Generalization of Winnow. Journal of Machine Learning Research, 2:615--637. Google ScholarDigital Library
- Noun phrase chunking in Hebrew: influence of lexical and morphological features
Recommendations
Urdu Noun Phrase Chunking - Hybrid Approach
IALP '10: Proceedings of the 2010 International Conference on Asian Language ProcessingIn this work, chunking is used to mark the noun phrases of Urdu sentences. The approach used in this work is hybrid that combines statistical method and hand crafted rules. The statistical model used in this work is HMM along with IOB chunk annotation. ...
Chinese Noun Phrases Chunking: A Latent Discriminative Model with Global Features
CSE '11: Proceedings of the 2011 14th IEEE International Conference on Computational Science and EngineeringIn the fields of Chinese natural language processing, recognizing simple and non-recursive base phrases is an important task for natural language processing applications, such as information processing and machine translation. In stead of rule-based ...
Kazakh Noun Phrase Extraction Based on N-gram and Rules
IALP '10: Proceedings of the 2010 International Conference on Asian Language ProcessingThe aim of the work is to extract Kazakh phrase and basic noun phrase from corpus. For the phrase extraction, N-gram model methods were used, specifically bigram and trigram methods were applied. For basic noun phrase extraction, rule-based methods were ...
Comments