skip to main content
10.3115/1220175.1220262dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free Access

Noun phrase chunking in Hebrew: influence of lexical and morphological features

Published:17 July 2006Publication History

ABSTRACT

We present a method for Noun Phrase chunking in Hebrew. We show that the traditional definition of base-NPs as non-recursive noun phrases does not apply in Hebrew, and propose an alternative definition of Simple NPs. We review syntactic properties of Hebrew related to noun phrases, which indicate that the task of Hebrew SimpleNP chunking is harder than base-NP chunking in English. As a confirmation, we apply methods known to work well for English to Hebrew data. These methods give low results (F from 76 to 86) in Hebrew. We then discuss our method, which applies SVM induction over lexical and morphological features. Morphological features improve the average precision by ~0.5%, recall by ~1%, and F-measure by ~0.75, resulting in a system with average performance of 93% precision, 93.4% recall and 93.2 F-measure.

References

  1. Meni Adler and Michael Elhadad, 2006. Unsupervised Morpheme-based HMM for Hebrew Morphological Disambiguation. In Proc. of COLING/ACL 2006, Sidney. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Steven P. Abney. 1991. Parsing by Chunks. In Robert C. Berwick, Steven P. Abney, and Carol Tenny editors, Principle Based Parsing. Kluwer Academic Publishers.Google ScholarGoogle Scholar
  3. Erin L. Allwein, Robert E. Schapire, and Yoram Singer. 2000. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers. Journal of Machine Learning Research, 1:113--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Claire Cardie and David Pierce. 1998. Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification. In Proc. of COLING-98, Montréal. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mona Diab, Kadri Hacioglu, and Daniel Jurafsky. 2004. Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks, In Proc. of HLT/NAACL 2004, Boston. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nizar Habash and Owen Rambow, 2005. Arabic Tokenization, Part-of-speech Tagging and Morphological Disambiguation in One Fell Swoop. In Proc. of ACL 2005, Ann Arbor. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Thorsten Joachims. 1998. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proc. of ECML-98, Chemnitz. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Taku Kudo and Yuji Matsumato. 2000. Use of Support Vector Learning for Chunk Identification. In Proc. of CoNLL-2000 and LLL-2000, Lisbon. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Taku Kudo and Yuji Matsumato. 2003. Fast Methods for Kernel-Based Text Analysis. In Proc. of ACL 2003, Sapporo. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yael Netzer-Dahan and Michael Elhadad, 1998. Generation of Noun Compounds in Hebrew: Can Syntactic Knowledge be Fully Encapsulated? In Proc. of INLG-98, Ontario.Google ScholarGoogle Scholar
  11. Lance A. Ramshaw and Mitchel P. Marcus. 1995. Text Chunking Using Transformation-based Learning. In Proc. of the 3rd ACL Workshop on Very Large Corpora. Cambridge.Google ScholarGoogle Scholar
  12. Khalil Sima'an, Alon Itai, Yoad Winter, Alon Altman and N. Nativ, 2001. Building a Tree-bank of Modern Hebrew Text, in Traitement Automatique des Langues 42(2).Google ScholarGoogle Scholar
  13. Fei Sha and Fernando Pereira. 2003. Shallow Parsing with Conditional Random Fields. Technical Report CIS TR MS-CIS-02-35, University of Pennsylvania.Google ScholarGoogle Scholar
  14. Erik F. Tjong Kim Sang and Sabine Buchholz. 2000. Introduction to the CoNLL-2000 Shared Task: Chunking. In Proc. of CoNLL-2000 and LLL-2000, Lisbon. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Vladimir Vapnik. 1995. The Nature of Statistical Learning Theory. Springer Verlag, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tong Zhang, Fred Damerau and David Johnson. 2002. Text Chunking based on a Generalization of Winnow. Journal of Machine Learning Research, 2:615--637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Noun phrase chunking in Hebrew: influence of lexical and morphological features

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
          July 2006
          1214 pages

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 17 July 2006

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate85of443submissions,19%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader