skip to main content
10.5555/1642059.1642068dlproceedingsArticle/Chapter ViewAbstractPublication PageslawConference Proceedingsconference-collections
research-article
Free Access

Semi-automated named entity annotation

Published:28 June 2007Publication History

ABSTRACT

We investigate a way to partially automate corpus annotation for named entity recognition, by requiring only binary decisions from an annotator. Our approach is based on a linear sequence model trained using a k-best MIRA learning algorithm. We ask an annotator to decide whether each mention produced by a high recall tagger is a true mention or a false positive. We conclude that our approach can reduce the effort of extending a seed training corpus by up to 58%.

References

  1. Fu-Dong Chiou, David Chiang, and Martha Palmer. 2001. Facilitating treebank annotation using a statistical parser. In HLT '01. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Wen-Chi Chou, Richard Tzong-Han Tsai, Ying-Shan Su, Wei Ku, Ting-Yi Sung, and Wen-Lian Hsu. 2006. A semi-automatic method for annotating a biomedical proposition bank. In FLAC'06. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. 2006. Online passive-aggressive algorithms. JMLR, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Aron Culota, Trausti Kristjansson, Andrew McCallum, and Paul Viola. 2006. Corrective feedback and persistent learning for information extraction. Artificial Intelligence, 170:1101--1122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1994. Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2):313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ryan McDonald, Koby Crammer, and Fernando Pereira. 2005. Online large-margin training of dependency parsers. In ACL'05. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lance Ramshaw and Mitch Marcus. 1995. Text chunking using transformation-based learning. In David Yarovsky and Kenneth Church, editors, Proceedings of the Third Workshop on Very Large Corpora. ACL.Google ScholarGoogle Scholar
  8. Lorraine Tanabe, Natalie Xie, Lynne H. Thom, Wayne Matten, and W. John Wilbur. 2005. GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics, 6(Suppl. 1).Google ScholarGoogle Scholar
  9. Nianwen Xue, Fu-Dong Chiou, and Martha Palmer. 2002. Building a large-scale annotated chinese corpus. In Proceedings of the 19th international conference on Computational linguistics. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alexander Yeh, Alexander Morgan, Marc Colosimo, and Lynette Hirschman. 2005. BioCreAtIvE Task 1A: gene mention finding evaluation. BMC Bioinformatics, 6(Suppl. 1).Google ScholarGoogle Scholar
  1. Semi-automated named entity annotation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        LAW '07: Proceedings of the Linguistic Annotation Workshop
        June 2007
        210 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 28 June 2007

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader