skip to main content
10.5555/1564131.1564136dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
research-article
Free Access

Estimating annotation cost for active learning in a multi-annotator environment

Published:05 June 2009Publication History

ABSTRACT

We present an empirical investigation of the annotation cost estimation task for active learning in a multi-annotator environment. We present our analysis from two perspectives: selecting examples to be presented to the user for annotation; and evaluating selective sampling strategies when actual annotation cost is not available. We present our results on a movie review classification task with rationale annotations. We demonstrate that a combination of instance, annotator and annotation task characteristics are important for developing an accurate estimator, and argue that both correlation coefficient and root mean square error should be used for evaluating annotation cost estimators.

References

  1. Shilpa Arora and Eric Nyberg. 2009. Interactive Annotation Learning with Indirect Feature Voting. In Proceedings of NAACL-HLT 2009 (Student Research Workshop).Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Xavier Carreras and Lluìs Márquez. 2004. Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling. http://www.lsi.upc.edu/~srlconll/st04/st04.html.Google ScholarGoogle Scholar
  3. Gahgene Gweon, Carolyn Penstein Ros'e, Joerg Wittwer and Matthias Nueckles. 2005. Supporting Efficient and Reliable Content Analysis Using Automatic Text Processing Technology. In proceedings of INTERACT 2005: 1112--1115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Robbie A. Haertel, Kevin D. Seppi, Eric K. Ringger and Janes L. Cattoll. 2008. Return on Investment for Active Learning. In proceedings of NIPS Workshop on Cost Sensitive Learning.Google ScholarGoogle Scholar
  5. Rebecca Hwa. 2000. Sample Selection for Statistical Grammar Induction. In proceedings of joint SIGDAT conference on Empirical Methods in NLP and Very Large Corpora. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ashish Kapoor, Eric Horvitz and Sumit Basu. 2007. Selective supervision: Guiding supervised learning with decision-theoretic active learning. In proceedings of IJCAI, pages 877--882. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ross D. King, Kenneth E. Whelan, Ffion M. Jones, Philip G. K. Reiser, Christopher H. Bryant, Stephen H. Muggleton, Douglas B. Kell and Stephen G. Oliver. 2004. Functional Genomics hypothesis generation and experimentation by a robot scientist. In proceedings of Nature, 427(6971):247--52.Google ScholarGoogle Scholar
  8. Trausti Kristjansson, Aron Culotta, Paul Viola and Andrew Mccallum. 2004. Interactive Information Extraction with Constrained Conditional Random Fields. In proceedings of AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Karl Pearson. 1895. Correlation Coefficient. Royal Society Proceedings, 58, 214.Google ScholarGoogle Scholar
  10. Eric Ringger, Marc Carmen, Robbie Haertel, Kevin Seppi, Deryle Lonsdale, Peter McClanahan, Janes L. Cattoll and Noel Ellison. 2008. Assessing the Costs of Machine-Assisted Corpus Annotation through a User Study. In proceedings of LREC.Google ScholarGoogle Scholar
  11. Burr Settles, Mark Craven and Lewis Friedland. 2008. Active Learning with Real Annotation Costs. In proceedings of NIPS Workshop on Cost Sensitive Learning.Google ScholarGoogle Scholar
  12. Alex J. Smola and Bernhard Scholkopf 1998. A Tutorial on Support Vector Regression. NeuroCOLT2 Technical Report Series - NC2-TR-1998-030.Google ScholarGoogle Scholar
  13. Katrin Tomanek, Joachim Wermter and Udo Hahn. 2007. An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In proceedings of EMNLP-CoNLL, pp. 486--495.Google ScholarGoogle Scholar
  14. Theresa Wilson, Janyce Wiebe and Paul Hoffmann. 2005. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. In proceedings of HLT/EMNLP, Vancouver, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ian H. Witten and Eibe Frank. 2005. Data Mining: Practical machine learning tools and techniques. 2nd Edition, Morgan Kaufmann, San Francisco. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Omar Zaidan, Jason Eisner and Christine Piatko. 2007. Using "annotator rationales" to improve machine learning for text categorization. In Proceedings of NAACL-HLT, pp. 260--267, Rochester, NY.Google ScholarGoogle Scholar

Index Terms

  1. Estimating annotation cost for active learning in a multi-annotator environment

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image DL Hosted proceedings
                HLT '09: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
                June 2009
                72 pages

                Publisher

                Association for Computational Linguistics

                United States

                Publication History

                • Published: 5 June 2009

                Qualifiers

                • research-article

                Acceptance Rates

                Overall Acceptance Rate240of768submissions,31%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader