research-article

Free Access

Estimating annotation cost for active learning in a multi-annotator environment

Authors:
Shilpa Arora

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Eric Nyberg

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Carolyn P. Rosé

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

HLT '09: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language ProcessingJune 2009Pages 18–26

Published:05 June 2009Publication History

HLT '09: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing

Pages 18–26

ABSTRACT

We present an empirical investigation of the annotation cost estimation task for active learning in a multi-annotator environment. We present our analysis from two perspectives: selecting examples to be presented to the user for annotation; and evaluating selective sampling strategies when actual annotation cost is not available. We present our results on a movie review classification task with rationale annotations. We demonstrate that a combination of instance, annotator and annotation task characteristics are important for developing an accurate estimator, and argue that both correlation coefficient and root mean square error should be used for evaluating annotation cost estimators.

References

Shilpa Arora and Eric Nyberg. 2009. Interactive Annotation Learning with Indirect Feature Voting. In Proceedings of NAACL-HLT 2009 (Student Research Workshop).Google ScholarDigital Library
Xavier Carreras and Lluìs Márquez. 2004. Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling. http://www.lsi.upc.edu/~srlconll/st04/st04.html.Google Scholar
Gahgene Gweon, Carolyn Penstein Ros'e, Joerg Wittwer and Matthias Nueckles. 2005. Supporting Efficient and Reliable Content Analysis Using Automatic Text Processing Technology. In proceedings of INTERACT 2005: 1112--1115. Google ScholarDigital Library
Robbie A. Haertel, Kevin D. Seppi, Eric K. Ringger and Janes L. Cattoll. 2008. Return on Investment for Active Learning. In proceedings of NIPS Workshop on Cost Sensitive Learning.Google Scholar
Rebecca Hwa. 2000. Sample Selection for Statistical Grammar Induction. In proceedings of joint SIGDAT conference on Empirical Methods in NLP and Very Large Corpora. Google ScholarDigital Library
Ashish Kapoor, Eric Horvitz and Sumit Basu. 2007. Selective supervision: Guiding supervised learning with decision-theoretic active learning. In proceedings of IJCAI, pages 877--882. Google ScholarDigital Library
Ross D. King, Kenneth E. Whelan, Ffion M. Jones, Philip G. K. Reiser, Christopher H. Bryant, Stephen H. Muggleton, Douglas B. Kell and Stephen G. Oliver. 2004. Functional Genomics hypothesis generation and experimentation by a robot scientist. In proceedings of Nature, 427(6971):247--52.Google Scholar
Trausti Kristjansson, Aron Culotta, Paul Viola and Andrew Mccallum. 2004. Interactive Information Extraction with Constrained Conditional Random Fields. In proceedings of AAAI. Google ScholarDigital Library
Karl Pearson. 1895. Correlation Coefficient. Royal Society Proceedings, 58, 214.Google Scholar
Eric Ringger, Marc Carmen, Robbie Haertel, Kevin Seppi, Deryle Lonsdale, Peter McClanahan, Janes L. Cattoll and Noel Ellison. 2008. Assessing the Costs of Machine-Assisted Corpus Annotation through a User Study. In proceedings of LREC.Google Scholar
Burr Settles, Mark Craven and Lewis Friedland. 2008. Active Learning with Real Annotation Costs. In proceedings of NIPS Workshop on Cost Sensitive Learning.Google Scholar
Alex J. Smola and Bernhard Scholkopf 1998. A Tutorial on Support Vector Regression. NeuroCOLT2 Technical Report Series - NC2-TR-1998-030.Google Scholar
Katrin Tomanek, Joachim Wermter and Udo Hahn. 2007. An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In proceedings of EMNLP-CoNLL, pp. 486--495.Google Scholar
Theresa Wilson, Janyce Wiebe and Paul Hoffmann. 2005. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. In proceedings of HLT/EMNLP, Vancouver, Canada. Google ScholarDigital Library
Ian H. Witten and Eibe Frank. 2005. Data Mining: Practical machine learning tools and techniques. 2nd Edition, Morgan Kaufmann, San Francisco. Google ScholarDigital Library
Omar Zaidan, Jason Eisner and Christine Piatko. 2007. Using "annotator rationales" to improve machine learning for text categorization. In Proceedings of NAACL-HLT, pp. 260--267, Rochester, NY.Google Scholar

Index Terms

Estimating annotation cost for active learning in a multi-annotator environment
1. Computing methodologies
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Practical cost-conscious active learning for data annotation in annotator-initiated environments
Read More
Practical Cost-Conscious Active Learning for Data Annotation in Annotator-Initiated Environments
Read More
Active cleaning for video corpus annotation
MMM'12: Proceedings of the 18th international conference on Advances in Multimedia Modeling

In this paper, we have described the <em>Active Cleaning</em> approach that was used to complete the active learning approach in the TRECVID collaborative annotation. It consists of using a classification system to select the samples to be re-annotated ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HLT '09: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
June 2009
72 pages
Conference Chairs:
Eric Ringger
Brigham Young University
,
Robbie Hertel
Brigham Young University
,
Katrin Tomanek
University of Jena (Germany)
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 5 June 2009
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate240of768submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 344
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Estimating annotation cost for active learning in a multi-annotator environment

HLT '09: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Practical cost-conscious active learning for data annotation in annotator-initiated environments

Practical Cost-Conscious Active Learning for Data Annotation in Annotator-Initiated Environments

Active cleaning for video corpus annotation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Estimating annotation cost for active learning in a multi-annotator environment

HLT '09: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Practical cost-conscious active learning for data annotation in annotator-initiated environments

Practical Cost-Conscious Active Learning for Data Annotation in Annotator-Initiated Environments

Active cleaning for video corpus annotation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media