ABSTRACT
We present an empirical investigation of the annotation cost estimation task for active learning in a multi-annotator environment. We present our analysis from two perspectives: selecting examples to be presented to the user for annotation; and evaluating selective sampling strategies when actual annotation cost is not available. We present our results on a movie review classification task with rationale annotations. We demonstrate that a combination of instance, annotator and annotation task characteristics are important for developing an accurate estimator, and argue that both correlation coefficient and root mean square error should be used for evaluating annotation cost estimators.
- Shilpa Arora and Eric Nyberg. 2009. Interactive Annotation Learning with Indirect Feature Voting. In Proceedings of NAACL-HLT 2009 (Student Research Workshop).Google ScholarDigital Library
- Xavier Carreras and Lluìs Márquez. 2004. Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling. http://www.lsi.upc.edu/~srlconll/st04/st04.html.Google Scholar
- Gahgene Gweon, Carolyn Penstein Ros'e, Joerg Wittwer and Matthias Nueckles. 2005. Supporting Efficient and Reliable Content Analysis Using Automatic Text Processing Technology. In proceedings of INTERACT 2005: 1112--1115. Google ScholarDigital Library
- Robbie A. Haertel, Kevin D. Seppi, Eric K. Ringger and Janes L. Cattoll. 2008. Return on Investment for Active Learning. In proceedings of NIPS Workshop on Cost Sensitive Learning.Google Scholar
- Rebecca Hwa. 2000. Sample Selection for Statistical Grammar Induction. In proceedings of joint SIGDAT conference on Empirical Methods in NLP and Very Large Corpora. Google ScholarDigital Library
- Ashish Kapoor, Eric Horvitz and Sumit Basu. 2007. Selective supervision: Guiding supervised learning with decision-theoretic active learning. In proceedings of IJCAI, pages 877--882. Google ScholarDigital Library
- Ross D. King, Kenneth E. Whelan, Ffion M. Jones, Philip G. K. Reiser, Christopher H. Bryant, Stephen H. Muggleton, Douglas B. Kell and Stephen G. Oliver. 2004. Functional Genomics hypothesis generation and experimentation by a robot scientist. In proceedings of Nature, 427(6971):247--52.Google Scholar
- Trausti Kristjansson, Aron Culotta, Paul Viola and Andrew Mccallum. 2004. Interactive Information Extraction with Constrained Conditional Random Fields. In proceedings of AAAI. Google ScholarDigital Library
- Karl Pearson. 1895. Correlation Coefficient. Royal Society Proceedings, 58, 214.Google Scholar
- Eric Ringger, Marc Carmen, Robbie Haertel, Kevin Seppi, Deryle Lonsdale, Peter McClanahan, Janes L. Cattoll and Noel Ellison. 2008. Assessing the Costs of Machine-Assisted Corpus Annotation through a User Study. In proceedings of LREC.Google Scholar
- Burr Settles, Mark Craven and Lewis Friedland. 2008. Active Learning with Real Annotation Costs. In proceedings of NIPS Workshop on Cost Sensitive Learning.Google Scholar
- Alex J. Smola and Bernhard Scholkopf 1998. A Tutorial on Support Vector Regression. NeuroCOLT2 Technical Report Series - NC2-TR-1998-030.Google Scholar
- Katrin Tomanek, Joachim Wermter and Udo Hahn. 2007. An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In proceedings of EMNLP-CoNLL, pp. 486--495.Google Scholar
- Theresa Wilson, Janyce Wiebe and Paul Hoffmann. 2005. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. In proceedings of HLT/EMNLP, Vancouver, Canada. Google ScholarDigital Library
- Ian H. Witten and Eibe Frank. 2005. Data Mining: Practical machine learning tools and techniques. 2nd Edition, Morgan Kaufmann, San Francisco. Google ScholarDigital Library
- Omar Zaidan, Jason Eisner and Christine Piatko. 2007. Using "annotator rationales" to improve machine learning for text categorization. In Proceedings of NAACL-HLT, pp. 260--267, Rochester, NY.Google Scholar
Index Terms
- Estimating annotation cost for active learning in a multi-annotator environment
Recommendations
Active cleaning for video corpus annotation
MMM'12: Proceedings of the 18th international conference on Advances in Multimedia ModelingIn this paper, we have described the <em>Active Cleaning</em> approach that was used to complete the active learning approach in the TRECVID collaborative annotation. It consists of using a classification system to select the samples to be re-annotated ...
Comments