Skip to main content

Design and Implementation of Relevance Assessments Using Crowdsourcing

  • Conference paper
Advances in Information Retrieval (ECIR 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6611))

Included in the following conference series:

Abstract

In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in any experiment, there are several details that would make an experiment work or fail. To gather useful results, user interface guidelines, inter-agreement metrics, and justification analysis are important aspects of a successful crowdsourcing experiment. In this work we explore the design and execution of relevance judgments using Amazon Mechanical Turk as crowdsourcing platform, introducing a methodology for crowdsourcing relevance assessments and the results of a series of experiments using TREC 8 with a fixed budget. Our findings indicate that workers are as good as TREC experts, even providing detailed feedback for certain query-document pairs. We also explore the importance of document design and presentation when performing relevance assessment tasks. Finally, we show our methodology at work with several examples that are interesting in their own.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. von Ahn, L.: Games with a purpose. IEEE Computer 39(6), 92–94 (2006)

    Article  Google Scholar 

  2. Alonso, O., Mizzaro, S.: Can we get rid of TREC Assessors? Using Mechanical Turk for Relevance Assessment. In: SIGIR Workshop Future of IR Evaluation (2009)

    Google Scholar 

  3. Alonso, O., Schenkel, R., Theobald, M.: Crowdsourcing Assessments for XML Ranked Retrieval. In: 32 ECIR, Milton Keynes, UK (2010)

    Google Scholar 

  4. Alonso, O., Baeza-Yates, R.: An Analysis of Crowdsourcing Relevance Assessments in Spanish. In: CERI 2010, Madrid, Spain (2010)

    Google Scholar 

  5. Bradburn, N., Sudman, S., Wansink, B.: Asking Questions: The Definitive Guide to Questionnaire Design. Josey-Bass (2004)

    Google Scholar 

  6. Callison-Burch, C.: Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk. In: Proceedings of EMNLP (2009)

    Google Scholar 

  7. Grady, C., Lease, M.: Crowdsourcing Document Relevance Assessment with Mechanical Turk. In: NAACL HLT Workshop on Creating Speech and Language Data with Amazons Mechanical Turk (2010)

    Google Scholar 

  8. Kazai, G., Milic-Frayling, N., Costello, J.: Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments. In: 32 SIGIR (2009)

    Google Scholar 

  9. Kinney, K., Huffman, S., Zhai, J.: How Evaluator Domain Expertise Affects Search Result Relevance Judgments. In: 17 CIKM (2008)

    Google Scholar 

  10. Malone, T.W., Laubacher, R., Dellarocas, C.: Harnessing Crowds: Mapping the Genome of Collective Intelligence. MIT Press, Cambridge (2009)

    Google Scholar 

  11. Mason, W., Watts, D.: Financial Incentives and the ‘Performance of Crowds’. In: HCOMP Workshop at KDD, Paris, France (2009)

    Google Scholar 

  12. Nov, O., Naaman, M., Ye, C.: What Drives Content Tagging: The Case of Photos on Flickr. In: CHI, Florence, Italy (2008)

    Google Scholar 

  13. Snow, R., O’ Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In: EMNLP (2008)

    Google Scholar 

  14. Tang, J., Sanderson, M.: Evaluation and User Preference Study on Spatial Diversity. In: 32 ECIR, Milton Keynes, UK (2010)

    Google Scholar 

  15. Voorhees, E.: Personal communication (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alonso, O., Baeza-Yates, R. (2011). Design and Implementation of Relevance Assessments Using Crowdsourcing. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20161-5_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20160-8

  • Online ISBN: 978-3-642-20161-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics