skip to main content
10.1145/2675133.2675285acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
research-article

Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards

Published:28 February 2015Publication History

ABSTRACT

In just a few years, crowdsourcing markets like Mechanical Turk have become the dominant mechanism for for building "gold standard" datasets in areas of computer science ranging from natural language processing to audio transcription. The assumption behind this sea change - an assumption that is central to the approaches taken in hundreds of research projects - is that crowdsourced markets can accurately replicate the judgments of the general population for knowledge-oriented tasks. Focusing on the important domain of semantic relatedness algorithms and leveraging Clark's theory of common ground as a framework, we demonstrate that this assumption can be highly problematic. Using 7,921 semantic relatedness judgements from 72 scholars and 39 crowdworkers, we show that crowdworkers on Mechanical Turk produce significantly different semantic relatedness gold standard judgements than people from other communities. We also show that algorithms that perform well against Mechanical Turk gold standard datasets do significantly worse when evaluated against other communities' gold standards. Our results call into question the broad use of Mechanical Turk for the development of gold standard datasets and demonstrate the importance of understanding these datasets from a human-centered point-of-view. More generally, our findings problematize the notion that a universal gold standard dataset exists for all knowledge tasks.

References

  1. Babbie, E. R., et al. Survey research methods. Wadsworth Belmont, CA, 1990.Google ScholarGoogle Scholar
  2. Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., Van Der Goot, E., Halkia, M., Pouliquen, B., and Belyaeva, J. Sentiment analysis in the news. arXiv preprint arXiv:1309.6202 (2013).Google ScholarGoogle Scholar
  3. Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M., and Gergle, D. Omnipedia: Bridging the wikipedia language gap. In CHI '12 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bergstrom, T., and Karahalios, K. Conversation clusters: grouping conversation topics through human-computer dialog. In CHI '09 (Boston, MA, 2009), 2349--2352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bloodgood, M., and Callison-Burch, C. Using mechanical turk to build machine translation evaluation sets. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Budanitsky, A., and Hirst, G. Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics 32, 1 (2006), 13--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Buhrmester, M., Kwang, T., and Gosling, S. D. Amazon's mechanical turk a new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science 6, 1 (Jan. 2011), 3--5.Google ScholarGoogle ScholarCross RefCross Ref
  8. Callison-Burch, C., and Dredze, M. Creating speech and language data with amazon's mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Association for Computational Linguistics (2010), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Clark, H. H. Using Language. Cambridge University Press, May 1996.Google ScholarGoogle ScholarCross RefCross Ref
  10. Dong, W., and Fu, W.-T. Cultural difference in image tagging. In CHI '10 (Atlanta, Georgia, USA, 2010), 981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dong, Z., Shi, C., Sen, S., Terveen, L., and Riedl, J. War versus inspirational in forrest gump: Cultural effects in tagging communities. In ICWSM '12 (May 2012).Google ScholarGoogle Scholar
  12. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. Placing search in context: The concept revisited. ACM Transactions on Information Systems 20, 1 (2002), 116--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Freitas, A., Oliveira, J. G., O'Riain, S., da Silva, J. C., and Curry, E. Querying linked data graphs using semantic relatedness: A vocabulary independent approach. Data & Knowledge Engineering 88, 0 (2013), 126--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gabrilovich, E., and Markovitch, S. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI '07 (Hyberabad, India, 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gergle, D., Kraut, R. E., and Fussell, S. R. Action as language in a shared visual space. In Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work, CSCW '04, ACM (New York, NY, USA, 2004), 487--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gergle, D., Millen, D. R., Kraut, R. E., and Fussell, S. R. Persistence matters: Making the most of chat in tightly-coupled work. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '04, ACM (New York, NY, USA, 2004), 431--438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Grieser, K., Baldwin, T., Bohnert, F., and Sonenberg, L. Using ontological and document similarity to estimate museum exhibit relatedness. 10:110:20. Cited by 0013.Google ScholarGoogle Scholar
  18. Halawi, G., Dror, G., Gabrilovich, E., and Koren, Y. Large-scale learning of word relatedness with constraints. In KDD '12, ACM (New York, NY, USA, 2012), 14061414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hecht, B., Carton, S. H., Quaderi, M., Schöning, J., Raubal, M., Gergle, D., and Downey, D. Explanatory semantic relatedness and explicit spatialization for exploratory search. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, ACM (2012), 415--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hecht, B., and Gergle, D. The tower of babel meets web 2.0: User-generated content and its applications in a multilingual context. In CHI '10, ACM (Atlanta, GA, 2010), 291300. ACM ID: 1753370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Heer, J., and Bostock, M. Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In CHI '10 (2010), 203212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ipeirotis, P. G. Demographics of mechanical turk.Google ScholarGoogle Scholar
  23. Kittur, A., Chi, E. H., and Suh, B. What's in wikipedia?: Mapping topics and conflict using socially annotated category structure. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '09, ACM (New York, NY, USA, 2009), 1509--1512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Liesaputra, V., and Witten, I. H. Realistic electronic books. International Journal of Human-Computer Studies 70, 9 (Sept. 2012), 588--610. Cited by 0002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Miller, G. A., and Charles, W. G. Contextual correlates of semantic similarity. 1--28.Google ScholarGoogle Scholar
  26. Milne, D., and Witten, I. H. Learning to link with wikipedia. In CIKM '08 (Napa Valley, California, USA, 2008), 509518. ACM ID: 1458150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mooney, C. Z., Duval, R. D., and Duvall, R. Bootstrapping: A nonparametric approach to statistical inference. Sage, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  28. Patwardhan, S., Banerjee, S., and Pedersen, T. Using measures of semantic relatedness for word sense disambiguation. In Computational Linguistics and Intelligent Text Processing, A. Gelbukh, Ed. Springer Berlin Heidelberg, Jan. 2003, 241--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Pavlick, E., Post, M., Irvine, A., Kachaev, D., and Callison-Burch, C. The language demographics of amazon mechanical turk. Transactions of the Association for Computational Linguistics 2 (2014), 79--92.Google ScholarGoogle ScholarCross RefCross Ref
  30. Pedersen, T., Pakhomov, S. V., Patwardhan, S., and Chute, C. G. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40, 3 (2006), 288--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pirró, G., and Seco, N. Design, implementation and evaluation of a new semantic similarity metric combining features and intrinsic information content. In On the Move to Meaningful Internet Systems: OTM 2008, R. Meersman and Z. Tari, Eds., no. 5332 in Lecture Notes in Computer Science. Springer Berlin Heidelberg, Jan. 2008, 1271--1288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ponzetto, S. P., and Strube, M. Exploiting semantic role labeling, WordNet and wikipedia for coreference resolution. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (2006), 192199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Popescu, A., and Grefenstette, G. Mining user home location and gender from flickr tags. In ICSWM '10 (2010).Google ScholarGoogle Scholar
  34. Radinsky, K., Agichtein, E., Gabrilovich, E., and Markovitch, S. A word at a time: Computing word relatedness using temporal semantic analysis. In WWW '11 (Hyberabad, India, 2011), 337--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Resnick, P. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI '95 (Montreal, Quebec, Canada, 1995), 448--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Rubenstein, H., and Goodenough, J. B. Contextual correlates of synonymy. Communications of the ACM 8, 10 (Oct. 1965), 627633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Schöning, J., Hecht, B., Raubal, M., Krger, A., Marsh, M., and Rohs, M. Improving interaction with virtual globes through spatial thinking: Helping users ask Why?. In IUI '08 (Masapalomas, Gran Canaria, Spain, 2008), 129--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Y. Cheap and fastbut is it good?: evaluating non-expert annotations for natural language tasks. In EMNLP '08 (2008), 254263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Strube, M., and Ponzetto, S. P. WikiRelate! computing semantic relatedness using wikipedia. In AAAI '06 (Boston, MA, 2006), 1419--1424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., and Stede, M. Lexicon-based methods for sentiment analysis. Computational linguistics 37, 2 (2011), 267--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Witten, I., and Milne, D. An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, AAAI Press, Chicago, USA (2008), 25--30.Google ScholarGoogle Scholar
  42. Zesch, T., and Gurevych, I. Wisdom of crowds versus wisdom of linguists-measuring the semantic relatedness of words. Natural Language Engineering 16, 1 (2010), 25. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CSCW '15: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing
      February 2015
      1956 pages
      ISBN:9781450329224
      DOI:10.1145/2675133

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 February 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CSCW '15 Paper Acceptance Rate161of575submissions,28%Overall Acceptance Rate2,235of8,521submissions,26%

      Upcoming Conference

      CSCW '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader