skip to main content
10.1145/2998181.2998248acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
research-article

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

Published:25 February 2017Publication History

ABSTRACT

Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets. As a result, crowd workers spend weeks or months repeating the exact same tasks, making it necessary to understand their behavior over these long periods of time. We utilize three large, longitudinal datasets of nine million annotations collected from Amazon Mechanical Turk to examine claims that workers fatigue or satisfice over these long periods, producing lower quality work. We find that, contrary to these claims, workers are extremely stable in their quality over the entire period. To understand whether workers set their quality based on the task's requirements for acceptance, we then perform an experiment where we vary the required quality for a large crowdsourcing task. Workers did not adjust their quality based on the acceptance threshold: workers who were above the threshold continued working at their usual quality level, and workers below the threshold self-selected themselves out of the task. Capitalizing on this consistency, we demonstrate that it is possible to predict workers' long-term quality using just a glimpse of their quality on the first five tasks.

References

  1. Debby G.J. Beckers, Dimitri van der Linden, Peter G.W. Smulders, Michiel A.J. Kompier, Marc JPM van Veldhoven, and Nico W. van Yperen. 2004. Working overtime hours: relations with fatigue, work motivation, and the quality of work. Journal of Occupational and Environmental Medicine 46, 12 (2004), 1282--1289.Google ScholarGoogle Scholar
  2. Michael S. Bernstein, Joel Brandt, Robert C. Miller, and David R. Karger. 2011. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michael S. Bernstein, Greg Little, Robert C. Miller, Björn Hartmann, Mark S. Ackerman, David R. Karger, David Crowell, and Katrina Panovich. 2010. Soylent: a word processor with a crowd inside. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. ACM, 313--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Maarten A.S. Boksem and Mattie Tops. 2008. Mental fatigue: costs and benefits. Brain research reviews 59, 1 (2008), 125--139.Google ScholarGoogle Scholar
  5. Jonathan Warren Brelig and Jared Morton Schrieber. 2013. System and method for automated retail product accounting. (Jan. 30 2013). US Patent App. 13/754,664.Google ScholarGoogle Scholar
  6. Chris Callison-Burch. 2009. Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 286--295. Google ScholarGoogle ScholarCross RefCross Ref
  7. Dana Chandler and Adam Kapelner. 2013. Breaking monotony with meaning: Motivation in crowdsourcing markets. Journal of Economic Behavior & Organization 90 (2013), 123--133.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jesse Chandler, Gabriele Paolacci, and Pam Mueller. 2013. Risks and rewards of crowdsourcing marketplaces. In Handbook of human computation. Springer, 377--392.Google ScholarGoogle Scholar
  9. Peng Dai, Jeffrey M. Rzeszotarski, Praveen Paritosh, and Ed H. Chi. 2015. And now for something completely different: Improving crowdsourcing workflows with Micro-Diversions. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 628--638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marc Damashek. 1995. Gauging similarity with n-grams: Language-independent categorization of text. Science 267, 5199 (1995), 843.Google ScholarGoogle Scholar
  11. Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, and Philippe Cudré-Mauroux. 2014. Scaling-up the crowd: Micro-task pricing schemes for worker retention and latency improvement. In Second AAAI Conference on Human Computation and Crowdsourcing.Google ScholarGoogle ScholarCross RefCross Ref
  12. Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, Panagiotis G. Ipeirotis, and Philippe Cudré-Mauroux. 2015. The Dynamics of Micro-Task Crowdsourcing: The Case of Amazon MTurk. In Proceedings of the 24th International Conference on World Wide Web. ACM, 617--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Steven Dow, Anand Kulkarni, Scott Klemmer, and Björn Hartmann. 2012. Shepherding the crowd yields better work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work. ACM, 1013--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are your participants gaming the system?: screening mechanical turk workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2399--2402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jeffrey Heer and Michael Bostock. 2010. Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 203--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Robert A. Henning, Steven L. Sauter, Gavriel Salvendy, and Edward F. Krieg Jr. 1989. Microbreak length, performance, and stress in a data entry task. Ergonomics 32, 7 (1989), 855--864.Google ScholarGoogle ScholarCross RefCross Ref
  17. Paul Hitlin. 2016. Research in the Crowdsourcing Age, a Case Study. (July 2016). http://www.pewinternet.org/2016/07/11/ research-in-the-crowdsourcing-age-a-case-study/Google ScholarGoogle Scholar
  18. Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, and Jennifer Wortman Vaughan. 2015. Incentivizing high quality crowdwork. In Proceedings of the 24th International Conference on World Wide Web. ACM, 419--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD workshop on human computation. ACM, 64--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lilly C. Irani and M. Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 611--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Adam Kapelner and Dana Chandler. 2010. Preventing Satisficing in online surveys. In Proceedings of CrowdConf.Google ScholarGoogle Scholar
  22. David R. Karger, Sewoong Oh, and Devavrat Shah. 2011. Budget-optimal crowdsourcing using low-rank matrix approximations. In Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on. IEEE, 284--291.Google ScholarGoogle Scholar
  23. Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with Mechanical Turk. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 453--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Aniket Kittur, Jeffrey V. Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1301--1318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E. Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael Bernstein, and Li Fei-Fei. 2016b. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision (2016). http://arxiv.org/abs/1602.07332Google ScholarGoogle Scholar
  27. Ranjay A. Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A. Shamma, Li Fei-Fei, and Michael S. Bernstein. 2016a. Embracing Error to Enable Rapid Crowdsourcing. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 3167--3179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jon A. Krosnick. 1991. Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied cognitive psychology 5, 3 (1991), 213--236.Google ScholarGoogle Scholar
  29. Gerald P. Krueger. 1989. Sustained work, fatigue, sleep loss and performance: A review of the issues. Work & Stress 3, 2 (1989), 129--141.Google ScholarGoogle ScholarCross RefCross Ref
  30. Raymond Kuhn and Erik Neveu. 2013. Political journalism: New challenges, new practices. Routledge.Google ScholarGoogle Scholar
  31. Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Walter S. Lasecki, Adam Marcus, Jeffrey M. Rzeszotarski, and Jeffrey P. Bigham. 2014. Using microtask continuity to improve crowdsourcing. Technical Report. Tech. rep.Google ScholarGoogle Scholar
  33. Walter S. Lasecki, Kyle I. Murray, Samuel White, Robert C. Miller, and Jeffrey P. Bigham. 2011. Real-time crowd control of existing interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Walter S. Lasecki, Jeffrey M. Rzeszotarski, Adam Marcus, and Jeffrey P. Bigham. 2015. The Effects of Sequence and Delay on Crowd Work. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1375--1378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Edith Law, Ming Yin, Kevin Chen Joslin Goh, Michael Terry, and Krzysztof Z. Gajos. 2016. Curiosity Killed the Cat, but Makes Crowdwork Better. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 4098--4110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Greg Little. 2009. How many turkers are there. (Dec 2009). http://groups.csail.mit.edu/uid/deneme/?p=502Google ScholarGoogle Scholar
  37. Angli Liu, Stephen Soderland, Jonathan Bragg, Christopher H. Lin, Xiao Ling, and Daniel S. Weld. 2016. Effective Crowd Annotation for Relation Extraction. In Proceedings of the NAACL-HLT 2016. Association for Computational Linguistics, 897 S--906.Google ScholarGoogle Scholar
  38. Adam Marcus and Aditya Parameswaran. 2015. Crowdsourced data management: industry and academic perspectives. Foundations and Trends in Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. David Martin, Benjamin V. Hanrahan, Jacki O'Neill, and Neha Gupta. 2014. Being a turker. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 224--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Winter Mason and Duncan J. Watts. 2010. Financial incentives and the performance of crowds. ACM SigKDD Explorations Newsletter 11, 2 (2010), 100--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Brian McInnis, Dan Cosley, Chaebong Nam, and Gilly Leshed. 2016. Taking a HIT: Designing around Rejection, Mistrust, Risk, and Workers' Experiences in Amazon Mechanical Turk. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2271--2282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tanushree Mitra, Clayton J. Hutto, and Eric Gilbert. 2015. Comparing Person-and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1345--1354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Allen Newell and Paul S. Rosenbloom. 1981. Mechanisms of skill acquisition and the law of practice. Cognitive skills and their acquisition 1 (1981), 1--55.Google ScholarGoogle Scholar
  44. Daniel M. Oppenheimer, Tom Meyvis, and Nicolas Davidenko. 2009. Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology 45, 4 (2009), 867--872.Google ScholarGoogle ScholarCross RefCross Ref
  45. Layne P. Perelli. 1980. Fatigue Stressors in Simulated Long-duration Flight. Effects on Performance, Information Processing, Subjective Fatigue, and Physiological Cost. Technical Report. DTIC Document.Google ScholarGoogle Scholar
  46. Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Association for Computational Linguistics, 139--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Joel Ross, Lilly Irani, M. Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: shifting demographics in mechanical turk. In CHI'10 extended abstracts on Human factors in computing systems. ACM, 2863--2872. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jeffrey Rzeszotarski and Aniket Kittur. 2012. CrowdScape: interactively visualizing user behavior and output. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 55--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Niloufar Salehi, Lilly C. Irani, Michael S. Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, and others. 2015. We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1621--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Aaron D. Shaw, John J. Horton, and Daniel L. Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 conference on Computer supported cooperative work. ACM, 275--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Herbert A. Simon. 1972. Theories of bounded rationality. Decision and organization 1, 1 (1972), 161--176.Google ScholarGoogle Scholar
  52. Alexander Sorokin and David Forsyth. 2008. Utility data annotation with amazon mechanical turk. Urbana 51, 61 (2008), 820.Google ScholarGoogle Scholar
  53. Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2015. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817 (2015).Google ScholarGoogle Scholar
  54. John W. Tukey. 1949. Comparing individual means in the analysis of variance. Biometrics (1949), 99--114.Google ScholarGoogle Scholar
  55. Luis Von Ahn. 2006. Games with a purpose. Computer 39, 6 (2006), 92--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R. Movellan, and Paul L. Ruvolo. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in neural information processing systems. 2035--2043. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Stanley Wyatt, James N. Langdon, and others. 1937. Fatigue and Boredom in Repetitive Work. Industrial Health Research Board Report. Medical Research Council 77 (1937).Google ScholarGoogle Scholar

Index Terms

  1. A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CSCW '17: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing
      February 2017
      2556 pages
      ISBN:9781450343350
      DOI:10.1145/2998181

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 February 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CSCW '17 Paper Acceptance Rate183of530submissions,35%Overall Acceptance Rate2,235of8,521submissions,26%

      Upcoming Conference

      CSCW '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader