research-article

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

Authors:
Kenji Hata

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Ranjay Krishna

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Li Fei-Fei

Stanford University, Palo Alto, CA, USA

Stanford University, Palo Alto, CA, USA
View Profile

,
Michael S. Bernstein

Stanford University, Palo Alto, CA, USA

Stanford University, Palo Alto, CA, USA
View Profile

CSCW '17: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social ComputingFebruary 2017Pages 889–901https://doi.org/10.1145/2998181.2998248

Published:25 February 2017Publication History

CSCW '17: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing

Pages 889–901

ABSTRACT

Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets. As a result, crowd workers spend weeks or months repeating the exact same tasks, making it necessary to understand their behavior over these long periods of time. We utilize three large, longitudinal datasets of nine million annotations collected from Amazon Mechanical Turk to examine claims that workers fatigue or satisfice over these long periods, producing lower quality work. We find that, contrary to these claims, workers are extremely stable in their quality over the entire period. To understand whether workers set their quality based on the task's requirements for acceptance, we then perform an experiment where we vary the required quality for a large crowdsourcing task. Workers did not adjust their quality based on the acceptance threshold: workers who were above the threshold continued working at their usual quality level, and workers below the threshold self-selected themselves out of the task. Capitalizing on this consistency, we demonstrate that it is possible to predict workers' long-term quality using just a glimpse of their quality on the first five tasks.

References

Debby G.J. Beckers, Dimitri van der Linden, Peter G.W. Smulders, Michiel A.J. Kompier, Marc JPM van Veldhoven, and Nico W. van Yperen. 2004. Working overtime hours: relations with fatigue, work motivation, and the quality of work. Journal of Occupational and Environmental Medicine 46, 12 (2004), 1282--1289.Google Scholar
Michael S. Bernstein, Joel Brandt, Robert C. Miller, and David R. Karger. 2011. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 33--42. Google ScholarDigital Library
Michael S. Bernstein, Greg Little, Robert C. Miller, Björn Hartmann, Mark S. Ackerman, David R. Karger, David Crowell, and Katrina Panovich. 2010. Soylent: a word processor with a crowd inside. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. ACM, 313--322. Google ScholarDigital Library
Maarten A.S. Boksem and Mattie Tops. 2008. Mental fatigue: costs and benefits. Brain research reviews 59, 1 (2008), 125--139.Google Scholar
Jonathan Warren Brelig and Jared Morton Schrieber. 2013. System and method for automated retail product accounting. (Jan. 30 2013). US Patent App. 13/754,664.Google Scholar
Chris Callison-Burch. 2009. Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 286--295. Google ScholarCross Ref
Dana Chandler and Adam Kapelner. 2013. Breaking monotony with meaning: Motivation in crowdsourcing markets. Journal of Economic Behavior & Organization 90 (2013), 123--133.Google ScholarCross Ref
Jesse Chandler, Gabriele Paolacci, and Pam Mueller. 2013. Risks and rewards of crowdsourcing marketplaces. In Handbook of human computation. Springer, 377--392.Google Scholar
Peng Dai, Jeffrey M. Rzeszotarski, Praveen Paritosh, and Ed H. Chi. 2015. And now for something completely different: Improving crowdsourcing workflows with Micro-Diversions. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 628--638. Google ScholarDigital Library
Marc Damashek. 1995. Gauging similarity with n-grams: Language-independent categorization of text. Science 267, 5199 (1995), 843.Google Scholar
Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, and Philippe Cudré-Mauroux. 2014. Scaling-up the crowd: Micro-task pricing schemes for worker retention and latency improvement. In Second AAAI Conference on Human Computation and Crowdsourcing.Google ScholarCross Ref
Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, Panagiotis G. Ipeirotis, and Philippe Cudré-Mauroux. 2015. The Dynamics of Micro-Task Crowdsourcing: The Case of Amazon MTurk. In Proceedings of the 24th International Conference on World Wide Web. ACM, 617--617. Google ScholarDigital Library
Steven Dow, Anand Kulkarni, Scott Klemmer, and Björn Hartmann. 2012. Shepherding the crowd yields better work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work. ACM, 1013--1022. Google ScholarDigital Library
Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are your participants gaming the system?: screening mechanical turk workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2399--2402. Google ScholarDigital Library
Jeffrey Heer and Michael Bostock. 2010. Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 203--212. Google ScholarDigital Library
Robert A. Henning, Steven L. Sauter, Gavriel Salvendy, and Edward F. Krieg Jr. 1989. Microbreak length, performance, and stress in a data entry task. Ergonomics 32, 7 (1989), 855--864.Google ScholarCross Ref
Paul Hitlin. 2016. Research in the Crowdsourcing Age, a Case Study. (July 2016). http://www.pewinternet.org/2016/07/11/ research-in-the-crowdsourcing-age-a-case-study/Google Scholar
Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, and Jennifer Wortman Vaughan. 2015. Incentivizing high quality crowdwork. In Proceedings of the 24th International Conference on World Wide Web. ACM, 419--429. Google ScholarDigital Library
Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD workshop on human computation. ACM, 64--67. Google ScholarDigital Library
Lilly C. Irani and M. Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 611--620. Google ScholarDigital Library
Adam Kapelner and Dana Chandler. 2010. Preventing Satisficing in online surveys. In Proceedings of CrowdConf.Google Scholar
David R. Karger, Sewoong Oh, and Devavrat Shah. 2011. Budget-optimal crowdsourcing using low-rank matrix approximations. In Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on. IEEE, 284--291.Google Scholar
Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with Mechanical Turk. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 453--456. Google ScholarDigital Library
Aniket Kittur, Jeffrey V. Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1301--1318. Google ScholarDigital Library
Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E. Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 43--52. Google ScholarDigital Library
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael Bernstein, and Li Fei-Fei. 2016b. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision (2016). http://arxiv.org/abs/1602.07332Google Scholar
Ranjay A. Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A. Shamma, Li Fei-Fei, and Michael S. Bernstein. 2016a. Embracing Error to Enable Rapid Crowdsourcing. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 3167--3179. Google ScholarDigital Library
Jon A. Krosnick. 1991. Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied cognitive psychology 5, 3 (1991), 213--236.Google Scholar
Gerald P. Krueger. 1989. Sustained work, fatigue, sleep loss and performance: A review of the issues. Work & Stress 3, 2 (1989), 129--141.Google ScholarCross Ref
Raymond Kuhn and Erik Neveu. 2013. Political journalism: New challenges, new practices. Routledge.Google Scholar
Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 23--34. Google ScholarDigital Library
Walter S. Lasecki, Adam Marcus, Jeffrey M. Rzeszotarski, and Jeffrey P. Bigham. 2014. Using microtask continuity to improve crowdsourcing. Technical Report. Tech. rep.Google Scholar
Walter S. Lasecki, Kyle I. Murray, Samuel White, Robert C. Miller, and Jeffrey P. Bigham. 2011. Real-time crowd control of existing interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 23--32. Google ScholarDigital Library
Walter S. Lasecki, Jeffrey M. Rzeszotarski, Adam Marcus, and Jeffrey P. Bigham. 2015. The Effects of Sequence and Delay on Crowd Work. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1375--1378. Google ScholarDigital Library
Edith Law, Ming Yin, Kevin Chen Joslin Goh, Michael Terry, and Krzysztof Z. Gajos. 2016. Curiosity Killed the Cat, but Makes Crowdwork Better. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 4098--4110. Google ScholarDigital Library
Greg Little. 2009. How many turkers are there. (Dec 2009). http://groups.csail.mit.edu/uid/deneme/?p=502Google Scholar
Angli Liu, Stephen Soderland, Jonathan Bragg, Christopher H. Lin, Xiao Ling, and Daniel S. Weld. 2016. Effective Crowd Annotation for Relation Extraction. In Proceedings of the NAACL-HLT 2016. Association for Computational Linguistics, 897 S--906.Google Scholar
Adam Marcus and Aditya Parameswaran. 2015. Crowdsourced data management: industry and academic perspectives. Foundations and Trends in Databases. Google ScholarDigital Library
David Martin, Benjamin V. Hanrahan, Jacki O'Neill, and Neha Gupta. 2014. Being a turker. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 224--235. Google ScholarDigital Library
Winter Mason and Duncan J. Watts. 2010. Financial incentives and the performance of crowds. ACM SigKDD Explorations Newsletter 11, 2 (2010), 100--108. Google ScholarDigital Library
Brian McInnis, Dan Cosley, Chaebong Nam, and Gilly Leshed. 2016. Taking a HIT: Designing around Rejection, Mistrust, Risk, and Workers' Experiences in Amazon Mechanical Turk. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2271--2282. Google ScholarDigital Library
Tanushree Mitra, Clayton J. Hutto, and Eric Gilbert. 2015. Comparing Person-and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1345--1354. Google ScholarDigital Library
Allen Newell and Paul S. Rosenbloom. 1981. Mechanisms of skill acquisition and the law of practice. Cognitive skills and their acquisition 1 (1981), 1--55.Google Scholar
Daniel M. Oppenheimer, Tom Meyvis, and Nicolas Davidenko. 2009. Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology 45, 4 (2009), 867--872.Google ScholarCross Ref
Layne P. Perelli. 1980. Fatigue Stressors in Simulated Long-duration Flight. Effects on Performance, Information Processing, Subjective Fatigue, and Physiological Cost. Technical Report. DTIC Document.Google Scholar
Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Association for Computational Linguistics, 139--147. Google ScholarDigital Library
Joel Ross, Lilly Irani, M. Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: shifting demographics in mechanical turk. In CHI'10 extended abstracts on Human factors in computing systems. ACM, 2863--2872. Google ScholarDigital Library
Jeffrey Rzeszotarski and Aniket Kittur. 2012. CrowdScape: interactively visualizing user behavior and output. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 55--62. Google ScholarDigital Library
Niloufar Salehi, Lilly C. Irani, Michael S. Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, and others. 2015. We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1621--1630. Google ScholarDigital Library
Aaron D. Shaw, John J. Horton, and Daniel L. Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 conference on Computer supported cooperative work. ACM, 275--284. Google ScholarDigital Library
Herbert A. Simon. 1972. Theories of bounded rationality. Decision and organization 1, 1 (1972), 161--176.Google Scholar
Alexander Sorokin and David Forsyth. 2008. Utility data annotation with amazon mechanical turk. Urbana 51, 61 (2008), 820.Google Scholar
Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2015. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817 (2015).Google Scholar
John W. Tukey. 1949. Comparing individual means in the analysis of variance. Biometrics (1949), 99--114.Google Scholar
Luis Von Ahn. 2006. Games with a purpose. Computer 39, 6 (2006), 92--94. Google ScholarDigital Library
Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R. Movellan, and Paul L. Ruvolo. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in neural information processing systems. 2035--2043. Google ScholarDigital Library
Stanley Wyatt, James N. Langdon, and others. 1937. Fatigue and Boredom in Repetitive Work. Industrial Health Research Board Report. Medical Research Council 77 (1937).Google Scholar

Index Terms

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality
1. Human-centered computing
  1. Collaborative and social computing

Recommendations

Make Hay While the Crowd Shines: Towards Efficient Crowdsourcing on the Web
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Within the scope of this PhD proposal, we set out to investigate two pivotal aspects that influence the effectiveness of crowdsourcing: (i) microtask design, and (ii) workers behavior. Leveraging the dynamics of tasks that are crowdsourced on the one ...
Read More
Modus Operandi of Crowd Workers: The Invisible Role of Microtask Work Environments

The ubiquity of the Internet and the widespread proliferation of electronic devices has resulted in flourishing microtask crowdsourcing marketplaces, such as Amazon MTurk. An aspect that has remained largely invisible in microtask crowdsourcing is that ...
Read More
Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection
Abstract
The suitability of crowdsourcing to solve a variety of problems has been investigated widely. Yet, there is still a lack of understanding about the distinct behavior and performance of workers within microtasks. In this paper, we first introduce a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CSCW '17: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing
February 2017
2556 pages
ISBN:9781450343350
DOI:10.1145/2998181
General Chairs:
Charlotte P. Lee
University of Washington
,
Steve Poltrock
Retired
,
Program Chairs:
Louise Barkhuus
Stockholm University and Cornell Tech
,
Marcos Borges
Universidade Federal do Rio de Janeiro
,
Wendy Kellogg
IBM T.J. Watson Research Center
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 February 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
consistent
crowdsourcing
fatigue
microtasks
satisficing
Qualifiers
- research-article
Conference

Acceptance Rates
CSCW '17 Paper Acceptance Rate183of530submissions,35%Overall Acceptance Rate2,235of8,521submissions,26%
More
Upcoming Conference
CSCW '24

Sponsor:

sigchi

CSCW '24: Computer-Supported Cooperative Work and Social Computing

November 9 - 13, 2024

San Jose , Costa Rica
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 438
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

CSCW '17: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Make Hay While the Crowd Shines: Towards Efficient Crowdsourcing on the Web

Modus Operandi of Crowd Workers: The Invisible Role of Microtask Work Environments

Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection