ABSTRACT
Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets. As a result, crowd workers spend weeks or months repeating the exact same tasks, making it necessary to understand their behavior over these long periods of time. We utilize three large, longitudinal datasets of nine million annotations collected from Amazon Mechanical Turk to examine claims that workers fatigue or satisfice over these long periods, producing lower quality work. We find that, contrary to these claims, workers are extremely stable in their quality over the entire period. To understand whether workers set their quality based on the task's requirements for acceptance, we then perform an experiment where we vary the required quality for a large crowdsourcing task. Workers did not adjust their quality based on the acceptance threshold: workers who were above the threshold continued working at their usual quality level, and workers below the threshold self-selected themselves out of the task. Capitalizing on this consistency, we demonstrate that it is possible to predict workers' long-term quality using just a glimpse of their quality on the first five tasks.
- Debby G.J. Beckers, Dimitri van der Linden, Peter G.W. Smulders, Michiel A.J. Kompier, Marc JPM van Veldhoven, and Nico W. van Yperen. 2004. Working overtime hours: relations with fatigue, work motivation, and the quality of work. Journal of Occupational and Environmental Medicine 46, 12 (2004), 1282--1289.Google Scholar
- Michael S. Bernstein, Joel Brandt, Robert C. Miller, and David R. Karger. 2011. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 33--42. Google ScholarDigital Library
- Michael S. Bernstein, Greg Little, Robert C. Miller, Björn Hartmann, Mark S. Ackerman, David R. Karger, David Crowell, and Katrina Panovich. 2010. Soylent: a word processor with a crowd inside. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. ACM, 313--322. Google ScholarDigital Library
- Maarten A.S. Boksem and Mattie Tops. 2008. Mental fatigue: costs and benefits. Brain research reviews 59, 1 (2008), 125--139.Google Scholar
- Jonathan Warren Brelig and Jared Morton Schrieber. 2013. System and method for automated retail product accounting. (Jan. 30 2013). US Patent App. 13/754,664.Google Scholar
- Chris Callison-Burch. 2009. Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 286--295. Google ScholarCross Ref
- Dana Chandler and Adam Kapelner. 2013. Breaking monotony with meaning: Motivation in crowdsourcing markets. Journal of Economic Behavior & Organization 90 (2013), 123--133.Google ScholarCross Ref
- Jesse Chandler, Gabriele Paolacci, and Pam Mueller. 2013. Risks and rewards of crowdsourcing marketplaces. In Handbook of human computation. Springer, 377--392.Google Scholar
- Peng Dai, Jeffrey M. Rzeszotarski, Praveen Paritosh, and Ed H. Chi. 2015. And now for something completely different: Improving crowdsourcing workflows with Micro-Diversions. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 628--638. Google ScholarDigital Library
- Marc Damashek. 1995. Gauging similarity with n-grams: Language-independent categorization of text. Science 267, 5199 (1995), 843.Google Scholar
- Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, and Philippe Cudré-Mauroux. 2014. Scaling-up the crowd: Micro-task pricing schemes for worker retention and latency improvement. In Second AAAI Conference on Human Computation and Crowdsourcing.Google ScholarCross Ref
- Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, Panagiotis G. Ipeirotis, and Philippe Cudré-Mauroux. 2015. The Dynamics of Micro-Task Crowdsourcing: The Case of Amazon MTurk. In Proceedings of the 24th International Conference on World Wide Web. ACM, 617--617. Google ScholarDigital Library
- Steven Dow, Anand Kulkarni, Scott Klemmer, and Björn Hartmann. 2012. Shepherding the crowd yields better work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work. ACM, 1013--1022. Google ScholarDigital Library
- Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are your participants gaming the system?: screening mechanical turk workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2399--2402. Google ScholarDigital Library
- Jeffrey Heer and Michael Bostock. 2010. Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 203--212. Google ScholarDigital Library
- Robert A. Henning, Steven L. Sauter, Gavriel Salvendy, and Edward F. Krieg Jr. 1989. Microbreak length, performance, and stress in a data entry task. Ergonomics 32, 7 (1989), 855--864.Google ScholarCross Ref
- Paul Hitlin. 2016. Research in the Crowdsourcing Age, a Case Study. (July 2016). http://www.pewinternet.org/2016/07/11/ research-in-the-crowdsourcing-age-a-case-study/Google Scholar
- Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, and Jennifer Wortman Vaughan. 2015. Incentivizing high quality crowdwork. In Proceedings of the 24th International Conference on World Wide Web. ACM, 419--429. Google ScholarDigital Library
- Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD workshop on human computation. ACM, 64--67. Google ScholarDigital Library
- Lilly C. Irani and M. Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 611--620. Google ScholarDigital Library
- Adam Kapelner and Dana Chandler. 2010. Preventing Satisficing in online surveys. In Proceedings of CrowdConf.Google Scholar
- David R. Karger, Sewoong Oh, and Devavrat Shah. 2011. Budget-optimal crowdsourcing using low-rank matrix approximations. In Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on. IEEE, 284--291.Google Scholar
- Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with Mechanical Turk. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 453--456. Google ScholarDigital Library
- Aniket Kittur, Jeffrey V. Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1301--1318. Google ScholarDigital Library
- Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E. Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 43--52. Google ScholarDigital Library
- Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael Bernstein, and Li Fei-Fei. 2016b. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision (2016). http://arxiv.org/abs/1602.07332Google Scholar
- Ranjay A. Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A. Shamma, Li Fei-Fei, and Michael S. Bernstein. 2016a. Embracing Error to Enable Rapid Crowdsourcing. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 3167--3179. Google ScholarDigital Library
- Jon A. Krosnick. 1991. Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied cognitive psychology 5, 3 (1991), 213--236.Google Scholar
- Gerald P. Krueger. 1989. Sustained work, fatigue, sleep loss and performance: A review of the issues. Work & Stress 3, 2 (1989), 129--141.Google ScholarCross Ref
- Raymond Kuhn and Erik Neveu. 2013. Political journalism: New challenges, new practices. Routledge.Google Scholar
- Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 23--34. Google ScholarDigital Library
- Walter S. Lasecki, Adam Marcus, Jeffrey M. Rzeszotarski, and Jeffrey P. Bigham. 2014. Using microtask continuity to improve crowdsourcing. Technical Report. Tech. rep.Google Scholar
- Walter S. Lasecki, Kyle I. Murray, Samuel White, Robert C. Miller, and Jeffrey P. Bigham. 2011. Real-time crowd control of existing interfaces. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 23--32. Google ScholarDigital Library
- Walter S. Lasecki, Jeffrey M. Rzeszotarski, Adam Marcus, and Jeffrey P. Bigham. 2015. The Effects of Sequence and Delay on Crowd Work. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1375--1378. Google ScholarDigital Library
- Edith Law, Ming Yin, Kevin Chen Joslin Goh, Michael Terry, and Krzysztof Z. Gajos. 2016. Curiosity Killed the Cat, but Makes Crowdwork Better. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 4098--4110. Google ScholarDigital Library
- Greg Little. 2009. How many turkers are there. (Dec 2009). http://groups.csail.mit.edu/uid/deneme/?p=502Google Scholar
- Angli Liu, Stephen Soderland, Jonathan Bragg, Christopher H. Lin, Xiao Ling, and Daniel S. Weld. 2016. Effective Crowd Annotation for Relation Extraction. In Proceedings of the NAACL-HLT 2016. Association for Computational Linguistics, 897 S--906.Google Scholar
- Adam Marcus and Aditya Parameswaran. 2015. Crowdsourced data management: industry and academic perspectives. Foundations and Trends in Databases. Google ScholarDigital Library
- David Martin, Benjamin V. Hanrahan, Jacki O'Neill, and Neha Gupta. 2014. Being a turker. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 224--235. Google ScholarDigital Library
- Winter Mason and Duncan J. Watts. 2010. Financial incentives and the performance of crowds. ACM SigKDD Explorations Newsletter 11, 2 (2010), 100--108. Google ScholarDigital Library
- Brian McInnis, Dan Cosley, Chaebong Nam, and Gilly Leshed. 2016. Taking a HIT: Designing around Rejection, Mistrust, Risk, and Workers' Experiences in Amazon Mechanical Turk. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2271--2282. Google ScholarDigital Library
- Tanushree Mitra, Clayton J. Hutto, and Eric Gilbert. 2015. Comparing Person-and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1345--1354. Google ScholarDigital Library
- Allen Newell and Paul S. Rosenbloom. 1981. Mechanisms of skill acquisition and the law of practice. Cognitive skills and their acquisition 1 (1981), 1--55.Google Scholar
- Daniel M. Oppenheimer, Tom Meyvis, and Nicolas Davidenko. 2009. Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology 45, 4 (2009), 867--872.Google ScholarCross Ref
- Layne P. Perelli. 1980. Fatigue Stressors in Simulated Long-duration Flight. Effects on Performance, Information Processing, Subjective Fatigue, and Physiological Cost. Technical Report. DTIC Document.Google Scholar
- Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Association for Computational Linguistics, 139--147. Google ScholarDigital Library
- Joel Ross, Lilly Irani, M. Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: shifting demographics in mechanical turk. In CHI'10 extended abstracts on Human factors in computing systems. ACM, 2863--2872. Google ScholarDigital Library
- Jeffrey Rzeszotarski and Aniket Kittur. 2012. CrowdScape: interactively visualizing user behavior and output. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 55--62. Google ScholarDigital Library
- Niloufar Salehi, Lilly C. Irani, Michael S. Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, and others. 2015. We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1621--1630. Google ScholarDigital Library
- Aaron D. Shaw, John J. Horton, and Daniel L. Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 conference on Computer supported cooperative work. ACM, 275--284. Google ScholarDigital Library
- Herbert A. Simon. 1972. Theories of bounded rationality. Decision and organization 1, 1 (1972), 161--176.Google Scholar
- Alexander Sorokin and David Forsyth. 2008. Utility data annotation with amazon mechanical turk. Urbana 51, 61 (2008), 820.Google Scholar
- Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2015. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817 (2015).Google Scholar
- John W. Tukey. 1949. Comparing individual means in the analysis of variance. Biometrics (1949), 99--114.Google Scholar
- Luis Von Ahn. 2006. Games with a purpose. Computer 39, 6 (2006), 92--94. Google ScholarDigital Library
- Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R. Movellan, and Paul L. Ruvolo. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in neural information processing systems. 2035--2043. Google ScholarDigital Library
- Stanley Wyatt, James N. Langdon, and others. 1937. Fatigue and Boredom in Repetitive Work. Industrial Health Research Board Report. Medical Research Council 77 (1937).Google Scholar
Index Terms
- A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality
Recommendations
Make Hay While the Crowd Shines: Towards Efficient Crowdsourcing on the Web
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide WebWithin the scope of this PhD proposal, we set out to investigate two pivotal aspects that influence the effectiveness of crowdsourcing: (i) microtask design, and (ii) workers behavior. Leveraging the dynamics of tasks that are crowdsourced on the one ...
Modus Operandi of Crowd Workers: The Invisible Role of Microtask Work Environments
The ubiquity of the Internet and the widespread proliferation of electronic devices has resulted in flourishing microtask crowdsourcing marketplaces, such as Amazon MTurk. An aspect that has remained largely invisible in microtask crowdsourcing is that ...
Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection
AbstractThe suitability of crowdsourcing to solve a variety of problems has been investigated widely. Yet, there is still a lack of understanding about the distinct behavior and performance of workers within microtasks. In this paper, we first introduce a ...
Comments