research-article

FourEyes: Leveraging Tool Diversity as a Means to Improve Aggregate Accuracy in Crowdsourcing

Authors:
Jean Y. Song

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Raymond Fok

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Juho Kim

KAIST, Republic of Korea, Daejeon, Republic of Korea

KAIST, Republic of Korea, Daejeon, Republic of Korea
View Profile

,
Walter S. Lasecki

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

ACM Transactions on Interactive Intelligent Systems Volume 10 Issue 1Article No.: 3pp 1–30https://doi.org/10.1145/3237188

Published:09 August 2019Publication History

ACM Transactions on Interactive Intelligent Systems

Abstract

Crowdsourcing is a common means of collecting image segmentation training data for use in a variety of computer vision applications. However, designing accurate crowd-powered image segmentation systems is challenging, because defining object boundaries in an image requires significant fine motor skills and hand-eye coordination, which makes these tasks error-prone. Typically, special segmentation tools are created and then answers from multiple workers are aggregated to generate more accurate results. However, individual tool designs can bias how and where people make mistakes, resulting in shared errors that remain even after aggregation. In this article, we introduce a novel crowdsourcing approach that leverages tool diversity as a means of improving aggregate crowd performance. Our idea is that given a diverse set of tools, answer aggregation done across tools can help improve the collective performance by offsetting systematic biases induced by the individual tools themselves. To demonstrate the effectiveness of the proposed approach, we design four different tools and present FourEyes, a crowd-powered image segmentation system that uses aggregation across different tools. We then conduct a series of studies that evaluate different aggregation conditions and show that using multiple tools can significantly improve aggregate accuracy. Furthermore, we investigate the idea of applying post-processing for multi-tool aggregation in terms of correction mechanism. We introduce a novel region-based method for synthesizing more accurate bounds for image segmentation tasks through averaging surrounding annotations. In addition, we explore the effect of adjusting the threshold parameter of an EM-based aggregation method. Our results suggest that not only the individual tool’s design, but also the correction mechanism, can affect the performance of multi-tool aggregation. This article extends a work presented at ACM IUI 2018 [46] by providing a novel region-based error-correction method and additional in-depth evaluation of the proposed approach.

References

Amy Bearman, Olga Russakovsky, Vittorio Ferrari, and Li Fei-Fei. 2016. What’s the point: Semantic segmentation with point supervision. In Proceedings of the European Conference on Computer Vision. Springer, 549--565.Google ScholarCross Ref
Sean Bell, Paul Upchurch, Noah Snavely, and Kavita Bala. 2013. OpenSurfaces: A richly annotated catalog of surface appearance. ACM Trans. Graph. 32, 4 (2013), 111. Google ScholarDigital Library
Michael S. Bernstein, Greg Little, Robert C. Miller, Björn Hartmann, Mark S. Ackerman, David R. Karger, David Crowell, and Katrina Panovich. 2010. Soylent: A word processor with a crowd inside. In Proceedings of the 23rd ACM Symposium on User Interface Software and Technology. ACM, 313--322. Google ScholarDigital Library
Jonathan Bragg, Mausam, and Daniel S. Weld. 2013. Crowdsourcing multi-label classification for taxonomy creation. In Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
Axel Carlier, Vincent Charvillat, Amaia Salvador, Xavier Giro-i Nieto, and Oge Marques. 2014. Click’n’Cut: Crowdsourced interactive segmentation with object candidates. In Proceedings of the International ACM Workshop on Crowdsourcing for Multimedia. ACM, 53--56. Google ScholarDigital Library
Alexander Philip Dawid and Allan M. Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28, 1 (1979), 20--28.Google ScholarCross Ref
Thomas G. Dietterich et al. 2000. Ensemble methods in machine learning. Mult. Class. Syst. 1857 (2000), 1--15. Google ScholarDigital Library
Steven Dow, Anand Kulkarni, Scott Klemmer, and Björn Hartmann. 2012. Shepherding the crowd yields better work. In Proceedings of the ACM Conference on Computer Supported Cooperative Work. ACM, 1013--1022. Google ScholarDigital Library
Yoav Freund and Robert E. Schapire. 1995. A desicion-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the European Conference on Computational Learning Theory. Springer, 23--37. Google ScholarDigital Library
Timnit Gebru, Jonathan Krause, Jia Deng, and Li Fei-Fei. 2017. Scalable annotation of fine-grained categories without experts. In Proceedings of the International Conference on Human Factors in Computing Systems. ACM, 1877--1881. Google ScholarDigital Library
Mitchell Gordon, Jeffrey P. Bigham, and Walter S. Lasecki. 2015. LegionTools: A toolkit+ UI for recruiting and routing crowds to synchronous real-time tasks. In Adjunct Proceedings of the 28th ACM Symposium on User Interface Software 8 Technology. ACM, 81--82. Google ScholarDigital Library
Sai Gouravajhala, Jean Y. Song, Jinyeong Yim, Raymond Fok, Yanda Huang, Fan Yang, Kyle Wang, Yilei An, and Walter S. Lasecki. 2017. Towards hybrid intelligence for robotics. In Proceedings of the Collective Intelligence Conference (CI’17).Google Scholar
Danna Gurari, Mehrnoosh Sameki, and Margrit Betke. 2016. Investigating the influence of data familiarity to improve the design of a crowdsourcing image annotation system. In Proceedings of the AAAI Conference on Human Computation 8 Crowdsourcing (HCOMP’16).Google Scholar
Lars Kai Hansen and Peter Salamon. 1990. Neural network ensembles. IEEE Trans. Pattern Anal. Machine Intell. 12, 10 (1990), 993--1001. Google ScholarDigital Library
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. Retrieved from: CoRR abs/1703.06870.Google Scholar
Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on Amazon Mechanical Turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, 64--67. Google ScholarDigital Library
Alexandre Kaspar, Genevieve Patterson, Changil Kim, Yagiz Aksoy, Wojciech Matusik, and Mohamed Elgharib. 2018. Crowd-guided ensembles: How can we choreograph crowd workers for video segmentation? In Proceedings of the Conference on Human Factors in Computing Systems (CHI’18). ACM, New York, NY, Article 111, 111:1--111:12 pages. Google ScholarDigital Library
Harmanpreet Kaur, Mitchell Gordon, Yiwei Yang, Jeffrey P. Bigham, Jaime Teevan, Ece Kamar, and Walter S. Lasecki. 2017. Crowdmask: Using crowds to preserve privacy in crowd-powered systems via progressive filtering. In Proceedings of the AAAI Conference on Human Computation (HCOMP’17), Vol. 17.Google Scholar
Juho Kim, Phu Tran Nguyen, Sarah Weir, Philip J. Guo, Robert C. Miller, and Krzysztof Z. Gajos. 2014. Crowdsourcing step-by-step information extraction to enhance existing how-to videos. In Proceedings of the 32nd ACM Conference on Human Factors in Computing Systems (CHI’14). ACM, New York, NY, 4017--4026. Google ScholarDigital Library
Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E. Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th ACM Symposium on User Interface Software and Technology. ACM, 43--52. Google ScholarDigital Library
Anand Kulkarni, Matthew Can, and Björn Hartmann. 2012. Collaboratively crowdsourcing workflows with Turkomatic. In Proceedings of the ACM Conference on Computer Supported Cooperative Work. ACM, 1003--1012. Google ScholarDigital Library
Walter Lasecki and Jeffrey Bigham. 2012. Self-correcting crowds. In CHI’12 Extended Abstracts on Human Factors in Computing Systems. ACM, 2555--2560. Google ScholarDigital Library
Walter S. Lasecki, Mitchell Gordon, Danai Koutra, Malte F. Jung, Steven P. Dow, and Jeffrey P. Bigham. 2014. Glance: Rapidly coding behavioral video with the crowd. In Proceedings of the 27th ACM Symposium on User Interface Software and Technology. ACM, 551--562. Google ScholarDigital Library
Walter S. Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th ACM Symposium on User Interface Software and Technology. ACM, 23--34. Google ScholarDigital Library
Walter S. Lasecki, Christopher D. Miller, and Jeffrey P. Bigham. 2013. Warping time for more effective real-time crowdsourcing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). ACM, New York, NY, 2033--2036. Google ScholarDigital Library
Walter S. Lasecki, Kyle I. Murray, Samuel White, Robert C. Miller, and Jeffrey P. Bigham. 2011. Real-time crowd control of existing interfaces. In Proceedings of the 24th ACM Symposium on User Interface Software and Technology. ACM, 23--32. Google ScholarDigital Library
Walter S. Lasecki, Young Chol Song, Henry Kautz, and Jeffrey P. Bigham. 2013. Real-time crowd labeling for deployable activity recognition. In Proceedings of the Conference on Computer Supported Cooperative Work. ACM, 1203--1212. Google ScholarDigital Library
Matthew Lease, Jessica Hullman, Jeffrey P. Bigham, Michael S. Bernstein, Juho Kim, Walter S. Lasecki, Saeideh Bakhshi, Tanushree Mitra, and Robert C. Miller. 2013. Mechanical Turk is not anonymous. Soc. Sci. Res. Netw. (2013).Google Scholar
Christopher Lin, Mausam Mausam, and Daniel Weld. 2012. Dynamically switching between synergistic workflows for crowdsourcing. In Proceedings of the AAAI Conference on Artificial Intelligence. Google ScholarDigital Library
Christopher H. Lin, Mausam Daniel, and S. Weld. 2012. Crowdsourcing control: Moving beyond multiple choice. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI’12). Google ScholarDigital Library
Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, and Jian Sun. 2016. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3159--3167.Google ScholarCross Ref
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740--755.Google Scholar
Greg Little, Lydia B. Chilton, Max Goldman, and Robert C. Miller. 2010. Turkit: Human computation algorithms on Mechanical Turk. In Proceedings of the 23nd ACM Symposium on User Interface Software and Technology. ACM, 57--66. Google ScholarDigital Library
Ching Liu, Juho Kim, and Hao-Chuan Wang. 2018. ConceptScape: Collaborative concept mapping for video learning. In Proceedings of the Conference on Human Factors in Computing Systems. ACM, 387. Google ScholarDigital Library
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).Google ScholarCross Ref
Alan Lundgard, Yiwei Yang, Maya L. Foster, and Walter S. Lasecki. 2018. Bolt: Instantaneous crowdsourcing via just-in-time training. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’18). ACM, New York, NY. Google ScholarDigital Library
Kurt Luther, Nathan Hahn, Steven P. Dow, and Aniket Kittur. 2015. Crowdlines: Supporting synthesis of diverse information sources through crowdsourced outlines. In Proceedings of the 3rd AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
Allan MacLean, Richard M. Young, Victoria M. E. Bellotti, and Thomas P. Moran. 1991. Questions, options, and criteria: Elements of design space analysis. Human--Comput. Interact. 6, 3–4 (1991), 201--250. Google ScholarDigital Library
Andrew Mao, Ece Kamar, Yiling Chen, Eric Horvitz, Megan E. Schwamb, Chris J. Lintott, and Arfon M. Smith. 2013. Volunteering versus work for pay: Incentives and tradeoffs in crowdsourcing. In Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
Christian A. Meissner and John C. Brigham. 2001. Thirty years of investigating the own-race bias in memory for faces: A meta-analytic review. Psych., Pub. Polic. Law 7, 1 (2001), 3.Google ScholarCross Ref
Tom Ouyang and Yang Li. 2012. Bootstrapping personal gesture shortcuts with the wisdom of the crowd and handwriting recognition. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’12). ACM, New York, NY, 2895--2904. Google ScholarDigital Library
Akshay Rao, Harmanpreet Kaur, and Walter S. Lasecki. 2018. Plexiglass: Multiplexing passive and active tasks for more efficient crowdsourcing. In Proceedings of the AAAI Conference on Human Computation. ACM.Google Scholar
Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2008. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 1 (2008), 157--173. Google ScholarDigital Library
Jeffrey M. Rzeszotarski and Aniket Kittur. 2011. Instrumenting the crowd: Using implicit behavioral measures to predict task performance. In Proceedings of the 24th ACM Symposium on User Interface Software and Technology. ACM, 13--22. Google ScholarDigital Library
Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 254--263. Google ScholarDigital Library
Jean Y. Song, Raymond Fok, Alan Lundgard, Fan Yang, Juho Kim, and Walter S. Lasecki. 2018. Two tools are better than one: Tool diversity as a means of improving aggregate crowd performance. In Proceedings of the 23rd International Conference on Intelligent User Interfaces (IUI’18). ACM, New York, NY, 559--570. Google ScholarDigital Library
Saiganesh Swaminathan, Raymond Fok, Fanglin Chen, Ting-Hao Kenneth Huang, Irene Lin, Rohan Jadvani, Walter S. Lasecki, and Jeffrey P. Bigham. 2017. WearMail: On-the-go access to information in your email with a privacy-preserving human computation workflow. In Proceedings of the 30th ACM Symposium on User Interface Software and Technology. ACM, 807--815. Google ScholarDigital Library
Shane Torbert. 2016. Applied Computer Science. Springer. 158 pages. Google ScholarDigital Library
Peter Welinder, Steve Branson, Pietro Perona, and Serge J. Belongie. 2010. The multidimensional wisdom of crowds. In Proceedings of the Conference on Advances in Neural Information Processing Systems. Curran Associates, Inc., 2424--2432. Google ScholarDigital Library
Jacob Whitehill, Ting Fan Wu, Jacob Bergsma, Javier R. Movellan, and Paul L. Ruvolo. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proceedings of the Conference on Advances in Neural Information Processing Systems. Curran Associates, Inc., 2035--2043. Google ScholarDigital Library
Joseph Jay Williams, Juho Kim, Anna Rafferty, Samuel Maldonado, Krzysztof Z. Gajos, Walter S. Lasecki, and Neil Heffernan. 2016. Axis: Generating explanations at scale with learnersourcing and machine learning. In Proceedings of the 3rd ACM Conference on Learning@ Scale. ACM, 379--388. Google ScholarDigital Library

Index Terms

FourEyes: Leveraging Tool Diversity as a Means to Improve Aggregate Accuracy in Crowdsourcing

Recommendations

Two Tools are Better Than One: Tool Diversity as a Means of Improving Aggregate Crowd Performance
IUI '18: Proceedings of the 23rd International Conference on Intelligent User Interfaces

Crowdsourcing is a common means of collecting image segmentation training data for use in a variety of computer vision applications. However, designing accurate crowd-powered image segmentation systems is challenging because defining object boundaries in ...
Read More
How many crowdsourced workers should a requester hire?

Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a ...
Read More
Self-correcting crowds
CHI EA '12: CHI '12 Extended Abstracts on Human Factors in Computing Systems

Much of the current work in crowdsourcing is focused on increasing the quality of responses. Quality issues are most often due to a small subset of low quality workers. The ability to distinguish between high and low quality workers would allow a wide ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Interactive Intelligent Systems Volume 10, Issue 1
Special Issue on IUI 2018
March 2020
347 pages
ISSN:2160-6455
EISSN:2160-6463
DOI:10.1145/3352585
Editor:
Michelle X. Zhou
Juji, Inc., USA
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 August 2019
- Revised: 1 July 2018
- Accepted: 1 July 2018
- Received: 1 May 2018
Published in tiis Volume 10, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Crowdsourcing
computer vision
human computation
multi-tool aggregation
semantic image segmentation
tool diversity
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 318
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

FourEyes: Leveraging Tool Diversity as a Means to Improve Aggregate Accuracy in Crowdsourcing

ACM Transactions on Interactive Intelligent Systems

Abstract

References

Cited By

Index Terms

Recommendations

Two Tools are Better Than One: Tool Diversity as a Means of Improving Aggregate Crowd Performance

How many crowdsourced workers should a requester hire?

Self-correcting crowds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

FourEyes: Leveraging Tool Diversity as a Means to Improve Aggregate Accuracy in Crowdsourcing

ACM Transactions on Interactive Intelligent Systems

Abstract

References

Cited By

Index Terms

Recommendations

Two Tools are Better Than One: Tool Diversity as a Means of Improving Aggregate Crowd Performance

How many crowdsourced workers should a requester hire?

Self-correcting crowds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media