Abstract
Crowdsourcing is a common means of collecting image segmentation training data for use in a variety of computer vision applications. However, designing accurate crowd-powered image segmentation systems is challenging, because defining object boundaries in an image requires significant fine motor skills and hand-eye coordination, which makes these tasks error-prone. Typically, special segmentation tools are created and then answers from multiple workers are aggregated to generate more accurate results. However, individual tool designs can bias how and where people make mistakes, resulting in shared errors that remain even after aggregation. In this article, we introduce a novel crowdsourcing approach that leverages tool diversity as a means of improving aggregate crowd performance. Our idea is that given a diverse set of tools, answer aggregation done across tools can help improve the collective performance by offsetting systematic biases induced by the individual tools themselves. To demonstrate the effectiveness of the proposed approach, we design four different tools and present FourEyes, a crowd-powered image segmentation system that uses aggregation across different tools. We then conduct a series of studies that evaluate different aggregation conditions and show that using multiple tools can significantly improve aggregate accuracy. Furthermore, we investigate the idea of applying post-processing for multi-tool aggregation in terms of correction mechanism. We introduce a novel region-based method for synthesizing more accurate bounds for image segmentation tasks through averaging surrounding annotations. In addition, we explore the effect of adjusting the threshold parameter of an EM-based aggregation method. Our results suggest that not only the individual tool’s design, but also the correction mechanism, can affect the performance of multi-tool aggregation. This article extends a work presented at ACM IUI 2018 [46] by providing a novel region-based error-correction method and additional in-depth evaluation of the proposed approach.
- Amy Bearman, Olga Russakovsky, Vittorio Ferrari, and Li Fei-Fei. 2016. What’s the point: Semantic segmentation with point supervision. In Proceedings of the European Conference on Computer Vision. Springer, 549--565.Google ScholarCross Ref
- Sean Bell, Paul Upchurch, Noah Snavely, and Kavita Bala. 2013. OpenSurfaces: A richly annotated catalog of surface appearance. ACM Trans. Graph. 32, 4 (2013), 111. Google ScholarDigital Library
- Michael S. Bernstein, Greg Little, Robert C. Miller, Björn Hartmann, Mark S. Ackerman, David R. Karger, David Crowell, and Katrina Panovich. 2010. Soylent: A word processor with a crowd inside. In Proceedings of the 23rd ACM Symposium on User Interface Software and Technology. ACM, 313--322. Google ScholarDigital Library
- Jonathan Bragg, Mausam, and Daniel S. Weld. 2013. Crowdsourcing multi-label classification for taxonomy creation. In Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Axel Carlier, Vincent Charvillat, Amaia Salvador, Xavier Giro-i Nieto, and Oge Marques. 2014. Click’n’Cut: Crowdsourced interactive segmentation with object candidates. In Proceedings of the International ACM Workshop on Crowdsourcing for Multimedia. ACM, 53--56. Google ScholarDigital Library
- Alexander Philip Dawid and Allan M. Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28, 1 (1979), 20--28.Google ScholarCross Ref
- Thomas G. Dietterich et al. 2000. Ensemble methods in machine learning. Mult. Class. Syst. 1857 (2000), 1--15. Google ScholarDigital Library
- Steven Dow, Anand Kulkarni, Scott Klemmer, and Björn Hartmann. 2012. Shepherding the crowd yields better work. In Proceedings of the ACM Conference on Computer Supported Cooperative Work. ACM, 1013--1022. Google ScholarDigital Library
- Yoav Freund and Robert E. Schapire. 1995. A desicion-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the European Conference on Computational Learning Theory. Springer, 23--37. Google ScholarDigital Library
- Timnit Gebru, Jonathan Krause, Jia Deng, and Li Fei-Fei. 2017. Scalable annotation of fine-grained categories without experts. In Proceedings of the International Conference on Human Factors in Computing Systems. ACM, 1877--1881. Google ScholarDigital Library
- Mitchell Gordon, Jeffrey P. Bigham, and Walter S. Lasecki. 2015. LegionTools: A toolkit+ UI for recruiting and routing crowds to synchronous real-time tasks. In Adjunct Proceedings of the 28th ACM Symposium on User Interface Software 8 Technology. ACM, 81--82. Google ScholarDigital Library
- Sai Gouravajhala, Jean Y. Song, Jinyeong Yim, Raymond Fok, Yanda Huang, Fan Yang, Kyle Wang, Yilei An, and Walter S. Lasecki. 2017. Towards hybrid intelligence for robotics. In Proceedings of the Collective Intelligence Conference (CI’17).Google Scholar
- Danna Gurari, Mehrnoosh Sameki, and Margrit Betke. 2016. Investigating the influence of data familiarity to improve the design of a crowdsourcing image annotation system. In Proceedings of the AAAI Conference on Human Computation 8 Crowdsourcing (HCOMP’16).Google Scholar
- Lars Kai Hansen and Peter Salamon. 1990. Neural network ensembles. IEEE Trans. Pattern Anal. Machine Intell. 12, 10 (1990), 993--1001. Google ScholarDigital Library
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. Retrieved from: CoRR abs/1703.06870.Google Scholar
- Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on Amazon Mechanical Turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, 64--67. Google ScholarDigital Library
- Alexandre Kaspar, Genevieve Patterson, Changil Kim, Yagiz Aksoy, Wojciech Matusik, and Mohamed Elgharib. 2018. Crowd-guided ensembles: How can we choreograph crowd workers for video segmentation? In Proceedings of the Conference on Human Factors in Computing Systems (CHI’18). ACM, New York, NY, Article 111, 111:1--111:12 pages. Google ScholarDigital Library
- Harmanpreet Kaur, Mitchell Gordon, Yiwei Yang, Jeffrey P. Bigham, Jaime Teevan, Ece Kamar, and Walter S. Lasecki. 2017. Crowdmask: Using crowds to preserve privacy in crowd-powered systems via progressive filtering. In Proceedings of the AAAI Conference on Human Computation (HCOMP’17), Vol. 17.Google Scholar
- Juho Kim, Phu Tran Nguyen, Sarah Weir, Philip J. Guo, Robert C. Miller, and Krzysztof Z. Gajos. 2014. Crowdsourcing step-by-step information extraction to enhance existing how-to videos. In Proceedings of the 32nd ACM Conference on Human Factors in Computing Systems (CHI’14). ACM, New York, NY, 4017--4026. Google ScholarDigital Library
- Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E. Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th ACM Symposium on User Interface Software and Technology. ACM, 43--52. Google ScholarDigital Library
- Anand Kulkarni, Matthew Can, and Björn Hartmann. 2012. Collaboratively crowdsourcing workflows with Turkomatic. In Proceedings of the ACM Conference on Computer Supported Cooperative Work. ACM, 1003--1012. Google ScholarDigital Library
- Walter Lasecki and Jeffrey Bigham. 2012. Self-correcting crowds. In CHI’12 Extended Abstracts on Human Factors in Computing Systems. ACM, 2555--2560. Google ScholarDigital Library
- Walter S. Lasecki, Mitchell Gordon, Danai Koutra, Malte F. Jung, Steven P. Dow, and Jeffrey P. Bigham. 2014. Glance: Rapidly coding behavioral video with the crowd. In Proceedings of the 27th ACM Symposium on User Interface Software and Technology. ACM, 551--562. Google ScholarDigital Library
- Walter S. Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th ACM Symposium on User Interface Software and Technology. ACM, 23--34. Google ScholarDigital Library
- Walter S. Lasecki, Christopher D. Miller, and Jeffrey P. Bigham. 2013. Warping time for more effective real-time crowdsourcing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). ACM, New York, NY, 2033--2036. Google ScholarDigital Library
- Walter S. Lasecki, Kyle I. Murray, Samuel White, Robert C. Miller, and Jeffrey P. Bigham. 2011. Real-time crowd control of existing interfaces. In Proceedings of the 24th ACM Symposium on User Interface Software and Technology. ACM, 23--32. Google ScholarDigital Library
- Walter S. Lasecki, Young Chol Song, Henry Kautz, and Jeffrey P. Bigham. 2013. Real-time crowd labeling for deployable activity recognition. In Proceedings of the Conference on Computer Supported Cooperative Work. ACM, 1203--1212. Google ScholarDigital Library
- Matthew Lease, Jessica Hullman, Jeffrey P. Bigham, Michael S. Bernstein, Juho Kim, Walter S. Lasecki, Saeideh Bakhshi, Tanushree Mitra, and Robert C. Miller. 2013. Mechanical Turk is not anonymous. Soc. Sci. Res. Netw. (2013).Google Scholar
- Christopher Lin, Mausam Mausam, and Daniel Weld. 2012. Dynamically switching between synergistic workflows for crowdsourcing. In Proceedings of the AAAI Conference on Artificial Intelligence. Google ScholarDigital Library
- Christopher H. Lin, Mausam Daniel, and S. Weld. 2012. Crowdsourcing control: Moving beyond multiple choice. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI’12). Google ScholarDigital Library
- Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, and Jian Sun. 2016. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3159--3167.Google ScholarCross Ref
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740--755.Google Scholar
- Greg Little, Lydia B. Chilton, Max Goldman, and Robert C. Miller. 2010. Turkit: Human computation algorithms on Mechanical Turk. In Proceedings of the 23nd ACM Symposium on User Interface Software and Technology. ACM, 57--66. Google ScholarDigital Library
- Ching Liu, Juho Kim, and Hao-Chuan Wang. 2018. ConceptScape: Collaborative concept mapping for video learning. In Proceedings of the Conference on Human Factors in Computing Systems. ACM, 387. Google ScholarDigital Library
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).Google ScholarCross Ref
- Alan Lundgard, Yiwei Yang, Maya L. Foster, and Walter S. Lasecki. 2018. Bolt: Instantaneous crowdsourcing via just-in-time training. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’18). ACM, New York, NY. Google ScholarDigital Library
- Kurt Luther, Nathan Hahn, Steven P. Dow, and Aniket Kittur. 2015. Crowdlines: Supporting synthesis of diverse information sources through crowdsourced outlines. In Proceedings of the 3rd AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Allan MacLean, Richard M. Young, Victoria M. E. Bellotti, and Thomas P. Moran. 1991. Questions, options, and criteria: Elements of design space analysis. Human--Comput. Interact. 6, 3–4 (1991), 201--250. Google ScholarDigital Library
- Andrew Mao, Ece Kamar, Yiling Chen, Eric Horvitz, Megan E. Schwamb, Chris J. Lintott, and Arfon M. Smith. 2013. Volunteering versus work for pay: Incentives and tradeoffs in crowdsourcing. In Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
- Christian A. Meissner and John C. Brigham. 2001. Thirty years of investigating the own-race bias in memory for faces: A meta-analytic review. Psych., Pub. Polic. Law 7, 1 (2001), 3.Google ScholarCross Ref
- Tom Ouyang and Yang Li. 2012. Bootstrapping personal gesture shortcuts with the wisdom of the crowd and handwriting recognition. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’12). ACM, New York, NY, 2895--2904. Google ScholarDigital Library
- Akshay Rao, Harmanpreet Kaur, and Walter S. Lasecki. 2018. Plexiglass: Multiplexing passive and active tasks for more efficient crowdsourcing. In Proceedings of the AAAI Conference on Human Computation. ACM.Google Scholar
- Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2008. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 1 (2008), 157--173. Google ScholarDigital Library
- Jeffrey M. Rzeszotarski and Aniket Kittur. 2011. Instrumenting the crowd: Using implicit behavioral measures to predict task performance. In Proceedings of the 24th ACM Symposium on User Interface Software and Technology. ACM, 13--22. Google ScholarDigital Library
- Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 254--263. Google ScholarDigital Library
- Jean Y. Song, Raymond Fok, Alan Lundgard, Fan Yang, Juho Kim, and Walter S. Lasecki. 2018. Two tools are better than one: Tool diversity as a means of improving aggregate crowd performance. In Proceedings of the 23rd International Conference on Intelligent User Interfaces (IUI’18). ACM, New York, NY, 559--570. Google ScholarDigital Library
- Saiganesh Swaminathan, Raymond Fok, Fanglin Chen, Ting-Hao Kenneth Huang, Irene Lin, Rohan Jadvani, Walter S. Lasecki, and Jeffrey P. Bigham. 2017. WearMail: On-the-go access to information in your email with a privacy-preserving human computation workflow. In Proceedings of the 30th ACM Symposium on User Interface Software and Technology. ACM, 807--815. Google ScholarDigital Library
- Shane Torbert. 2016. Applied Computer Science. Springer. 158 pages. Google ScholarDigital Library
- Peter Welinder, Steve Branson, Pietro Perona, and Serge J. Belongie. 2010. The multidimensional wisdom of crowds. In Proceedings of the Conference on Advances in Neural Information Processing Systems. Curran Associates, Inc., 2424--2432. Google ScholarDigital Library
- Jacob Whitehill, Ting Fan Wu, Jacob Bergsma, Javier R. Movellan, and Paul L. Ruvolo. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proceedings of the Conference on Advances in Neural Information Processing Systems. Curran Associates, Inc., 2035--2043. Google ScholarDigital Library
- Joseph Jay Williams, Juho Kim, Anna Rafferty, Samuel Maldonado, Krzysztof Z. Gajos, Walter S. Lasecki, and Neil Heffernan. 2016. Axis: Generating explanations at scale with learnersourcing and machine learning. In Proceedings of the 3rd ACM Conference on Learning@ Scale. ACM, 379--388. Google ScholarDigital Library
Index Terms
- FourEyes: Leveraging Tool Diversity as a Means to Improve Aggregate Accuracy in Crowdsourcing
Recommendations
Two Tools are Better Than One: Tool Diversity as a Means of Improving Aggregate Crowd Performance
IUI '18: Proceedings of the 23rd International Conference on Intelligent User InterfacesCrowdsourcing is a common means of collecting image segmentation training data for use in a variety of computer vision applications. However, designing accurate crowd-powered image segmentation systems is challenging because defining object boundaries in ...
How many crowdsourced workers should a requester hire?
Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a ...
Self-correcting crowds
CHI EA '12: CHI '12 Extended Abstracts on Human Factors in Computing SystemsMuch of the current work in crowdsourcing is focused on increasing the quality of responses. Quality issues are most often due to a small subset of low quality workers. The ability to distinguish between high and low quality workers would allow a wide ...
Comments