research-article

Fluid Annotation: A Human-Machine Collaboration Interface for Full Image Annotation

Authors:
Mykhaylo Andriluka

Google, Zurich, Switzerland

Google, Zurich, Switzerland
View Profile

,
Jasper R. R. Uijlings

Google, Zurich, Switzerland

Google, Zurich, Switzerland
View Profile

,
Vittorio Ferrari

Google, Zurich, Switzerland

Google, Zurich, Switzerland
View Profile

MM '18: Proceedings of the 26th ACM international conference on MultimediaOctober 2018Pages 1957–1966https://doi.org/10.1145/3240508.3241916

Published:15 October 2018Publication History

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 1957–1966

ABSTRACT

We introduce Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every object and background region in an image. Fluid annotation is based on three principles:(I) Strong Machine-Learning aid. We start from the output of a strong neural network model, which the annotator can edit by correcting the labels of existing regions, adding new regions to cover missing objects, and removing incorrect regions.The edit operations are also assisted by the model.(II) Full image annotation in a single pass. As opposed to performing a series of small annotation tasks in isolation [51,68], we propose a unified interface for full image annotation in a single pass.(III) Empower the annotator.We empower the annotator to choose what to annotate and in which order. This enables concentrating on what the ma-chine does not already know, i.e. putting human effort only on the errors it made. This helps using the annotation budget effectively.

Through extensive experiments on the COCO+Stuff dataset [11,51], we demonstrate that Fluid Annotation leads to accurate an-notations very efficiently, taking 3x less annotation time than the popular LabelMe interface [70].

References

D. Acuna, H. Ling, A. Kar, and S. Fidler. 2018. Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNNGoogle Scholar
. (2018).Google Scholar
P. Arbeláez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik. 2014. Multiscale Combinatorial Grouping. In CVPR .Google Scholar
A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei. 2016. What's the point: Semantic segmentation with point supervision. ECCV .Google Scholar
S. Bell, P. Upchurch, N. Snavely, and K. Bala. 2015. Material Recognition in the Wild with the Materials in Context Database. In CVPR .Google Scholar
T. Berg and D.A. Forsyth. 2006. Animals on the web. In CVPR . Google ScholarDigital Library
H. Bilen and A. Vedaldi. 2016. Weakly Supervised Deep Detection Networks. In CVPR .Google Scholar
Arijit Biswas and Devi Parikh. 2013. Simultaneous active learning of classifiers & attributes via relative feedback. In CVPR . Google ScholarDigital Library
Y. Boykov and M. P. Jolly. 2001. Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images. In ICCV .Google Scholar
Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder, Pietro Perona, and Serge Belongie. 2010. Visual recognition with humans in the loop. In ECCV . Google ScholarDigital Library
H. Caesar, J.R.R. Uijlings, and V. Ferrari. 2018. COCO-Stuff: Thing and Stuff Classes in Context. In CVPR .Google Scholar
J. Carreira and C. Sminchisescu. 2010. Constrained Parametric Min-Cuts for Automatic Object Segmentation. In CVPR .Google Scholar
L. Castrejón, K. Kundu, R. Urtasun, and S. Fidler. 2017. Annotating object instances with a polygon-rnn. In CVPR .Google Scholar
L.-C. Chen, A. Hermans, F. Schroff G. Papandreou, P. Wang, and H. Adam. 2017. MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features. ArXiv (2017).Google Scholar
L-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A.L. Yuille. 2018. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. on PAMI (2018).Google Scholar
R.G. Cinbis, J. Verbeek, and C. Schmid. 2014. Multi-fold MIL Training for Weakly Supervised Object Localization. In CVPR . Google ScholarDigital Library
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. CVPR (2016).Google Scholar
J. Dai, K. He, Y. Li, S. Ren, and J. Sun. 2016. Instance-sensitive Fully Convolutional Networks. In ECCV .Google Scholar
Jia Deng, Olga Russakovsky, Jonathan Krause, Michael S. Bernstein, Alex Berg, and Li Fei-Fei. 2014. Scalable Multi-label Annotation. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (CHI '14). ACM, 3099--3102. Google ScholarDigital Library
T. Deselaers, B. Alexe, and V. Ferrari. 2010. Localizing Objects while Learning Their Appearance. In ECCV . Google ScholarDigital Library
I. Endres and D. Hoiem. 2014. Category-Independent Object Proposals with Diverse Ranking. IEEE Trans. on PAMI , Vol. 36, 2 (2014), 222--234. Google ScholarDigital Library
M. Everingham, S. Eslami, L. van Gool, C. Williams, J. Winn, and A. Zisserman. 2015. The PASCAL Visual Object Classes Challenge: A Retrospective. IJCV (2015). Google ScholarDigital Library
P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. 2010. Object Detection with Discriminatively Trained Part Based Models. IEEE Trans. on PAMI , Vol. 32, 9 (2010). Google ScholarDigital Library
R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. 2010. Learning Object Categories From Internet Image Searches. In Proceedings of the IEEE.Google Scholar
Y. Freund and R.E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences (1997). Google ScholarDigital Library
R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR . Google ScholarDigital Library
Greg Griffin, Alex Holub, and Pietro Perona. 2007. The Caltech-256. Technical Report. Caltech.Google Scholar
B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik. 2014. Simultaneous Detection and Segmentation. In ECCV .Google Scholar
M. Haußmann, F.A. Hamprecht, and M. Kandemir. 2017. Variational Bayesian Multiple Instance Learning with Gaussian Processes. In CVPR .Google Scholar
K. He, G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask R-CNN. In ICCV .Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. CVPR .Google Scholar
J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In CVPR .Google Scholar
M. Huh, P. Agrawal, and A.A. Efros. 2016. What makes ImageNet good for transfer learning? NIPS LSCVS workshop .Google Scholar
S. Jain and K. Grauman. 2016. Click Carving: Segmenting Objects in Video with Point Clicks. In Proceedings of the Fourth AAAI Conference on Human Computation and Crowdsourcing .Google Scholar
Suyog Dutt Jain and Kristen Grauman. 2013. Predicting sufficient annotation strength for interactive foreground segmentation. In ICCV .Google Scholar
B. Jin, M.V. Ortiz-Segovia, and S. Süsstrunk. 2017. Webly supervised semantic segmentation. In CVPR .Google Scholar
Ajay J Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. 2009. Multi-class active learning for image classification. In CVPR .Google Scholar
V. Kantorov, M. Oquab, M. Cho, and I. Laptev. 2010. ContextLocNet: Context-aware Deep Network Models for Weakly Supervised Localization. In ECCV .Google Scholar
Ashish Kapoor, Kristen Grauman, Raquel Urtasun, and Trevor Darrell. 2007. Active learning with gaussian processes for object categorization. In ICCV .Google Scholar
A. Khoreva, R. Benenson, J. Hosang, M. Hein, and B. Schiele. 2017. Simple does it: Weakly supervised instance and semantic segmentation. In CVPR .Google Scholar
A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár. 2018. Panoptic Segmentation. In ArXiv.Google Scholar
A. Kolesnikov and C.H. Lampert. 2016. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In ECCV .Google Scholar
K. Konyushkova, J.R.R. Uijlings, C. Lampert, and V. Ferrari. 2018. Learning Intelligent Dialogs for Bounding Box Annotation. In CVPR .Google Scholar
Adriana Kovashka, Sudheendra Vijayanarasimhan, and Kristen Grauman. 2011. Actively selecting annotations among objects and attributes. In ICCV . Google ScholarDigital Library
I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-El-Haija, A. Kuznetsova, H. Rom, J. Uijlings, S. Popov, S. Kamali, M. Malloci, J. Pont-Tuset, A. Veit, S. Belongie, V. Gomes, A. Gupta, C. Sun, G. Chechik, D. Cai, Z. Feng, D. Narayanan, and K. Murphy. 2017. OpenImages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://storage.googleapis.com/openimages/web/index.html (2017).Google Scholar
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS . Google ScholarDigital Library
A. Li, A. Jabri, A. Joulin, and L. van der Maaten. 2017. Learning Visual N-Grams from Web Data. ICCV .Google Scholar
X. Li, L. Chen, L. Zhang, F. Lin, and W-Y. Ma. 2006. Image Annotation by Large-scale Content-based Image Retrieval. In ACM Multimedia . Google ScholarDigital Library
J.H. Liew, Y. Wei, W. Xiong, S-H. Ong, and J. Feng. 2017. Regional interactive image segmentation networks. In ICCV .Google Scholar
D. Lin, J. Dai, J. Jia, K. He, and J. Sun. 2016. ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation. In CVPR .Google Scholar
T-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C.L. Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV .Google Scholar
W. Liu, A. Rabinovich, and A.C. Berg. 2016. ParseNet: Looking Wider to See Better. In ICLR workshop .Google Scholar
J. Long, E. Shelhamer, and T. Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. In CVPR .Google Scholar
D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y. Li, A. Bharambe, and L. van der Maaten. 2018. Exploring the limits of weakly supervised pretraining. In ArXiv .Google Scholar
K.-K. Maninis, S. Caelles, J. Pont-Tuset, and L. Van Gool. 2018. Deep Extreme Cut: From Extreme Points to Object Segmentation. In CVPR .Google Scholar
Pascal Mettes, Jan C van Gemert, and Cees GM Snoek. 2016. Spot On: Action Localization from Pointly-Supervised Proposals. In ECCV .Google Scholar
R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille. 2014. The role of context for object detection and semantic segmentation in the wild. In CVPR . Google ScholarDigital Library
R. Mottaghi, S. Fidler, J. Yao, R. Urtasun, and D. Parikh. 2013. Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs. In CVPR . 3143--3150. Google ScholarDigital Library
N. S. Nagaraja, F. R. Schmidt, and T. Brox. 2015. Video Segmentation with Just a Few Strokes. In ICCV . Google ScholarDigital Library
Dim P Papadopoulos, Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. 2017a. Extreme clicking for efficient object annotation. In ICCV .Google Scholar
Dim P Papadopoulos, Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. 2017b. Training object class detectors with click supervision. In CVPR .Google Scholar
D. P. Papadopoulos, Jasper R. R. Uijlings, F. Keller, and V. Ferrari. 2016. We don't need no bounding-boxes: Training object class detectors using only human verification. In CVPR .Google Scholar
Amar Parkash and Devi Parikh. 2012. Attributes for classifier feedback. In ECCV . Google ScholarDigital Library
D. Pathak, P. Kr"ahenbuhl, and T. Darrell. 2015. Constrained convolutional neural networks for weakly supervised segmentation. In ICCV . Google ScholarDigital Library
Guo-Jun Qi, Xian-Sheng Hua, Yong Rui, Jinhui Tang, and Hong-Jiang Zhang. 2008. Two-dimensional active learning for image classification. In CVPR .Google Scholar
S. Ren, K. He, R. Girshick, and J. Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS . Google ScholarDigital Library
C. Rother, V. Kolmogorov, and A. Blake. 2004. GrabCut: interactive foreground extraction using iterated graph cuts. SIGGRAPH , Vol. 23, 3 (2004), 309--314. Google ScholarDigital Library
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. Berg, and L. Fei-Fei. 2015a. ImageNet Large Scale Visual Recognition Challenge. IJCV (2015). Google ScholarDigital Library
O. Russakovsky, L-J. Li, and L. Fei-Fei. 2015b. Best of both worlds: human-machine collaboration for object annotation. In CVPR .Google Scholar
B. Russel and A. Torralba. 2008. LabelMe: a database and web-based tool for image annotation. IJCV , Vol. 77, 1--3 (2008), 157--173. Google ScholarDigital Library
B. C. Russell, K. P. Murphy, and W. T. Freeman. 2008. LabelMe: a database and web-based tool for image annotation. IJCV (2008). Google ScholarDigital Library
F. Schroff, D. Kalenichenko, and J. Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In CVPR .Google Scholar
J. Shotton, J. Winn, C. Rother, and A. Criminisi. 2009. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Appearance, Shape and Context. IJCV , Vol. 81, 1 (2009), 2--23. Google ScholarDigital Library
A. Shrivastava, A. Gupta, and R. Girshick. 2016. Training region-based object detectors with online hard example mining. In CVPR .Google Scholar
Behjat Siddiquie and Abhinav Gupta. 2010. Beyond active noun tagging: Modeling contextual interactions for multi-class active learning. In CVPR .Google Scholar
K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR .Google Scholar
H. Su, J. Deng, and L. Fei-Fei. 2012. Crowdsourcing annotations for visual object detection. In AAAI Human Computation Workshop .Google Scholar
C. Sun, A. Shrivastava, S. Singh, and A. Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In ICCV .Google Scholar
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. 2017. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. AAAI .Google Scholar
Joseph Tighe and Svetlana Lazebnik. 2013. Superparsing - Scalable Nonparametric Image Parsing with Superpixels. IJCV , Vol. 101, 2 (2013), 329--349. Google ScholarDigital Library
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. IJCV (2013). Google ScholarDigital Library
Sudheendra Vijayanarasimhan and Kristen Grauman. 2008. Multi-Level Active Prediction of Useful Image Annotations for Recognition. In NIPS . Google ScholarDigital Library
Sudheendra Vijayanarasimhan and Kristen Grauman. 2009. What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In CVPR .Google Scholar
Sudheendra Vijayanarasimhan and Kristen Grauman. 2014. Large-scale live active learning: Training object detectors with crawled data and crowds. IJCV , Vol. 108, 1--2 (2014), 97--114. Google ScholarDigital Library
Catherine Wah, Grant Van Horn, Steve Branson, Subhrajyoti Maji, Pietro Perona, and Serge Belongie. 2014. Similarity comparisons for interactive fine-grained categorization. In CVPR . Google ScholarDigital Library
T. Wang, B. Han, and J. Collomosse. 2014. TouchCut: Fast image and video segmentation using single-touch interaction. CVIU (2014). Google ScholarDigital Library
Z. Wu, C. Shen, and A. van den Hengel. 2016. Bridging Category-level and Instance-level Semantic Image Segmentation. ArXiv (2016).Google Scholar
J. Xiao, K. Ehinger, J. Hays, A. Torralba, and A. Oliva. 2014. SUN Database: Exploring a Large Collection of Scene Categories. IJCV (2014), 1--20. Google ScholarDigital Library
J. Xu, A. G. Schwing, and R. Urtasun. 2015. Learning to Segment Under Various Forms of Weak Supervision. In CVPR .Google Scholar
N. Xu, B. Price, S. Cohen, J. Yang, and T.S. Huang. 2016. Deep interactive object selection. In CVPR .Google Scholar
Angela Yao, Juergen Gall, Christian Leistner, and Luc Van Gool. 2012. Interactive object detection. In CVPR . Google ScholarDigital Library
B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba. 2017. Scene Parsing through ADE20K Dataset. In CVPR .Google Scholar
Y. Zhu, Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao. 2017. Soft Proposal Networks for Weakly Supervised Object Localization. In ICCV .Google Scholar

Index Terms

Fluid Annotation: A Human-Machine Collaboration Interface for Full Image Annotation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
      2. Computer vision tasks
        Scene understanding
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Graphical user interfaces

Recommendations

Panoptic Image Annotation with a Collaborative Assistant
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

This paper aims to reduce the time to annotate images for panoptic segmentation, which requires annotating segmentation masks and class labels for all object instances and stuff regions. We formulate our approach as a collaborative process between an ...
Read More
A survey of methods for image annotation

In order to evaluate automated image annotation and object recognition algorithms, ground truth in the form of a set of images correctly annotated with text describing each image is required. In this paper, three image annotation approaches are reviewed:...
Read More
A survey on automatic image annotation
Abstract
Automatic image annotation is a crucial area in computer vision, which plays a significant role in image retrieval, image description, and so on. Along with the internet technique developing, there are numerous images posted on the web, resulting ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China
Copyright © 2018 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2018
Check for updates
Author Tags
computer vision
human-machine collaboration
image annotation
Qualifiers
- research-article
Conference

Acceptance Rates
MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 41
  Total Citations
  View Citations
- 653
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fluid Annotation: A Human-Machine Collaboration Interface for Full Image Annotation

MM '18: Proceedings of the 26th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Panoptic Image Annotation with a Collaborative Assistant

A survey of methods for image annotation

A survey on automatic image annotation