ABSTRACT
We introduce Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every object and background region in an image. Fluid annotation is based on three principles:(I) Strong Machine-Learning aid. We start from the output of a strong neural network model, which the annotator can edit by correcting the labels of existing regions, adding new regions to cover missing objects, and removing incorrect regions.The edit operations are also assisted by the model.(II) Full image annotation in a single pass. As opposed to performing a series of small annotation tasks in isolation [51,68], we propose a unified interface for full image annotation in a single pass.(III) Empower the annotator.We empower the annotator to choose what to annotate and in which order. This enables concentrating on what the ma-chine does not already know, i.e. putting human effort only on the errors it made. This helps using the annotation budget effectively.
Through extensive experiments on the COCO+Stuff dataset [11,51], we demonstrate that Fluid Annotation leads to accurate an-notations very efficiently, taking 3x less annotation time than the popular LabelMe interface [70].
- D. Acuna, H. Ling, A. Kar, and S. Fidler. 2018. Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNNGoogle Scholar
- . (2018).Google Scholar
- P. Arbeláez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik. 2014. Multiscale Combinatorial Grouping. In CVPR .Google Scholar
- A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei. 2016. What's the point: Semantic segmentation with point supervision. ECCV .Google Scholar
- S. Bell, P. Upchurch, N. Snavely, and K. Bala. 2015. Material Recognition in the Wild with the Materials in Context Database. In CVPR .Google Scholar
- T. Berg and D.A. Forsyth. 2006. Animals on the web. In CVPR . Google ScholarDigital Library
- H. Bilen and A. Vedaldi. 2016. Weakly Supervised Deep Detection Networks. In CVPR .Google Scholar
- Arijit Biswas and Devi Parikh. 2013. Simultaneous active learning of classifiers & attributes via relative feedback. In CVPR . Google ScholarDigital Library
- Y. Boykov and M. P. Jolly. 2001. Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images. In ICCV .Google Scholar
- Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder, Pietro Perona, and Serge Belongie. 2010. Visual recognition with humans in the loop. In ECCV . Google ScholarDigital Library
- H. Caesar, J.R.R. Uijlings, and V. Ferrari. 2018. COCO-Stuff: Thing and Stuff Classes in Context. In CVPR .Google Scholar
- J. Carreira and C. Sminchisescu. 2010. Constrained Parametric Min-Cuts for Automatic Object Segmentation. In CVPR .Google Scholar
- L. Castrejón, K. Kundu, R. Urtasun, and S. Fidler. 2017. Annotating object instances with a polygon-rnn. In CVPR .Google Scholar
- L.-C. Chen, A. Hermans, F. Schroff G. Papandreou, P. Wang, and H. Adam. 2017. MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features. ArXiv (2017).Google Scholar
- L-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A.L. Yuille. 2018. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. on PAMI (2018).Google Scholar
- R.G. Cinbis, J. Verbeek, and C. Schmid. 2014. Multi-fold MIL Training for Weakly Supervised Object Localization. In CVPR . Google ScholarDigital Library
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. CVPR (2016).Google Scholar
- J. Dai, K. He, Y. Li, S. Ren, and J. Sun. 2016. Instance-sensitive Fully Convolutional Networks. In ECCV .Google Scholar
- Jia Deng, Olga Russakovsky, Jonathan Krause, Michael S. Bernstein, Alex Berg, and Li Fei-Fei. 2014. Scalable Multi-label Annotation. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (CHI '14). ACM, 3099--3102. Google ScholarDigital Library
- T. Deselaers, B. Alexe, and V. Ferrari. 2010. Localizing Objects while Learning Their Appearance. In ECCV . Google ScholarDigital Library
- I. Endres and D. Hoiem. 2014. Category-Independent Object Proposals with Diverse Ranking. IEEE Trans. on PAMI , Vol. 36, 2 (2014), 222--234. Google ScholarDigital Library
- M. Everingham, S. Eslami, L. van Gool, C. Williams, J. Winn, and A. Zisserman. 2015. The PASCAL Visual Object Classes Challenge: A Retrospective. IJCV (2015). Google ScholarDigital Library
- P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. 2010. Object Detection with Discriminatively Trained Part Based Models. IEEE Trans. on PAMI , Vol. 32, 9 (2010). Google ScholarDigital Library
- R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. 2010. Learning Object Categories From Internet Image Searches. In Proceedings of the IEEE.Google Scholar
- Y. Freund and R.E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences (1997). Google ScholarDigital Library
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR . Google ScholarDigital Library
- Greg Griffin, Alex Holub, and Pietro Perona. 2007. The Caltech-256. Technical Report. Caltech.Google Scholar
- B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik. 2014. Simultaneous Detection and Segmentation. In ECCV .Google Scholar
- M. Haußmann, F.A. Hamprecht, and M. Kandemir. 2017. Variational Bayesian Multiple Instance Learning with Gaussian Processes. In CVPR .Google Scholar
- K. He, G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask R-CNN. In ICCV .Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. CVPR .Google Scholar
- J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In CVPR .Google Scholar
- M. Huh, P. Agrawal, and A.A. Efros. 2016. What makes ImageNet good for transfer learning? NIPS LSCVS workshop .Google Scholar
- S. Jain and K. Grauman. 2016. Click Carving: Segmenting Objects in Video with Point Clicks. In Proceedings of the Fourth AAAI Conference on Human Computation and Crowdsourcing .Google Scholar
- Suyog Dutt Jain and Kristen Grauman. 2013. Predicting sufficient annotation strength for interactive foreground segmentation. In ICCV .Google Scholar
- B. Jin, M.V. Ortiz-Segovia, and S. Süsstrunk. 2017. Webly supervised semantic segmentation. In CVPR .Google Scholar
- Ajay J Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. 2009. Multi-class active learning for image classification. In CVPR .Google Scholar
- V. Kantorov, M. Oquab, M. Cho, and I. Laptev. 2010. ContextLocNet: Context-aware Deep Network Models for Weakly Supervised Localization. In ECCV .Google Scholar
- Ashish Kapoor, Kristen Grauman, Raquel Urtasun, and Trevor Darrell. 2007. Active learning with gaussian processes for object categorization. In ICCV .Google Scholar
- A. Khoreva, R. Benenson, J. Hosang, M. Hein, and B. Schiele. 2017. Simple does it: Weakly supervised instance and semantic segmentation. In CVPR .Google Scholar
- A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár. 2018. Panoptic Segmentation. In ArXiv.Google Scholar
- A. Kolesnikov and C.H. Lampert. 2016. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In ECCV .Google Scholar
- K. Konyushkova, J.R.R. Uijlings, C. Lampert, and V. Ferrari. 2018. Learning Intelligent Dialogs for Bounding Box Annotation. In CVPR .Google Scholar
- Adriana Kovashka, Sudheendra Vijayanarasimhan, and Kristen Grauman. 2011. Actively selecting annotations among objects and attributes. In ICCV . Google ScholarDigital Library
- I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-El-Haija, A. Kuznetsova, H. Rom, J. Uijlings, S. Popov, S. Kamali, M. Malloci, J. Pont-Tuset, A. Veit, S. Belongie, V. Gomes, A. Gupta, C. Sun, G. Chechik, D. Cai, Z. Feng, D. Narayanan, and K. Murphy. 2017. OpenImages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://storage.googleapis.com/openimages/web/index.html (2017).Google Scholar
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS . Google ScholarDigital Library
- A. Li, A. Jabri, A. Joulin, and L. van der Maaten. 2017. Learning Visual N-Grams from Web Data. ICCV .Google Scholar
- X. Li, L. Chen, L. Zhang, F. Lin, and W-Y. Ma. 2006. Image Annotation by Large-scale Content-based Image Retrieval. In ACM Multimedia . Google ScholarDigital Library
- J.H. Liew, Y. Wei, W. Xiong, S-H. Ong, and J. Feng. 2017. Regional interactive image segmentation networks. In ICCV .Google Scholar
- D. Lin, J. Dai, J. Jia, K. He, and J. Sun. 2016. ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation. In CVPR .Google Scholar
- T-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C.L. Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV .Google Scholar
- W. Liu, A. Rabinovich, and A.C. Berg. 2016. ParseNet: Looking Wider to See Better. In ICLR workshop .Google Scholar
- J. Long, E. Shelhamer, and T. Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. In CVPR .Google Scholar
- D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y. Li, A. Bharambe, and L. van der Maaten. 2018. Exploring the limits of weakly supervised pretraining. In ArXiv .Google Scholar
- K.-K. Maninis, S. Caelles, J. Pont-Tuset, and L. Van Gool. 2018. Deep Extreme Cut: From Extreme Points to Object Segmentation. In CVPR .Google Scholar
- Pascal Mettes, Jan C van Gemert, and Cees GM Snoek. 2016. Spot On: Action Localization from Pointly-Supervised Proposals. In ECCV .Google Scholar
- R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille. 2014. The role of context for object detection and semantic segmentation in the wild. In CVPR . Google ScholarDigital Library
- R. Mottaghi, S. Fidler, J. Yao, R. Urtasun, and D. Parikh. 2013. Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs. In CVPR . 3143--3150. Google ScholarDigital Library
- N. S. Nagaraja, F. R. Schmidt, and T. Brox. 2015. Video Segmentation with Just a Few Strokes. In ICCV . Google ScholarDigital Library
- Dim P Papadopoulos, Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. 2017a. Extreme clicking for efficient object annotation. In ICCV .Google Scholar
- Dim P Papadopoulos, Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. 2017b. Training object class detectors with click supervision. In CVPR .Google Scholar
- D. P. Papadopoulos, Jasper R. R. Uijlings, F. Keller, and V. Ferrari. 2016. We don't need no bounding-boxes: Training object class detectors using only human verification. In CVPR .Google Scholar
- Amar Parkash and Devi Parikh. 2012. Attributes for classifier feedback. In ECCV . Google ScholarDigital Library
- D. Pathak, P. Kr"ahenbuhl, and T. Darrell. 2015. Constrained convolutional neural networks for weakly supervised segmentation. In ICCV . Google ScholarDigital Library
- Guo-Jun Qi, Xian-Sheng Hua, Yong Rui, Jinhui Tang, and Hong-Jiang Zhang. 2008. Two-dimensional active learning for image classification. In CVPR .Google Scholar
- S. Ren, K. He, R. Girshick, and J. Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS . Google ScholarDigital Library
- C. Rother, V. Kolmogorov, and A. Blake. 2004. GrabCut: interactive foreground extraction using iterated graph cuts. SIGGRAPH , Vol. 23, 3 (2004), 309--314. Google ScholarDigital Library
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. Berg, and L. Fei-Fei. 2015a. ImageNet Large Scale Visual Recognition Challenge. IJCV (2015). Google ScholarDigital Library
- O. Russakovsky, L-J. Li, and L. Fei-Fei. 2015b. Best of both worlds: human-machine collaboration for object annotation. In CVPR .Google Scholar
- B. Russel and A. Torralba. 2008. LabelMe: a database and web-based tool for image annotation. IJCV , Vol. 77, 1--3 (2008), 157--173. Google ScholarDigital Library
- B. C. Russell, K. P. Murphy, and W. T. Freeman. 2008. LabelMe: a database and web-based tool for image annotation. IJCV (2008). Google ScholarDigital Library
- F. Schroff, D. Kalenichenko, and J. Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In CVPR .Google Scholar
- J. Shotton, J. Winn, C. Rother, and A. Criminisi. 2009. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Appearance, Shape and Context. IJCV , Vol. 81, 1 (2009), 2--23. Google ScholarDigital Library
- A. Shrivastava, A. Gupta, and R. Girshick. 2016. Training region-based object detectors with online hard example mining. In CVPR .Google Scholar
- Behjat Siddiquie and Abhinav Gupta. 2010. Beyond active noun tagging: Modeling contextual interactions for multi-class active learning. In CVPR .Google Scholar
- K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR .Google Scholar
- H. Su, J. Deng, and L. Fei-Fei. 2012. Crowdsourcing annotations for visual object detection. In AAAI Human Computation Workshop .Google Scholar
- C. Sun, A. Shrivastava, S. Singh, and A. Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In ICCV .Google Scholar
- C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. 2017. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. AAAI .Google Scholar
- Joseph Tighe and Svetlana Lazebnik. 2013. Superparsing - Scalable Nonparametric Image Parsing with Superpixels. IJCV , Vol. 101, 2 (2013), 329--349. Google ScholarDigital Library
- J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. IJCV (2013). Google ScholarDigital Library
- Sudheendra Vijayanarasimhan and Kristen Grauman. 2008. Multi-Level Active Prediction of Useful Image Annotations for Recognition. In NIPS . Google ScholarDigital Library
- Sudheendra Vijayanarasimhan and Kristen Grauman. 2009. What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In CVPR .Google Scholar
- Sudheendra Vijayanarasimhan and Kristen Grauman. 2014. Large-scale live active learning: Training object detectors with crawled data and crowds. IJCV , Vol. 108, 1--2 (2014), 97--114. Google ScholarDigital Library
- Catherine Wah, Grant Van Horn, Steve Branson, Subhrajyoti Maji, Pietro Perona, and Serge Belongie. 2014. Similarity comparisons for interactive fine-grained categorization. In CVPR . Google ScholarDigital Library
- T. Wang, B. Han, and J. Collomosse. 2014. TouchCut: Fast image and video segmentation using single-touch interaction. CVIU (2014). Google ScholarDigital Library
- Z. Wu, C. Shen, and A. van den Hengel. 2016. Bridging Category-level and Instance-level Semantic Image Segmentation. ArXiv (2016).Google Scholar
- J. Xiao, K. Ehinger, J. Hays, A. Torralba, and A. Oliva. 2014. SUN Database: Exploring a Large Collection of Scene Categories. IJCV (2014), 1--20. Google ScholarDigital Library
- J. Xu, A. G. Schwing, and R. Urtasun. 2015. Learning to Segment Under Various Forms of Weak Supervision. In CVPR .Google Scholar
- N. Xu, B. Price, S. Cohen, J. Yang, and T.S. Huang. 2016. Deep interactive object selection. In CVPR .Google Scholar
- Angela Yao, Juergen Gall, Christian Leistner, and Luc Van Gool. 2012. Interactive object detection. In CVPR . Google ScholarDigital Library
- B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba. 2017. Scene Parsing through ADE20K Dataset. In CVPR .Google Scholar
- Y. Zhu, Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao. 2017. Soft Proposal Networks for Weakly Supervised Object Localization. In ICCV .Google Scholar
Index Terms
- Fluid Annotation: A Human-Machine Collaboration Interface for Full Image Annotation
Recommendations
Panoptic Image Annotation with a Collaborative Assistant
MM '20: Proceedings of the 28th ACM International Conference on MultimediaThis paper aims to reduce the time to annotate images for panoptic segmentation, which requires annotating segmentation masks and class labels for all object instances and stuff regions. We formulate our approach as a collaborative process between an ...
A survey of methods for image annotation
In order to evaluate automated image annotation and object recognition algorithms, ground truth in the form of a set of images correctly annotated with text describing each image is required. In this paper, three image annotation approaches are reviewed:...
A survey on automatic image annotation
AbstractAutomatic image annotation is a crucial area in computer vision, which plays a significant role in image retrieval, image description, and so on. Along with the internet technique developing, there are numerous images posted on the web, resulting ...
Comments