The Pascal Visual Object Classes Challenge: A Retrospective

Everingham, Mark; Eslami, S. M. Ali; Van Gool, Luc; Williams, Christopher K. I.; Winn, John; Zisserman, Andrew

doi:10.1007/s11263-014-0733-5

The Pascal Visual Object Classes Challenge: A Retrospective

Published: 25 June 2014

Volume 111, pages 98–136, (2015)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Mark Everingham¹,
S. M. Ali Eslami²,
Luc Van Gool^3,4,
Christopher K. I. Williams⁵,
John Winn² &
…
Andrew Zisserman⁶

23k Accesses
4318 Citations
21 Altmetric
1 Mention
Explore all metrics

Abstract

The Pascal Visual Object Classes (VOC) challenge consists of two components: (i) a publicly available dataset of images together with ground truth annotation and standardised evaluation software; and (ii) an annual competition and workshop. There are five challenges: classification, detection, segmentation, action classification, and person layout. In this paper we provide a review of the challenge from 2008–2012. The paper is intended for two audiences: algorithm designers, researchers who want to see what the state of the art is, as measured by performance on the VOC datasets, along with the limitations and weak points of the current generation of algorithms; and, challenge designers, who want to see what we as organisers have learnt from the process and our recommendations for the organisation of future challenges. To analyse the performance of submitted algorithms on the VOC datasets we introduce a number of novel evaluation methods: a bootstrapping method for determining whether differences in the performance of two algorithms are significant or not; a normalised average precision so that performance can be compared across classes with different proportions of positive instances; a clustering method for visualising the performance across multiple algorithms so that the hard and easy images can be identified; and the use of a joint classifier over the submitted algorithms in order to measure their complementarity and combined performance. We also analyse the community’s progress through time using the methods of Hoiem et al. (Proceedings of European Conference on Computer Vision, 2012) to identify the types of occurring errors. We conclude the paper with an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

The Open Images Dataset V4

Article 13 March 2020

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Notes

Pascal stands for pattern analysis, statistical modelling and computational learning. It was an EU Network of Excellence funded project under the IST Programme of the European Union.
Matlab ® is a registered trademark of MathWorks, Inc.

References

Alexe, B., Deselaers, T., & Ferrari, V. (2010). What is an object? In Proceedings of Conference on Computer Vision and Pattern Recognition (pp. 73–80).
Alexiou, I., & Bharath, A. (2012). Efficient Kernels couple visual words through categorical opponency. In Proceedings of British Machine Vision Conference.
Bertail, P., Clémençon, S. J., & Vayatis, N. (2009). On bootstrapping the ROC curve. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in Neural Information Processing Systems (Vol. 21, pp. 137–144). Red Hook, NY: Curran Associates, Inc.
Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Proceedings of European Conference on Computer Vision.
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Chen, Q., Song, Z., Hua, Y., Huang, Z., & Yan, S. (2012). Generalized hierarchical matching for image classification. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Csurka, G., Dance, C., Fan, L., Williamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of ECCV2004 Workshop on Statistical Learning in Computer Vision (pp. 59–74).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. CoRR abs/1310.1531.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88, 303–338.
Article Google Scholar
Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In Proceedings of Conference on Computer Vision and Pattern Recognition, IEEE (pp. 1778– 1785).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Flickr website. (2013). http://www.flickr.com/.
Girshick, R. B., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Hall, P., Hyndman, R., & Fan, Y. (2004). Nonparametric confidence intervals for receiver operating characteristic curves. Biometrika, 91, 743–50.
Article MATH MathSciNet Google Scholar
Hoai, M., Ladicky, L., & Zisserman, A. (2012). Action Recognition from Still Images by Aligning Body Parts. http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/workshop/segmentation_action_layout.pdf. Slides contained in the presentation by Luc van Gool on Overview and results of the segmentation challenge and action taster.
Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In Proceedings of European Conference on Computer Vision.
Ion, A., Carreira, J., Sminchisescu, C. (2011a). Image segmentation by figure-ground composition into maximal cliques. In Proceedings of International Conference on Computer Vision.
Ion, A., Carreira, J., & Sminchisescu, C. (2011b). Probabilistic joint image segmentation and labeling. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 24, pp. 1827–1835). Red Hook, NY: Curran Associates, Inc.
Karaoglu, S., Van Gemert, J., & Gevers, T. (2012). Object reading: Text recognition for object recognition. In Proceedings of ECCV 2012 Workshops and Gemonstrations.
Khan, F., Anwer, R., Van de Weijer, J., Bagdanov, A., Vanrell, M., & Lopez, A. M. (2012a). Color attributes for object detection. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Khan, F., Van de Weijer, J., & Vanrell, M. (2012b). Modulating shape features by color attention for object recognition. International Journal of Computer Vision, 98(1), 49–64.
Article Google Scholar
Khosla, A., Yao, B., & Fei-Fei, L. (2011). Combining randomization and discrimination for fine-grained image categorization. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 25, pp. 1106–1114). Red Hook, NY: Curran Associates, Inc.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of Conference on Computer Vision and Pattern Recognition (pp 2169–2178).
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Proceedings of ECCV Workshop on Statistical Learning in Computer Vision.
Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel & A. Culotta (Eds.), Advances in Neural Information Processing Systems (Vol. 23, pp. 1324–1332). Red Hook, NY: Curran Associates, Inc.http://papers.nips.cc/paper/4043-learning-to-count-objects-in-images.pdf
Li, F., Carreira, J., Lebanon, G., & Sminchisescu, C. (2013). Composite statistical inference for semantic segmentation. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91– 110.
Article Google Scholar
Nanni, L., & Lumini, A. (2013). Heterogeneous bag-of-features for object/scene recognition. Applied Soft Computing, 13(4), 2171–2178.
Article Google Scholar
O’Connor, B. (2010). A response to “comparing Precision-Recall curves the Bayesian way?”. A comment on the blog post by Bob Carpenter on Comparing Precision-Recall Curves the Bayesian Way? http://lingpipe-blog.com/2010/01/29/comparing-precision-recall-curves-bayesian-way/.
Oquab, M., Bottou, L., Laptev, I., Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Russakovsky, O., Lin, Y., Yu, K., & Fei-Fei, L. (2012). Object-centric spatial pooling for image classification. In Proceedings of European Conference on Computer Vision.
Russell, B., Torralba, A., Murphy, K., & Freeman, W. T. (2008). LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173. http://labelme.csail.mit.edu/
Salton, G., & Mcgill, M. J. (1986). Introduction to modern information retrieval. New York, NY: McGraw-Hill Inc.
Google Scholar
Sener, F., Bas, C., Ikizler-Cinbis, N. (2012). On recognizing actions in still images via multiple features. In Proceedings of ECCV Workshop on Action Recognition and Pose Estimation in Still Images.
Song, Z., Chen, Q., Huang, Z., Hua, Y., & Yan, S. (2011). Contextualizing object detection and classification. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Pascal VOC 2012 challenge results. (2012). http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/results/index.html.
Pascal VOC annotation guidelines. (2012). http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/guidelines.html.
Pascal VOC best practice guidelines. (2012). http://pascallin.ecs.soton.ac.uk/challenges/VOC/#bestpractice.
Pascal VOC evaluation server. (2012) http://host.robots.ox.ac.uk:8080/.
Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In Proceedings of Conference on Computer Vision and Pattern Recognition, IEEE (pp. 1521–1528).
Uijlings, J., Van de Sande, K., Gevers, T., & Smeulders, A. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.
Van de Sande, K., Uijlings, J., Gevers, T., & Smeulders, A. (2011). Segmentation as selective search for object recognition. In Proceedings of International Conference on Computer Vision.
Van Gemert, J. (2011). Exploiting photographic style for category-level image classification by generalizing the spatial pyramid. In Proceedings of International Conference on Multimedia Retrieval.
Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In International Conference on Computer Vision.
Viola, P., & Jones, M. (2004). Robust real-time object detection. International Journal of Computer Vision, 57(2), 137–154.
Article Google Scholar
Wang, X., Lin, L., Huang, L., & Yan, S. (2013). Incorporating structural alternatives and sharing into hierarchy for multiclass object recognition and detection. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Wasserman, L. (2004). All of statistics. Berlin: Springer.
Book MATH Google Scholar
Xia, W., Song, Z., Feng, J., Cheong, L. F., & Yan, S. (2012). Segmentation over detection by coupled global and local sparse representations. In Proceedings of European Conference on Computer Vision.
Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR abs/1311.2901.
Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In Proceedings of Conference on Computer Vision and Pattern Recognition.
Zisserman, A., Winn, J., Fitzgibbon, A., Van Gool, L., Sivic, J., Williams, C., et al. (2012). In memoriam: Mark Everingham. Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2081–2082.
Article Google Scholar

Download references

Acknowledgments

First, we thank all the groups that participated in the challenge—without these VOC would just have been a dataset. Second, we would like to thank those who have been ‘friends of the challenge’—making helpful suggestions and criticisms throughout: Alyosha Efros, David Forsyth, Derek Hoiem, Ivan Laptev, Jitendra Malik and Bill Triggs. Third, we thank those who have given additional assistance in developing and maintaining the PASCAL challenge: Marcin Eichner, Sam Johnson, Lubor Ladicky, Marcin Marszalek, Arpit Mittal and Andrea Vedaldi. In particular, we thank Alexander Sorokin for the first version of the evaluation server, and Yusuf Aytar for subsequent versions. Fourth, we gratefully acknowledge the annotators from VOC2008 onwards: Yusuf Aytar, Lucia Ballerini, Jan Hendrik Becker, Hakan Bilen, Patrick Buehler, Kian Ming Adam Chai, Ken Chatfield, Mircea Cimpoi, Miha Drenik, Chris Engels, Basura Fernando, Adrien Gaidon, Christoph Godau, Bertan Gunyel, Hedi Harzallah, Nicolas Heess, Phoenix/Xuan Huang, Sam Johnson, Zdenek Kalal, Jyri Kivinen, Lubor Ladicky, Marcin Marszalek, Markus Mathias, Alastair Moore, Maria-Elena Nilsback, Patrick Ott, Kristof Overdulve, Konstantinos Rematas, Florian Schroff, Gilad Sharir, Glenn Sheasby, Alexander Sorokin, Paul Sturgess, David Tingdahl, Diana Turcsany, Hirofumi Uemura, Jan Van Gemert, Johan Van Rompay, Mathias Vercruysse, Vibhav Vineet, Martin Vogt, Josiah Wang, Ziming Zhang, Shuai Kyle Zheng. Fifth, we are grateful to the IST Programme of the EC under the PASCAL2 Network of Excellence, IST-2007-216886 who provided the funding for running the VOC challenge, and Michele Sebag and John-Shawe Taylor who coordinated the challenge programme and PASCAL2 respectively. Finally, we would like to thank the anonymous reviewers for their encouragement and feedback—their suggestions led to significant improvements to the paper.

Author information

Authors and Affiliations

University of Leeds, Leeds, UK
Mark Everingham
Microsoft Research, Cambridge, UK
S. M. Ali Eslami & John Winn
KU Leuven, Leuven, Belgium
Luc Van Gool
ETH, Zurich, Switzerland
Luc Van Gool
University of Edinburgh, Edinburgh, UK
Christopher K. I. Williams
University of Oxford, Oxford, UK
Andrew Zisserman

Authors

Mark Everingham
View author publications
You can also search for this author in PubMed Google Scholar
S. M. Ali Eslami
View author publications
You can also search for this author in PubMed Google Scholar
Luc Van Gool
View author publications
You can also search for this author in PubMed Google Scholar
Christopher K. I. Williams
View author publications
You can also search for this author in PubMed Google Scholar
John Winn
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. M. Ali Eslami.

Additional information

Communicated by M. Hebert.

Mark Everingham, who died in 2012, was the key member of the VOC project. His contribution was crucial and substantial. For these reasons he is included as the posthumous first author of this paper. An appreciation of his life and work can be found in Zisserman et al. (2012).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Everingham, M., Eslami, S.M.A., Van Gool, L. et al. The Pascal Visual Object Classes Challenge: A Retrospective. Int J Comput Vis 111, 98–136 (2015). https://doi.org/10.1007/s11263-014-0733-5

Download citation

Received: 12 September 2013
Accepted: 23 May 2014
Published: 25 June 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11263-014-0733-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Pascal Visual Object Classes Challenge: A Retrospective

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

The Open Images Dataset V4

ImageNet Large Scale Visual Recognition Challenge

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Pascal Visual Object Classes Challenge: A Retrospective

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

The Open Images Dataset V4

ImageNet Large Scale Visual Recognition Challenge

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation