An Elastic Deformation Field Model for Object Detection and Tracking

Pedersoli, Marco; Timofte, Radu; Tuytelaars, Tinne; Van Gool, Luc

doi:10.1007/s11263-014-0736-2

An Elastic Deformation Field Model for Object Detection and Tracking

Published: 24 June 2014

Volume 111, pages 137–152, (2015)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Marco Pedersoli¹,
Radu Timofte¹,
Tinne Tuytelaars¹ &
…
Luc Van Gool¹

1149 Accesses
12 Citations
3 Altmetric
Explore all metrics

Abstract

Deformable Parts Models (DPM) are the current state-of-the-art for object detection. Nevertheless they seem sub-optimal in the representation of deformations. Object deformations are often continuous and not confined to big parts. Therefore we propose to replace the DPM star model based on big parts by a deformation field. This consists of a grid of small parts connected with pairwise constraints which can better handle continuous deformations. The naive application of this model for object detection would consist of a bounded sliding window approach: for each possible location of the image the best part configuration within a limited bound around this location is found. This is computationally very expensive.Instead, we propose a different inference procedure, where an iterative image-level search finds the best object hypothesis. We show that this approach is faster than bounded sliding windows yet produces comparable accuracy. Experiments further show that the deformation field can better approximate real object deformations and therefore, for certain classes, produces even better detection accuracy than state-of-the-art DPM. Finally, the same approach is adapted to model-free tracking, showing improved accuracy also in this case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

Notes

In this work when referring to DPM we consider the specific implementation of Felzenszwalb et al. (2010).
Typically, when using alpha expansion the solution will not be exact. However, we experimentally found that the algorithm still works well, which is an indirect observation that the solution that is found is generally very close to the exact one. See Fig. 4 in the experimental results.
Fig. 4
Image-level inference versus bounded sliding window. We compare EDFM with image-level inference (with different number of hypotheses) with bounded sliding window in terms of AP and average computational time on Pascal VOC 2007 bicycles. In the image-level inference, when varying the number of hypotheses we obtain a different trade-off between precision and recall. In terms of time, our method is always much faster than the bounded sliding window
Full size image
\(S(\mathbf {l},\mathbf {x},\mathbf {w})\) is the maximization defined in Eq. (9), where we make explicit the dependency on the image \(\mathbf {x}\) and \(\mathbf {w}\).

References

Alahari, K., Kohli, P., & Torr, P. H. S. (2010). Dynamic hybrid algorithms for map inference in discrete mrfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1846–1857.
Article Google Scholar
Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1014–1021).
Babenko, B., Yang, M. H., & Belongie, S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1619–1632.
Article Google Scholar
Batra, D., Yadollahpour, P., Guzman, A., & Shakhnarovich, G. (2012). Diverse m-best solutions in markov random fields. In Proceedings of the European conference on computer vision.
Bergtholdt, M., Kappes, J., Schmidt, S., & Schnörr, C. (2010). A study of parts-based object class detection using complete graphs. International Journal of Computer Vision, 87(1–2), 93–117.
Article MathSciNet Google Scholar
Bourdev, L. D., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In Proceedings of the European conference on computer vision (pp. 168–181).
Boykov, Y., Veksler, O., & Zabih, R. (1998). Markov random fields with efficient approximations. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 648–656).
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Article Google Scholar
Crandall, D., Felzenszwalb, P., & Huttenlocher, D. (2005). Spatial priors for part-based recognition using statistical models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 10–17).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 886–893).
Desai, C., Ramanan, D., & Fowlkes, C. (2009). Discriminative models for multi-class layout. In Proceedings of the IEEE international conference on computer vision.
Duchenne, O., Joulin, A., & Ponce, J. (2011). A graph-matching kernel for object categorization. p. 1056. Barcelona, Spain.
Everingham, M., Zisserman, A., Williams, C., & Van Gool, L. (2007). The pascal visual obiect classes challenge 2007 (voc2007) results.
Felzenszwalb, P. F., Girshick, R., & McAllester, D. (2010). Cascade object detection with deformable part models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2241–2248).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (9) 1627–1645.
Felzenszwalb, P. F., & Huttenlocher, D.P. (2004). Distance transforms of sampled functions. Technical report
Felzenszwalb, P. F., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 264–271).
Glocker, B., Komodakis, N., Tziritas, G., Navab, N., & Paragios, N. (2008). Dense image registration through mrfs and efficient linear programming. Medical Image Analysis, 12(6), 731–741.
Article Google Scholar
Hoeim, D., Rother, C., & Winn, J. M. (2008). 3d layout crf for multi-view object class recognition and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17, 185–203.
Article Google Scholar
Kalal, Z., Matas, J., & Mikolajczyk, K. (2010).P-n learning: Bootstrapping binary classifiers by structural constraints. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 49–56).
Kapoor, A., & Winn, J. (2006). Located hidden random fields: Learning discriminative parts for object detection. In Proceedings of the European conference on computer vision (pp. 302–315).
Kohli, P., & Torr, P. H. S. (2007). Dynamic graph cuts for efficient inference in markov random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2079–2088.
Article Google Scholar
Komodakis, N., Tziritas, G., & Paragios, N. (2008). Performance vs computational efficiency for optimizing single and dynamic mrfs: Setting the state of the art with primal-dual strategies. Computer Vision and Image Understanding, 112(1), 14–29.
Article Google Scholar
Lades, M., Vorbruggen, J. C., Buhmann, J., Lange, J., Malsburg, Cvd, Wurtz, R. P., et al. (1993). Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on Computers, 42(3), 300–311.
Article Google Scholar
Ladicky, L., Sturgess, P., Alahari, K., Russell, C., & Torr, P. (2010). Where, what and how many? Combining object detectors and crfs. In Proceedings of the European conference on computer vision (pp. 424–437).
Ladicky, L., Torr, P. H. S., & Zisserman, A. (2012). Latent svms for human detection with a locally affine deformation field. In Proceedings of the British machine vision conference (pp. 10.1–10.11).
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of the international joint conference on artificial intelligence (pp. 674–679).
Pedersoli, M., Vedaldi, A., & Gonzàlez, J. (2011). A coarse-to-fine approach for fast deformable object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1353–1360).
Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In L. K. Saul, Y. Weiss & L. Bottou (Eds.), Advances in neural information processing systems (pp. 1097–1104). MIT Press.
Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2011). Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming, 127(1), 3–30.
Vedaldi, A., & Zisserman, A. (2009). Structured output regression for detection with partial occulsion. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams & A. Culotta (Eds.), Advances in neural information processing systems (pp. 1928–1936).
Vedaldi, A., & Zisserman, A. (2012). Sparse kernel approximations for efficient classification and detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2320–2327).
Wang, Y., Tran, D., Liao, Z. (2011). Learning hierarchical poselets for human parsing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1705–1712).
Yang, Y., & Ramanan, D. (2012). Articulated human detection with flexible mixtures-of-parts. IEEE Transactions on Pattern Analysis and Machine Intelligence 99(PrePrints), 1.
Yuille, A., Rangarajan, A., & Yuille, A. L. (2002). The concave-convex procedure (cccp). In Advances in neural information processing systems (pp. 1033–1040).
Zhang, L., & van der Maaten, L. (2013). Structure preserving object tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).

Download references

Acknowledgments

This work was partially supported by Toyota Motor Corporation and FP7 ERC Starting Grant 240530 COGNIMUND.

Author information

Authors and Affiliations

KU Leuven, ESAT-PSI-VISICS/iMinds Kasteelpark Arenberg 10, 3001 , Leuven, Belgium
Marco Pedersoli, Radu Timofte, Tinne Tuytelaars & Luc Van Gool

Authors

Marco Pedersoli
View author publications
You can also search for this author in PubMed Google Scholar
Radu Timofte
View author publications
You can also search for this author in PubMed Google Scholar
Tinne Tuytelaars
View author publications
You can also search for this author in PubMed Google Scholar
Luc Van Gool
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Pedersoli.

Additional information

Communicated by M. Hebert.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pedersoli, M., Timofte, R., Tuytelaars, T. et al. An Elastic Deformation Field Model for Object Detection and Tracking. Int J Comput Vis 111, 137–152 (2015). https://doi.org/10.1007/s11263-014-0736-2

Download citation

Received: 16 October 2013
Accepted: 30 May 2014
Published: 24 June 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11263-014-0736-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Elastic Deformation Field Model for Object Detection and Tracking

Abstract

Access this article