Skip to main content
Log in

An Elastic Deformation Field Model for Object Detection and Tracking

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Deformable Parts Models (DPM) are the current state-of-the-art for object detection. Nevertheless they seem sub-optimal in the representation of deformations. Object deformations are often continuous and not confined to big parts. Therefore we propose to replace the DPM star model based on big parts by a deformation field. This consists of a grid of small parts connected with pairwise constraints which can better handle continuous deformations. The naive application of this model for object detection would consist of a bounded sliding window approach: for each possible location of the image the best part configuration within a limited bound around this location is found. This is computationally very expensive.Instead, we propose a different inference procedure, where an iterative image-level search finds the best object hypothesis. We show that this approach is faster than bounded sliding windows yet produces comparable accuracy. Experiments further show that the deformation field can better approximate real object deformations and therefore, for certain classes, produces even better detection accuracy than state-of-the-art DPM. Finally, the same approach is adapted to model-free tracking, showing improved accuracy also in this case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. In this work when referring to DPM we consider the specific implementation of Felzenszwalb et al. (2010).

  2. Typically, when using alpha expansion the solution will not be exact. However, we experimentally found that the algorithm still works well, which is an indirect observation that the solution that is found is generally very close to the exact one. See Fig. 4 in the experimental results.

    Fig. 4
    figure 4

    Image-level inference versus bounded sliding window. We compare EDFM with image-level inference (with different number of hypotheses) with bounded sliding window in terms of AP and average computational time on Pascal VOC 2007 bicycles. In the image-level inference, when varying the number of hypotheses we obtain a different trade-off between precision and recall. In terms of time, our method is always much faster than the bounded sliding window

  3. \(S(\mathbf {l},\mathbf {x},\mathbf {w})\) is the maximization defined in Eq. (9), where we make explicit the dependency on the image \(\mathbf {x}\) and \(\mathbf {w}\).

References

  • Alahari, K., Kohli, P., & Torr, P. H. S. (2010). Dynamic hybrid algorithms for map inference in discrete mrfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1846–1857.

    Article  Google Scholar 

  • Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1014–1021).

  • Babenko, B., Yang, M. H., & Belongie, S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1619–1632.

    Article  Google Scholar 

  • Batra, D., Yadollahpour, P., Guzman, A., & Shakhnarovich, G. (2012). Diverse m-best solutions in markov random fields. In Proceedings of the European conference on computer vision.

  • Bergtholdt, M., Kappes, J., Schmidt, S., & Schnörr, C. (2010). A study of parts-based object class detection using complete graphs. International Journal of Computer Vision, 87(1–2), 93–117.

    Article  MathSciNet  Google Scholar 

  • Bourdev, L. D., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In Proceedings of the European conference on computer vision (pp. 168–181).

  • Boykov, Y., Veksler, O., & Zabih, R. (1998). Markov random fields with efficient approximations. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 648–656).

  • Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.

    Article  Google Scholar 

  • Crandall, D., Felzenszwalb, P., & Huttenlocher, D. (2005). Spatial priors for part-based recognition using statistical models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 10–17).

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 886–893).

  • Desai, C., Ramanan, D., & Fowlkes, C. (2009). Discriminative models for multi-class layout. In Proceedings of the IEEE international conference on computer vision.

  • Duchenne, O., Joulin, A., & Ponce, J. (2011). A graph-matching kernel for object categorization. p. 1056. Barcelona, Spain.

  • Everingham, M., Zisserman, A., Williams, C., & Van Gool, L. (2007). The pascal visual obiect classes challenge 2007 (voc2007) results.

  • Felzenszwalb, P. F., Girshick, R., & McAllester, D. (2010). Cascade object detection with deformable part models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2241–2248).

  • Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (9) 1627–1645.

  • Felzenszwalb, P. F., & Huttenlocher, D.P. (2004). Distance transforms of sampled functions. Technical report

  • Felzenszwalb, P. F., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).

  • Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 264–271).

  • Glocker, B., Komodakis, N., Tziritas, G., Navab, N., & Paragios, N. (2008). Dense image registration through mrfs and efficient linear programming. Medical Image Analysis, 12(6), 731–741.

    Article  Google Scholar 

  • Hoeim, D., Rother, C., & Winn, J. M. (2008). 3d layout crf for multi-view object class recognition and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17, 185–203.

    Article  Google Scholar 

  • Kalal, Z., Matas, J., & Mikolajczyk, K. (2010).P-n learning: Bootstrapping binary classifiers by structural constraints. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 49–56).

  • Kapoor, A., & Winn, J. (2006). Located hidden random fields: Learning discriminative parts for object detection. In Proceedings of the European conference on computer vision (pp. 302–315).

  • Kohli, P., & Torr, P. H. S. (2007). Dynamic graph cuts for efficient inference in markov random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2079–2088.

    Article  Google Scholar 

  • Komodakis, N., Tziritas, G., & Paragios, N. (2008). Performance vs computational efficiency for optimizing single and dynamic mrfs: Setting the state of the art with primal-dual strategies. Computer Vision and Image Understanding, 112(1), 14–29.

    Article  Google Scholar 

  • Lades, M., Vorbruggen, J. C., Buhmann, J., Lange, J., Malsburg, Cvd, Wurtz, R. P., et al. (1993). Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on Computers, 42(3), 300–311.

    Article  Google Scholar 

  • Ladicky, L., Sturgess, P., Alahari, K., Russell, C., & Torr, P. (2010). Where, what and how many? Combining object detectors and crfs. In Proceedings of the European conference on computer vision (pp. 424–437).

  • Ladicky, L., Torr, P. H. S., & Zisserman, A. (2012). Latent svms for human detection with a locally affine deformation field. In Proceedings of the British machine vision conference (pp. 10.1–10.11).

  • Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of the international joint conference on artificial intelligence (pp. 674–679).

  • Pedersoli, M., Vedaldi, A., & Gonzàlez, J. (2011). A coarse-to-fine approach for fast deformable object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1353–1360).

  • Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In L. K. Saul, Y. Weiss & L. Bottou (Eds.), Advances in neural information processing systems (pp. 1097–1104). MIT Press.

  • Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2011). Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming, 127(1), 3–30.

  • Vedaldi, A., & Zisserman, A. (2009). Structured output regression for detection with partial occulsion. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams & A. Culotta (Eds.), Advances in neural information processing systems (pp. 1928–1936).

  • Vedaldi, A., & Zisserman, A. (2012). Sparse kernel approximations for efficient classification and detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2320–2327).

  • Wang, Y., Tran, D., Liao, Z. (2011). Learning hierarchical poselets for human parsing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1705–1712).

  • Yang, Y., & Ramanan, D. (2012). Articulated human detection with flexible mixtures-of-parts. IEEE Transactions on Pattern Analysis and Machine Intelligence 99(PrePrints), 1.

  • Yuille, A., Rangarajan, A., & Yuille, A. L. (2002). The concave-convex procedure (cccp). In Advances in neural information processing systems (pp. 1033–1040).

  • Zhang, L., & van der Maaten, L. (2013). Structure preserving object tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).

  • Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).

Download references

Acknowledgments

This work was partially supported by Toyota Motor Corporation and FP7 ERC Starting Grant 240530 COGNIMUND.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Pedersoli.

Additional information

Communicated by M. Hebert.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pedersoli, M., Timofte, R., Tuytelaars, T. et al. An Elastic Deformation Field Model for Object Detection and Tracking. Int J Comput Vis 111, 137–152 (2015). https://doi.org/10.1007/s11263-014-0736-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0736-2

Keywords

Navigation