Abstract
Deformable Parts Models (DPM) are the current state-of-the-art for object detection. Nevertheless they seem sub-optimal in the representation of deformations. Object deformations are often continuous and not confined to big parts. Therefore we propose to replace the DPM star model based on big parts by a deformation field. This consists of a grid of small parts connected with pairwise constraints which can better handle continuous deformations. The naive application of this model for object detection would consist of a bounded sliding window approach: for each possible location of the image the best part configuration within a limited bound around this location is found. This is computationally very expensive.Instead, we propose a different inference procedure, where an iterative image-level search finds the best object hypothesis. We show that this approach is faster than bounded sliding windows yet produces comparable accuracy. Experiments further show that the deformation field can better approximate real object deformations and therefore, for certain classes, produces even better detection accuracy than state-of-the-art DPM. Finally, the same approach is adapted to model-free tracking, showing improved accuracy also in this case.
Similar content being viewed by others
Notes
In this work when referring to DPM we consider the specific implementation of Felzenszwalb et al. (2010).
Typically, when using alpha expansion the solution will not be exact. However, we experimentally found that the algorithm still works well, which is an indirect observation that the solution that is found is generally very close to the exact one. See Fig. 4 in the experimental results.
\(S(\mathbf {l},\mathbf {x},\mathbf {w})\) is the maximization defined in Eq. (9), where we make explicit the dependency on the image \(\mathbf {x}\) and \(\mathbf {w}\).
References
Alahari, K., Kohli, P., & Torr, P. H. S. (2010). Dynamic hybrid algorithms for map inference in discrete mrfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1846–1857.
Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1014–1021).
Babenko, B., Yang, M. H., & Belongie, S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1619–1632.
Batra, D., Yadollahpour, P., Guzman, A., & Shakhnarovich, G. (2012). Diverse m-best solutions in markov random fields. In Proceedings of the European conference on computer vision.
Bergtholdt, M., Kappes, J., Schmidt, S., & Schnörr, C. (2010). A study of parts-based object class detection using complete graphs. International Journal of Computer Vision, 87(1–2), 93–117.
Bourdev, L. D., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In Proceedings of the European conference on computer vision (pp. 168–181).
Boykov, Y., Veksler, O., & Zabih, R. (1998). Markov random fields with efficient approximations. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 648–656).
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Crandall, D., Felzenszwalb, P., & Huttenlocher, D. (2005). Spatial priors for part-based recognition using statistical models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 10–17).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 886–893).
Desai, C., Ramanan, D., & Fowlkes, C. (2009). Discriminative models for multi-class layout. In Proceedings of the IEEE international conference on computer vision.
Duchenne, O., Joulin, A., & Ponce, J. (2011). A graph-matching kernel for object categorization. p. 1056. Barcelona, Spain.
Everingham, M., Zisserman, A., Williams, C., & Van Gool, L. (2007). The pascal visual obiect classes challenge 2007 (voc2007) results.
Felzenszwalb, P. F., Girshick, R., & McAllester, D. (2010). Cascade object detection with deformable part models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2241–2248).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (9) 1627–1645.
Felzenszwalb, P. F., & Huttenlocher, D.P. (2004). Distance transforms of sampled functions. Technical report
Felzenszwalb, P. F., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 264–271).
Glocker, B., Komodakis, N., Tziritas, G., Navab, N., & Paragios, N. (2008). Dense image registration through mrfs and efficient linear programming. Medical Image Analysis, 12(6), 731–741.
Hoeim, D., Rother, C., & Winn, J. M. (2008). 3d layout crf for multi-view object class recognition and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17, 185–203.
Kalal, Z., Matas, J., & Mikolajczyk, K. (2010).P-n learning: Bootstrapping binary classifiers by structural constraints. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 49–56).
Kapoor, A., & Winn, J. (2006). Located hidden random fields: Learning discriminative parts for object detection. In Proceedings of the European conference on computer vision (pp. 302–315).
Kohli, P., & Torr, P. H. S. (2007). Dynamic graph cuts for efficient inference in markov random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2079–2088.
Komodakis, N., Tziritas, G., & Paragios, N. (2008). Performance vs computational efficiency for optimizing single and dynamic mrfs: Setting the state of the art with primal-dual strategies. Computer Vision and Image Understanding, 112(1), 14–29.
Lades, M., Vorbruggen, J. C., Buhmann, J., Lange, J., Malsburg, Cvd, Wurtz, R. P., et al. (1993). Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on Computers, 42(3), 300–311.
Ladicky, L., Sturgess, P., Alahari, K., Russell, C., & Torr, P. (2010). Where, what and how many? Combining object detectors and crfs. In Proceedings of the European conference on computer vision (pp. 424–437).
Ladicky, L., Torr, P. H. S., & Zisserman, A. (2012). Latent svms for human detection with a locally affine deformation field. In Proceedings of the British machine vision conference (pp. 10.1–10.11).
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of the international joint conference on artificial intelligence (pp. 674–679).
Pedersoli, M., Vedaldi, A., & Gonzàlez, J. (2011). A coarse-to-fine approach for fast deformable object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1353–1360).
Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In L. K. Saul, Y. Weiss & L. Bottou (Eds.), Advances in neural information processing systems (pp. 1097–1104). MIT Press.
Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2011). Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming, 127(1), 3–30.
Vedaldi, A., & Zisserman, A. (2009). Structured output regression for detection with partial occulsion. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams & A. Culotta (Eds.), Advances in neural information processing systems (pp. 1928–1936).
Vedaldi, A., & Zisserman, A. (2012). Sparse kernel approximations for efficient classification and detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2320–2327).
Wang, Y., Tran, D., Liao, Z. (2011). Learning hierarchical poselets for human parsing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1705–1712).
Yang, Y., & Ramanan, D. (2012). Articulated human detection with flexible mixtures-of-parts. IEEE Transactions on Pattern Analysis and Machine Intelligence 99(PrePrints), 1.
Yuille, A., Rangarajan, A., & Yuille, A. L. (2002). The concave-convex procedure (cccp). In Advances in neural information processing systems (pp. 1033–1040).
Zhang, L., & van der Maaten, L. (2013). Structure preserving object tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Acknowledgments
This work was partially supported by Toyota Motor Corporation and FP7 ERC Starting Grant 240530 COGNIMUND.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by M. Hebert.
Rights and permissions
About this article
Cite this article
Pedersoli, M., Timofte, R., Tuytelaars, T. et al. An Elastic Deformation Field Model for Object Detection and Tracking. Int J Comput Vis 111, 137–152 (2015). https://doi.org/10.1007/s11263-014-0736-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-014-0736-2