Abstract
The α-expansion algorithm has had a significant impact in computer vision due to its generality, effectiveness, and speed. It is commonly used to minimize energies that involve unary, pairwise, and specialized higher-order terms. Our main algorithmic contribution is an extension of α-expansion that also optimizes “label costs” with well-characterized optimality bounds. Label costs penalize a solution based on the set of labels that appear in it, for example by simply penalizing the number of labels in the solution.
Our energy has a natural interpretation as minimizing description length (MDL) and sheds light on classical algorithms like K-means and expectation-maximization (EM). Label costs are useful for multi-model fitting and we demonstrate several such applications: homography detection, motion segmentation, image segmentation, and compression. Our C++ and MATLAB code is publicly available http://vision.csd.uwo.ca/code/.
Similar content being viewed by others
References
Akaike, H. (1974). A new look at statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Ayed, I. B., & Mitiche, A. (2008). A region merging prior for variational level set image segmentation. IEEE Transactions on Image Processing, 17(12), 2301–2311.
Babayev, D. A. (1974). Comments on the note of frieze. Mathematical Programming, 7(1), 249–252.
Barinova, O., Lempitsky, V., & Kohli, P. (2010). On the detection of multiple object instances using hough transforms. In IEEE conf. on computer vision and pattern recognition (CVPR), June 2010.
Birchfield, S., & Tomasi, C. (1999). Multiway cut for stereo and motion with slanted surfaces. In International conf. on computer vision (ICCV).
Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.
Blake, A., & Zisserman, A. (1987). Visual reconstruction. Cambridge: MIT Press.
Boros, E., & Hammer, P. L. (2002). Pseudo-boolean optimization. Discrete Applied Mathematics, 123(1–3), 155–225.
Boykov, Y., & Kolmogorov, V. (2003). Computing geodesics and minimal surfaces via graph cuts. In International conf. on computer vision (ICCV).
Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE transactions on pattern analysis and machine intelligence, 29(9), 1124–1137.
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE transactions on pattern analysis and machine intelligence, 23(11), 1222–1239.
Brox, T., & Weickert, J. (2004). Level set based segmentation of multiple objects. In LNCS: Vol. 3175. Pattern recognition (pp. 415–423).
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference. Berlin: Springer.
Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE transactions on pattern analysis and machine intelligence, 24(5), 603–619.
Cornuejols, G., Fisher, M. L., & Nemhauser, G. L. (1977). Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms. Management Science, 23(8), 789–810.
Cornuejols, G., Nemhauser, G. L., & Wolsey, L. A. (1983). The uncapacitated facility location problem. Technical Report 605, Op. Research, Cornell University, August.
Dahlhaus, E., Johnson, D. S., Papadimitriou, C. H., Seymour, P. D., & Yannakakis, M. (1994). The complexity of multiterminal cuts. SIAM Journal on Computing, 23(4), 864–894.
Delong, A., Osokin, A., Isack, H., & Boykov, Y. (2010). Fast approximate energy minimization with label costs. In IEEE conf. on computer vision and pattern recognition (CVPR), June 2010.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.
Everett, H. (1963). Generalized Lagrange multiplier method for solving problems of optimum allocation of resources. Operations Research, 11(3), 399–417.
Feige, U. (1998). A threshold of \(\ln n\) for approximating set cover. Journal of the ACM, 45(4), 634–652.
Figueiredo, M. A., & Jain, A. K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 381–396.
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.
Freedman, D., & Drineas, P. (2005). Energy minimization via graph cuts: settling what is possible. In IEEE conf. on computer vision and pattern recognition (CVPR), June 2005.
Frieze, A. M. (1974). A cost function property for plant location problems. Mathematical Programming, 7(1), 245–248.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.
Gersho, A., & Gray, R. M. (2001). Vector quantization and signal compression. Norwell: Kluwer Academic.
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.
Hochbaum, D. S. (1982). Heuristics for the fixed cost median problem. Mathematical Programming, 22(1), 148–162.
Hoiem, D., Rother, C., & Winn, J. (2007). 3D LayoutCRF for multi-view object class recognition and segmentation. In IEEE conf. on computer vision and pattern recognition (CVPR).
Isack, H. N., & Boykov, Y. (2011) Energy-based geometric multi-model fitting. International Journal of Computer Vision (IJCV). doi:10.1007/s11263-011-0474-7
Kleinberg, J., & Tardos, E. (2002). Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields. Journal of the ACM, 49(5).
Kohli, P., Kumar, M. P., & Torr, P. H. S. (2007). \(\mathcal{P}^{3}\) & Beyond: solving energies with higher order cliques. In IEEE conf. on computer vision and pattern recognition (CVPR).
Kohli, P., Ladický, L., & Torr, P. H. S. (2009). Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision, 82(3), 302–324.
Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583.
Kolmogorov, V., Boykov, Y., & Rother, C. (2007). Applications of parametric maxflow in computer vision. In International conf. on computer vision (ICCV).
Kolmogorov, V., & Zabih, R. (2004). What energy functions can be optimized via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 147–159.
Kuehn, A. A., & Hamburger, M. J. (1963). A heuristic program for locating warehouses. Management Science, 9(4), 643–666.
Ladický, L., Russell, C., Kohli, P., & Torr, P. (2010). Graph cut based inference with co-occurrence statistics. In European conf. on computer vision (ECCV), September 2010.
Lazic, N., Givoni, I., Frey, B., & Aarabi, P. (2009). FLoSS: facility location for subspace segmentation. In International conf. on computer vision (ICCV).
Leclerc, Y. G. (1989). Constructing simple stable descriptions for image partitioning. International Journal of Computer Vision, 3(1), 73–102.
Li, H. (2007). Two-view motion segmentation from linear programming relaxation. In IEEE conf. on computer vision and pattern recognition (CVPR).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
MacKay, D. J. C. (2003). Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press.
Mitchell, T., & Beauchamp, J. (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404), 1023–1032.
Nemhauser, G. L., Wolsey, L. A., & Fisher, M. L. (1978). An analysis of approximations for maximizing submodular set functions—I. Mathematical Programming, 14(1), 265–294.
Ortega, A., & Ramchandran, K. (1998). Rate-distortion methods for image and video compression. IEEE Signal Processing Magazine, 15(6), 23–50.
Rother, C., Kolmogorov, V., & Blake, A. (2004). GrabCut: interactive foreground extraction using iterated graph cuts. In ACM SIGGRAPH.
Shmoys, D. B., Tardos, E., & Aardal, K. (1998). Approximation algorithms for facility location problems (extended abstract). In ACM symposium on theory of computing (STOC) (pp. 265–274).
Sun, M. (2005). A tabu search heuristic for the uncapacitated facility location problem. In Metaheuristic optimization via memory and evolution: Vol. 30 (pp. 191–211). Berlin: Springer.
Sung, K. K., & Poggio, T. (1995). Example based learning for view-based human face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 39–51.
Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., & Rother, C. (2008). A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 1068–1080.
Szummer, M., Kohli, P., & Hoiem, D. (2008). Learning CRFs using graph cuts. In European conf. on computer vision (ECCV).
Taskar, B., Chatalbashev, V., & Koller, D. (2004). Learning associative Markov networks. In International conf. on machine learning (ICML).
Torr, P. H. S. (1998). Geometric motion segmentation and model selection. Philosophical Trans. of the Royal Society A (pp. 1321–1340).
Tron, R., & Vidal, R. (2007). A benchmark for the comparison of 3-d motion segmentation algorithms. In IEEE conf. on computer vision and pattern recognition (CVPR).
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2006). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2), 1453–1484.
Ueda, N., Nakano, R., Ghahramani, Z., & Hinton, G. E. (2000). SMEM algorithm for mixture models. Neural Computation, 12(9), 2109–2128.
Werner, T. (2008). High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF). In IEEE conf. on computer vision and pattern recognition (CVPR), June 2008.
Woodford, O. J., Rother, C., & Kolmogorov, V. (2009). A global perspective on MAP inference for low-level vision. In International conf. on computer vision (ICCV), October 2009.
Yuan, J., & Boykov, Y. (2010). TV-based multi-label image segmentation with label cost prior. In British machine vision conference (BMVC), Sept 2010.
Zabih, R., & Kolmogorov, V. (2004). Spatially coherent clustering with graph cuts. In IEEE conf. on computer vision and pattern recognition (CVPR), June 2004.
Zhu, S. C., & Yuille, A. L. (1996). Region competition: unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(9), 884–900.
Zuliani, M., Kenney, C. S., & Manjunath, B. S. (2005). The multiRANSAC algorithm and its application to detect planar homographies. In International conf. on image processing (ICIP).
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors assert equal contribution and joint first authorship.
Rights and permissions
About this article
Cite this article
Delong, A., Osokin, A., Isack, H.N. et al. Fast Approximate Energy Minimization with Label Costs. Int J Comput Vis 96, 1–27 (2012). https://doi.org/10.1007/s11263-011-0437-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0437-z