Abstract
Learning spatial models from sensor data raises the challenging data association problem of relating model parameters to individual measurements. This paper proposes an EM-based algorithm, which solves the model learning and the data association problem in parallel. The algorithm is developed in the context of the the structure from motion problem, which is the problem of estimating a 3D scene model from a collection of image data. To accommodate the spatial constraints in this domain, we compute virtual measurements as sufficient statistics to be used in the M-step. We develop an efficient Markov chain Monte Carlo sampling method called chain flipping, to calculate these statistics in the E-step. Experimental results show that we can solve hard data association problems when learning models of 3D scenes, and that we can do so efficiently. We conjecture that this approach can be applied to a broad range of model learning problems from sensordata, such as the robot mapping problem.
Article PDF
Similar content being viewed by others
References
Avitzour, D. (1992). A maximum likelihood approach to data association. IEEE Trans. on Aerospace and Electronic Systems, 28:2, 560-566.
Ayer, S., &; Sawhney, H. S. (1995). Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding. In Int. Conf. on Computer Vision (ICCV) (pp. 777-784).
Bar-Shalom, Y., &; Fortmann, T. E. (1988). Tracking and data association. New York: Academic Press.
Basri, R., Grove, A. J., &; Jacobs, D.W. (1998). Efficient determination of shape from multiple images containing partial information. Pattern Recognition, 31:11, 1691-1703.
Beardsley, P. A., Torr, P. H. S., &; Zisserman, A. (1996). 3D model acquisition from extended image sequences. In Eur. Conf. on Computer Vision (ECCV) (pp. II:683-695).
Bertsekas, D. P. (1991). Linear network optimization: Algorithms and codes. Cambridge, MA: The MIT Press.
Borenstein, J., Everett, B., &; Feng, L. (1996). Navigating mobile robots: Systems and techniques. Wellesley, MA: A. K. Peters, Ltd.
Broder, A. Z. (1986). How hard is to marry at random? (On the approximation of the permanent). In Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing (pp. 50-58). Berkeley, California.
Broida, T., &; Chellappa, R. (1991). Estimating the kinematics and structure of a rigid object from a sequence of monocular images. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13:6, 497-513.
Burgard, W., Fox, D., Jans, H., Matenar, C., &; Thrun, S. (1999). Sonar-based mapping of large-scale mobile robot environments using EM. In Proceedings of the International Conference on Machine Learning, Bled, Slovenia.
Castellanos, J. A., Montiel, J. M. M., Neira, J., &; Tardos, J. D. (1999). The SPmap: A probabilistic framework for simultaneous localization and map building. IEEE Trans. on Robotics and Automation, 15:5, 948-953.
Castellanos, J. A., &; Tardos, J. D. (2000). Mobile robot localization and map building: A multisensor fusion approach. Boston, MA: Kluwer Academic Publishers.
Cook, W. J., Cunningham, W. H., Pulleyblank, W. R., &; Schrijver, A. (1998). Combinatorial optimization. New York, NY: John Wiley &; Sons.
Cox, I. J. (1993). A review of statistical data association techniques for motion correspondence. Int. J. of Computer Vision, 10:1, 53-66.
Cox, I. J., &; Hingorani, S. L. (1994). An efficient implementation and evaluation of Reid's multiple hypothesis tracking algorithm for visual tracking. In Int. Conf. on Pattern Recognition (ICPR), (Vol. 1, pp. 437-442). Jerusalem, Israel.
Cox, I. J., &; Leonard, J. J. (1994). Modeling a dynamic environment using a Bayesian multiple hypothesis approach. Artificial Intelligence, 66:2, 311-344.
Dellaert, F. (2001). Monte Carlo EM for data association and its applications in computer vision. PhD thesis, School of Computer Science, Carnegie Mellon. Also available as Technical Report CMU-CS-01-153.
Dempster, A. P., Laird, N. M., &; Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1, 1-38.
Deriche, R., &; Faugeras, O. D. (1990). Tracking line segments. Image and Vision Computing, 8, 261-270.
Doucet, A., de Freitas, N., &; Gordon, N. (Eds.). (2001). Sequential Monte Carlo methods in practice. New York: Springer-Verlag.
Feder, H. J. S., Leonard, J. J., &; Smith, C. M. (1999). Adaptive mobile robot navigation and mapping. International Journal of Robotics Research, Special Issue on Field and Service Robotics, 18:7, 650-668.
Gauvrit, H., Le Cadre, J. P., &; Jauffret, C. (1997). A formulation of multitarget tracking as an incomplete data problem. IEEE Trans. on Aerospace and Electronic Systems, 33:4, 1242-1257.
Gilks, W. R., Richardson, S., &; Spiegelhalter, D. J. (Eds.) (1996). Markov chain Monte Carlo in practice. Chapman and Hall.
Gold, S., Rangarajan, A., Lu, C., Pappu, S., &; Mjolsness, E. (1998). Newalgorithms for 2D and 3D point matching. Pattern Recognition, 31:8, 1019-1031.
Gutmann, J.-S., &; Konolige, K. (2000). Incremental mapping of large cyclic environments. In Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA).
Hartley, H. O. (1958). Maximum likelihood estimation from incomplete data. Biometrics, 14, 174-194.
Hartley, R. I. (1994). Euclidean reconstruction from uncalibrated views. In Application of Invariance in Computer Vision, 237-256.
Hartley, R., &; Zisserman, A. (2000). Multiple viewgeometry in computer vision. Cambridge: Cambridge University Press.
Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika, 57, 97-109.
Jacobs, D.W. (1997). Linear fitting with missing data: Applications to structure from motion and to characterizing intensity images. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (pp. 206-212).
Kozen, D. C. (1991). The design and analysis of algorithms. Berlin: Springer-Verlag.
Kurata, T., Fujiki, J., Kourogi, M., &; Sakaue, K. (1999). A robust recursive factorization method for recovering structure and motion from live video frames. In 1999 ICCV Workshop on Frame Rate Processing, Corfu, Greece.
Leonard, J. J., Cox, I. J., &; Durrant-Whyte, H. F. (1992). Dynamic mmap building for an autonomous mobile robot. Int. J. Robotics Research, 11:4, 286-289.
Leonard, J. J. &; Durrant-Whyte, H. F. (1992). Directed sonar sensing for mobile robot navigation. Boston: Kluwer Academic.
Longuet-Higgins, H. C. (1981). A computer algorithm for reconstructing a scene from two projections. Nature, 293, 133-135.
McLachlan, G. J., &; Basford, K. E. (1988). Mixture models: Inference and applications to clustering. New York: Marcel Dekker.
McLachlan, G. J., &; Krishnan, T. (1997). The EM algorithm and extensions. Wiley series in probability and statistics. John Wiley &; Sons.
McLauchlan, P., &; Murray, D. (1995). A unifying framework for structure and motion recovery from image sequences. In Int. Conf. on Computer Vision (ICCV) (pp. 314-320).
Morris, D. D., &; Kanade, T. (1998). A unified factorization algorithm for points, line segments and planes with uncertainty models. In Int. Conf. on Computer Vision (ICCV) (pp. 696-702).
Morris, D. D., Kanatani, K., &; Kanade, T. (1999). Uncertainty modeling for optimal structure from motion. In ICCV Workshop on Vision Algorithms: Theory and Practice.
Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRGTR-93-1, Dept. of Computer Science, University of Toronto.
Papadimitriou, C. H., &; Steiglitz, K. (1982). Combinatorial optimization: Algorithms and complexity. Englewood Cliff, NJ: Prentice-Hall.
Pasula, H., Russell, S., Ostland, M., &; Ritov, Y. (1999). Tracking many objects with many sensors. In Int. Joint Conf. on Artificial Intelligence (IJCAI), Stockholm.
Poelman, C., &; Kanade, T. (1997). A paraperspective factorization method for shape and motion recovery. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19:3, 206-218.
Popoli, R., &; Blackman, S. S. (1999). Design and analysis of modern tracking systems. Artech House Radar Library.
Rasmussen, C., &; Hager, G. D. (1998). Joint probabilistic techniques for tracking objects using multiple vision clues. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS) (pp. 191-196).
Reid, D. B. (1979). An algorithm for tracking multiple targets. IEEE Trans. on Automation and Control, AC-24:6, 84-90.
Scott, G. L., &; Longuet-Higgins, H. C. (1991). Analgorithm for associating the features of two images. Proceedings of Royal Society of London, B-244, 21-26.
Seitz, S. M., &; Dyer, C. R. (1995). Complete structure from four point correspondences. In Int. Conf. on Computer Vision (ICCV) (pp. 330-337).
Shapiro, L. S., &; Brady, J. M. (1992). Feature-based correspondence: An eigenvector approach. Image and Vision Computing, 10:5, 283-288.
Shatkay, H. (1998). Learning models for robot navigation. Ph.D. thesis, Computer Science Department, Brown University, Providence, RI.
Shatkay, H., &; Kaelbling, L. (1997). Learning topological maps with weak local odometric information. In Proceedings of IJCAI-97, IJCAI, Inc.
Smith, A. F. M., &; Gelfand, A. E. (1992). Bayesian statistics without tears: A sampling-resampling perspective. American Statistician, 46:2, 84-88.
Spetsakis, M., &; Aloimonos, Y. (1991). A multi-frame approach to visual motion perception. Int. J. of Computer Vision, 6:3, 245-255.
Streit, R., &; Luginbuhl, T. (1994). Maximum likelihood method for probabilistic multi-hypothesis tracking. In Proc. SPIE (Vol. 2335, pp. 394-405).
Szeliski, R., &; Kang, S. B. (1993). Recovering 3D shape and motion from image streams using non-linear least squares. Technical Report CRL 93/3, DEC Cambridge Research Lab.
Tanner, M. A. (1996). Tools for statistical inference (3rd edn). New York: Springer Verlag.
Thrun, S., Fox, D., &; Burgard, W. (1998a). A probabilistic approach to concurrent mapping and localization for mobile robots. Machine Learning, 31, 29-53. also appeared in Autonomous Robots, 5, 253-271.
Thrun, S., Fox, D., &; Burgard, W. (1998). Probabilistic mapping of an environment by a mobile robot. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA).
Tomasi, C., &; Kanade, T. (1992). Shape and motion from image streams under orthography: A factorization method. Int. J. of Computer Vision, 9:2, 137-154.
Torr, P.H. S. (1997). An assessment of information criteria for motion model selection. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (pp. 47-53).
Torr, P., Fitzgibbon, A., &; Zisserman, A. (1998). Maintaining multiple motion model hypotheses over many views to recover matching and structure. In Int. Conf. on Computer Vision (ICCV) (pp. 485-491).
Torr, P. H. S., Szeliski, R., &; Anandan, P. (1999). An integrated bayesian approach to layer extraction from image sequences. In Int. Conf. on Computer Vision (ICCV) (pp. 983-990).
Triggs, B. (1996). Factorization methods for projective structure and motion. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (pp. 845-851).
Triggs, B., McLauchlan, P., Hartley, R., &; Fitzgibbon, A. (1999). Bundle adjustment-a modern synthesis. In Vision Algorithms 99, Corfu, Greece.
Tsai, R. Y., &; Huang, T. S. (1984). Uniqueness and estimation of three-dimensional motion parameters of rigid objects with curved surfaces. IEEE Trans. on Pattern Analysis and Machine Intelligence, 6:1, 13-27.
Ullman, S. (1979). The interpretation of visual motion. Cambridge, MA: MIT Press.
Zhang, Z., &; Faugeras, O. D. (1992). Three-dimensional motion computation and object segmentation in a long sequence of stereo frames. Int. J. of Computer Vision, 7:3, 211-241.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Dellaert, F., Seitz, S.M., Thorpe, C.E. et al. EM, MCMC, and Chain Flipping for Structure from Motion with Unknown Correspondence. Machine Learning 50, 45–71 (2003). https://doi.org/10.1023/A:1020245811187
Issue Date:
DOI: https://doi.org/10.1023/A:1020245811187