From passive to interactive object learning and recognition through self-identification on a humanoid robot

Lyubova, Natalia; Ivaldi, Serena; Filliat, David

doi:10.1007/s10514-015-9445-0

From passive to interactive object learning and recognition through self-identification on a humanoid robot

Published: 16 June 2015

Volume 40, pages 33–57, (2016)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Natalia Lyubova¹,
Serena Ivaldi^2,3,4 &
David Filliat⁵

926 Accesses
15 Citations
8 Altmetric
Explore all metrics

Abstract

Service robots, working in evolving human environments, need the ability to continuously learn to recognize new objects. Ideally, they should act as humans do, by observing their environment and interacting with objects, without specific supervision. Taking inspiration from infant development, we propose a developmental approach that enables a robot to progressively learn objects appearances in a social environment: first, only through observation, then through active object manipulation. We focus on incremental, continuous, and unsupervised learning that does not require prior knowledge about the environment or the robot. In the first phase, we analyse the visual space and detect proto-objects as units of attention that are learned and recognized as possible physical entities. The appearance of each entity is represented as a multi-view model based on complementary visual features. In the second phase, entities are classified into three categories: parts of the body of the robot, parts of a human partner, and manipulable objects. The categorization approach is based on mutual information between the visual and proprioceptive data, and on motion behaviour of entities. The ability to categorize entities is then used during interactive object exploration to improve the previously acquired objects models. The proposed system is implemented and evaluated with an iCub and a Meka robot learning 20 objects. The system is able to recognize objects with 88.5 % success and create coherent representation models that are further improved by interactive learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Brain Intelligence: Go beyond Artificial Intelligence

Article 21 September 2017

Robotics and Industry 4.0

Active Learning is About More Than Hands-On: A Mixed-Reality AI System to Support STEM Education

Article 03 February 2020

Notes

http://openni.org.
Implemented in the OpenCV library http://opencv.org.
We tested 2, 3 and 4-connectedness of features and chose 4-connectedness based on our preliminary experiments with a set of 10 objects as a compromise between performance and computational cost in order to be able to perform interactive experiments. We also compared the use of low-level and mid-level features, and got an improvement from 84.33 to 97.83 % recognition rate (based on pure labels) when using mid-features. More details can be found in Lyubova (2013), p84.
http://eris.liralab.it/yarp.
http://www.icub.org.
http://en.wikipedia.org/wiki/Meka_Robotics.
http://eigen.tuxfamily.org.
The code used in these experiments is open-source. Details for installing the code are available at http://eris.liralab.it/wiki/UPMC_iCub_project/MACSi_Software while the documentation for running the experiments is at http://chronos.isir.upmc.fr/~ivaldi/macsi/doc/. The experiments can be directly reproduced with an iCub robot, whereas in case of other robots with a different middleware (i.e., not based on YARP), the module encoding the action primitives has to be adapted.

References

Aldavert, D., Ramisa, A., López de Mántaras, R., & Toledo, R., et al. (2010). Real-time object segmentation using a bag of features approach. Artificial Intelligence Research and Development, pp. 321–329.
Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (surf). Computer Vision and Image Understanding, 110, 346–359.
Article Google Scholar
Belongie, S., Carson, C., Greenspan, H., & Malik, J. (1998). Color-and texture-based image segmentation using EM and its application to content-based image retrieval. In IEEE Conference on Computer Vision (pp. 675–682).
Beucher, S., & Meyer, F. (1993). The morphological approach to segmentation: The watershed transformation. mathematical morphology in image processing. Optical Engineering, 34, 433–481.
Google Scholar
Browatzki, B., Tikhanoff, V., Metta, G., Bulthoff, H., & Wallraven, C. (2012). Active object recognition on a humanoid robot. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 2021–2028).
Burger, W., & Burge, M. J. (2008). Digital image processing. Berlin: Springer.
Book Google Scholar
Chao, F., Lee, M. H., Jiang, M., & Zhou, C. (2014). An infant development-inspired approach to robot hand-eye coordination. International Journal of Advanced Robotic Systems, 11, 15.
Google Scholar
Chinellato, E., Antonelli, M., Grzyb, B., & del Pobil, A. (2011). Implicit sensorimotor mapping of the peripersonal space by gazing and reaching. IEEE Transactions on Autonomous Mental Development, 3(1), 43–53.
Article Google Scholar
Chu, V., McMahon, I., Riano, L., McDonald, C., He, Q., Martinez Perez-Tejada, J., Arrigo, M., Fitter, N., Nappo, J., Darrell, T., & Kuchenbecker, K. (2013). Using robotic exploratory procedures to learn the meaning of haptic adjectives. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 3048–3055).
Crandall, D.J., Felzenszwalb, P.F., & Huttenlocher, D.P. (2005). Spatial priors for part-based recognition using statistical models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 10–17).
Dickscheid, T., Schindler, F., & Förstner, W. (2011). Coding images with local features. International Journal on Computer Vision, 94, 154–174.
Article MATH Google Scholar
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification (2nd ed.). New York City: Wiley-Interscience.
Google Scholar
Everingham, M., Eslami, S., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2014). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.
Article Google Scholar
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 264–271).
Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (vol. 1, pp. 380–387).
Fiala, M. (2005). Artag, a fiducial marker system using digital techniques. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (vol. 2, pp. 590–596).
Filliat, D. (2007). A visual bag of words method for interactive qualitative localization and mapping. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 3921–3926).
Gaël, G., & Benoît, J. (2010). Eigen v3. http://eigen.tuxfamily.org
Gevers, T., & Smeulders, A. W. (1999). Color-based object recognition. Pattern Recognition, 32(3), 453–464.
Article Google Scholar
Gold, K., & Scassellati, B. (2006). Learning acceptable windows of contingency. Connection Science, 18(2), 217–228.
Article Google Scholar
Goldstein, E. B. (2010). Sensation and perception. Belmont: Wadsworth Publishing Company.
Google Scholar
Grauman, K., & Leibe, B. (2011). Visual object recognition. Synthesis lectures on artificial intelligence and machine learning. San Rafael: Morgan & Claypool Publishers.
Google Scholar
Griffith, S., Sukhoy, V., & Stoytchev, A. (2011). Using sequences of movement dependency graphs to form object categories. In IEEE-RAS International Conference on Humanoid Robots (Humanoids) (pp. 715–720).
Gupta, M., & Sukhatme, G. (2012). Using manipulation primitives for brick sorting in clutter. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 3883–3889).
Harman, K. L., Humphrey, G., & Goodale, M. A. (1999). Active manual control of object views facilitates visual recognition. Current Biology, 9(22), 1315–1318.
Article Google Scholar
Hoffmann, M., Marques, H., Hernandez Arieta, A., Sumioka, H., Lungarella, M., & Pfeifer, R. (2010). Body schema in robotics: A review. IEEE Transactions on Autonomous Mental Development, 2(4), 304–324.
Article Google Scholar
Huang, T., Yang, G., & Tang, G. (1979). A fast two-dimensional median filtering algorithm. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(1), 13–18.
Article Google Scholar
Hulse, M., McBrid, S., & Lee, M. (2009). Robotic hand-eye coordination without global reference: A biologically inspired learning scheme. In IEEE International Conference on Development and Learning (ICDL) (pp. 1–6).
Ivaldi, S., Lyubova, N., Gérardeaux-Viret, D., Droniou, A., Anzalone, S.M., Chetouani, M., Filliat, D., & Sigaud, O. (2012). Perception and human interaction for developmental learning of objects and affordances. In IEEE International Conference on Humanoid Robots (Humanoids) (pp. 248–254).
Ivaldi, S., Nguyen, S., Lyubova, N., Droniou, A., Padois, V., Filliat, D., et al. (2013). Object learning through active exploration. IEEE Transactions on Autonomous Mental Development, 6(1), 56–72.
Article Google Scholar
Ivaldi, S., Nguyen, S., Lyubova, N., Droniou, A., Padois, V., Filliat, D., et al. (2014). Object learning through active exploration. IEEE Transactions on Autonomous Mental Development, 6, 56–72.
Article Google Scholar
Jégou, H., Douze, M., & Schmid, C. (2010). Improving bag-of-features for large scale image search. International Journal of Computer Vision, 87(3), 316–336.
Article Google Scholar
Kemp, C., & Edsinger, A. (2006). What can i control?: The development of visual categories for a robots body and the world that it influences. In International Workshop on Epigenetic Robotics (Epirob) (pp. 33–40).
Kraft, D., Pugeault, N., Baseski, E., Popovic, M., Kragic, D., Kalkan, S., et al. (2008). Birth of the object: Detection of objectness and extraction of object shape through object-action complexes. International Journal of Humanoid Robotics, 05(02), 247–265.
Article Google Scholar
Krainin, M., Henry, P., Ren, X., & Fox, D. (2011). Manipulator and object tracking for in-hand 3D object modeling. International Journal of Robotics Research, 30(11), 1311–1327.
Article Google Scholar
Law, J., Shaw, P., Lee, M., & Sheldon, M. (2014). From saccades to grasping: A model of coordinated reaching through simulated development on a humanoid robot. IEEE Transactions on Autonomous Mental Development, 6(2), 93–109.
Article Google Scholar
LeCun, Y., Huang, F.J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 97–104).
Lucas, B.D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence (IJCAI) (pp. 674–679).
Lyubova, N. (2013). Developmental approach of perception for a humanoid robot. Ph.D. thesis, ENSTA ParisTech.
Marjanovic, M.J., Scassellati, B., & Williamson, M.M. (1996). Self-taught visually-guided pointing for a humanoid robot. In From Animals to Animats 4: International Conference on Simulation of Adaptive Behavior (SAB) (pp. 35–44).
Metta, G., & Fitzpatrick, P. M. (2003). Better vision through manipulation. Adaptive Behaviour, 11(2), 109–128.
Article Google Scholar
Michel, P., Gold, K., & Scassellati, B. (2004). Motion-based robotic self-recognition. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (vol. 3, pp. 2763–2768).
Micusik, B., & Kosecka, J. (2009). Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry. In IEEE International Conference on Computer Vision (ICCV) (pp. 625–632).
Modayil, J., & Kuipers, B. (2008). The initial development of object knowledge by a learning robot. Robotics and Autonomous Systems, 56, 879–890.
Article Google Scholar
Nagi, J., Ducatelle, F., Di Caro, G.A., Ciresan, D., Meier, U., Giusti, A., Nagi, F., Schmidhuber, J., & Gambardella, L.M. (2011). Max-pooling convolutional neural networks for vision-based hand gesture recognition. In IEEE International Conference on Signal and Image Processing Applications (ICSIPA) (pp. 342–347).
Natale, L., Orabona, F., Berton, F., Metta, G., & Sandini, G. (2005). From sensorimotor development to object perception. In IEEE/RAS International Conference on Humanoid Robots (pp. 226–231).
Natale, L., Nori, F., Metta, G., Fumagalli, M., Ivaldi, S., Pattacini, U., et al. (2013). The icub platform: A tool for studying intrinsically motivated learning. In G. Baldassarre & M. Mirolli (Eds.), Intrinsically motivated learning in natural and artificial systems (pp. 433–458). Berlin: Springer.
Chapter Google Scholar
Needham, A., Barrett, T., & Peterman, K. (2002). A pick-me-up for infants exploratory skills: Early simulated experiences reaching for objects using sticky mittens enhances young infants object exploration skills. Infant Behavior and Development, 25(3), 279–295.
Article Google Scholar
Nguyen, S.M., Ivaldi, S., Lyubova, N., Droniou, A., Gérardeaux-Viret, D., Filliat, D., Padois, V., Sigaud, O., & Oudeyer, P.Y. (2013). Learning to recognize objects through curiosity-driven manipulation with the icub humanoid robot. In International Conference on Development and Learning (pp. 1–8).
Orabona, F., Metta, G., & Sandini, G. (2007). A proto-object based visual attention model. In L. Paletta & E. Rome (Eds.), Attention in cognitive systems, theories and systems from an interdisciplinary viewpoint., Lecture notes in computer science Berlin: Springer.
Google Scholar
Piaget, J. (1999). Play, dreams and imitation in childhood. London: Routledge.
Google Scholar
Prest, A., Leistner, C., Civera, J., Schmid, C., & Ferrari, V. (2012). Learning object class detectors from weakly annotated video. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 3282–3289).
Pylyshyn, Z. W. (2001). Visual indexes, preconceptual objects, and situated vision. Cognition, 80, 127–158.
Article Google Scholar
Rensink, R. A. (2000). Seeing, sensing, and scrutinizing. Vision Research, 40(10–12), 1469–1487.
Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (vol. 2, pp. 1605–1614).
Saegusa, R., Metta, G., & Sandini, G. (2012). Body definition based on visuomotor correlation. IEEE Transactions on Industrial Electronics, 59(8), 3199–3210.
Article Google Scholar
Schiebener, D., Morimoto, J., Asfour, T., & Ude, A. (2013). Integrating visual perception and manipulation for autonomous learning of object representations. Adaptive Behavior, 21(5), 328–345.
Article Google Scholar
Shi, J., & Tomasi, C. (1994). Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 593–600).
Shih, F. Y. (2009). Image processing and mathematical morphology: Fundamentals and applications. Boca Raton: CRC PressI Llc.
Book Google Scholar
Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–8).
Sinapov, J., Bergquist, T., Schenck, C., Ohiri, U., Griffith, S., & Stoytchev, A. (2011). Interactive object recognition using proprioceptive and auditory feedback. International Journal of Robotic Research, 30(10), 1250–1262.
Article Google Scholar
Sivic, J., & Zisserman, A. (2003). Video google: Text retrieval approach to object matching in videos. International Conference on Computer Vision, 2, 1470–1477.
Article Google Scholar
Southey, T., & Little, J.J. (2006). Object discovery through motion, appearance and shape. In AAAI Workshop on Cognitive Robotics (p. 9).
Spelke, E. S. (1990). Principles of object perception. Cognitive Science, 14, 29–56.
Article Google Scholar
Spelke, E. S., & Kinzler, K. D. (2007). Core knowledge. Developmental Science, 10(1), 89–96.
Article Google Scholar
Torres-Jara, E., Natale, L., & Fitzpatrick, P. (2005). Tapping into touch. In International Workshop on Epigenetic Robotics (Epirob) (pp. 79–86). Lund: Lund University Cognitive Studies.
Ude, A., Omrčen, D., & Cheng, G. (2008). Making object learning and recognition an active process. International Journal of Humanoid Robotics, 5(02), 267–286.
Article Google Scholar
van Hoof, H., Kroemer, O., & Peters, J. (2014). Probabilistic segmentation and targeted exploration of objects in cluttered environments. IEEE Transactions on Robotics, 30(5), 1198–1209.
Article Google Scholar
Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal on Computer Vision, 57, 137–154.
Article Google Scholar
Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19(9), 1395–407.
Article MATH Google Scholar
Weng, J., McClelland, J., Pentland, A., Sporns, O., Stockman, I., Sur, M., et al. (2001). Autonomous mental development by robots and animals. Science, 291(5504), 599–600.
Article Google Scholar
Wersing, H., Kirstein, S., Götting, M., Brandl, H., Dunn, M., Mikhailova, I., et al. (2007). Online learning of objects in a biologically motivated visual architecture. International Journal Neural Systems, 17(4), 219–230.
Article Google Scholar
Yang, M.H., & Ahuja, N. (1999). Gaussian mixture model for human skin color and its application in image and video databases. In SPIE: Storage and Retrieval for Image and Video Databases (vol. 3656, pp. 458–466).
Zhang, Z. (2012). Microsoft kinect sensor and its effect. IEEE MultiMedia, 19(2), 4–10.
Article Google Scholar

Download references

Acknowledgments

This work was supported by the French ANR program (ANR-10-BLAN-0216) through Project MACSi, and partly by the European Commission, within the CoDyCo project (FP7-ICT-2011-9, No. 600716). The authors would like to thank the anonymous reviewers for their comments that greatly helped improving the quality of the paper.

Author information

Authors and Affiliations

Aldebaran-Robotics, Paris, France
Natalia Lyubova
Inria, 54600, Villers-lès-Nancy, France
Serena Ivaldi
IAS, TU-Darmstadt, Darmstadt, Germany
Serena Ivaldi
Institut des Systèmes Intelligents et de Robotique, CNRS UMR 7222 & Université Pierre et Marie Curie, 75005, Paris, France
Serena Ivaldi
ENSTA ParisTech - INRIA FLOWERS Team, Computer Science and System Engineering Laboratory, ENSTA ParisTech, Paris, France
David Filliat

Authors

Natalia Lyubova
View author publications
You can also search for this author in PubMed Google Scholar
Serena Ivaldi
View author publications
You can also search for this author in PubMed Google Scholar
David Filliat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalia Lyubova.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lyubova, N., Ivaldi, S. & Filliat, D. From passive to interactive object learning and recognition through self-identification on a humanoid robot. Auton Robot 40, 33–57 (2016). https://doi.org/10.1007/s10514-015-9445-0

Download citation

Received: 27 April 2014
Accepted: 05 June 2015
Published: 16 June 2015
Issue Date: January 2016
DOI: https://doi.org/10.1007/s10514-015-9445-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From passive to interactive object learning and recognition through self-identification on a humanoid robot

Abstract

Access this article

Similar content being viewed by others

Brain Intelligence: Go beyond Artificial Intelligence

Robotics and Industry 4.0

Active Learning is About More Than Hands-On: A Mixed-Reality AI System to Support STEM Education

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

From passive to interactive object learning and recognition through self-identification on a humanoid robot

Abstract

Access this article

Similar content being viewed by others

Brain Intelligence: Go beyond Artificial Intelligence

Robotics and Industry 4.0

Active Learning is About More Than Hands-On: A Mixed-Reality AI System to Support STEM Education

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation