Abstract
Agents (hardware or software) that act autonomously in an environment have to be able to integrate three basic behaviors: planning, execution, and learning. This integration is mandatory when the agent has no knowledge about how its actions can affect the environment, how the environment reacts to its actions, or, when the agent does not receive as an explicit input, the goals it must achieve. Without an “a priori” theory, autonomous agents should be able to self-propose goals, set-up plans for achieving the goals according to previously learned models of the agent and the environment, and learn those models from past experiences of successful and failed executions of plans. Planning involves selecting a goal to reach and computing a set of actions that will allow the autonomous agent to achieve the goal. Execution deals with the interaction with the environment by application of planned actions, observation of resulting perceptions, and control of successful achievement of the goals. Learning is needed to predict the reactions of the environment to the agent actions, thus guiding the agent to achieve its goals more efficiently.
In this context, most of the learning systems applied to problem solving have been used to learn control knowledge for guiding the search for a plan, but few systems have focused on the acquisition of planning operator descriptions. As an example, currently, one of the most used techniques for the integration of (a way of) planning, execution, and learning is reinforcement learning. However, they usually do not consider the representation of action descriptions, so they cannot reason in terms of goals and ways of achieving those goals.
In this paper, we present an integrated architecture, lope, that learns operator definitions, plans using those operators, and executes the plans for modifying the acquired operators. The resulting system is domain-independent, and we have performed experiments in a robotic framework. The results clearly show that the integrated planning, learning, and executing system outperforms the basic planner in that domain.
Similar content being viewed by others
References
Ashish, N., Knoblock, C., and Levy, A.: 1997, Information gathering plans with sensing actions, in: S. Steel (ed.), Proc. of the 4th European Conf. on Planning, Toulouse, France, pp. 15–27.
Barbehenn, M. and Hutchinson, S.: 1991, An integrated architecture for learning and planning in robotic domains, Sigart Bulletin 2(4), 29–33.
Bennet, S. W. and DeJong, G.: 1996, Real world robotics: Learning to plan for a robust execution, Machine Learning 23(2/3), 121–162.
Borrajo, D. and Veloso, M.: 1997, Lazy incremental learning of control knowledge for efficiently obtaining quality plans, AI Rev. J. Special Issue on Lazy Learning 11(1–5), 371–405.
Boutilier, C., Dearden, R., and Goldszmidt, M.: 1995, Exploiting structure in policy construction, in: Proc. of the 14th Internat. Joint Conf. on Artificial Intelligence (IJCAI-95), Montreal, Quebec, Canada, pp. 1104–1111.
Brooks, R. A.: 1986, A roboust layered control system for a mobile robot, IEEE J. Robotics Automat. 2(1), 14–23.
Calistri-Yeh: 1990, Classifying and detecting plan-based misconceptions for robust plan recognition, PhD Thesis, Department of Computer Science, Brown University.
Carbonell, J. G. and Gil, Y.: 1990, Learning by experimentation: The operator refinement method', in: R. S. Michalski and Y. Kodratoff (eds.), Machine Learning: An Artificial Intelligence Approach, Vol. III, Palo Alto, CA: Morgan Kaufmann, pp. 191–213.
Cassandra, A., Kaebling, L., and Littman, M.: 1994, Acting optimally in partially observable stochastic domains, in: Proc. of the American Association of Artificial Intelligence (AAAI-94), pp. 1023–1028.
Christiansen, A.: 1992, Automatic acquisition of task theories for robotic manipulation, PhD Thesis, School of Computer Science, Carnegie-Mellon University, Pittsburgh, PA.
Dean, T. and Givan, R.: 1997, Model minimization in Markov decision processes, in: Proc. of the American Association of Artificial Intelligence (AAAI-97).
Falkenhainer, B.: 1990, A unified approach to explanation and theory formation, in: J. Shrager and L. P. (eds), Computational Models of Scientific Discovery and Theory Formation, Morgan Kaufmann.
Fernández, F. and Borrajo, D.: 1999, Vector quantization applied to reinforcement learning, in: M. Veloso (ed.), Working notes of the IJCAI'99 3rd Internat. Workshop on Robocup, Stockholm, Sweden, pp. 97–102.
Fikes, R. E., Hart, P. E., and Nilsson, N. J.: 1972, Learning and executing generalized robot plans, Artificial Intelligence 3, 251–288.
Fikes, R. E. and Nilsson, N. J.: 1971, STRIPS: A new approach to the application of theorem proving to problem solving, Artificial Intelligence 2, 189–208.
Fritz, W., García-Martínez, R., Blanqué, J., Rama, A., Adobbati, R., and Samo, M.: 1989, The autonomous intelligent system, Robotics Autonom. Systems 5(2), 109–125.
García-Martínez, R. and Borrajo, D.: 1996, Unsupervised machine learning embedded in autonomous intelligent systems, in: Proc. of the 14th IASTED Internat. Conf. on Applied Informatics, Innsbruck, Austria, pp. 71–73.
García-Martínez, R. and Borrajo, D.: 1997, Planning, learning, and executing in autonomous systems, in: S. Steel (ed.), Recent Advances in AI Planning, 4th European Conf. on Planning, ECP'97, Toulouse, France, pp. 208–220.
García-Martínez, R. and Borrajo, D.: 1998, Learning in unknown environments by knowledge sharing, in: J. Demiris and A. Birk (eds.), Proc. of the 7th European Workshop on Learning Robots, EWLR'98, Edinburgh, Scotland, pp. 22–32.
García-Martínez, R.: 1993, Heuristic theory formation as a machine learning method, in: Proc. of the VI Internat. Symposium on Artificial Intelligence, México, pp. 294–298.
García-Martínez, R.: 1997, Un Modelo de aprendizaje por observación en planificación, PhD Thesis, Facultad de Informática, Universidad Politécnica de Madrid.
Hayes-Roth, F.: 1983, Using proofs and refutations to learn from experience, in: R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.), Machine Learning, An Artificial Intelligence Approach, Palo Alto, CA, Tioga Press, pp. 221–240.
Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., and Osawa, E.: 1995, RoboCup: The robot world cup initiative, in: Proc. of the IJCAI-95 Workshop on Entertainment and AI/Life, pp. 19–24.
Klingspor, V., Morik, K. J., and Rieger, A. D.: 1996, Learning concepts from sensor data of a mobile robot, Machine Learning 23(2/3), 305–000.
Langley, P.: 1983, Learning effective search heuristics, in: Proc. of the 8th Internat. Joint Conf. on Artificial Intelligence, Los Altos, CA, pp. 419–421.
Lin, L.-J.: 1993, 'scaling-up reinforcement learning for robot control, in: Proc. of the 10th Internat. Conf. on Machine Learning, Amherst, MA, pp. 182–189.
Mahavedan, S. and Connell, J.: 1992, Automatic programming of behavior-based robots using reinforcement learning, Artificial Intelligence 55, 311–365.
Matellán, V., Borrajo, D., and Fernández, C.: 1998, Using ABC2 in the RoboCup domain, in: H. Kitano (ed.), RoboCup-97: Robot Soccer World Cup I, pp. 475–483.
Minton, S.: 1988, Learning Effective Search Control Knowledge: An Explanation-Based Approach, Boston, MA, Kluwer Academic, Dordrecht.
Mitchell, T.: 1977, Version spaces: A candidate elimination approach to rule learning, in: Proc. of the 5th IJCAI, MIT, Cambridge, MA, pp. 305–310.
Safra, S. and Tennenholtz, M.: 1994, On planning while learning, J. Artificial Intell. Res. 2, 111–129.
Salzberg, S.: 1985, Heuristics for inductive learning, in: Proc. of the 9th Internat. Joint Conf. on Artificial Intelligence, Los Angeles, CA, pp. 603–609.
Shen, W.: 1993, Discovery as autonomous learning from enviroment, Machine Learning 12, 143–165.
Simmons, R. and Mitchell, T. M.: 1989, A task control architecture for mobile robots, in: Working Notes of the AAAI Spring Symposium on Robot Navigation.
Stone, P. and Veloso, M. M.: 1998, Towards collaborative and adversarial learning: A case study in robotic soccer, Internat. J. Human-Comput. Systems 48.
Sutton, R.: 1990, Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, in: Proc. of the 7th Internat. Conf. on Machine Learning, Austin, TX, pp. 216–224.
Tan, M.: 1993, Multi-agent reinforcement learning: Independent vs. cooperative agents, in: Proc. of the 10th Internat. Conf. on Machine Learning, Amherst, MA, pp. 330–337.
Veloso, M.: 1994, Planning and Learning by Analogical Reasoning, Springer, Berlin.
Veloso, M., Carbonell, J., Pérez, A., Borrajo, D., Fink, E., and Blythe, J.: 1995, Integrating planning and learning: The PRODIGY architecture, J. Experim. Theoret. AI 7, 81–120.
Wang, X.: 1996, Planning while learning operators, in: B. Drabble (ed.), Proc. of the 3rd Internat. Conf. on Artificial Intelligence Planning Systems (AIPS'96), Edinburgh, Scotland, pp. 229–236.
Watkins, C. J. C. H. and Dayan, P.: 1992, Technical note: Q-learning, Machine Learning 8(3/4), 279–292.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
García-Martínez, R., Borrajo, D. An Integrated Approach of Learning, Planning, and Execution. Journal of Intelligent and Robotic Systems 29, 47–78 (2000). https://doi.org/10.1023/A:1008134010576
Issue Date:
DOI: https://doi.org/10.1023/A:1008134010576