A POMDP Approximation Algorithm That Anticipates the Need to Observe

Zubek, Valentina Bayer; Dietterich, Thomas

doi:10.1007/3-540-44533-1_53

Valentina Bayer Zubek³ &
Thomas Dietterich³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1886))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

940 Accesses
5 Citations

Abstract

This paper introduces the even-odd POMDP, an approximation to POMDPs (Partially Observable Markov Decision Problems) in which the world is assumed to be fully observable every other time step. This approximation works well for problems with a delayed need to observe. The even-odd POMDP can be converted into an equivalent MDP, the 2MDP, whose value function, V*_2MDP, can be combined online with a 2-step lookahead search to provide a good POMDP policy. We prove that this gives an approximation to the POMDP’s optimal value function that is at least as good as methods based on the optimal value function of the underlying MDP. We present experimental evidence that the method finds a good policy for a POMDP with 10,000 states and observations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bayer, V., Dietterich, T.: A POMDP Approximation Algorithm that Anticipates the Need to Observe. Technical Report 00-30-01, Oregon State University, Dept. of Computer Science (2000)
Google Scholar
Bertsekas, D. P., Tsitsiklis, J. N.: Neuro-Dynamic Programming. Athena Sci. (1996)
Google Scholar
Bonet, B., Geffner, H.: Planning with Incomplete Information as Heuristic Search in Belief Space. AIPS 2000. AAAI Press/MIT Press (2000) 52–61
Google Scholar
Cassandra, A. R., Kaelbling, L.P., Kurien, J. A.: Acting under Uncertainty: Discrete Bayesian Models for Mobil-Robot Navigation. IROS-96. IEEE (1996)
Google Scholar
Hansen, E. A.: Cost-Effective Sensing During Plan Execution. AAAI-94. AAAI Press/MIT Press (1994) 1029–1035
Google Scholar
Hansen, E. A.: Solving POMDPs by Searching in Policy Space. UAI-14. Morgan Kaufmann (1998) 211–219
Google Scholar
Howard, R. A.: Information Value Theory. IEEE Trans. Sys. Sci. and Cyber., Vol. SSC-2 (1966) 22–26
Article Google Scholar
Littman, M. L., Cassandra, A. R., Kaelbling, L.P.: Learning Policies for Partially Observable Environments: Scaling Up. ICML-95. Morgan Kaufmann (1995) 362–370
Google Scholar
Madani, O., Hanks, S., Condon, A.: On the Undecidability of Probabilistic Planning and Infinite-Horizon POMDPs. AAAI-99. AAAI Press/MIT Press (1999) 541–548
Google Scholar
McCallum, R. A.: Instance-based Utile Distinctions for Reinforcement Learning with Hidden State. ICML-95. Morgan Kaufmann (1995) 387–396
Google Scholar
Parr, R., Russell, S.: Approximating Optimal Policies for Partially Observable Stochastic Domains. IJCAI-95. Morgan Kaufmann (1995) 1088–1094
Google Scholar
Rodríguez, A., Parr, R., Koller, D.: Reinforcement Learning Using Approximate Belief States. NIPS-12, MIT Press (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Oregon State University, Corvallis, OR, 97331, USA
Valentina Bayer Zubek & Thomas Dietterich

Authors

Valentina Bayer Zubek
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Dietterich
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka, 567-0047, Japan
Riichiro Mizoguchi
Computer Sciences Laboratory, Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT, 0200, Australia
John Slaney

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zubek, V.B., Dietterich, T. (2000). A POMDP Approximation Algorithm That Anticipates the Need to Observe. In: Mizoguchi, R., Slaney, J. (eds) PRICAI 2000 Topics in Artificial Intelligence. PRICAI 2000. Lecture Notes in Computer Science(), vol 1886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44533-1_53

Download citation

DOI: https://doi.org/10.1007/3-540-44533-1_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67925-7
Online ISBN: 978-3-540-44533-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics