Structured prediction with reinforcement learning

Maes, Francis; Denoyer, Ludovic; Gallinari, Patrick

doi:10.1007/s10994-009-5140-8

Structured prediction with reinforcement learning

Published: 03 September 2009

Volume 77, pages 271–301, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Structured prediction with reinforcement learning

Download PDF

Francis Maes¹,
Ludovic Denoyer¹ &
Patrick Gallinari¹

1770 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

We formalize the problem of Structured Prediction as a Reinforcement Learning task. We first define a Structured Prediction Markov Decision Process (SP-MDP), an instantiation of Markov Decision Processes for Structured Prediction and show that learning an optimal policy for this SP-MDP is equivalent to minimizing the empirical loss. This link between the supervised learning formulation of structured prediction and reinforcement learning (RL) allows us to use approximate RL methods for learning the policy. The proposed model makes weak assumptions both on the nature of the Structured Prediction problem and on the supervision process. It does not make any assumption on the decomposition of loss functions, on data encoding, or on the availability of optimal policies for training. It then allows us to cope with a large range of structured prediction problems. Besides, it scales well and can be used for solving both complex and large-scale real-world problems. We describe two series of experiments. The first one provides an analysis of RL on classical sequence prediction benchmarks and compares our approach with state-of-the-art SP algorithms. The second one introduces a tree transformation problem where most previous models fail. This is a complex instance of the general labeled tree mapping problem. We show that RL exploration is effective and leads to successful results on this challenging task. This is a clear confirmation that RL could be used for large size and complex structured prediction problems.

References

Baxter, J., Bartlett, P. L., & Weaver, L. (2001). Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 2001.
MathSciNet Google Scholar
Berger, A., Della Pietra, S., & Della Pietra, V. (1996). A maximum entropy approach to natural language processing. In Computational linguistics.
Chidlovskii, B., & Fuselier, J. (2005). A probabilistic learning method for xml annotation of documents. In IJCAI.
Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In EMNLP.
Collins, M., & Roark, B. (2004). Incremental parsing with the perceptron algorithm. In Proceedings of the 42nd meeting of the association for computational linguistics (ACL’04), main volume (pp. 111–118). Barcelona, Spain, July 2004.
Daumé III, H., & Marcu, D. (2005). Learning as search optimization: Approximate large margin methods for structured prediction. In International conference on machine learning (ICML), Bonn, Germany, 2005. New York: ACM.
Google Scholar
Daumé III, H., Langford, J., & Marcu, D. (2006). Search-based structured prediction. Machine Learning Journal (submitted).
Denoyer, L., & Gallinari, P. (2006). The wikipedia xml corpus. SIGIR Forum.
Denoyer, L., & Gallinari, P. (2007). Report on the xml mining track at inex 2005 and inex 2006: categorization and clustering of xml documents. SIGIR Forum, 41(1), 79–90.
Article Google Scholar
Doan, A., Domingos, P., & Halevy, A. (2003). Learning to match the schemas of data sources: A multistrategy approach. Maching Learning, 50(3), 279–301.
Article MATH Google Scholar
Fuhr, N., Gövert, N., Kazai, G., & Lalmas, M. (Eds.) (2002). Proceedings of the first workshop of the initiative for the evaluation of XML retrieval (INEX), Schloss Dagstuhl, Germany, December 9–11, 2002.
Garcia, F., & Ndiaye, S. M. (1998). A learning rate analysis of reinforcement learning algorithms in finite-horizon. In ICML ’98: Proceedings of the fifteenth international conference on machine learning (pp. 215–223), San Francisco, CA, USA, 1998. San Mateo: Morgan Kaufmann.
Google Scholar
Globerson, A., Koo, T., Carreras, X., & Collins, M. (2007). Exponentiated gradient algorithms for log-linear structured prediction. In ICML (pp. 305–312).
Jousse, F., Gilleron, R., Tellier, I., & Tommasi, M. (2006). Conditional random fields for xml trees. In ECML workshop on mining and learning in graphs.
Kassel, R. H. (1995). A comparison of approaches to on-line handwritten character recognition. Ph.D. thesis, Cambridge, MA, USA.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. 18th international conf. on machine learning (pp. 282–289). San Mateo: Morgan Kaufmann.
Google Scholar
Maes, F., Denoyer, L., & Gallinari, P. (2007). Sequence labelling with reinforcement learning and ranking algorithms. In ECML, Warsaw, Poland.
Phan, X.-H., & Nguyen, L.-M. (2005). Flexcrfs: Flexible conditional random field toolkit. http://flexcrfs.sourceforge.net.
Ramshaw, L., & Marcus, M. (1995). Text chunking using transformation-based learning. In D. Yarovsky & K. Church (Eds.), Proceedings of the third workshop on very large corpora (pp. 82–94), Somerset, New Jersey, 1995. Mant-de-Marsan: ACL.
Google Scholar
Ruzzo, W. L. (1979) On the complexity of general context-free language parsing and recognition. In Proceedings of the 6th colloquium, on automata, languages and programming (pp. 489–497), London, UK, 1979. Berlin: Springer.
Google Scholar
Sutton, R., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press.
Google Scholar
Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In NIPS.
Titov, I., & Henderson, J. (2007). Incremental Bayesian networks for structure prediction. In ICML (pp. 887–894).
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In International conference on machine learning (ICML). New York: ACM.
Google Scholar
Wisniewski, G., Denoyer, L., Francis, M., & Gallinari, P. (2007). Probabilistic model for structured document mapping. In 5th international conference on machine learning and data mining in pattern recognition (MLDM’07), Germany, 2007.

Download references

Author information

Authors and Affiliations

LIP6, University Pierre et Marie Curie (Paris 6), Paris, France
Francis Maes, Ludovic Denoyer & Patrick Gallinari

Authors

Francis Maes
View author publications
You can also search for this author in PubMed Google Scholar
Ludovic Denoyer
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Gallinari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ludovic Denoyer.

Additional information

Editors: Charles Parker, Yasemin Altun, and Prasad Tadepalli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maes, F., Denoyer, L. & Gallinari, P. Structured prediction with reinforcement learning. Mach Learn 77, 271–301 (2009). https://doi.org/10.1007/s10994-009-5140-8

Download citation

Received: 20 April 2008
Revised: 02 July 2009
Accepted: 20 July 2009
Published: 03 September 2009
Issue Date: December 2009
DOI: https://doi.org/10.1007/s10994-009-5140-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Structured prediction with reinforcement learning

Abstract

Article PDF

Similar content being viewed by others

Offline reinforcement learning with task hierarchies

Large sequence models for sequential decision-making: a survey

Towards Reinforcement Learning for Non-stationary Environments

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Structured prediction with reinforcement learning

Abstract

Article PDF

Similar content being viewed by others

Offline reinforcement learning with task hierarchies

Large sequence models for sequential decision-making: a survey

Towards Reinforcement Learning for Non-stationary Environments

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation