Abstract
We formalize the problem of Structured Prediction as a Reinforcement Learning task. We first define a Structured Prediction Markov Decision Process (SP-MDP), an instantiation of Markov Decision Processes for Structured Prediction and show that learning an optimal policy for this SP-MDP is equivalent to minimizing the empirical loss. This link between the supervised learning formulation of structured prediction and reinforcement learning (RL) allows us to use approximate RL methods for learning the policy. The proposed model makes weak assumptions both on the nature of the Structured Prediction problem and on the supervision process. It does not make any assumption on the decomposition of loss functions, on data encoding, or on the availability of optimal policies for training. It then allows us to cope with a large range of structured prediction problems. Besides, it scales well and can be used for solving both complex and large-scale real-world problems. We describe two series of experiments. The first one provides an analysis of RL on classical sequence prediction benchmarks and compares our approach with state-of-the-art SP algorithms. The second one introduces a tree transformation problem where most previous models fail. This is a complex instance of the general labeled tree mapping problem. We show that RL exploration is effective and leads to successful results on this challenging task. This is a clear confirmation that RL could be used for large size and complex structured prediction problems.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Baxter, J., Bartlett, P. L., & Weaver, L. (2001). Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 2001.
Berger, A., Della Pietra, S., & Della Pietra, V. (1996). A maximum entropy approach to natural language processing. In Computational linguistics.
Chidlovskii, B., & Fuselier, J. (2005). A probabilistic learning method for xml annotation of documents. In IJCAI.
Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In EMNLP.
Collins, M., & Roark, B. (2004). Incremental parsing with the perceptron algorithm. In Proceedings of the 42nd meeting of the association for computational linguistics (ACL’04), main volume (pp. 111–118). Barcelona, Spain, July 2004.
Daumé III, H., & Marcu, D. (2005). Learning as search optimization: Approximate large margin methods for structured prediction. In International conference on machine learning (ICML), Bonn, Germany, 2005. New York: ACM.
Daumé III, H., Langford, J., & Marcu, D. (2006). Search-based structured prediction. Machine Learning Journal (submitted).
Denoyer, L., & Gallinari, P. (2006). The wikipedia xml corpus. SIGIR Forum.
Denoyer, L., & Gallinari, P. (2007). Report on the xml mining track at inex 2005 and inex 2006: categorization and clustering of xml documents. SIGIR Forum, 41(1), 79–90.
Doan, A., Domingos, P., & Halevy, A. (2003). Learning to match the schemas of data sources: A multistrategy approach. Maching Learning, 50(3), 279–301.
Fuhr, N., Gövert, N., Kazai, G., & Lalmas, M. (Eds.) (2002). Proceedings of the first workshop of the initiative for the evaluation of XML retrieval (INEX), Schloss Dagstuhl, Germany, December 9–11, 2002.
Garcia, F., & Ndiaye, S. M. (1998). A learning rate analysis of reinforcement learning algorithms in finite-horizon. In ICML ’98: Proceedings of the fifteenth international conference on machine learning (pp. 215–223), San Francisco, CA, USA, 1998. San Mateo: Morgan Kaufmann.
Globerson, A., Koo, T., Carreras, X., & Collins, M. (2007). Exponentiated gradient algorithms for log-linear structured prediction. In ICML (pp. 305–312).
Jousse, F., Gilleron, R., Tellier, I., & Tommasi, M. (2006). Conditional random fields for xml trees. In ECML workshop on mining and learning in graphs.
Kassel, R. H. (1995). A comparison of approaches to on-line handwritten character recognition. Ph.D. thesis, Cambridge, MA, USA.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. 18th international conf. on machine learning (pp. 282–289). San Mateo: Morgan Kaufmann.
Maes, F., Denoyer, L., & Gallinari, P. (2007). Sequence labelling with reinforcement learning and ranking algorithms. In ECML, Warsaw, Poland.
Phan, X.-H., & Nguyen, L.-M. (2005). Flexcrfs: Flexible conditional random field toolkit. http://flexcrfs.sourceforge.net.
Ramshaw, L., & Marcus, M. (1995). Text chunking using transformation-based learning. In D. Yarovsky & K. Church (Eds.), Proceedings of the third workshop on very large corpora (pp. 82–94), Somerset, New Jersey, 1995. Mant-de-Marsan: ACL.
Ruzzo, W. L. (1979) On the complexity of general context-free language parsing and recognition. In Proceedings of the 6th colloquium, on automata, languages and programming (pp. 489–497), London, UK, 1979. Berlin: Springer.
Sutton, R., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press.
Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In NIPS.
Titov, I., & Henderson, J. (2007). Incremental Bayesian networks for structure prediction. In ICML (pp. 887–894).
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In International conference on machine learning (ICML). New York: ACM.
Wisniewski, G., Denoyer, L., Francis, M., & Gallinari, P. (2007). Probabilistic model for structured document mapping. In 5th international conference on machine learning and data mining in pattern recognition (MLDM’07), Germany, 2007.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Charles Parker, Yasemin Altun, and Prasad Tadepalli.
Rights and permissions
About this article
Cite this article
Maes, F., Denoyer, L. & Gallinari, P. Structured prediction with reinforcement learning. Mach Learn 77, 271–301 (2009). https://doi.org/10.1007/s10994-009-5140-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5140-8