Abstract
Language Models (LMs) capture the contextual dependencies of a language and assign higher probabilities to well-formed sequences of words. For that reason, LMs have been commonly used in generic handwriting recognition, improving recognition results. In this paper, we present the integration of a Language Model along with a dictionary into a graph-based recognizer, which aims at transcribing handwritten historical documents. The results of such integration show a significant improvement on word accuracy when applied to our corpora.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Burger, T., Kessentini, Y., Paquet, T.: Dempster-shafer based rejection strategy for handwritten word recognition. In: Proc. 2011 Int. Conf. on Document Analysis and Recognition (ICDAR 2011), pp. 528–532 (2011)
Chowdhury, S., Garain, U., Chattopadhyay, T.: A weighted finite-state transducer (wfst)-based language model for online indic script handwriting recognition. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 599–602 (September 2011)
Cortes, C., Vapnik, V.: Support-vector networks. Maching Learning 20(3), 273–297 (1995)
Fischer, A., Frinken, V., Bunke, H., Suen, C.Y.: Improving hmm-based keyword spotting with character language models. In: ICDAR, pp. 506–510 (2013)
Frinken, V., Bunke, H.: Self-training for handwritten text line recognition. In: Bloch, I., Cesar Jr, R.M. (eds.) CIARP 2010. LNCS, vol. 6419, pp. 104–112. Springer, Heidelberg (2010)
Frinken, V., Fischer, A., Bunke, H.: Combining neural networks to improve performance of handwritten keyword spotting. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 215–224. Springer, Heidelberg (2010)
Frinken, V., Fischer, A., Bunke, H., Fornés, A.: Co-training for handwritten word recognition. In: Proc. 2011 Int. Conf. on Document Analysis and Recognition (ICDAR 2011), pp. 314–318 (2011)
Fujisawa, Y., Shi, M., Wakabayashi, T., Kimura, F.: Handwritten numeral recognition using gradient and curvature of gray scale image. In: Proc. 5th Int. Conf. on Document Analysis and Recognition (ICDAR 1999), pp. 277–300 (1999)
He, C.L.: Error Analysis of a Hybrid Multiple Classifier System for Recognizing Unconstrained Handwritten Numerals. PhD thesis, Computer Science Department, Concordia University, Montreal, Canada (September 2010)
Leydier, Y., Lebourgeois, F., Emptoz, H.: Omnilingual Segmentation-freeWord Spotting for Ancient Manuscripts Indexation. In: Proc. 8th Int. Conf. on Document Analysis and Recognition (ICDAR 2005), pp. 533–537 (2005)
Leydier, Y., Lebourgeois, F., Emptoz, H.: Text search for medieval manuscript images. Pattern Recogntion 40(12), 3552–3567 (2007)
Leydier, Y., Ouji, A., LeBourgeois, F., Emptoz, H.: Towards an omnilingual word retrieval system for ancient manuscripts. Pattern Recognition 42(9), 2089–2105 (2009)
Liwicki, M., Bunke, H.: Feature selection for HMM and BLSTM based handwriting recognition of whiteboard notes. Int. Journal on Pattern Recognition and Artificial Intelligence 23(5), 907–923 (2009)
Meza-Lovón, G.L.: A graph-based approach for transcribing ancient documents. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.) IBERAMIA 2012. LNCS, vol. 7637, pp. 210–220. Springer, Heidelberg (2012)
Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proc. 7th Int. Conf. on Document Analysis and Recognition (ICDAR 2003), pp. 218–222. IEEE Computer Society (2003)
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 521–527 (2003)
Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. Journal on Document Analysis and Recognition, 139–152 (2007)
Romero, V., Andreu Sanchez, J.: Category-based language models for handwriting recognition of marriage license books. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 788–792 (August 2013)
Romero, V., Pastor, M.: Computer Assisted Transcription of Text Images. In: Multimodal Interactive Pattern Recognition and Applications. Springer (2011)
Romero, V., Rodríguez-Ruiz, L.: Computer Assisted Transcription: General Framework. In: Multimodal Interactive Pattern Recognition and Applications. Springer (2011)
Roy, U., Sankaran, N., Sankar, K., Jawahar, C.: Character n-gram spotting on handwritten documents using weakly-supervised segmentation. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 577–581 (August 2013)
Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: Holistic Urdu handwritten word recognition using support vector machine. In: Proc. of the 9th International Conference on Pattern Recognition (ICPR 2010), pp. 1900–1903 (2010)
Toselli, A.H., Romero, V., Pastor, M., Vidal, E.: Multimodal interactive transcription of text images. Pattern Recognition 43(5), 1814–1825 (2010)
Wang, Q.-F., Yin, F., Liu, C.-L.: Integrating language model in handwritten chinese text recognition. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 1036–1040 (July 2009)
Zhang, H., Zhou, X.-D., Liu, C.-L.: Keyword spotting in online chinese handwritten documents with candidate scoring based on semi-crf model. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 567–571 (August 2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Meza-Lovón, G.L. (2014). A Language Model for Improving the Graph-Based Transcription Approach for Historical Documents. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-12027-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12026-3
Online ISBN: 978-3-319-12027-0
eBook Packages: Computer ScienceComputer Science (R0)