Abstract
Keyword spotting refers to the process of retrieving all instances of a given key word in a document. In the present paper, a novel keyword spotting system for handwritten documents is described. It is derived from a neural network based system for unconstrained handwriting recognition. As such it performs template-free spotting, i.e. it is not necessary for a keyword to appear in the training set. The keyword spotting is done using a modification of the CTC Token Passing algorithm. We demonstrate that such a system has the potential for high performance. For example, a precision of 95% at 50% recall is reached for the 4,000 most frequent words on the IAM offline handwriting database.
Chapter PDF
Similar content being viewed by others
References
Vinciarelli, A.: A Survey On Off-Line Cursive Word Recognition. Pattern Recognition 35(7), 1433–1446 (2002)
Plamondon, R., Srihari, S.N.: On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey. IEEE Transaction on Pattern Analysis and Machine Intelligence 22(1), 63–84 (2000)
Levy, S.: Google’s two revolutions, Newsweek (December 27/January 3, 2004)
Kołcz, A., Alspector, J., Augusteijn, M.F., Carlson, R., Popescu, G.V.: A Line-Oriented Approach to Word Spotting in Handwritten Documents. Pattern Analysis and Applications 3, 153–168 (2000)
Manmatha, R., Rath, T.M.: Indexing of Handwritten Historical Documents - Recent Progress. In: Symposium on Document Image Understanding Technology, pp. 77–85 (2003)
Rath, T.M., Manmatha, R.: Word Image Matching Using Dynamic Time Warping. Computer Vision and Pattern Recognition 2, 521–527 (2003)
Ataer, E., Duygulu, P.: Matching Ottoman Words: An Image Retrieval Approach to Historical Document Indexing. In: 6th Int’l. Conf. on Image and Video Retrieval, pp. 341–347 (2007)
Leydier, Y., Lebourgeois, F., Emptoz, H.: Text Search for Medieval Manuscript Images. Pattern Recognition 40, 3552–3567 (2007)
Srihari, S.N., Srinivasan, H., Huang, C., Shetty, S.: Spotting Words in Latin, Devanagari and Arabic Scripts. Indian Journal of Artificial Intelligence 16(3), 2–9 (2006)
Zhang, B., Srihari, S.N., Huang, C.: Word Image Retrieval Using Binary Features. In: Proceedings of the SPIE, vol. 5296, pp. 45–53 (2004)
Edwards, J., Whye, Y., David, T., Roger, F., Maire, B.M., Vesom, G.: Making Latin Manuscripts Searchable using gHMM’s. In: Advances in Neural Information Processing Systems (NIPS), vol. 17, pp. 385–392. MIT Press, Cambridge (2004)
Cao, H., Govindaraju, V.: Template-free Word Spotting in Low-Quality Manuscripts. In: 6th Int’l. Conf. on Advances in Pattern Recognition (2007)
Marti, U.V., Bunke, H.: The IAM-Database: An English Sentence Database for Offline Handwriting Recognition. Int’l. Journal on Document Analysis and Recognition 5, 39–46 (2002)
Marti, U.V., Bunke, H.: Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System. Int’l. Journal of Pattern Recognition and Artificial Intelligence 15, 65–90 (2001)
Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transaction on Pattern Analysis and Machine Intelligence 31(5), 855–868 (2009)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist Temporal Classification: Labelling Unsegmented Sequential Data with Recurrent Neural Networks. In: 23rd Int’l. Conf. on Machine Learning, pp. 369–376 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Frinken, V., Fischer, A., Bunke, H. (2010). A Novel Word Spotting Algorithm Using Bidirectional Long Short-Term Memory Neural Networks. In: Schwenker, F., El Gayar, N. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2010. Lecture Notes in Computer Science(), vol 5998. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12159-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-12159-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12158-6
Online ISBN: 978-3-642-12159-3
eBook Packages: Computer ScienceComputer Science (R0)