Abstract
This paper deals with an Optical Character Recognition (OCR) system for printedOriya script. The development of OCR for this script is difficult because a large number of character shapes in the script have to be recognized. In the proposed system, the document image is first captured using a flat-bed scanner and then passed through different preprocessing modules like skew correction, line segmentation, zone detection, word and character segmentation etc. These modules have been developed by combining some conventional techniques with some newly proposed ones. Next, individual characters are recognized using a combination of stroke and run-number based features, along with features obtained from the concept of water overflow from a reservoir. The feature detection methods are simple and robust, and do not require preprocessing steps like thinning and pruning. A prototype of the system has been tested on a variety of printed Oriya material, and currently achieves 96.3% character level accuracy on average.
Similar content being viewed by others
References
Akiyama T, Hagita N 1990 Automatic entry system for printed documents.Pattern Recogn. 23:1141–1154
Bozinovic R M, Srihari S N 1989 Off line cursive script word recognition.IEEE Trans. Pattern Anal. Machine Intell. PAMI-11: 68–83.
Chaudhuri B B, Pal U 1997 Skew angle detection of digitized Indian script documents,IEEE Trans. Pattern Anal. Machine Intell. PAMI 19: 182–186
Chaudhuri B B, Pal U 1997 An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi).Proc. Fourth Int. Conf. on Document Analysis and Recognition (Los Alamitos, CA: IEEE Comput. Soc.) pp 1011–1016
Chaudhuri B B, Pal U 1998 A complete printed Bangla OCR system.Pattern Recogn. 31: 531–549
Dutta A K, Chaudhuri S 1993 Bengali alpha-numeric character recognition using curvature features.Pattern Recogn. 26: 1757–1770
Garain U, Chaudhuri B B 1998 Compound character recognition by run number based metric distance.Proc. SPIE Annual Symposium on Electronic Imaging, San Jose, USA, pp 90–97
Govindan V K, Shivaprasad A P 1990 Character recognition -a survey.Pattern Recogn. 23: 671–683
Hinds S C, Fisher J L, D’Amato D P 1990 A document skew detection method using run-length encoding and the Hough transform.Proc. 10th Int. Conf. on Pattern Recognition (Los Alamitos, CA: IEEE Comput. Soc.) vol. 1, pp 464–468
Le D S, Thoma G R, Wechsler H 1994 Automatic page orientation and skew angle detection for binary document images.Pattern Recogn. 27: 1325–1344
Lehal G S, Singh C 2000 A Gurmukhi script recognition system.Proc. 15th Int. Conf. on Pattern Recognition (Los Alamitos, CA: IEEE Comput. Soc.) vol. 2, pp 557–560
Mantas J 1986 An overview of character recognition methodologies.Pattern Recogn. 19: 425–430
Mori S, Suen C Y, Yamamoto K 1992 Historical review of OCR research and development.Proc. IEEE 80: 1029–1058
O’Gorman L 1993 The document spectrum for page layout analysis.IEEE Trans. Pattern Anal. Machine Intell. PAMI-15: 1162–1173
Pal U, Chaudhuri B B 1997 Printed Devnagari script OCR system.Vivek 10: 12–24
Pavlidis T, Zhou J 1992 Page segmentation and classification.Comput. Vision Graphics Image Process. 54: 484–96
Sinha R M K 1987 Rule based contextual post processing for Devnagari text recognition.Pattern Recogn. 20: 475–85
Siromony G, Chandrasekaran R, Chandrasekaran M 1978 Computer recognition of printed Tamil characters.Pattern Recogn. 10: 243–247
Wang P S P 1991Character and handwritten recognition (Singapore: World Scientific)
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chaudhuri, B.B., Pal, U. & Mitra, M. Automatic recognition of printed Oriya script. Sadhana 27, 23–34 (2002). https://doi.org/10.1007/BF02703310
Issue Date:
DOI: https://doi.org/10.1007/BF02703310