Skip to main content
Log in

Automatic recognition of printed Oriya script

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

This paper deals with an Optical Character Recognition (OCR) system for printedOriya script. The development of OCR for this script is difficult because a large number of character shapes in the script have to be recognized. In the proposed system, the document image is first captured using a flat-bed scanner and then passed through different preprocessing modules like skew correction, line segmentation, zone detection, word and character segmentation etc. These modules have been developed by combining some conventional techniques with some newly proposed ones. Next, individual characters are recognized using a combination of stroke and run-number based features, along with features obtained from the concept of water overflow from a reservoir. The feature detection methods are simple and robust, and do not require preprocessing steps like thinning and pruning. A prototype of the system has been tested on a variety of printed Oriya material, and currently achieves 96.3% character level accuracy on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akiyama T, Hagita N 1990 Automatic entry system for printed documents.Pattern Recogn. 23:1141–1154

    Article  Google Scholar 

  • Bozinovic R M, Srihari S N 1989 Off line cursive script word recognition.IEEE Trans. Pattern Anal. Machine Intell. PAMI-11: 68–83.

    Article  Google Scholar 

  • Chaudhuri B B, Pal U 1997 Skew angle detection of digitized Indian script documents,IEEE Trans. Pattern Anal. Machine Intell. PAMI 19: 182–186

    Article  Google Scholar 

  • Chaudhuri B B, Pal U 1997 An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi).Proc. Fourth Int. Conf. on Document Analysis and Recognition (Los Alamitos, CA: IEEE Comput. Soc.) pp 1011–1016

    Chapter  Google Scholar 

  • Chaudhuri B B, Pal U 1998 A complete printed Bangla OCR system.Pattern Recogn. 31: 531–549

    Article  Google Scholar 

  • Dutta A K, Chaudhuri S 1993 Bengali alpha-numeric character recognition using curvature features.Pattern Recogn. 26: 1757–1770

    Article  Google Scholar 

  • Garain U, Chaudhuri B B 1998 Compound character recognition by run number based metric distance.Proc. SPIE Annual Symposium on Electronic Imaging, San Jose, USA, pp 90–97

  • Govindan V K, Shivaprasad A P 1990 Character recognition -a survey.Pattern Recogn. 23: 671–683

    Article  Google Scholar 

  • Hinds S C, Fisher J L, D’Amato D P 1990 A document skew detection method using run-length encoding and the Hough transform.Proc. 10th Int. Conf. on Pattern Recognition (Los Alamitos, CA: IEEE Comput. Soc.) vol. 1, pp 464–468

    Chapter  Google Scholar 

  • Le D S, Thoma G R, Wechsler H 1994 Automatic page orientation and skew angle detection for binary document images.Pattern Recogn. 27: 1325–1344

    Article  Google Scholar 

  • Lehal G S, Singh C 2000 A Gurmukhi script recognition system.Proc. 15th Int. Conf. on Pattern Recognition (Los Alamitos, CA: IEEE Comput. Soc.) vol. 2, pp 557–560

    Chapter  Google Scholar 

  • Mantas J 1986 An overview of character recognition methodologies.Pattern Recogn. 19: 425–430

    Article  Google Scholar 

  • Mori S, Suen C Y, Yamamoto K 1992 Historical review of OCR research and development.Proc. IEEE 80: 1029–1058

    Article  Google Scholar 

  • O’Gorman L 1993 The document spectrum for page layout analysis.IEEE Trans. Pattern Anal. Machine Intell. PAMI-15: 1162–1173

    Article  Google Scholar 

  • Pal U, Chaudhuri B B 1997 Printed Devnagari script OCR system.Vivek 10: 12–24

    Google Scholar 

  • Pavlidis T, Zhou J 1992 Page segmentation and classification.Comput. Vision Graphics Image Process. 54: 484–96

    Google Scholar 

  • Sinha R M K 1987 Rule based contextual post processing for Devnagari text recognition.Pattern Recogn. 20: 475–85

    Article  Google Scholar 

  • Siromony G, Chandrasekaran R, Chandrasekaran M 1978 Computer recognition of printed Tamil characters.Pattern Recogn. 10: 243–247

    Article  Google Scholar 

  • Wang P S P 1991Character and handwritten recognition (Singapore: World Scientific)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chaudhuri, B.B., Pal, U. & Mitra, M. Automatic recognition of printed Oriya script. Sadhana 27, 23–34 (2002). https://doi.org/10.1007/BF02703310

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02703310

Keywords

Navigation