Skip to main content
Log in

Ancient text recognition: a review

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Optical character recognition (OCR) is an important research area in the field of pattern recognition. A lot of research has been done on OCR in the last 60 years. There is a large volume of paper-based data in various libraries and offices. Also, there is a wealth of knowledge in the form of ancient text documents. It is a challenge to maintain and search from this paper-based data. At many places, efforts are being done to digitize this data. Paper based documents are scanned to digitize data but scanned data is in pictorial form. It cannot be recognized by computers because computers can understand standard alphanumeric characters as ASCII or some other codes. Therefore, alphanumeric information must be retrieved from scanned images. Optical character recognition system allows us to convert a document into electronic text, which can be used for edit, search, etc. operations. OCR system is the machine replication of human reading and has been the subject of intensive research for more than six decades. This paper presents a comprehensive survey of the work done in the various phases of an OCR with special focus on the OCR for ancient text documents. This paper will help the novice researchers by providing a comprehensive study of the various phases, namely, segmentation, feature extraction and classification techniques required for an OCR system especially for ancient documents. It has been observed that there is a limited work is done for the recognition of ancient documents especially for Devanagari script. This article also presents future directions for the upcoming researchers in the field of ancient text recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Acharya S, Pant AK, Gyawali PK (2015) Deep learning based large scale handwritten Devanagari character recognition. In: Proceedings of the 9th international conference on software, knowledge, information management and applications (SKIMA), pp 1–6

  • Adiguzel H, Sahin E, Dugulu P (2012) A hybrid approach for line segmentation in handwritten documents. In: Proceedings of the international conference on frontiers in handwriting recognition (ICFHR), pp 503–508

  • Aggarwal A, Rani R, Dhir R (2012) Handwritten Devanagari character recognition using gradient features. Int J Adv Res Comput Sci Softw Eng 2(5):85–90

    Google Scholar 

  • Alaei A, Nagabhushan P, Pal U (2011) Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents. Pattern Anal Appl 14(4):381–394

    MathSciNet  Google Scholar 

  • Alam M, Kashem AM (2010) A complete Bangla OCR system for printed characters. J Cases Inf Technol 1(1):30–35

    Google Scholar 

  • Alizadehashraf B, Roohi S (2017) Persian handwritten character recognition using convolutional neural network. In: Proceedings of the 10th Iranian conference on machine vision and image processing, pp 247–251

  • Almazan EJ, Tal R, Qian Y, Elder JH (2017) MCMLSD: a dynamic programming approach to line segment detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5854–5862

  • Ameta D (2017) Ensemble classifier approach in breast cancer detection and malignancy grading—a review. Int J Manag Public Sector Inf Commun Technol (IJMPICT) 8(1):17–26

    Google Scholar 

  • Angadi SA, Kodabagi MM (2014) A robust segmentation technique for line, word and character extraction from Kannada text in low resolution display board images. In: Proceedings of the fifth international conference on signals and image processing, pp 42–49

  • Arivazhagan M, Srinivasan H, Srihari S (2007) A statistical approach to line segmentation in handwritten documents. In: Proceedings of SPIE document recognition and retrieval, pp 1–11

  • Arora S, Bhatcharjee D, Nasipuri M, Malik L (2007) A two stage classification approach for handwritten Devnagari characters. In: Proceedings of the international conference on computational intelligence and multimedia applications (ICCIMA 2007), Sivakasi, Tamil Nadu, pp 399–403

  • Arvanitopoulos N, Süsstrunk S (2014) Seam carving for text line extraction on color and grayscale historical manuscripts. In: Proceedings of the 14th international conference on frontiers in handwriting recognition (ICFHR-14), pp 721–726

  • Avadesh M, Goyal N (2018) Optical character recognition for sanskrit using convolution neural networks. In: Proceedings of the 13th IAPR international workshop on document analysis systems (DAS), Vienna, pp 447–452

  • Bag S, Harit G (2013) A survey on optical character recognition for Bangla and Devnagari scripts. Sadhana 38(1):133–168

    Google Scholar 

  • Bag S, Krishna A (2015) Character segmentation of hindi unconstrained handwritten words. Proc 17th Int Workshop Comb Image. Anal 9448:247–260

    MathSciNet  Google Scholar 

  • Bansal V, Sinha RMK (2001) A complete OCR for printed Hindi text in Devanagari script. In: Proceedings of the 6th international conference on document analysis and recognition, pp 800–804

  • Bansal V, Sinha R, Kumar MK (2002) Segmentation of touching and fused Devanagari characters. Pattern Recognit 35(4):875–893

    MATH  Google Scholar 

  • Bar-Yosef A, Hagbi N, Kedem K, Dinstein I (2009) Line segmentation for degraded handwritten historical documents. In: Proceedings of the 10th international conference on document analysis and recognition, Barcelona, pp 1161–1165

  • Basu S, Chaudhuri C, Kundu M, Nasipuri M, Basu DK (2007) Text line extraction from multi-skewed handwritten documents. Pattern Recognit 40(6):1825–1839

    MATH  Google Scholar 

  • Beare R (2006) A locally constrained watershed transform. IEEE Trans Pattern Anal Mach Intell 28(7):1063–1074

    Google Scholar 

  • Belaid A (1997) OCR print - an overview. In: Survey of the state of the art in human language technology, pp 71–74

  • Bhopi SA, Singh MP (2018) Review on optical character recognition of Devanagari script using neural network. Int J Future Revolut Comput Sci Commun Eng 4(3):415–420

    Google Scholar 

  • Boiangiu CA, Tanase MC, Ioanitescu R (2014) Handwritten documents text line segmentation based on information energy. Int J Comput Commun Control 9(1):8–15

    Google Scholar 

  • Brodic D (2012) Extended approach to water flow algorithm for text line segmentation. J Comput Sci Technol 27(1):187–194

    Google Scholar 

  • Brodic D (2015) Text line segmentation with water flow algorithm based on power function. J Electr Eng 66(3):132–141

    Google Scholar 

  • Casey RG, Lecolinet E (1996) A survey of methods and strategies in character segmentation. IEEE Trans Pattern Anal Mach Intell 18(7):690–706

    Google Scholar 

  • Cecotti H, Belaid A(2005) Hybrid OCR combination approach complemented by a specialized ICR applied on ancient documents. In: Proceedings of the 8th international conference on document analysis and recognition, vol 2, pp 1045–1049

  • Chen YK, Wang JF (2000) Segmentation of single or multiple-touching handwritten numeral string using background and foreground analysis. IEEE Trans Pattern Anal Mach Intell 22(11):1304–1317

    MathSciNet  Google Scholar 

  • Chen K, Seuret M, Liwicki M, Hennebert J, Liu C, Ingold R (2016) Page segmentation for historical handwritten document images using conditional random fields. In: Proceedings of the 15th international conference on frontiers in handwriting recognition (ICFHR), Shenzhen, pp 90–95

  • Clausner C, Antonacopoulos A, Pletschacher S (2012) A robust hybrid approach for text line segmentation in historical documents. In: Proceedings of the 21st international conference on pattern recognition (ICPR), pp 335–338

  • Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (CVPR'05), vol 1, pp 886–893

  • Daugman JG (1980) Two-dimensional spectral analysis of cortical receptive field profiles. Vision Res 20(10):847–856

    Google Scholar 

  • Deng D, Chan KP, Yu Y (1994) Handwritten Chinese character recognition using spatial Gabor filters and self-organizing feature maps. In: Proceedings of the international conference on image processing, vol 3, pp 940–944

  • Devijver PA, Kittler J (1985) Pattern recognition: a statistical approach. Image Vis Comput 3(2):87–88

    MATH  Google Scholar 

  • Diem M, Sablatnig R (2009) Recognition of degraded handwritten characters using local features. In: Proceedings of the 10th international conference on document analysis and recognition, pp 221–225

  • Diem M, Sablatnig R (2010) Recognizing characters of ancient manuscripts. In: Proceedings of the international conference on computer image analysis in the study of art, pp 753106–753112

  • Din I, Malik Z, Siddiqi I, Khalid S (2016) Line and ligature segmentation in printed Urdu document images. J Appl Environ Biol Sci 6(3S):114–120

    Google Scholar 

  • Ding X, Li Y, Belatreche A, Maguire L (2012) Constructing minimum volume surfaces using level set methods for novelty detection. In: Proceedings of the 2012 international joint conference on neural networks (IJCNN), Brisbane, QLD, pp 1–6

  • Dongre VJ, Mankar VH (2010) A review of research on Devanagari character recognition. Int J Comput Appl 12(2):8–15

    Google Scholar 

  • Dunn CE, Wang PSP (1992) Character segmentation techniques for handwritten text-a survey. In: Proceedings of the 11th international conference on recognition methodology and systems, vol 2, pp 577–580

  • Dutta K, Krishnan P, Mathew M, Jawahar CV (2018) Offline handwriting recognition on Devanagari using a new benchmark dataset. In: Proceedings of the 13th IAPR international workshop on document analysis systems (DAS), Vienna, pp 25–30

  • Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660

    MathSciNet  MATH  Google Scholar 

  • Feldbach M, Tonnies KD (2001) Line detection and segmentation in historical church registers. In: Proceedings of the 6th international conference on document analysis and recognition, pp 743–747

  • Fragkou P, Petridis V, Kehagias A (2004) A dynamic programming algorithm for linear text segmentation. J Intell Inf Syst 23(2):179–197

    MATH  Google Scholar 

  • Fujisawa H, Nakano Y, Kurino K (1992) Segmentation methods for character recognition: from segmentation to document structure analysis. Proc IEEE 80(7):1079–1092

    Google Scholar 

  • Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press, New York

    MATH  Google Scholar 

  • Garz A, Sablatnig R, Diem M (2011) Using local features for efficient layout analysis of ancient manuscripts. In: Proceedings of the 19th European Signal Processing Conference (EUSIPCO) Barcelona, Spain, pp 1259–1263

  • Gatos B, Louloudis G, Stamatopoulos N (2014) Segmentation of historical handwritten documents into text zones and text lines. In: Proceedings of the international conference on frontiers in handwriting recognition, ICFHR, pp 464–469. https://doi.org/10.1109/ICFHR.2014.84

  • Ghosh R, Roy PP (2015) Study of two zone-based features for online Bengali and Devanagari character recognition. In: Proceedings of the 13th international conference on document analysis and recognition (ICDAR), pp 401–405

  • Gomathi R, Uma RS, Mohanval S (2012) Segmentation of touching, overlapping, skewed and short handwritten text lines. Int J Comput Appl 49:24–27

    Google Scholar 

  • Gonzalez RC, Woods RE (1992) Digital image processing. Prentice-Hall, NewDelhi

    Google Scholar 

  • Holambe N, Thool RC, Jagade SM (2011) A brief review and survey of feature extraction methods for Devnagari OCR. In: Proceedings of the 9th international conference on ICT and knowledge engineering, pp 99–104

  • Hussain E, Hannan A, Kashyap K (2015) A zoning based feature extraction method for recognition of handwritten assamese characters. Int J Comput Sci Technol 6(2):226–228

    Google Scholar 

  • Jangid M, Srivastava S (2016) Accuracy enhancement of devanagari character recognition by gray level normalization. In: Proceedings of the 7th international conference on computing communication and networking technologies ACM, p 25

  • Jetley S, Belhe S, Koppula VK, Negi A (2012) Two-stage hybrid binarization around fringe map based text line segmentation for document images. In: Proceedings of the international conference on pattern recognition, pp 343–346

  • Jindal S, Lehal GS (2012) Line segmentation of handwritten Gurmukhi manuscripts. In: Proceedings of the document analysis and recognition, Mumbai, IN, India Copyright 2012 ACM, pp 74–78

  • Jindal MK, Sharma RK, Lehal GS (2007) Segmentation of horizontally overlapping lines in printed Indian scripts. Int J Comput Intell Res 3(4):277–286

    Google Scholar 

  • Jindal MK, Sharma DV, Lehal GS (2008) Structural features for recognising degraded printed Gurmukhi script. In: Proceedings of the 5th international conference on information technology, pp 668–673

  • Jindal M, Lehal G, Sharma RK (2009) On segmentation of touching characters and overlapping lines in degraded printed gurmukhi script. Int J Image Graph 9:321–353

    Google Scholar 

  • Jindal MK, Garg NK, Kaur L (2010) Segmentation of handwritten Hindi text. Int J Comput Appl 1:19–23

    Google Scholar 

  • Katiyar G, Mehfuz S (2016) A hybrid recognition system for off-line handwritten characters. SpringerPlus 5:1–18

    Google Scholar 

  • Kavitha AS, Shivakumara P, Hemantha G (2013) Skewness and nearest neighbour based approach for historical document classification. In: Proceedings of the international conference on communication systems and network technologies, pp 602–606

  • Kennard DJ, Barrett WA (2006) Separating lines of text in free-form handwritten historical documents. In: Proceedings of the 2nd international conference on document image analysis for libraries (DIAL-06), pp 12–23

  • Khanale PB, Chitnis SD (2011) Handwritten Devanagari character recognition using artificial neural network. J Artif Intell 4(1):55–62

    Google Scholar 

  • Khanduja D, Nain N, Panwar S (2016) A hybrid feature extraction algorithm for Devanagari script. ACM Trans Asian Low-Resour Lang Inf Process 15(1):2

    Google Scholar 

  • Khodadad M, Sid-Ahmed E, Raheem A (2011) Online Arabic/Persian character recognition using neural network classifier and DCT features. In: Proceedings of the 54th international Midwest symposium on circuits and systems, pp 1–4

  • Kim KK, Kim JH, Suen CY (2000) Recognition of unconstrained handwritten numeral strings by composite segmentation method. In: Proceedings of the 15th international conference on pattern recognition, pp 594–597

  • Kim MS, Jang MD, Choi HL, Rhee TH, Kim JH, Kwag HK (2004) Digitalizing scheme of handwritten Hanja historical documents. In: Proceedings of the first international workshop on document image analysis for libraries, pp 321–327

  • Kim K, Choi H, Oh K (2017) Object Detection using ensemble of linear classifiers with fuzzy adaptive boosting. EURASIP J Image Video Process 17:40

    Google Scholar 

  • Kimura F, Shridhar M (1991) Handwritten numerical recognition based on multiple algorithms. Pattern Recognit 24(10):969–983

    Google Scholar 

  • Kleber F, Sablatnig R, Gau M, Miklas H (2008) Ancient document analysis based on text line extraction. In: Proceedings of the 19th international conference on pattern recognition, pp 1–4

  • Kobayashi T, Hidaka A, Kurita T (2007) Selection of histograms of oriented gradients features for pedestrian detection. In: Proceedings of the international conference on neural information processing, pp 598–607

  • Koppula VK, Negi A (2011) Fringe map based text line segmentation of printed Telugu document images. In: Proceedings of the international conference on document analysis and recognition (ICDAR-11), pp 1294–1298

  • Kumar S (2016) A study for handwritten Devanagari word recognition. In: Proceedings of the international conference on communication and signal processing (ICCSP), pp 1009–1014

  • Kumar D, Gupta D (2018) Review on optical character recognition for off-line Devanagari handwritten characters & challenges. Int J Sci Res Comput Sci Eng Inf Technol 3(3):1364–1367

    Google Scholar 

  • Kumar KSS, Namboodiri AM, Jawahar CV (2006) Learning segmentation of documents with complex scripts. In: Fifth Indian conference on computer vision, graphics and image processing, Madurai, India, pp 749–760

  • Kumar M, Jindal MK, Sharma RK (2012) Offline handwritten Gurmukhi character recognition: study of different features and classifiers combinations. In: Proceedings of the international workshop on document analysis and recognition, IIT Bombay, pp 94–99

  • Kumar M, Sharma RK, Jindal MK (2013) A novel feature extraction technique for offline handwritten Gurmukhi character recognition. IETE J Res 59(6):687–692

    Google Scholar 

  • Kumar M, Jindal MK, Sharma RK (2014) A novel hierarchical techniques for offline handwritten Gurmukhi character recognition. Natl Acad Sci Lett 37(6):567–572

    Google Scholar 

  • Kumar M, Sharma RK, Jindal MK (2018) Character and numeral recognition for non-Indic and Indic scripts: a survey. Artif Intell Rev 52:2235–2261

    Google Scholar 

  • Lawgali A, Bouridane A, Angelova M, Ghassemlooy Z (2011) Handwritten Arabic character recognition: which feature extraction method. Int J Adv Sci Technol 34:1–8

    Google Scholar 

  • Lebourgeois F (1997) Robust multi-font OCR system from gray level images. In: Proceedings of the international conference on document analysis and recognition, vol 1, pp 1–5

  • Lehal GS (2009) Optical character recognition of Gurmukhi script using multiple classifiers. In: Proceedings of the international workshop on multilingual OCR, p 7

  • Lehal GS, Dhir R (1999) A range free skew detection technique for digitized Gurmukhi script documents. In: Proceedings of the fifth international conference on document analysis and recognition, pp 147–152

  • Lehal GS, Singh C (1999) Feature extraction and classification for OCR of Gurmukhi script. Vivek 12(2):2–12

    Google Scholar 

  • Likforman-Sulem L, Hanimyan A, Faure C (1995) A Hough based algorithm for extracting text lines in handwritten documents. In: Proceedings of the 3rd international conference on document analysis and recognition, Montreal, Canada, vol 2, pp 774–777

  • Liu N, Han W (2007) Recognition of human faces using discrete cosine transform filtered trace feature. In: Proceedings of the 6th international conference on information, communications & signal processing (ICICS), pp 1–5

  • Liwicki M, Indermuhle E, Bunke H (2007) On-line handwritten text line detection using dynamic programming. In: Proceedings of the 9th international conference on document analysis and recognition (ICDAR 07), vol 1, pp 447–451

  • Louloudis G, Gatos B, Pratikakis I, Halatsis C (2009) Text line and word segmentation of handwritten documents. Pattern Recognit 42(12):3169–3183

    MATH  Google Scholar 

  • Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Google Scholar 

  • Mahadevan U, Nagabushnam RC (1995) Gap metrics for word separation in handwritten lines. In: Proceedings of the 3rd international conference on document analysis and recognition (ICDAR-95), pp 124–127

  • Manjusha k, Kumar S, Rajendran J, Soman KP (2012) Hindi character segmentation in document images using level set methods and non-linear diffusion. Int J Comput Appl 44(16):42–47

    Google Scholar 

  • Manmatha R, Rothfeder JL (2005) A scale space approach for automatically segmenting words from historical handwritten documents. IEEE Trans Pattern Anal Mach Intell 27(8):1212–1225

    Google Scholar 

  • Mantas J (1986) An overview of character recognition methodologies. Pattern Recognit 19(6):425–430

    Google Scholar 

  • Messaoud B, El-Abed H, Amiri H, Margner V (2012) A multilevel textline segmentation framework for handwritten historical documents. In: Proceedings of the international conference on frontiers of handwriting recognition (ICFHR-12), pp 513–518

  • Monro DM, Rakshit S, Zhang D (2007) DCT-based iris recognition. IEEE Trans Pattern Anal Mach Intell 29(4):586–595

    Google Scholar 

  • Narang SR, Jindal MK (2018) Issues in Devanagari ancient character recognition: a study. J Adv Sch Res Allied Educ 15(10):6–11

    Google Scholar 

  • Narang SR, Jindal MK, Sharma P (2018) Devanagari ancient character recognition using HOG and DCT features. In: Proceedings of the 5th IEEE international conference on parallel, distributed and grid computing (PDGC-2018), Solan, India

  • Narang SR, Jindal MK, Kumar M (2019) Devanagari ancient character recognition using DCT features with adaptive boosting and bootstrap aggregating. Soft Comput 23:13603–13614

    Google Scholar 

  • Ngo W, Chan CK (2005) Video text detection and segmentation for optical character recognition. Multimed Syst 10(3):261–272

    Google Scholar 

  • Nikolaou N, Makridis M, Gatos B, Stamatopoulos N, Papamarkos N (2010) Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths. Image Vis Comput 28(4):590–604

    Google Scholar 

  • O’Gorman L (1993) The document spectrum for page layout analysis. IEEE Trans Pattern Anal Mach Intell 15(11):1162–1173

    Google Scholar 

  • Oliveira LS, Lethelier E, Bortolozzi F, Sabourin R (2000) A new segmentation approach for handwritten digits. In: Proceedings of the 15th international conference on pattern recognition, vol 2, pp 323–326

  • Pal U, Chaudhuri BB (2004) Indian script character recognition: a survey. Pattern Recognit 37(9):1887–1899

    Google Scholar 

  • Pal U, Datta S (2003) Segmentation of Bangla unconstrained handwritten text. In: Proceedings of the international conference on document analysis and recognition, pp 1128–1132

  • Pal U, Belaı̈d A, Choisy C (2003) Touching numeral segmentation using water reservoir concept. Pattern Recognit Lett 24:261–272

    Google Scholar 

  • Panichkriangkrai C, Li L, Hachimura K (2013) Character segmentation and retrieval for learning support system of Japanese historical books. In: Proceedings of the ACM international conference Proceeding series, pp 118–122. https://doi.org/10.1145/2501115.2501129

  • Parisi R, Claudi ED, Lucarelli G, Orlandi G (1998) Car plate recognition by neural networks and image processing. In: Proceedings of the IEEE international symposium on circuits and systems, vol 3, pp 195–198

  • Phan TV, Nguyen KC, Nakagawa M (2016) A Nom historical document recognition system for digital archiving. Int J Doc Anal Recognit 19(1):49–64

    Google Scholar 

  • Phillips CL (1999) The level set method. MIT Undergrad J Math 1:155–164

    Google Scholar 

  • Ptak R, Żygadło B, Unold O (2017) Projection-based text line segmentation with a variable threshold. Int J Appl Math Comput Sci 27(1):195–206

    MathSciNet  MATH  Google Scholar 

  • Purkaystha B, Datta T, Islam MS (2017) Bengali handwritten character recognition using deep convolutional neural network. In: Proceedings of the 20th international conference of computer and information technology (ICCIT), pp 1–5

  • Quacimy E, Kerroum MA, Hammouch A (2014) Feature extraction based on DCT for handwritten digit recognition. Int J Comput Sci Issues 11(6):27–33

    Google Scholar 

  • Quo L, Boukir S (2014) Ensemble margin framework for image classification. In: Proceedings of the IEEE international conference on image processing, France, pp 4231–4235

  • Quo L, Boukir S (2017) Building an ensemble classifier using ensemble margin. Application to image classification. In: Proceedings of the 2017 IEEE international conference on image processing, Beijing, pp 4492–4496

  • Ramanathan R, Ponmathavan S, Thaneshwaran L, Nair AS, Valliappan N, Soman KP (2009) Tamil font recognition using gabor filters and support vector machines. In: Proceedings of the international conference on advances in computing, control, and telecommunication technologies, Trivandrum, Kerala, pp 613–615

  • Rani S (2015) Recognition of Gurmukhi handwritten manuscripts. Ph.D. thesis, Punjabi University, Patiala, India

  • Rani R, Dhir R, Lehal GS (2014) Gabor features based script identification of lines within a bilingual/trilingual document. Int J Adv Sci Technol 66:1–12

    Google Scholar 

  • Rao VN, Sastry ASCS, Chakravarthy A, SrinivasaRao AV (2015) Analysis of canonical character segmentation technique for ancient Telugu text documents. J Theor Appl Inf Technol 82(2):311–320

    Google Scholar 

  • Razak Z, Zulkiflee K, Idris MYI, Tamil EM, Noorzaily M (2008) Off-line handwriting text line segmentation: a review. Int J Comput Sci Netw Secur 8(7):12–20

    Google Scholar 

  • Reddy LP, Babu TR, Rao, NV &Babu BR (2010) Touching syllable segmentation using split profile algorithm. Int J Comput Sci 7(3):1–10

    Google Scholar 

  • Saabni R, Asi A, El-Sana J (2014) Text line extraction for historical document images. Pattern Recognit Lett 35:23–33

    Google Scholar 

  • Saha S, Basu S, Nasipuri M, Basu DK (2010) A Hough transform based technique for text segmentation. J Comput 2(2):134–141

    Google Scholar 

  • Sarkar R, Moulik S, Das N, Basu S, Nasipuri M, Kundu M (2011) Suppression of non-text components in handwritten document images. In: Proceedings of the international conference on image and information processing, pp 1–7

  • Sesh Kumar KS, Namboodiri AM, Jawahar CV (2006) Learning segmentation of documents with complex scripts. In: Proceedings of the fifth Indian conference on computer vision, graphics and image processing, Madurai, India, pp 749–760

  • Shah KR, Badgujar DD (2013) Devnagari handwritten character recognition (DHCR) for ancient documents: a review. In: Proceedings of IEEE conference on information and communication technology, pp 656–660

  • Shahi M, Ahlawat A, Pandey BN (2012) Literature survey on offline recognition of handwritten Hindi curve script using ANN approach. Int J Sci Res Publ 2(5):1–6

    Google Scholar 

  • Shao Y, Wang C, Xiao B (2014) A character image restoration method for unconstrained handwritten Chinese character recognition. Int J Doc Anal Recognit 18(1):73–86

    Google Scholar 

  • Shapiro VA (1993) From Radon to Hough transform of gray-scale images via digital halftoning. In: Proceedings of the 8th Scandinavian conference on image analysis, pp 665–672

  • Sharma DV, Lehal GS (2006) An iterative algorithm for segmentation of isolated handwritten words in Gurmukhi script. In: Proceedings of the 18th international conference on pattern recognition (ICPR'06), pp 1022–1025

  • Sharma N, Patnaik T, Kumar B (2013) Recognition for handwritten English letters: a review. Int J Eng Innov Technol 2(7):318–321

    Google Scholar 

  • Shelke S, Apte S (2015) A fuzzy-based classification scheme for unconstrained handwritten Devanagari character recognition. In: Proceedings of the international conference on communication, information & computing technology (ICCICT), pp 1–6

  • Shi Z, Govindaraju V (2004) Line separation for complex document images using fuzzy runlength. In: Proceedings of the international workshop on document image analysis for libraries, p 306

  • Shi Z, Setlur S, Govindaraju V (2005) Text extraction from gray scale historical document images using adaptive local connectivity map. In: Proceedings of the international conference on document analysis and recognition (ICDAR), vol 2, pp 794–798

  • Singh P, Budhiraja S (2011) Feature extraction and classification techniques in OCR systems for handwritten Gurmukhi script- a survey. Int J Eng Res Appl 1(4):1736–1739

    Google Scholar 

  • Singh J, Lehal GS (2014) Comparative performance analysis of feature (S)-classifier combination for Devanagari optical character recognition system. Int J Adv Comput Sci Appl 5(6):37–42

    Google Scholar 

  • Singh S, Aggarwal A, Dhir R (2012) Use of Gabor filters for recognition of handwritten Gurmukhi character. Int J Adv Res Comput Sci Softw Eng 2(5):234–240

    Google Scholar 

  • Singh D, Saini JP, Chauhan DS (2015) Hindi character recognition using RBF neural network and directional group feature extraction technique. In: Proceedings of the international conference on cognitive computing and information processing (CCIP), pp 1–4

  • Singh PK, Sarkar R, Nasipuri M (2016) A study of moment based features on handwritten digit recognition. Appl Comput Intell Soft Comput, Article ID 2796863

  • Sinha RMK, Mahabala HN (1979) Machine recognition of Devanagari script. IEEE Trans Syst Man Cybern 9(8):435–441

    MathSciNet  MATH  Google Scholar 

  • Souhar A, Boulid Y, Ameur EB, Ouagague MM (2017) Watershed transform for text lines extraction on binary Arabic handwritten documents. In: Proceedings of the 2nd international conference on big data, cloud and applications (BDCA'17). ACM, New York. https://doi.org/10.1145/3090354.3090444

  • Soumya A, Kumar HG (2015) Feature extraction and recognition of ancient Kannada epigraphs. Smart Innov Syst Technol 33:469–478

    Google Scholar 

  • Sousa JMC, Pinto JRC, Ribeiro CS, Gil JM (2005) Ancient document recognition using fuzzy methods. In: Proceedings of the IEEE international conference on fuzzy systems, pp 833–836

  • Sridevi N, Subashini P (2012) Segmentation of text lines and characters in ancient tamil script documents using computational intelligence techniques. Int J Comput Appl 52(14):7–12

    Google Scholar 

  • Sulem LL, Zahour A, Taconet B (2006) Text line segmentation of historical documents: a survey. Int J Doc Anal Recognit 9:123–138

    Google Scholar 

  • Sumetphong C, Tangwongsan S (2012) An optimal approach towards recognizing broken Thai characters in OCR systems. In: Proceedings of the international conference on digital image computing techniques and applications (DICTA), pp 1–5

  • Trier OD, Jain AK, Taxt T (1996) Feature extraction methods for character recognition – a survey. Pattern Recognit 29(4):641–642

    Google Scholar 

  • Tripathy N, Pal U (2004) Handwriting segmentation of unconstrained Oriya text. In: Proceedings of the international workshop on frontiers in handwriting recognition, pp 306–311

  • Tripathy N, Pal U (2006) Handwriting segmentation of unconstrained Oriya text. Sadhana 31:755–769

    Google Scholar 

  • Tseng YH, Lee HJ (1999) Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recognit Lett 20(8):791–806

    Google Scholar 

  • Verma R, Ali Z (2012) A survey of feature extraction and classification techniques in OCR systems. Int J Comput Appl Inf Technol 1(3):1–3

    Google Scholar 

  • Vincent L, Soille P (1991) Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans Pattern Anal Mach Intell 13(6):583–598

    Google Scholar 

  • Weliwitage C, Harvey AL, Jennings AB (2005) Handwritten document offline text line segmentation. In: Proceedings of the digital image computing: techniques and applications, pp 184–187

  • Wong K, Casey R, Wahl F (1982) Document analysis systems. IBM J Res Dev 26(6):647–656

    Google Scholar 

  • Yadav D, Sánchez-Cuadrado S, Morato J (2013) OCR for Hindi language using a neural network approach. J Inf Process Syst 9(1):117–140

    Google Scholar 

  • Yin F, Liu CL (2009) Handwritten Chinese text line segmentation by clustering with distance metric learning. Pattern Recognit 42(12):3146–3157

    MATH  Google Scholar 

  • Zahour A, Taconet B, Mercy P, Ramdane S (2001) Arabic hand-written text-line extraction. In: Proceedings of the sixth international conference on document analysis and recognition, pp 281–285

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munish Kumar.

Ethics declarations

Conflict of interest

Authors have declared that they have no conflict on interest in this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Narang, S.R., Jindal, M.K. & Kumar, M. Ancient text recognition: a review. Artif Intell Rev 53, 5517–5558 (2020). https://doi.org/10.1007/s10462-020-09827-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09827-4

Keywords

Navigation