Skip to main content
Log in

Abstract.

The increasing availability of high-performance, low-priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or wearable computers, and standalone image or video devices are highly mobile and easy to use; they can capture images of thick books, historical manuscripts too fragile to touch, and text in scenes, making them much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there will clearly be a demand in many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera-captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Baker S, Kanade T (2002) Limits on super-resolution and how to break them. IEEE Trans PAMI 24(9):1167-1183

    Google Scholar 

  2. Bayer BE: Color image array, US Patent 3971056

  3. Bertucci E, Pilu M, Mirmehdi M (2003) Text selection by structured light marking for hand-held cameras. In: In: Proc. ICDAR, pp 555-559

  4. Brown LG (1992) A survey of image registration techniques. ACM Comput Surv 24(4):325-376

    Google Scholar 

  5. Brown MS, Seales WB (2001) Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents. In: In: Proc. ICCV, pp 367-374

  6. Cai M, Song J-Q, Lyu MR (2002) A new approach for video text detection. In: In: Proc. ICIP, pp 117-120

  7. Cao H-G, Ding X-Q, Liu C-S (2003) Rectifying the bound document image captured by the camera: a model based approach. In: In: Proc. ICDAR, pp 71-75

  8. Chang SL, Chen LS, Chung YC, Chen SW (2004) Automatic license plate recognition. IEEE Trans Intell Transport Syst 5(1):42-53

    Google Scholar 

  9. Capel D, Zisserman A (2000) Super-resolution enhancement of text image sequences. In: In: Proc. ICPR, pp 600-605

  10. Chen D, Shearer K, Bourlard H (2001) Text enhancement with asymmetric filter for video OCR. In: Proc. ICDAR, pp 192-197

  11. Clark P, Mirmehdi M (2000) Location and recovery of text on oriented surfaces. In: Proc. SPIE Document Recognition and Retrieval VII, pp 267-277

  12. Clark P, Mirmehdi M (2000) Finding text regions using localised measures. In: Proc. 11th BMVC, pp 675-684

  13. Clark P, Mirmehdi M (2001) Estimating the orientation and recovery of text planes in a single image. In: Proc. 12th BMVC, pp 421-430

  14. Clark P, Mirmehdi M (2002) On the recovery of oriented documents from single images. In: Proc. Advanced Concepts for Intelligent Vision Systems, pp 190-197

  15. Clark P, Mirmehdi M (2002) Recognizing text in real scenes. Int J Doc Anal Recog 4(4):243-257

    Google Scholar 

  16. Comelli P, Ferragina P, Granieri MN, Stabile F (1995) Optical recognition of motor vehicle license plates. IEEE Trans Vehicular Technol 44(4):790-799

    Google Scholar 

  17. Crandall D, Antani S, Kasturi R (2003) Extraction of special effects caption text events from digital video. Int J Doc Anal Recog 5(2-3):138-157

    Google Scholar 

  18. Dance CR (2002) Perspective estimation for document images. In: Proc. SPIE Document Reconition and Retrieval IX, pp 244-254

  19. Doermann D (1998) The indexing and retrieval of document images: a survey. Comput Vis Image Understand 70(3):287-298

    Google Scholar 

  20. Doermann D, Mihalcik D (2000) Tools and techniques for video performance evaluation. In: Proc. ICPR, pp 167-170

  21. Doncescu A, Bouju A, Quillet V (1997) Former books digital processing: image warping. In: Proc. workshop on document image analysis, pp 5-9

  22. Du EY, Chang C-I, Thouin PD (2002) Thresholding video images for text detection. In: Proc. 16th ICPR, 3:919-922

  23. Elad M, Feuer A (1997) Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images. IEEE Trans Image Process 6(12):1646-1658

    Google Scholar 

  24. Etemad K, Doermann DS, Chellappa R (1997) Multiscale segmentation of unstructured document pages using soft decision integration. IEEE Trans Patt Anal Mach Intell 19(1):92-96

    Google Scholar 

  25. Fekri F, Mersereau RM, Schafer RW (2000) A generalized interpolative vector quantization method for jointly optimal quantization, interpolation, and binarization of text images. IEEE Trans Image Process 9(7):1272-1281

    Google Scholar 

  26. Fink GA, Wienencke M, Sagerer G (2001) Video-based on-line handwriting recognition. In: Proc. ICDAR, pp 226-230

  27. Fisher F (2001) Digital camera for document acquisition. In: Proc. symposium on document image understanding technology, pp 75-83

  28. Fletcher LA, Kastury R (1988) A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans Pattern Anal Mach Intell 10(6):910-918

    Google Scholar 

  29. Gargi U, Crandall D, Antani S, Gandhi T, Keener R, Kasturi R (1999) A system for automatic text detection in video. In: Proc. ICDAR, pp 29-32

  30. Geist J, Wilkinson RA, Janet S, Grother PJ, Hammond B, Larsen NW, Klear RM, Burges CJC, Creecy R, Hull JJ, Vogl TP, Wilson CL (1994) The second census optical character recognition systems conference. Technical Report NISTIR 5452, June 1994

  31. Gotoh T, Toriu T, Sasaki S, Yoshida M (1988) A flexible vision-based algorithm for a book sorting system. IEEE Trans Pattern Anal Mach Intell 10(3):393-399

    Google Scholar 

  32. Haralik RM (1994) Document image understanding: geometric and logical layout. In: Proc. CVPR, pp 385-390

  33. Hasan YMY, Karam LJ (2000) Morphological text extraction from images. IEEE Trans Image Process 9(11):1079-1983

    Google Scholar 

  34. Hsieh J-W, Yu S-H, Chen Y-S (2002) Morphology-based license plate detection from complex scenes. In: Proc. ICPR, pp 176-179

  35. Hua X-S, Chen X-R, Liu W-Y, Zhang H-J (2001) Automatic location of text in video frames. In: Proc. ACM workshop on multimedia: multimedia information retrieval, pp 24-27

  36. Hua X-S, Liu W, Zhang H-J (2001) Automatic performance evaluation for video text detection. In: Proc. ICDAR, pp 545-550

  37. Irani M, Peleg S (1991) Improving resolution by image registration. CVGIP Graphical Models and Image Processing 53(3):231-239

    Google Scholar 

  38. Jain AK, Yu B (1998) Automatic text location in images and video frames. Pattern Recog 31(12):2055-2076

    Google Scholar 

  39. Jiang WWC (1995) Thresholding and enhancement of text images for character recognition. In: Proc. IEEE international conference on acoustics, speech, and signal processing, 4:2395-2398

  40. Jung K, Kim KI, Han J-H (2002) Text extraction in real scene images on planar planes. In: Proc. ICPR, pp 469-472

  41. Jung K, Kim KI, Kurata T, Kourogi M, Han J-H (2002) Text scanner with text detection technology on image sequences. In: Proc. 16th ICPR, 3:473-476

  42. Kamada H, Fujimoto K (1999) High-speed, high-accuracy binarization method for recognizing text in images of low spatial resolutions. In: Proc. ICDAR, pp 139-142

  43. Kanungo T, Haralick RM, Phillips I (1993) Global and local document degradation models. In: Proc. ICDAR, pp 730-734

  44. Kim H-K (1996) Efficient automatic text location method and content-based indexing and structuring of video database. J Vis Commun Image Represent 7(4):336-344

    Google Scholar 

  45. Kim S, Kim D, Ryu Y, Kim G (2002) A robust license-plate extraction method under complex image conditions. In: Proc. ICPR, pp 216-219

  46. Kuo S-S, Ranganath MV (1995) Real time image enhancement for both text and color photo images. In: Proc. ICIP, 1:159-162

  47. Kurakake S, Kuwano H, Odaka K (1997) Recognition and visual feature matching of text region in video for conceptual indexing. In: Proc. SPIE Storage and Retrieval for Image and Video Databases V, San Jose, CA, 3022:368-379

  48. Kuwano H, Taniguchi Y, Arai H, Mori M, Kurakake S, Kojima H (2000) Telop-on-demand: video structuring and retrieval based on text recognition. In: Proc. IEEE ICME, New York, pp 759-762

  49. Lee C-M, Kankanhalli A (1995) Automatic extraction of characters in complex scene images. Int J Pattern Recog Artif Intell 9(1):67-82

    Google Scholar 

  50. Li C, Ding X-Q, Wu Y-S (2001) Automatic text location in natural scene images. In: Proc. ICDAR, pp 1069-1073

  51. Li J, Gray RM (1998) Text and picture segmentation by the distribution analysis of wavelet coefficients. In: Proc. ICIP, 3:790-794

  52. Li H, Kia O, Doermann D (1999) Text enhancement in digital video. In: Proc. 8th ACM conference on information and knowledge management, pp 122-130

  53. Li H, Doermann D (1999) Text enhancement in digital video using multiple frame integration. In: Proc. ACM international multimedia conference, pp 19-22

  54. Li H, Doermann D (2000) A video text detection system based on automated training. In: Proc. ICPR, pp 223-226

  55. Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(1):147-167

    Google Scholar 

  56. Lienhart R, Stuber F (1996) Automatic text recognition in digital videos. In: Proc. SPIE Image and Video Processing IV, 2666:180-188

  57. Lienhart R, Effelsberg W (2000) Automatic text segmentation and text recognition for video indexing. ACM Multimedia Syst 8:69-81

    Google Scholar 

  58. Lienhart R, Wernicle A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256-268

    Google Scholar 

  59. Lopresti D, Zhou J-Y (2000) Locating and recognizing text in WWW images. Inf Retrieval 2:177-206

    Google Scholar 

  60. Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competition. In: Proc. ICDAR, pp 682-687

  61. Mao S, Kanungo T (2001) Empirical performance evaluation methodology and its application to page segmentation algorithms. IEEE Trans Pattern Anal Mach Intell 23(3):242-256

    Google Scholar 

  62. Margner VF, Karcher P, Pawlowski A-K (1997) On benchmarking of document analysis systems. In: Proc. ICDAR, pp 331-336

  63. Mariano VY, Min J, Park J-H, Kasturi R, Mihalcik D, Li H, Doermann D, Drayer T (2002) Performance evaluation of object detection algorithm. In: Proc. ICPR, pp 965-969

  64. Myers GK (2003) Metrics for evaluating the performance of video text recognition systems. In: Proc. symposium on document image understanding technology, pp 259-263

  65. Messalodi S, Modena CM (1999) Automatic identification and skew estimation of text lines in real scene images. Pattern Recog 32(5):791-810

    Google Scholar 

  66. Miene A, Hermes Th, Ioannidis G (2001) Extracting textual inserts from digital videos. In: Proc. ICDAR, pp 1079-1083

  67. Mirmehdi M, Palmer PL, Kittler J (1997) Towards optimal zoom for automatic target recognition. In: Proc. 10th Scandinavian conference on image analysis, 1:447-453

  68. Mirmehdi M, Clark P, Lam J (2001) Extracting low resolution text with an active camera for OCR. In: Proc. IX Spanish symposium on pattern recognition and image processing, pp 43-48

  69. Moravec KLC (2002) A grayscale reader for camera images of XEROX dataglyphs. In: Proc. 13th BMVC, pp 698-707

  70. Munich ME, Perona P (2002) Visual input for pen-based computers. IEEE Trans Pattern Anal Mach Intell 24(3):313-328

    Google Scholar 

  71. Myers GK, Bolles RC, Luong Q-T, Herson JA (2001) Recognition of text in 3-D scenes. In: Proc. symposium on document image understanding technology, pp 85-99

  72. Nagy G (2000) Twenty years of document image analysis research in PAMI. IEEE Trans Pattern Anal Mach Intell 22(1):63-84

    Google Scholar 

  73. Nartker TA, Rice SV (1994) OCR accuracy: UNLV’s second annual test. INFORM 8(1):40-45

    Google Scholar 

  74. Nartker TA, Rice SV (1994) OCR accuracy: UNLV’s third annual test. INFORM 8(8):30-36

    Google Scholar 

  75. Newman W, Dance C, Taylor A, Taylor S, Taylor M, Aldhous T (1999) CamWorks: a video-based tool for efficient capture from paper source documents. In: Proc. international conference on multimedia computing and systems, pp 647-653

  76. Ohya J, Shio A, Akamatsu S (1994) Recognizing characters in scene images. IEEE Trans Pattern Anal Mach Intell 16(2):214-220

    Google Scholar 

  77. Pilu M (2001) Undoing paper curl distortion using applicable surfaces. In: Proc. CVPR, pp 67-72

  78. Pilu M (2001) Extraction of illusory linear clues in perspectively skewed documents. In: Proc. CVPR, pp 363-368

  79. Pilu M, Pollard S (2002) A light-weight text image processing method for handheld embedded cameras. In: Proc. BMVC, pp 547-556

  80. Pilu M, Isgro F (2002) A fast and reliable planar registration method with applications to document stitching. In: Proc. BMVC, pp 688-697

  81. Plamondon R, Srihari S (2000) On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans Pattern Anal Mach Intell 22(1):63-84

    Google Scholar 

  82. Rice SV, Jenkins FR, Nartker TA (1995) The fourth annual test of OCR accuracy. Technical Report 95-04, Information Science Research Institute, University of Nevada, Las Vegas

  83. Rice SV, Jenkins FR, Nartker TA (1996) The fifth annual test of OCR accuracy. Technical Report 96-02, Information Science Research Institute, University of Nevada, Las Vegas

  84. Rother C (2000) A new approach for vanishing point detection in architectural environments. In: Proc. 11th BMVC, pp 382-391

  85. Sato T, Kanade T, Hughes EK, Smith MA (1998) Video OCR for digital news archive. In: Proc. IEEE workshop on content-based access of image and video database, pp 52-60

  86. Seeger M, Dance C (2001) Binarising camera images for OCR. In: Proc. ICDAR, pp 54-59

  87. Shim J-C, Dorai C, Bolle R (1998) Automatic text extraction from video for content-based annotation and retrieval. In: Proc. ICPR, pp 618-620

  88. Smeaton AF, P, Over: (2002) The TREC-2002 video track report. In: Proc. TREC

  89. Stafford-Fraser Q, Robinson P (1996) BrightBoard: a video-augmented environment. In: Proc. conference on computer human interface, pp 134-141

  90. Suen H-M, Wang J-F (1996) Text string extraction from images of colour-printed documents. IEE Proc Vis Image Signal Process 143(4):210-216

    Google Scholar 

  91. Taylor MJ, Dance CR (1998) Enhancement of document images from cameras. In: Proc. SPIE: Document Recognition V, pp 230-241

  92. Trier OD, Taxt T (1995) Evaluation of binarization methods for document images. IEEE Trans Pattern Anal Mach Intell 17(3):312-315

    Google Scholar 

  93. Vinciarelli A (2002) A Survey on off-line word recognition. Pattern Recogn 35:1433-1446

    Google Scholar 

  94. Wang H, Kangas J (2001) Character-like region verification for extracting text in scene images. In: Proc. ICDAR, pp 957-962

  95. Watanabe Y, Okada Y, Kim Y-B, Takeda T (1998) Translation camera. In: Proc. 14th ICPR, pp 613-617

  96. Wellner P (1993) Interacting with paper on the DigitalDesk. Commun ACM 36(7):87-96

    Google Scholar 

  97. Wienecke M, Fink GA, Sagerer G (2003) Towards automatic video-based whiteboard reading. In: Proc. ICDAR, pp 87-91

  98. Wilkinson RA, Geist J, Janet S, Grother PJ, Burges CJC, Creecy R, Hammond B, Hull JJ, Larsen NJ, Vogle TP, Wilson CL (1992) The first optical character recognition systems conference. Technical Report NISTIR 4912, August 1992

  99. Wolf C, Doermann D (2002) Binarization of low quality text using a markov random field model. In: Proc. ICPR, 3:160-163

  100. Wolf C, Jolion J-M, Chassaing F (2002) Text localization, enhancement and binarization in multimedia documents. In: Proc. ICPR, 4:1037-1040

  101. Wolf C (2003) Text detection in images taken from video sequences for semantic indexing. PhD thesis, Institut National de Sciences Appliquées de Lyon, France

  102. Wong EK, Chen M-Y () A robust algorithm for text extraction in color video. In: Proc. IEEE international conference on multimedia and expo, pp 797-800

  103. Wu V, Manmatha R, Riseman EM (1997) Finding text in images. In: Proc. 2nd ACM international conference on digital libraries, pp 3-12

  104. Wu V, Manmatha R, Riseman EM (1999) TextFinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21(11):1124-1129

    Google Scholar 

  105. Yang J, Gao J, Zhang Y, Waibel A (2001) Towards automatic sign translation. In: Proc. Human Language Technology

  106. Yang J, Gao J, Zhang Y, Chen X, Waibel A (2001) An automatic sign recognition and translation system. In: Proc. workshop on perceptive user interfaces (PUI’01)

  107. Zandifar A, Duraiswami R, Chahine A, Davis L (2002) A video based interface to textual information for the visually impaired. In: Proc. IEEE 4th international conference on multimodal interfaces, pp 325-330

  108. Zappala A, Gee A, Taylor M (1999) Document mosaicing. Image Vis Comput 17(8):585-595

    Google Scholar 

  109. Zhang D, Rajendran RK, Chang S-F (2002) General and domain-specific techniques for detecting and recognizing superimposed text in video. In: Proc. ICIP, 1:593-596

  110. Zhang J, Chen X-L, Hanneman A, Yang J, Waibel A (2002) A robust approach for recognition of text embedded in natural scenes. In: Proc. ICPR, pp 204-207

  111. Zhang Z, Tan CL (2001) Restoration of images scanned from thick bound documents. In: Proc. ICIP, pp 1074-1077

  112. Zhang Z, Tan CL (2003) Correcting document image warping based on regression of curved text lines. In: Proc. ICDAR, pp 589-593

  113. Zhang Z, Tan CL, Fan L (2004) Estimation of 3D shape of warped document surface for image restoration. In: Proc. ICPR

  114. Zhang Z, Tan CL, Fan L (2004) Restoration of curved document images through 3D shape modeling. In: Proc. CVPR, pp 10-15

  115. Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. In: Proc. ICDAR, pp 146-149

  116. Zhong Y, Zhang H, Jain AK (2000) Automatic caption localization in compressed video. IEEE Trans Pattern Anal Mach Intell 22(4):385-392

    Google Scholar 

  117. Zunino R, Rovetta S (2000) Vector quantization for license-plate location and image coding. IEEE Trans Indust Electr 47(1):159-167

    Google Scholar 

  118. http://www.hpl.hp.com/news/2002/apr-jun/translator.html

  119. http://www.htsol.com/Products/SeeCar.html

  120. http://fire.relarn.ru/personal/andrey/cobra/

  121. http://www.roadtraffic-technology.com/contractors/ detection/perceptics2/

  122. http://www.4digitalbooks.com/products.htm

  123. http://donswa.home.pipiline.com/ nytimes.digitizing.html

  124. http://sourceforge.net/projects/viper-toolkit/

  125. http://www-nlpir.nist.gov/projects/t01v/

  126. http://www.casioprojector.com/yc400\_overview.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Liang.

Additional information

Received: 18 December 2003, Accepted: 1 November 2004, Published online: 21 June 2005

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, J., Doermann, D. & Li, H. Camera-based analysis of text and documents: a survey. IJDAR 7, 84–104 (2005). https://doi.org/10.1007/s10032-004-0138-z

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-004-0138-z

Keywords

Navigation