skip to main content
10.1145/502187.502211acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

Extraction of text areas in printed document images

Authors Info & Claims
Published:09 November 2001Publication History

ABSTRACT

In this paper, we present a document analysis system which is expected to extract regions of interest in greyscale document images. Collected areas are then clustered in text zones and non-text areas using geometric and texture features. The system works in two steps. Regions of interest are retrieved via cumulative gradient considerations. In classification module, we introduced some entropic heuristic. Experiments are done on the MediaTeam Document Database to show the relevance of this criteria.

References

  1. 1.N. Ahmed and K. R. Rao. Orthogonal Transforms for Digital Signal Processing. Springer Verlag, Berlin, Heidelberg, New York, 1975.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.N. Amamoto, S Torigoe, and Y. Hirogaki. Block segmentation and text area extraction of vertically/horizontally written documents. In Proceedings of thee second International Conference on Document Analysis and Recognition (ICDAR), pages 739-742, Tsukuba, Science City (Japan), 1993.]]Google ScholarGoogle ScholarCross RefCross Ref
  3. 3.M. Bahi. Segmentation de surfaces representees par des nuages de points non organises. PhD thesis, Universite Claude Bernard de Lyon, Juillet 1997.]]Google ScholarGoogle Scholar
  4. 4.Gerald Baillargeon. Introduction a l'inference statistique. Editions S.M.G., Trois Riviere, Quebec (Canada), 1992.]]Google ScholarGoogle Scholar
  5. 5.Abdel Belayd. Analyse et reconnaissance de documents. In Le traitement electronique du document, chapter 2, pages 11-47. ADBS Editions, Paris (France), 1994.]]Google ScholarGoogle Scholar
  6. 6.Abdel Belayd and Yolande Belayd. Reconnaissance des formes. Methodes et applications. Informatique, intelligence artificielle (iia). InterEdition, Paris (France), 1992.]]Google ScholarGoogle Scholar
  7. 7.Ph. Bolon, J.-M. Chassery, D. Domigny J.-P. Cocquerez, C. Graffigne, S. Philipp A. Montanvert, R. Zeboudj, and J. Zerubia. Analyse d'images: filtrage et segmentation. Masson, Paris, Milan, Barcelone, enseignement de la physique edition, Octobre 1995.]]Google ScholarGoogle Scholar
  8. 8.L. Boukined, B. Taconet, A. Zahour, and A Faure. Recherche de la structure physique d'un document imprime par rectangulation. In w Congres Reconnaissace de Formes et Intelligence Artificielle (RFIA), volume 3, pages 1027-1031, Lyon-Villeurbanne (France), Novembre 1991.]]Google ScholarGoogle Scholar
  9. 9.Jean-Marie Bouroche and Gilbert Saporta. L'analyse des donnees. Presses Universitaires de France, Paris (France), 1989.]]Google ScholarGoogle Scholar
  10. 10.Philippe Chauvet. Systemes d'analyse, reconnaissance et description de documents complexes. In w Congres Reconnaissance de Formes et Intelligence Artificielle (RFIA), volume 3, pages 1033-1044, Lyon-Villeurbanne (France), Novembre 1991.]]Google ScholarGoogle Scholar
  11. 11.Chi Hau Chen. Statistical Pattern Analysis. Spartan Books. Hayden Book Company, Inc., Rochelle Park, New Jersey (USA), 1973.]]Google ScholarGoogle Scholar
  12. 12.M. Cote, E. Lecolinet, M. Cheriet, and C. Y. Suen. Automatic reading of cursive scripts using reading model and perceptual concepts. the percepto system. International Journal on Document Analysis and Recognition (IJDAR), 1(1):3-17, 1998.]]Google ScholarGoogle Scholar
  13. 13.Myriam Cote. Utilisation d'un modele d'acces lexical et de concepts perceptifs pour la reconnaissance d'images de mots cursifs. PhD thesis, Ecole Nationale Superieure des Telecommonications (ENST) de Paris, 1997.]]Google ScholarGoogle Scholar
  14. 14.E. R. Davies. Machine Vision: Theory, Algorithms, Practicalities. Harcourt Brace Jovanovich, London, San Diego, New York, Boston, Sydney, Tokyo, academic press edition, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.Richard O. Duda and Peter E. Hart. Pattern Classification and Scene Analysis. Wiley-Interscience. John wiley and sons, 1973.]]Google ScholarGoogle Scholar
  16. 16.Anil K. Jain. Fundamentals of Digital Image Processing. Thomas Kailath, Prentice Hall, Englewoods Cliffs, New Jersey, USA, prentice hall information ans system sciences series edition, 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.Ramesh Jain, Rangachar Kasturi, and Brian G. Schunck. Machine Vision. McGraw-Hill Inc., mcgraw-hill series in computer science edition, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.M. Krishnamoorthy, G. Nagy, S. Seth, and M. Viswanathan. Syntactic segmentation and labeling of digitalized pages from technical journals. IEEE Computer Vision, Graphics and Image Processing, 47:327-352, 1993.]]Google ScholarGoogle Scholar
  19. 19.Ludovic Lebart, Alain Morineau, and Marie Piron. Statistique exploratoire multidimensionnelle. Dunod, Paris (France), 2000.]]Google ScholarGoogle Scholar
  20. 20.Kyong-Ho Lee, Yoon-Chul Choy, and Sung-Bae Cho. Geometric structure analysis of document images: A knowledge-based approach. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22(11):1224-1240, November 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.G. Nagy and S. Seth. Hierarchical representation of optically scanned documents. In 7th International Conference on Pattern Recognition (ICPR), pages 347-349, Montreal (Canada), 1984. IEEE Computer Society Press.]]Google ScholarGoogle Scholar
  22. 22.Lawrence O'Gorman and Rangachar Kasturi. Document Image Analysis. IEEE Computer Society Executive Briefing. IEEE Computer Society, Los Alamitos (California, USA), 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.Oleg Okun, David Doermann, and Matti Pietik~inen. Page segmentation and zone classification: The state of the art, November 1999.]]Google ScholarGoogle Scholar
  24. 24.J. R. Parker. Algorithms for Image Processing and Computer Vision. John Wiley and Sons, Chichester, New York, Brisbane, Toronto, Singapore, Weinheim, design and measurement in electronic engineering edition, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.T. Pavlidis. Structural Pattern Recognition. Springer-Verlag, Berlin, Heidelberg, New York, springer series in electrophysics edition, 1977.]]Google ScholarGoogle Scholar
  26. 26.T. Pavlidis and J. Zhou. Segmentation by white streams. In International Conference on Document Analysis and Recognition (ICDAR), pages 945-953, St-Malo (France), 1991.]]Google ScholarGoogle Scholar
  27. 27.William K. Pratt. Digital Image Processing. John Wiley and Sons, New York, Chichester, Brisbane, Toronto, Singapore, wiley-interscience edition, 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.Vincent Quint. Edition de documents structures. In Le traitement electronique du document, chapter 1, pages 11-47. ADBS Editions, Paris (France), 1994.]]Google ScholarGoogle Scholar
  29. 29.Henri Rouanet and Brigitte Le Roux. Analyse des Donnees Multidimensionnelles. Dunod, Paris (France), 1993.]]Google ScholarGoogle Scholar
  30. 30.E. Roubine. Introduction a la theorie de la communication, volume 3. Masson, Paris (France), 1970.]]Google ScholarGoogle Scholar
  31. 31.William C. Schefler. Statistics. Concepts and Applications. The Benjamin/Cummings Publishing Company, Inc., Menlo Park, California (USA), 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32.J. Serra. Image Analysis and Mathematical Morphology (vol.1). Academic Press, New York, 1982.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33.J. Serra. Image Analysis and Mathematical Morphology (vol.2). Academic Press, New York, 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. 34.Souad Souafi-Bensafi, Frank Lebourgeois, and Hubert Emptoz. Modelisation et reconnaissance des structures de documents: application aux sommaires de revues. In Actes du deuxieme Colloque International Francophone sur l'Ecrit et le Document (CIFED), Lyon (France), July 3-5 2000.]]Google ScholarGoogle Scholar
  35. 35.Souad Souafi-Bensafi, Frank Lebourgeois, Marc Parizeau, and Hubert Emptoz. Contribution a la reconnaissance des structures logiques hierarchiques dans les documents papier. Technical report, Universite Laval (Quebec), 2000.]]Google ScholarGoogle Scholar
  36. 36.Y. Y. Tang, C.D. Yan, M. Cheriet, and C.Y. Suen. Automatic analysis and understanding of documents. In .H. Chen Patrick S.P. Wang and L.F. Pau, editors, Handbook of Pattern Recognition and Computer Vision. The World Scientific Publishing Co. Pte, Ltd, Singapore, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. 37.Souad Tayeb-Bey. Analyse et conversion de documents: du pixel au langage HTML. PhD thesis, Institut National des Sciences Appliquees (INSA) de Lyon, 1998.]]Google ScholarGoogle Scholar
  38. 38.Ferdinand van der Heijden. Image Based Measurement Sytems. John Wiley and Sons, Chichester, New York, Brisbane, Toronto, Singapore, design and measurement in electronic engineering edition, 1994.]]Google ScholarGoogle Scholar
  39. 39.Kwan Y. Wong, Richard G. Casey, and Friedrich M. Wahl. Document analysis system. IBM Journal of Research and Developpment, 26(6):647-656, November 1982.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. 40.Victor Wu and R. Manmatha. Document image clean-up and binarization. Technical report, Computer Science Department, University of Massachusetts, Amherst (Massachussetts, USA), December 1997.]]Google ScholarGoogle Scholar
  41. 41.Victor Wu, R. Manmatha, and Edward M. Riseman. Textfinder: An automatic system to detect and recognize text in images. Technical report, Computer Science Department, University of Massachusetts, Amherst (Massachussetts, USA), November 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. 42.Victor Wu, R. Manmatha, and Edward M. Riserman. Finding text in images. In Second ACM International Conference on Digital Libraries (DL'97), July 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. 43.Steven W. Zucker. Survey: Region growing: Childhood and adolescence. In Computer Vision, Graphics and Image Processing, volume 5, pages 382-399. Academic Press, 1976.]]Google ScholarGoogle Scholar

Index Terms

  1. Extraction of text areas in printed document images

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              DocEng '01: Proceedings of the 2001 ACM Symposium on Document engineering
              November 2001
              174 pages
              ISBN:1581134320
              DOI:10.1145/502187

              Copyright © 2001 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 November 2001

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              DocEng '01 Paper Acceptance Rate18of55submissions,33%Overall Acceptance Rate178of537submissions,33%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader