ABSTRACT
In this paper, we present a document analysis system which is expected to extract regions of interest in greyscale document images. Collected areas are then clustered in text zones and non-text areas using geometric and texture features. The system works in two steps. Regions of interest are retrieved via cumulative gradient considerations. In classification module, we introduced some entropic heuristic. Experiments are done on the MediaTeam Document Database to show the relevance of this criteria.
- 1.N. Ahmed and K. R. Rao. Orthogonal Transforms for Digital Signal Processing. Springer Verlag, Berlin, Heidelberg, New York, 1975.]] Google ScholarDigital Library
- 2.N. Amamoto, S Torigoe, and Y. Hirogaki. Block segmentation and text area extraction of vertically/horizontally written documents. In Proceedings of thee second International Conference on Document Analysis and Recognition (ICDAR), pages 739-742, Tsukuba, Science City (Japan), 1993.]]Google ScholarCross Ref
- 3.M. Bahi. Segmentation de surfaces representees par des nuages de points non organises. PhD thesis, Universite Claude Bernard de Lyon, Juillet 1997.]]Google Scholar
- 4.Gerald Baillargeon. Introduction a l'inference statistique. Editions S.M.G., Trois Riviere, Quebec (Canada), 1992.]]Google Scholar
- 5.Abdel Belayd. Analyse et reconnaissance de documents. In Le traitement electronique du document, chapter 2, pages 11-47. ADBS Editions, Paris (France), 1994.]]Google Scholar
- 6.Abdel Belayd and Yolande Belayd. Reconnaissance des formes. Methodes et applications. Informatique, intelligence artificielle (iia). InterEdition, Paris (France), 1992.]]Google Scholar
- 7.Ph. Bolon, J.-M. Chassery, D. Domigny J.-P. Cocquerez, C. Graffigne, S. Philipp A. Montanvert, R. Zeboudj, and J. Zerubia. Analyse d'images: filtrage et segmentation. Masson, Paris, Milan, Barcelone, enseignement de la physique edition, Octobre 1995.]]Google Scholar
- 8.L. Boukined, B. Taconet, A. Zahour, and A Faure. Recherche de la structure physique d'un document imprime par rectangulation. In w Congres Reconnaissace de Formes et Intelligence Artificielle (RFIA), volume 3, pages 1027-1031, Lyon-Villeurbanne (France), Novembre 1991.]]Google Scholar
- 9.Jean-Marie Bouroche and Gilbert Saporta. L'analyse des donnees. Presses Universitaires de France, Paris (France), 1989.]]Google Scholar
- 10.Philippe Chauvet. Systemes d'analyse, reconnaissance et description de documents complexes. In w Congres Reconnaissance de Formes et Intelligence Artificielle (RFIA), volume 3, pages 1033-1044, Lyon-Villeurbanne (France), Novembre 1991.]]Google Scholar
- 11.Chi Hau Chen. Statistical Pattern Analysis. Spartan Books. Hayden Book Company, Inc., Rochelle Park, New Jersey (USA), 1973.]]Google Scholar
- 12.M. Cote, E. Lecolinet, M. Cheriet, and C. Y. Suen. Automatic reading of cursive scripts using reading model and perceptual concepts. the percepto system. International Journal on Document Analysis and Recognition (IJDAR), 1(1):3-17, 1998.]]Google Scholar
- 13.Myriam Cote. Utilisation d'un modele d'acces lexical et de concepts perceptifs pour la reconnaissance d'images de mots cursifs. PhD thesis, Ecole Nationale Superieure des Telecommonications (ENST) de Paris, 1997.]]Google Scholar
- 14.E. R. Davies. Machine Vision: Theory, Algorithms, Practicalities. Harcourt Brace Jovanovich, London, San Diego, New York, Boston, Sydney, Tokyo, academic press edition, 1990.]] Google ScholarDigital Library
- 15.Richard O. Duda and Peter E. Hart. Pattern Classification and Scene Analysis. Wiley-Interscience. John wiley and sons, 1973.]]Google Scholar
- 16.Anil K. Jain. Fundamentals of Digital Image Processing. Thomas Kailath, Prentice Hall, Englewoods Cliffs, New Jersey, USA, prentice hall information ans system sciences series edition, 1989.]] Google ScholarDigital Library
- 17.Ramesh Jain, Rangachar Kasturi, and Brian G. Schunck. Machine Vision. McGraw-Hill Inc., mcgraw-hill series in computer science edition, 1995.]] Google ScholarDigital Library
- 18.M. Krishnamoorthy, G. Nagy, S. Seth, and M. Viswanathan. Syntactic segmentation and labeling of digitalized pages from technical journals. IEEE Computer Vision, Graphics and Image Processing, 47:327-352, 1993.]]Google Scholar
- 19.Ludovic Lebart, Alain Morineau, and Marie Piron. Statistique exploratoire multidimensionnelle. Dunod, Paris (France), 2000.]]Google Scholar
- 20.Kyong-Ho Lee, Yoon-Chul Choy, and Sung-Bae Cho. Geometric structure analysis of document images: A knowledge-based approach. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22(11):1224-1240, November 2000.]] Google ScholarDigital Library
- 21.G. Nagy and S. Seth. Hierarchical representation of optically scanned documents. In 7th International Conference on Pattern Recognition (ICPR), pages 347-349, Montreal (Canada), 1984. IEEE Computer Society Press.]]Google Scholar
- 22.Lawrence O'Gorman and Rangachar Kasturi. Document Image Analysis. IEEE Computer Society Executive Briefing. IEEE Computer Society, Los Alamitos (California, USA), 1997.]] Google ScholarDigital Library
- 23.Oleg Okun, David Doermann, and Matti Pietik~inen. Page segmentation and zone classification: The state of the art, November 1999.]]Google Scholar
- 24.J. R. Parker. Algorithms for Image Processing and Computer Vision. John Wiley and Sons, Chichester, New York, Brisbane, Toronto, Singapore, Weinheim, design and measurement in electronic engineering edition, 1997.]] Google ScholarDigital Library
- 25.T. Pavlidis. Structural Pattern Recognition. Springer-Verlag, Berlin, Heidelberg, New York, springer series in electrophysics edition, 1977.]]Google Scholar
- 26.T. Pavlidis and J. Zhou. Segmentation by white streams. In International Conference on Document Analysis and Recognition (ICDAR), pages 945-953, St-Malo (France), 1991.]]Google Scholar
- 27.William K. Pratt. Digital Image Processing. John Wiley and Sons, New York, Chichester, Brisbane, Toronto, Singapore, wiley-interscience edition, 1991.]] Google ScholarDigital Library
- 28.Vincent Quint. Edition de documents structures. In Le traitement electronique du document, chapter 1, pages 11-47. ADBS Editions, Paris (France), 1994.]]Google Scholar
- 29.Henri Rouanet and Brigitte Le Roux. Analyse des Donnees Multidimensionnelles. Dunod, Paris (France), 1993.]]Google Scholar
- 30.E. Roubine. Introduction a la theorie de la communication, volume 3. Masson, Paris (France), 1970.]]Google Scholar
- 31.William C. Schefler. Statistics. Concepts and Applications. The Benjamin/Cummings Publishing Company, Inc., Menlo Park, California (USA), 1988.]] Google ScholarDigital Library
- 32.J. Serra. Image Analysis and Mathematical Morphology (vol.1). Academic Press, New York, 1982.]] Google ScholarDigital Library
- 33.J. Serra. Image Analysis and Mathematical Morphology (vol.2). Academic Press, New York, 1988.]] Google ScholarDigital Library
- 34.Souad Souafi-Bensafi, Frank Lebourgeois, and Hubert Emptoz. Modelisation et reconnaissance des structures de documents: application aux sommaires de revues. In Actes du deuxieme Colloque International Francophone sur l'Ecrit et le Document (CIFED), Lyon (France), July 3-5 2000.]]Google Scholar
- 35.Souad Souafi-Bensafi, Frank Lebourgeois, Marc Parizeau, and Hubert Emptoz. Contribution a la reconnaissance des structures logiques hierarchiques dans les documents papier. Technical report, Universite Laval (Quebec), 2000.]]Google Scholar
- 36.Y. Y. Tang, C.D. Yan, M. Cheriet, and C.Y. Suen. Automatic analysis and understanding of documents. In .H. Chen Patrick S.P. Wang and L.F. Pau, editors, Handbook of Pattern Recognition and Computer Vision. The World Scientific Publishing Co. Pte, Ltd, Singapore, 1993.]] Google ScholarDigital Library
- 37.Souad Tayeb-Bey. Analyse et conversion de documents: du pixel au langage HTML. PhD thesis, Institut National des Sciences Appliquees (INSA) de Lyon, 1998.]]Google Scholar
- 38.Ferdinand van der Heijden. Image Based Measurement Sytems. John Wiley and Sons, Chichester, New York, Brisbane, Toronto, Singapore, design and measurement in electronic engineering edition, 1994.]]Google Scholar
- 39.Kwan Y. Wong, Richard G. Casey, and Friedrich M. Wahl. Document analysis system. IBM Journal of Research and Developpment, 26(6):647-656, November 1982.]]Google ScholarDigital Library
- 40.Victor Wu and R. Manmatha. Document image clean-up and binarization. Technical report, Computer Science Department, University of Massachusetts, Amherst (Massachussetts, USA), December 1997.]]Google Scholar
- 41.Victor Wu, R. Manmatha, and Edward M. Riseman. Textfinder: An automatic system to detect and recognize text in images. Technical report, Computer Science Department, University of Massachusetts, Amherst (Massachussetts, USA), November 1997.]] Google ScholarDigital Library
- 42.Victor Wu, R. Manmatha, and Edward M. Riserman. Finding text in images. In Second ACM International Conference on Digital Libraries (DL'97), July 1997.]] Google ScholarDigital Library
- 43.Steven W. Zucker. Survey: Region growing: Childhood and adolescence. In Computer Vision, Graphics and Image Processing, volume 5, pages 382-399. Academic Press, 1976.]]Google Scholar
Index Terms
- Extraction of text areas in printed document images
Recommendations
Text region extraction from quality degraded document images
PReMI'07: Proceedings of the 2nd international conference on Pattern recognition and machine intelligenceIn this paper we present a well designed method that makes use of edge information to extract textual blocks from gray scale document images. It aims at detecting textual regions on heavy noise infected newspaper images and separate them from graphical ...
A multi-plane approach for text segmentation of complex document images
This study presents a new method, namely the multi-plane segmentation approach, for segmenting and extracting textual objects from various real-life complex document images. The proposed multi-plane segmentation approach first decomposes the document ...
Text Region Extraction from Quality Degraded Document Images
Pattern Recognition and Machine IntelligenceAbstractIn this paper we present a well designed method that makes use of edge information to extract textual blocks from gray scale document images. It aims at detecting textual regions on heavy noise infected newspaper images and separate them from ...
Comments