skip to main content
10.1145/2037342.2037352acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article

Automatic indexing of French handwritten census registers for probate geneaology

Authors Info & Claims
Published:16 September 2011Publication History

ABSTRACT

This paper describes the complete indexing process of the registers of a French census dating back to more than a hundred years, from image analysis to the integration into the information system, in the context of probate genealogy. The documents of interest are composed of a table of personal information in which the cells containing the first name, the surname and the relation to head of household must be extracted and recognized. More than 30 millions of cells were processed and their content either directly integrated into the information system or sent to keyers for manual validation, allowing an automation rate at 80% while keeping the error rate below 15% on average. Based on this project, we have started the development of a generic platform for table-based historical documents processing including new functionalities and a more generic and user-friendly table model definition interface.

References

  1. M. Bulacu, R. Van Koert, L. Schomaker, and T. Van Der Zant. Layout Analysis of Handwritten Historical Documents for Searching the Archive of the Cabinet of the Dutch Queen. In Proc. of the Int. Conf. on Document Analysis and Recognition, volume 1, pages 357--361, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Coüasnon. DMOS, a generic document recognition method: application to table structure analysis in a general and in a specific way. International Journal on Document Analysis and Recognition, 8(2-3):111--122, Mar. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  3. N. Gorski, V. Anisimov, E. Augustin, O. Baret, and S. Maximov. Industrial bank check processing: the A2iA CheckReader. International Journal on Document Analysis and Recognition, pages 196--206, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  4. K. Laven, S. Leishman, and S. Roweis. A statistical learning approach to document image analysis. In Proc. of the Int. Conf. on Document Analysis and Recognition, ICDAR '05, pages 357--361, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Likforman-Sulem, A. Hanimyan, and C. Faure. A Hough based algorithm for extracting text lines in handwritten documents. Proceedings of 3rd International Conference on Document Analysis and Recognition, 2:774--777, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Lopresti and G. Nagy. A tabular survey of automated table processing. In International Workshop on Graphics Recognition, volume 1941, page 93. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Manmatha and T. M. Rath. Indexing of Handwritten Historical Documents - Recent Progress. In Proc. of the Symposium on Document Image Understanding Technology, pages 77--85, 2003.Google ScholarGoogle Scholar
  8. I. Martinat, B. Coüasnon, and J. Camillerapp. An Adaptative Recognition System Using a Table Description Language for Hierarchical Table Structures in Archival Documents, volume 5046 of Lecture Notes in Computer Science, pages 9--20. Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Niblack. An Introduction to Digital Image Processing. Englewood Cliffs, N. J.: Prentice Hall, pages 115--116, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Nielson and W. Barrett. Consensus-based table form recognition of low-quality historical documents. International Journal on Document Analysis and Recognition, 8(2-3):183--200, Feb. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. Serra. Image Analysis and Mathematical Morphology. Academic Press, Inc., Orlando, FL, USA, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Soille. Morphological Image Analysis: Principles and Applications, 2 edition. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Wolf and J.-M. Jolion. Extraction and recognition of artificial text in multimedia documents. Pattern Anal. Appl., 6(4):309--326, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Zanibbi, D. Blostein, and J. R. Cordy. A survey of table recognition: Models, Obervations Transformations, and Infrences. International Journal on Document Analysis and Recognition, 7(1):1--16, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic indexing of French handwritten census registers for probate geneaology

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      HIP '11: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
      September 2011
      195 pages
      ISBN:9781450309165
      DOI:10.1145/2037342

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 September 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate52of90submissions,58%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader