Paper
13 January 2003 Header and footer extraction by page association
Author Affiliations +
Proceedings Volume 5010, Document Recognition and Retrieval X; (2003) https://doi.org/10.1117/12.472833
Event: Electronic Imaging 2003, 2003, Santa Clara, CA, United States
Abstract
This paper introduces a robust algorithm to extract headers and footers from a variety of electronic documents, such as image files, Adobe PDF files, and files generated from OCR. Compared with the conventional methods based on the page-level layout and format, the proposed strategy considers a page in the context of neighboring pages. Through the page-association, the headers and footers in different patterns can be automatically detected without human interference or individual templates. In addition, fuzzy string match makes the method robust against OCR errors.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xiaofan Lin "Header and footer extraction by page association", Proc. SPIE 5010, Document Recognition and Retrieval X, (13 January 2003); https://doi.org/10.1117/12.472833
Lens.org Logo
CITATIONS
Cited by 13 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Computing systems

Analytical research

Image processing software

Raster graphics

Statistical analysis

Physical phenomena

Back to Top