Abstract
The requirement of detection and identification of tables from document images is crucial to any document image analysis and digital library system. In this paper we report a very simple but extremely powerful approach to detect tables present in document pages. The algorithm relies on the observation that the tables have distinct columns which implies that gaps between the fields are substantially larger than the gaps between the words in text lines. This deceptively simple observation has led to the design of a simple but powerful table detection system with low computation cost. Moreover, mathematical foundation of the approach is also established including formation of a regular expression for ease of implementation.
Similar content being viewed by others
References
Baird, H.S.: Digital libraries and document image analysis. In: Proceedings of 7th Internatiolal Conference on Document Image Anlysis; vol. 1, pp. 2–14. IEEE Computer Society Los Alamitos, California (2003)
Belaid, Y., Panchevre, J.L., Belaid A.: Form analysisi by neural classification of cells. In: Proceedings of 3rd IAPR Workshop on Document Analysis Systems (DAS'98), pp. 69–78. Nagano, Japan (1998)
Chandran, S., Balasubramanian, S., Gandhi, T., Prasad, A., Kasturi, R., Chhabra, A.: Structure recognition and information extraction from tabular documents. IJIST 7(4), 289–303 (1996)
Chowdhury, S.P., Mandal,S., Das, A.K., Chanda, B.: Automated segmentation of math-zones from document images. In: 7th International Conference on Document Analysis and Recognition, vol. 2, pp. 755–759. Edinburgh, UK (2003)
Das, A.K.: Document image segmentation: a morphological approach. PhD thesis, Bengal Engineering College (Deemed University), Sibpur, India (1998)
Das, A.K., Chanda, B.: Text segmentation from document images: a morphological approach. J Institute Eng. 1(77), 50–56 (1996)
Das, A.K., Chanda, B.: Detection of tables and headings from document image: a morphological approach. In: International Conference on Computational linguistics, Speech and Document Processing (ICCLSDP'98), pp. A57–A64. Calcutta, India, (1998)
Das, A.K., Chanda, B.: A fast algorithm for skew detection of document images using morphology. Int. J. Doc. Anal. Recog. 4, 109–114 (2001)
Gonzalez, R.C., Wood, R.: Digital Image Processing. Addision-Wesley, Reading, MA (1992)
Hu, J., Kashi, R., Lopresti, D., Wilfong, G.: Medium-independent table detection. In: SPIE Document Recognition and Retrieval VII, pp. 291–302. San Jose, CA (2000)
Itonori, K.: Table structure recognition based on textblock arrangement and ruled Line position. In: Proceedings of ICDAR, pp. 765–768 (1993)
Joseph, S.H.: Processing of engineering line drawings for automatic input to cad. Pattern Recog. 22, 1–11 (1989)
Katsura, E., Takasu, A., Hara, S., Aizawa, A.: Design considerations for capturing an electronic library. Inf. Serv. Use, pp. 99–112 (1992)
Kieninger, T.G.: Table structure recognition based on robust block segmentation. In: Proceedings Document Recognition V, SPIE, vol. 3305, pp. 22–32. San Jose, California (1998)
Liu, J., Wu, X.: Description and recognition of form and automated form data entry. In: Proceedings of 3rd International Conference on Document Analysis and Recognition (ICDAR'95), pp. 579–582 (1995)
Otsu, N.: A threshold selection method from gray-level histogram. IEEE Trans. SMC 9(1), 62–66 (1979)
Ramel, J.-Y., Crucianu, M., Vincent, N., Faure, C.: Detection, extraction and representation of tables. In: 7th International Conference on Document Analysis and Recognition, vol. 1, pp. 374–378. Edinburgh, UK (2003)
Satoh, S., Takasu, A., Katsura, E.: An automated generation of electronic library based on document image understanding. In: Proceedings of ICDAR 1995, pp. 163–166 (1995)
Tanaka, T., Tsuruoka, S.: Table form document understanding using node classification method and html document generation. In: Proceedings of 3rd IAPR Workshop on Document Analysis Systems (DAS '98), pp. 157–158. Nagano, Japan (1998)
Tersteegen, W.T., Wenzel, C.: Scantab: table recognition by reference tables. In: Proceedings of 3rd IAPR workshop on Document Analysis Systems (DAS'98), pp. 356–365. Nagano, Japan (1998)
Tsuruoka, S., Takao, K., Tanaka, T., Yoshikawa, T., Shinogi, T.: Region segmentation for table image with unknown complex structure. In: Proceedings of ICDAR'2001, pp. 709–713 (2001)
Watanabe, T., Luo, Q.L., Sugie, N.: Layout recognition of multi-kinds of table-form documents. IEEE Trans. on Pattern Anal. and Machine Intell. 17(4), 432–446 (1995)
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: models, observations, transformations, and inferenences. IJDAR 7(1), 1–16 (2004)
Zuyev, K.: Table image image segmentation. In: Proceedings of ICDAR'1997, pp. 705–707. Ulm, Germany (1997)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mandal, S., Chowdhury, S.P., Das, A.K. et al. A simple and effective table detection system from document images. IJDAR 8, 172–182 (2006). https://doi.org/10.1007/s10032-005-0006-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-005-0006-5