Skip to main content
Log in

A simple and effective table detection system from document images

  • Regular Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

The requirement of detection and identification of tables from document images is crucial to any document image analysis and digital library system. In this paper we report a very simple but extremely powerful approach to detect tables present in document pages. The algorithm relies on the observation that the tables have distinct columns which implies that gaps between the fields are substantially larger than the gaps between the words in text lines. This deceptively simple observation has led to the design of a simple but powerful table detection system with low computation cost. Moreover, mathematical foundation of the approach is also established including formation of a regular expression for ease of implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Baird, H.S.: Digital libraries and document image analysis. In: Proceedings of 7th Internatiolal Conference on Document Image Anlysis; vol. 1, pp. 2–14. IEEE Computer Society Los Alamitos, California (2003)

  2. Belaid, Y., Panchevre, J.L., Belaid A.: Form analysisi by neural classification of cells. In: Proceedings of 3rd IAPR Workshop on Document Analysis Systems (DAS'98), pp. 69–78. Nagano, Japan (1998)

  3. Chandran, S., Balasubramanian, S., Gandhi, T., Prasad, A., Kasturi, R., Chhabra, A.: Structure recognition and information extraction from tabular documents. IJIST 7(4), 289–303 (1996)

    Article  Google Scholar 

  4. Chowdhury, S.P., Mandal,S., Das, A.K., Chanda, B.: Automated segmentation of math-zones from document images. In: 7th International Conference on Document Analysis and Recognition, vol. 2, pp. 755–759. Edinburgh, UK (2003)

  5. Das, A.K.: Document image segmentation: a morphological approach. PhD thesis, Bengal Engineering College (Deemed University), Sibpur, India (1998)

  6. Das, A.K., Chanda, B.: Text segmentation from document images: a morphological approach. J Institute Eng. 1(77), 50–56 (1996)

    Google Scholar 

  7. Das, A.K., Chanda, B.: Detection of tables and headings from document image: a morphological approach. In: International Conference on Computational linguistics, Speech and Document Processing (ICCLSDP'98), pp. A57–A64. Calcutta, India, (1998)

  8. Das, A.K., Chanda, B.: A fast algorithm for skew detection of document images using morphology. Int. J. Doc. Anal. Recog. 4, 109–114 (2001)

    Article  Google Scholar 

  9. Gonzalez, R.C., Wood, R.: Digital Image Processing. Addision-Wesley, Reading, MA (1992)

  10. Hu, J., Kashi, R., Lopresti, D., Wilfong, G.: Medium-independent table detection. In: SPIE Document Recognition and Retrieval VII, pp. 291–302. San Jose, CA (2000)

  11. Itonori, K.: Table structure recognition based on textblock arrangement and ruled Line position. In: Proceedings of ICDAR, pp. 765–768 (1993)

  12. Joseph, S.H.: Processing of engineering line drawings for automatic input to cad. Pattern Recog. 22, 1–11 (1989)

    Article  Google Scholar 

  13. Katsura, E., Takasu, A., Hara, S., Aizawa, A.: Design considerations for capturing an electronic library. Inf. Serv. Use, pp. 99–112 (1992)

  14. Kieninger, T.G.: Table structure recognition based on robust block segmentation. In: Proceedings Document Recognition V, SPIE, vol. 3305, pp. 22–32. San Jose, California (1998)

  15. Liu, J., Wu, X.: Description and recognition of form and automated form data entry. In: Proceedings of 3rd International Conference on Document Analysis and Recognition (ICDAR'95), pp. 579–582 (1995)

  16. Otsu, N.: A threshold selection method from gray-level histogram. IEEE Trans. SMC 9(1), 62–66 (1979)

    MathSciNet  Google Scholar 

  17. Ramel, J.-Y., Crucianu, M., Vincent, N., Faure, C.: Detection, extraction and representation of tables. In: 7th International Conference on Document Analysis and Recognition, vol. 1, pp. 374–378. Edinburgh, UK (2003)

  18. Satoh, S., Takasu, A., Katsura, E.: An automated generation of electronic library based on document image understanding. In: Proceedings of ICDAR 1995, pp. 163–166 (1995)

  19. Tanaka, T., Tsuruoka, S.: Table form document understanding using node classification method and html document generation. In: Proceedings of 3rd IAPR Workshop on Document Analysis Systems (DAS '98), pp. 157–158. Nagano, Japan (1998)

  20. Tersteegen, W.T., Wenzel, C.: Scantab: table recognition by reference tables. In: Proceedings of 3rd IAPR workshop on Document Analysis Systems (DAS'98), pp. 356–365. Nagano, Japan (1998)

  21. Tsuruoka, S., Takao, K., Tanaka, T., Yoshikawa, T., Shinogi, T.: Region segmentation for table image with unknown complex structure. In: Proceedings of ICDAR'2001, pp. 709–713 (2001)

  22. Watanabe, T., Luo, Q.L., Sugie, N.: Layout recognition of multi-kinds of table-form documents. IEEE Trans. on Pattern Anal. and Machine Intell. 17(4), 432–446 (1995)

    Article  Google Scholar 

  23. Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: models, observations, transformations, and inferenences. IJDAR 7(1), 1–16 (2004)

    Article  Google Scholar 

  24. Zuyev, K.: Table image image segmentation. In: Proceedings of ICDAR'1997, pp. 705–707. Ulm, Germany (1997)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Mandal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mandal, S., Chowdhury, S.P., Das, A.K. et al. A simple and effective table detection system from document images. IJDAR 8, 172–182 (2006). https://doi.org/10.1007/s10032-005-0006-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-005-0006-5

Keywords

Navigation