Skip to main content

Image-Based Table Recognition: Data, Model, and Evaluation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Abstract

Important information that relates to a specific topic in a document is often organized in tabular format to assist readers with information retrieval and comparison, which may be difficult to provide in natural language. However, tabular data in unstructured digital documents, e.g. Portable Document Format (PDF) and images, are difficult to parse into structured machine-readable format, due to complexity and diversity in their structure and style. To facilitate image-based table recognition with deep learning, we develop and release the largest publicly available table recognition dataset PubTabNet (https://github.com/ibm-aur-nlp/PubTabNet.), containing 568k table images with corresponding structured HTML representation. PubTabNet is automatically generated by matching the XML and PDF representations of the scientific articles in PubMed Central Open Access Subset (PMCOA). We also propose a novel attention-based encoder-dual-decoder (EDD) architecture that converts images of tables into HTML code. The model has a structure decoder which reconstructs the table structure and helps the cell decoder to recognize cell content. In addition, we propose a new Tree-Edit-Distance-based Similarity (TEDS) metric for table recognition, which more appropriately captures multi-hop cell misalignment and OCR errors than the pre-established metric. The experiments demonstrate that the EDD model can accurately recognize complex tables solely relying on the image representation, outperforming the state-of-the-art by 9.7% absolute TEDS score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/.

  2. 2.

    https://github.com/Academic-Hammer/SciTSR.

  3. 3.

    Both cells are identical and the direction matches.

  4. 4.

    If the number of rows is greater than the number of columns, we shift the cells in the first column rightwards instead.

  5. 5.

    v1.0.4 (https://github.com/tabulapdf/tabula-java).

  6. 6.

    v1.0 (https://github.com/thoqbk/traprange).

  7. 7.

    v0.7.3 (https://github.com/camelot-dev/camelot).

  8. 8.

    v0.6.0-alpha (https://github.com/jsvine/pdfplumber).

  9. 9.

    v2019.012.20040.

  10. 10.

    WYGIWYS is trained on the same samples as EDD by truncated back-propagation through time (200 steps). WYGIWYS and EDD use the same CNN in the encoder to rule out the possibility that the performance gain of EDD is due to difference in CNN.

  11. 11.

    Tables without multi-column or multi-row cells.

  12. 12.

    https://github.com/hassan-mahmood/TIES_DataGeneration.

  13. 13.

    [26] does not describe how the adjacency relations can be converted to a unique HTML representation.

References

  1. Cesarini, F., Marinai, S., Sarti, L., Soda, G.: Trainable table location in document images. In: Object Recognition Supported by User Interaction for Service Robots, vol. 3, pp. 236–240. IEEE (2002)

    Google Scholar 

  2. Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 980–989 (2017). JMLR.org

  3. Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 894–901. IEEE, September 2019. https://doi.org/10.1109/ICDAR.2019.00166

  4. Fan, M., Kim, D.S.: Table region detection on large-scale pdf files without labeled data. CoRR, abs/1506.08891 (2015)

    Google Scholar 

  5. Fang, J., Tao, X., Tang, Z., Qiu, R., Liu, Y.: Dataset, ground-truth and performance metrics for table detection evaluation. In: 2012 10th IAPR International Workshop on Document Analysis Systems, pp. 445–449. IEEE (2012)

    Google Scholar 

  6. Gao, L., et al.: ICDAR 2019 competition on table detection and recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515. IEEE, September 2019. https://doi.org/10.1109/ICDAR.2019.00166

  7. Gatos, B., Danatsas, D., Pratikakis, I., Perantonis, S.J.: Automatic table detection in document images. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 609–618. Springer, Heidelberg (2005). https://doi.org/10.1007/11551188_67

    Chapter  Google Scholar 

  8. Gilani, A., Qasim, S.R., Malik, I., Shafait, F.: Table detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 1, pp. 771–776. IEEE (2017)

    Google Scholar 

  9. Göbel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1449–1453. IEEE (2013)

    Google Scholar 

  10. Hao, L., Gao, L., Yi, X., Tang, Z.: A table detection method for pdf documents based on convolutional neural networks. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 287–292. IEEE (2016)

    Google Scholar 

  11. He, D., Cohen, S., Price, B., Kifer, D., Giles, C.L.: Multi-scale multi-task FCN for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 254–261. IEEE (2017)

    Google Scholar 

  12. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  14. Hirayama, Y.: A method for table structure analysis using DP matching. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 2, pp. 583–586. IEEE (1995)

    Google Scholar 

  15. Hu, J., Kashi, R.S., Lopresti, D.P., Wilfong, G.: Medium-independent table detection. In: Document Recognition and Retrieval VII, vol. 3967, pp. 291–302. International Society for Optics and Photonics (1999)

    Google Scholar 

  16. Hurst, M.: A Constraint-based Approach to Table Structure Derivation (2003)

    Google Scholar 

  17. Jimeno Yepes, A., Verspoor, K.: Literature mining of genetic variants for curation: quantifying the importance of supplementary material. Database 2014 (2014)

    Google Scholar 

  18. Kasar, T., Barlas, P., Adam, S., Chatelain, C., Paquet, T.: Learning to detect tables in scanned document images using line information. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1185–1189. IEEE (2013)

    Google Scholar 

  19. Kavasidis, I., et al.: A saliency-based convolutional neural network for table and chart detection in digitized documents. In: International Conference on Image Analysis and Processing, pp. 292–302. Springer (2019)

    Google Scholar 

  20. Kieninger, T., Dengel, A.: The t-recs table recognition and analysis system. In: Lee, S.-W., Nakano, Y. (eds.) DAS 1998. LNCS, vol. 1655, pp. 255–270. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48172-9_21

    Chapter  Google Scholar 

  21. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  22. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  23. Morais, R., Le, V., Tran, T., Saha, B., Mansour, M., Venkatesh, S.: Learning regularity in skeleton trajectories for anomaly detection in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11996–12004 (2019)

    Google Scholar 

  24. Paliwal, S.S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: TableNet: deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 128–133. IEEE (2019)

    Google Scholar 

  25. Pawlik, M., Augsten, N.: Tree edit distance: robust and memory-efficient. Inf. Syst. 56, 157–173 (2016)

    Article  Google Scholar 

  26. Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking Table Recognition Using Graph Neural Networks, pp. 142–147, September 2019. https://doi.org/10.1109/ICDAR.2019.00166

  27. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  28. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  29. Riba, P., Dutta, A., Goldmann, L., Fornés, A., Ramos, O., Lladós, J.: Table detection in invoice documents by graph neural networks. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 122–127. IEEE, September 2019. https://doi.org/10.1109/ICDAR.2019.00028

  30. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167. IEEE (2017)

    Google Scholar 

  31. Shafait, F., Smith, R.: Table detection in heterogeneous documents. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 65–72. ACM (2010)

    Google Scholar 

  32. Shahab, A., Shafait, F., Kieninger, T., Dengel, A.: An open approach towards the benchmarking of table structure recognition systems. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 113–120. ACM (2010)

    Google Scholar 

  33. Siegel, N., Lourie, N., Power, R., Ammar, W.: Extracting scientific figures with distantly supervised neural networks. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 223–232. ACM (2018)

    Google Scholar 

  34. e Silva, A.C.: Learning rich hidden Markov models in document analysis: table location. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 843–847. IEEE (2009)

    Google Scholar 

  35. Staar, P.W., Dolfi, M., Auer, C., Bekas, C.: Corpus conversion service: a machine learning platform to ingest documents at scale. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 774–782. ACM (2018)

    Google Scholar 

  36. Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep splitting and merging for table structure decomposition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 114–121. IEEE (2019)

    Google Scholar 

  37. Tupaj, S., Shi, Z., Chang, C.H., Alam, H.: Extracting Tabular Information from Text Files. EECS Department, Tufts University (1996)

    Google Scholar 

  38. Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)

    Google Scholar 

  39. Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE, September 2019. https://doi.org/10.1109/ICDAR.2019.00166

  40. Zhou, Y.F., Jiang, R.H., Wu, X., He, J.Y., Weng, S., Peng, Q.: Branchgan: unsupervised mutual image-to-image transfer with a single encoder and dual decoders. In: IEEE Transactions on Multimedia (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu Zhong .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 979 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhong, X., ShafieiBavani, E., Jimeno Yepes, A. (2020). Image-Based Table Recognition: Data, Model, and Evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58589-1_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58588-4

  • Online ISBN: 978-3-030-58589-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics