Skip to main content

Parsing Digitized Vietnamese Paper Documents

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13052))

Abstract

In recent years, the need to exploit digitized document data has been increasing. In this paper, we address the problem of parsing digitized Vietnamese paper documents. The digitized Vietnamese documents are mainly in the form of scanned images with diverse layouts and special characters introducing many challenges. To this end, we first collect the UIT-DODV dataset, a novel Vietnamese document image dataset that includes scientific papers in Vietnamese derived from different scientific conferences. We compile both images that were converted from PDF and scanned by a smartphone in addition a physical scanner that poses many new challenges. Additionally, we further leverage the state-of-the-art object detector along with the fused loss function to efficiently parse the Vietnamese paper documents. Extensive experiments conducted on the UIT-DODV dataset provide a comprehensive evaluation and insightful analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    UIT-DODV published at https://uit-together.github.io/datasets/.

  2. 2.

    https://github.com/DevashishPrasad/CascadeTabNet.

References

  1. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection (2020)

    Google Scholar 

  2. Cesarini, F., Marinai, S., Sarti, L., Soda, G.: Trainable table location in document images. In: Object Recognition Supported by User Interaction for Service Robots, vol. 3, pp. 236–240 (2002)

    Google Scholar 

  3. Etemad, K., Doermann, D., Chellappa, R.: Multiscale segmentation of unstructured document pages using soft decision integration. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 92–96 (1997)

    Article  Google Scholar 

  4. Gao, L., et al.: ICDAR 2019 competition on table detection and recognition (ctdar). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019)

    Google Scholar 

  5. Gatos, B., Danatsas, D., Pratikakis, I., Perantonis, S.J.: Automatic table detection in document images. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005, Part I. LNCS, vol. 3686, pp. 609–618. Springer, Heidelberg (2005). https://doi.org/10.1007/11551188_67

    Chapter  Google Scholar 

  6. Ha, J., Phillips, I., Haralick, R.: Document page decomposition using bounding boxes of connected components of black pixels. In: Proceedings of SPIE - The International Society for Optical Engineering (March 1995)

    Google Scholar 

  7. Huang, Y., et al.: A YOLO-based table detection method. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 813–818. IEEE (2019)

    Google Scholar 

  8. Kim, K., Lee, H.S.: Probabilistic anchor assignment with IoU prediction for object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XXV. LNCS, vol. 12370, pp. 355–371. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_22

    Chapter  Google Scholar 

  9. Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: table benchmark for image-based table detection and recognition. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 1918–1925 (2020)

    Google Scholar 

  10. Li, X., Yin, F., Liu, C.: Page object detection from pdf document images by deep structured prediction and supervised clustering. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3627–3632 (2018)

    Google Scholar 

  11. Nguyen, T.V., Zhao, Q., Yan, S.: Attentive systems: a survey. Int. J. Comput. Vis. 126(1), 86–110 (2018)

    Article  Google Scholar 

  12. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents (2020)

    Google Scholar 

  13. Ren, S., He, K., Girshick, R., Sun, J.: Towards real-time object detection with region proposal networks. Faster R-CNN (2016)

    Google Scholar 

  14. Sauvola, J., Pietikäinen, M.: Page segmentation and classification using fast feature extraction and connectivity analysis, vol. 2, pp. 1127–1131 (September 1995). ISBN 0-8186-7128-9

    Google Scholar 

  15. Sun, N., Zhu, Y., Hu, X., et al.: Table detection using boundary refining via corner locating. In: Lin, Z. (ed.) PRCV 2019, Part I. LNCS, vol. 11857, pp. 135–146. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31654-9_12

    Chapter  Google Scholar 

  16. Vo, N.D., Nguyen, K., Nguyen, T.V., Nguyen, K.: Ensemble of deep object detectors for page object detection. In: Proceedings of the 12th International Conference on Ubiquitous Information Management and Communication, pp. 1–6 (2018)

    Google Scholar 

  17. Zhong, X., Tang, J., Jimeno Yepes, A.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022 (2019)

    Google Scholar 

Download references

Acknowledgment

The research team would like to express our sincere thanks to the Multimedia Communications Laboratory (MMLab) - University of Information Technology, VNU-HCM for supporting this research. We want to thank Can Tho University Journal of Science for the assistance in the data collection. This project is partially funded under National Science Foundation (NSF) under Grant No. 2025234 and Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number DSC2021-26-03.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Linh Truong Dieu , Thuan Trong Nguyen , Nguyen D. Vo , Tam V. Nguyen or Khang Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dieu, L.T., Nguyen, T.T., Vo, N.D., Nguyen, T.V., Nguyen, K. (2021). Parsing Digitized Vietnamese Paper Documents. In: Tsapatsoulis, N., Panayides, A., Theocharides, T., Lanitis, A., Pattichis, C., Vento, M. (eds) Computer Analysis of Images and Patterns. CAIP 2021. Lecture Notes in Computer Science(), vol 13052. Springer, Cham. https://doi.org/10.1007/978-3-030-89128-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89128-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89127-5

  • Online ISBN: 978-3-030-89128-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics