DOI QR코드

DOI QR Code

Structure Recognition Method of Invoice Document Image for Document Processing Automation

문서 처리 자동화를 위한 인보이스 이미지의 구조 인식 방법

  • 이동석 (동의대학교 인공지능그랜드ICT연구센터) ;
  • 권순각 (동의대학교 컴퓨터소프트웨어공학과)
  • Received : 2023.03.27
  • Accepted : 2023.04.25
  • Published : 2023.04.30

Abstract

In this paper, we propose the methods of invoice document structure recognition and of making a spreadsheet electronic document. The texts and block location information of word blocks are recognized by an optical character recognition engine through deep learning. The word blocks on the same row and same column are found through their coordinates. The document area is divided through arrangement information of the word blocks. The character recognition result is inputted in the spreadsheet based on the document structure. In simulation result, the item placement through the proposed method shows an average accuracy of 92.30%.

본 논문은 인보이스 문서 이미지에 문서 처리 자동화를 적용하기 위한 문서 구조 인식 방법과 문서 구조 인식 결과를 토대로 스프레드문서 형태로 출력하는 방법을 제안한다. 딥러닝 OCR 엔진을 통해 문서 내 단어 블록들과 해당 블록들의 문자 인식 결과를 얻는다. 단어 블록의 위치 정보들을 통해 같은 행과 같은 열에 존재하는 단어 블록들을 검출한다. 단어 블록들의 배치 정보를 통해 문서 영역을 분할한다. 문서의 구역 정보를 통해 얻어진 문서 구조를 토대로 스프레드시트의 알맞은 위치에 문자 인식 결과를 입력한다. 실험 결과 제안된 방법을 통한 항목 배치는 평균 92.30%의 정확도를 보인다.

Keywords

Acknowledgement

본 논문은 2022년도 BB21+ 사업으로 지원되었으며, 또한 과학기술정보통신부 및 정보통신기획평가원의 지역지능화혁신 인재양성(Grand ICT연구센터) 사업의 연구결과로 수행되었음(IITP-2023-2020-0-01791).

References

  1. Cai, Z. and Vasconcelos, N. (2018). Cascade R-CNN: Delving Into High Quality Object Detection, Preceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, Salt Lake City, UT, USA, pp. 6154-6162, 2018.
  2. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020). End-to-End Object Detection with Transformers, Proceedings of the European Conference on Computer Vision, Aug. 23-28, pp. 213-229.
  3. Feng, H., Wang, Y., Zhou, W., Deng, J., and Li, H. (2021). DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction, Proceeding of ACM International Conference on Multimedia, Oct. 20-24, Chengdu, China, pp. 273-281.
  4. Harley, A. W., Ufkes, A., and Derpanis, K. G. (2015). Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval. Proceedings of the International Conference on Document Analysis and Recognition, Aug. 23-26, Tunis, Tunisia, pp. 991-995.
  5. He, T., Tian, Z., Huang, W., Shen, C., Qiao Y., and Sun, C. (2018). An End-to-End TextSpotter with Explicit Alignment and Attention, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, Salt Lake City, UT, USA, pp. 5020-5029.
  6. Kozlowski, M. and Weichbroth, P. (2021). Samples of Electronic Invoices, Mendeley Data. https://doi.org/10.17632/tnj49gpmtz.2.
  7. Lee, D. S. and Kwon, S. K. (2022). Structure Recognition Method in Various Table Types for Document Processing Automation. Journal of Korea Multimedia Society, 25(5), 695-702. https://doi.org/10.9717/kmms.2022.25.5.69
  8. Liao, M., Wan, Z., Yao, C., Chen, K., and Bai, X. (2020). Real-time Scene Text Detection with Differentiable Binarization. Proceedings of the AAAI conference on artificial intelligence, Feb. 7-12, New York, NY, USA, pp. 11474-11481
  9. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., and Sultanpure, K. (2020). CascadeTabNet: An Approach for End to End Table Detection and Structure Recognition from Image-based Documents, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, June 14-19, Seattle, Wa, USA, pp. 2439-2447.
  10. Shaoqing, R., Kaiming, H., Girshick, R., and Sun, J. (2017). Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks, IEEE Transection on Pattern Analysis and Machine Intelligence, 39(6), 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031.
  11. Shi, B., Bai, X., and Yao, C. (2016). An End-to-end Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298-2304. https://doi.org/10.1109/TPAMI.2016.2646371.
  12. Shi, B., Yang, M., Wang. X., Lyu, P., Yao, C., and Bai, X. (2019). ASTER: An Attentional Scene Text Recognizer with Flexible Rectification, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2035-2048. https://doi.org/10.1109/TPAMI.2018.2848939.
  13. Smock, B., Pesala R., and Abraham, R. (2022). PubTables-1M: Towards Comprehensive Table Extraction from Unstructured Documents, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 19-20, New Orleans, LA, USA, pp. 4624-4632.
  14. Zhong, X., Bavani, E. S., and Yepes, A. J. (2020). Image-Based Table Recognition: Data, Model, and Evaluation, Proceedings of the European Conference on Computer Vison, Aug. 23-28, pp. 564-580.