Skip to main content

Fast End-to-End Deep Learning Identity Document Detection, Classification and Cropping

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12824))

Included in the following conference series:

Abstract

The growing use of Know Your Customer online services generates a massive flow of dematerialised personal Identity Documents under variable capturing conditions and qualities (e.g. webcam, smartphone, scan, or even handcrafted pdfs). IDs are designed, depending on their issuing country/model, with a specific layout (i.e. background, photo(s), fixed/variable text fields) along with various anti-fraud features (e.g. checksums, Optical Variable Devices) which are non-trivial to analyse. This paper tackles the problem of detecting, classifying, and aligning captured documents onto their reference model. This task is essential in the process of document reading and fraud verification. However, due to the high variation of capture conditions and models’ layout, classical handcrafted approaches require deep knowledge of documents and hence are hard to maintain. A modular approach using a fully multi-stage deep learning based approach is proposed in this work. The proposed approach allows to accurately classify the document and estimates its quadrilateral (localization). As opposed to approaches relying on a single end-to-end network, the proposed modular framework offers more flexibility and a potential for future incremental learning. All networks used in this work are derivatives of recent state-of-the-art ones. Experiments show the superiority of the proposed approach in terms of speed while maintaining good accuracy, both on the MIDV-500 academic dataset and on an industrial based dataset compared to hand crafted solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Know Your Customer [16]: set of laws, certifications and regulations preventing criminal from either impersonating other people or forging false IDs.

  2. 2.

    References for local features are not exhaustively provided and can be found in [6].

References

  1. Abbas, S.A., ul Hussain, S.: Recovering homography from camera captured documents using convolutional neural networks. arXiv preprint arXiv:1709.03524 (2017)

  2. Arlazarov, V.V., et al.: MIDV-500: a dataset for identity documents analysis and recognition on mobile devices in video stream. CoRR (2018)

    Google Scholar 

  3. Attivissimo, F., et al.: An automatic reader of identity documents. In: Systems, Man and Cybernetics (SMC). IEEE (2019)

    Google Scholar 

  4. Awal, A.M., et al.: Complex document classification and localization application on identity document images. In: 14th IAPR International Conference on Document Analysis and Recognition, pp. 426–431 (2017)

    Google Scholar 

  5. Bandyopadhyay, H., et al.: A gated and bifurcated stacked U-Net module for document image dewarping (2020). arXiv:2007.09824 [cs.CV]

  6. Bojanić, D., et al.: On the comparison of classic and deep keypoint detector and descriptor methods. In: 11th International Symposium on Image and Signal Processing and Analysis (ISPA), pp. 64–69. IEEE (2019)

    Google Scholar 

  7. Bulatov, K., et al.: MIDV-2019: challenges of the modern mobile based document OCR. In: ICMV 2019, vol. 11433 (2020)

    Google Scholar 

  8. Burie, J.-C., et al.: ICDAR2015 competition on smartphone document capture and OCR (SmartDoc). In: 13th International Conference on Document Analysis and Recognition, pp. 1161–1165. IEEE (2015)

    Google Scholar 

  9. Castelblanco, A., Solano, J., Lopez, C., Rivera, E., Tengana, L., Ochoa, M.: Machine learning techniques for identity document verification in uncontrolled environments: a case study. In: Figueroa Mora, K.M., Anzurez Marín, J., Cerda, J., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera-López, J.A. (eds.) MCPR 2020. LNCS, vol. 12088, pp. 271–281. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49076-8_26

    Chapter  Google Scholar 

  10. DeTone, D., et al.: Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016)

  11. DeTone, D., et al.: Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)

    Google Scholar 

  12. Chiron, G., et al.: ID documents matching and localization with multi-hypothesis constraints. In: 25th International Conference on Pattern Recognition (ICPR). IEEE (2020)

    Google Scholar 

  13. Ilg, E., et al.: FlowNet 2.0: evolution of optical ow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2462–2470 (2017)

    Google Scholar 

  14. Javed, K., Shafait, F.: Real-time document localization in natural images by recursive application of a CNN. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 105–110. IEEE (2017)

    Google Scholar 

  15. das Neves Junior, R.B., et al.: A fast fully octave convolutional neural network for document image segmentation. arXiv preprint arXiv:2004.01317 (2020)

  16. Mullins, R.R., et al.: Know your customer: how salesperson perceptions of customer relationship quality form and influence account profitability. J. Mark. 78(6), 38–58 (2014)

    Article  Google Scholar 

  17. Nguyen, T., et al.: Unsupervised deep homography: a fast and robust homography estimation model. IEEE Rob. Autom. Lett. 3(3), 2346–2353 (2018)

    Article  Google Scholar 

  18. Puybareau, É., Géraud, T.: Real-time document detection in smartphone videos. In: 25th IEEE International Conference on Image Processing, pp. 1498–1502 (2018)

    Google Scholar 

  19. Raguram, R., et al.: USAC: a universal framework for random sample consensus. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 2022–2038 (2012)

    Article  Google Scholar 

  20. Sarlin, P.-E., et al.: Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020)

    Google Scholar 

  21. Shen, X., et al.: RANSAC-flow: generic two-stage image alignment. arXiv preprint arXiv:2004.01526 (2020)

  22. Sheshkus, A., et al.: Houghencoder: neural network architecture for document image semantic segmentation. In: IEEE International Conference on Image Processing (ICIP), pp. 1946–1950 (2020)

    Google Scholar 

  23. Simon, M., et al.: Fine-grained classification of identity document types with only one example. In: 2015 14th IAPR International Conference on Machine Vision Applications (MVA), pp. 126–129. IEEE (2015)

    Google Scholar 

  24. Skoryukina, N., et al.: Fast method of ID documents location and type identification for mobile and server application. In: International Conference on Document Analysis and Recognition, pp. 850–857 (2019)

    Google Scholar 

  25. Tan, M., et al.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)

    Google Scholar 

  26. Tan, M., et al.: MnasNET: platform-aware neural architecture search for mobile. In: IEEE/CVPR, pp. 2820–2828 (2019)

    Google Scholar 

  27. Tropin, D.V., et al.: Approach for document detection by contours and contrasts. arXiv preprint arXiv:2008.02615 (2020)

  28. Truong, P., et al.: GLU-Net: global-local universal network for dense flow and correspondences. In: IEEE/CVPR (2020)

    Google Scholar 

  29. Viet, H.T., et al.: A robust end-to-end information extraction system for Vietnamese identity cards. In: NAFOSTED (2019)

    Google Scholar 

  30. Zhang, J., et al.: Content-aware unsupervised deep homography estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 653–669. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_38

    Chapter  Google Scholar 

  31. Zhou, Q., Li, X.: STN-homography: estimate homography parameters directly. arXiv preprint arXiv:1906.02539 (2019)

  32. Zhu, A., Zhang, C., Li, Z., Xiong, S.: Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement. Int. J. Doc. Anal. Recogn. (IJDAR) 22(3), 351–360 (2019). https://doi.org/10.1007/s10032-019-00341-0

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillaume Chiron .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chiron, G., Arrestier, F., Awal, A.M. (2021). Fast End-to-End Deep Learning Identity Document Detection, Classification and Cropping. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86337-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86336-4

  • Online ISBN: 978-3-030-86337-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics