Fast End-to-End Deep Learning Identity Document Detection, Classification and Cropping

Chiron, Guillaume; Arrestier, Florian; Awal, Ahmad Montaser

doi:10.1007/978-3-030-86337-1_23

Guillaume Chiron¹¹,
Florian Arrestier¹¹ &
Ahmad Montaser Awal¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12824))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3297 Accesses
5 Citations

Abstract

The growing use of Know Your Customer online services generates a massive flow of dematerialised personal Identity Documents under variable capturing conditions and qualities (e.g. webcam, smartphone, scan, or even handcrafted pdfs). IDs are designed, depending on their issuing country/model, with a specific layout (i.e. background, photo(s), fixed/variable text fields) along with various anti-fraud features (e.g. checksums, Optical Variable Devices) which are non-trivial to analyse. This paper tackles the problem of detecting, classifying, and aligning captured documents onto their reference model. This task is essential in the process of document reading and fraud verification. However, due to the high variation of capture conditions and models’ layout, classical handcrafted approaches require deep knowledge of documents and hence are hard to maintain. A modular approach using a fully multi-stage deep learning based approach is proposed in this work. The proposed approach allows to accurately classify the document and estimates its quadrilateral (localization). As opposed to approaches relying on a single end-to-end network, the proposed modular framework offers more flexibility and a potential for future incremental learning. All networks used in this work are derivatives of recent state-of-the-art ones. Experiments show the superiority of the proposed approach in terms of speed while maintaining good accuracy, both on the MIDV-500 academic dataset and on an industrial based dataset compared to hand crafted solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Know Your Customer [16]: set of laws, certifications and regulations preventing criminal from either impersonating other people or forging false IDs.
2.
References for local features are not exhaustively provided and can be found in [6].

References

Abbas, S.A., ul Hussain, S.: Recovering homography from camera captured documents using convolutional neural networks. arXiv preprint arXiv:1709.03524 (2017)
Arlazarov, V.V., et al.: MIDV-500: a dataset for identity documents analysis and recognition on mobile devices in video stream. CoRR (2018)
Google Scholar
Attivissimo, F., et al.: An automatic reader of identity documents. In: Systems, Man and Cybernetics (SMC). IEEE (2019)
Google Scholar
Awal, A.M., et al.: Complex document classification and localization application on identity document images. In: 14th IAPR International Conference on Document Analysis and Recognition, pp. 426–431 (2017)
Google Scholar
Bandyopadhyay, H., et al.: A gated and bifurcated stacked U-Net module for document image dewarping (2020). arXiv:2007.09824 [cs.CV]
Bojanić, D., et al.: On the comparison of classic and deep keypoint detector and descriptor methods. In: 11th International Symposium on Image and Signal Processing and Analysis (ISPA), pp. 64–69. IEEE (2019)
Google Scholar
Bulatov, K., et al.: MIDV-2019: challenges of the modern mobile based document OCR. In: ICMV 2019, vol. 11433 (2020)
Google Scholar
Burie, J.-C., et al.: ICDAR2015 competition on smartphone document capture and OCR (SmartDoc). In: 13th International Conference on Document Analysis and Recognition, pp. 1161–1165. IEEE (2015)
Google Scholar
Castelblanco, A., Solano, J., Lopez, C., Rivera, E., Tengana, L., Ochoa, M.: Machine learning techniques for identity document verification in uncontrolled environments: a case study. In: Figueroa Mora, K.M., Anzurez Marín, J., Cerda, J., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera-López, J.A. (eds.) MCPR 2020. LNCS, vol. 12088, pp. 271–281. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49076-8_26
Chapter Google Scholar
DeTone, D., et al.: Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016)
DeTone, D., et al.: Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)
Google Scholar
Chiron, G., et al.: ID documents matching and localization with multi-hypothesis constraints. In: 25th International Conference on Pattern Recognition (ICPR). IEEE (2020)
Google Scholar
Ilg, E., et al.: FlowNet 2.0: evolution of optical ow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2462–2470 (2017)
Google Scholar
Javed, K., Shafait, F.: Real-time document localization in natural images by recursive application of a CNN. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 105–110. IEEE (2017)
Google Scholar
das Neves Junior, R.B., et al.: A fast fully octave convolutional neural network for document image segmentation. arXiv preprint arXiv:2004.01317 (2020)
Mullins, R.R., et al.: Know your customer: how salesperson perceptions of customer relationship quality form and influence account profitability. J. Mark. 78(6), 38–58 (2014)
Article Google Scholar
Nguyen, T., et al.: Unsupervised deep homography: a fast and robust homography estimation model. IEEE Rob. Autom. Lett. 3(3), 2346–2353 (2018)
Article Google Scholar
Puybareau, É., Géraud, T.: Real-time document detection in smartphone videos. In: 25th IEEE International Conference on Image Processing, pp. 1498–1502 (2018)
Google Scholar
Raguram, R., et al.: USAC: a universal framework for random sample consensus. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 2022–2038 (2012)
Article Google Scholar
Sarlin, P.-E., et al.: Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020)
Google Scholar
Shen, X., et al.: RANSAC-flow: generic two-stage image alignment. arXiv preprint arXiv:2004.01526 (2020)
Sheshkus, A., et al.: Houghencoder: neural network architecture for document image semantic segmentation. In: IEEE International Conference on Image Processing (ICIP), pp. 1946–1950 (2020)
Google Scholar
Simon, M., et al.: Fine-grained classification of identity document types with only one example. In: 2015 14th IAPR International Conference on Machine Vision Applications (MVA), pp. 126–129. IEEE (2015)
Google Scholar
Skoryukina, N., et al.: Fast method of ID documents location and type identification for mobile and server application. In: International Conference on Document Analysis and Recognition, pp. 850–857 (2019)
Google Scholar
Tan, M., et al.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Google Scholar
Tan, M., et al.: MnasNET: platform-aware neural architecture search for mobile. In: IEEE/CVPR, pp. 2820–2828 (2019)
Google Scholar
Tropin, D.V., et al.: Approach for document detection by contours and contrasts. arXiv preprint arXiv:2008.02615 (2020)
Truong, P., et al.: GLU-Net: global-local universal network for dense flow and correspondences. In: IEEE/CVPR (2020)
Google Scholar
Viet, H.T., et al.: A robust end-to-end information extraction system for Vietnamese identity cards. In: NAFOSTED (2019)
Google Scholar
Zhang, J., et al.: Content-aware unsupervised deep homography estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 653–669. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_38
Chapter Google Scholar
Zhou, Q., Li, X.: STN-homography: estimate homography parameters directly. arXiv preprint arXiv:1906.02539 (2019)
Zhu, A., Zhang, C., Li, Z., Xiong, S.: Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement. Int. J. Doc. Anal. Recogn. (IJDAR) 22(3), 351–360 (2019). https://doi.org/10.1007/s10032-019-00341-0
Article Google Scholar

Download references

Author information

Authors and Affiliations

Research Department AriadNEXT, Cesson-Sévigné, France
Guillaume Chiron, Florian Arrestier & Ahmad Montaser Awal

Authors

Guillaume Chiron
View author publications
You can also search for this author in PubMed Google Scholar
Florian Arrestier
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Montaser Awal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillaume Chiron .

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chiron, G., Arrestier, F., Awal, A.M. (2021). Fast End-to-End Deep Learning Identity Document Detection, Classification and Cropping. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-86337-1_23
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86336-4
Online ISBN: 978-3-030-86337-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Fast End-to-End Deep Learning Identity Document Detection, Classification and Cropping