HCADecoder: A Hybrid CTC-Attention Decoder for Chinese Text Recognition

Cai, Siqi; Xue, Wenyuan; Li, Qingyong; Zhao, Peng

doi:10.1007/978-3-030-86334-0_12

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12823))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3557 Accesses
3 Citations

Abstract

Text recognition has attracted much attention and achieved exciting results on several commonly used public English datasets in recent years. However, most of these well-established methods, such as connectionist temporal classification (CTC)-based methods and attention-based methods, pay less attention to challenges on the Chinese scene, especially for long text sequences. In this paper, we exploit the characteristic of Chinese word frequency distribution and propose a hybrid CTC-Attention decoder (HCADecoder) supervised with bigram mixture labels for Chinese text recognition. Specifically, we first add high-frequency bigram subwords into the original unigram labels to construct the mixture bigram label, which can shorten the decoding length. Then, in the decoding stage, the CTC module outputs a preliminary result, in which confused predictions are replaced with bigram subwords. The attention module utilizes the preliminary result and outputs the final result. Experimental results on four Chinese datasets demonstrate the effectiveness of the proposed method for Chinese text recognition, especially for long texts. Code will be made publicly available(https://github.com/lukecsq/hybrid-CTC-Attention).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cascade 2D attentional decoders with context-enhanced encoder for scene text recognition

Article 21 February 2024

Word-character attention model for Chinese text classification

Article 26 February 2019

Text recognition in natural scenes based on deep learning

Article 16 February 2022

Notes

1.
One- and two-character words are defined by the Chinese semantics and have the same length as unigram and bigram words, respectively.
2.
http://www.cncorpus.org/.
3.
https://dumps.wikimedia.org/zhwiki/latest/.

References

Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5076–5084 (2017)
Google Scholar
Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160 (2015)
Google Scholar
Naphade, M.R., Huang, T.S.: Extracting semantics from audio-visual content: the final frontier in multimedia retrieval. IEEE Trans. Neural Netw. 13(4), 793–810 (2002)
Article Google Scholar
Sadeghi, H., Valaee, S., Shirani, S.: Ocrapose: an indoor positioning system using smartphone/tablet cameras and OCR-aided stereo feature matching. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1473–1477 (2015)
Google Scholar
Wan, Z., Xie, F., Liu, Y., Bai, X., Yao, C.: 2D-CTC for scene text recognition. arXiv preprint arXiv:1907.09705 (2019)
Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: AON: towards arbitrarily-oriented text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5571–5579 (2018)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning (ICML), pp. 369–376 (2006)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Liu, C.L., Yin, F., Wang, D.H., Wang, Q.F.: CASIA online and offline Chinese handwriting databases. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 37–41 (2011)
Google Scholar
Nayef, N., et al.: ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1454–1459 (2017)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)
Google Scholar
Ding, X., Wang, Y.: Character Recognition: Principles, Methods and Practice (2017)
Google Scholar
Wang, Z.R., Du, J., Wang, J.M.: Writer-aware CNN for parsimonious HMM-based offline handwritten Chinese text recognition. Pattern Recognit. 100, 107102 (2020)
Article Google Scholar
Tong, G., Li, Y., Gao, H., Chen, H., Wang, H., Yang, X.: MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes. Int. J. Doc. Anal. Recognit. (IJDAR) 23, 103–114 (2019)
Article Google Scholar
Zhao, Y., Xue, W., Li, Q.: A multi-scale CRNN model for Chinese papery medical document recognition. In: Proceedings of the IEEE Fourth International Conference on Multimedia Big Data (BigMM), pp. 1–5 (2018)
Google Scholar
Gao, Y., Chen, Y., Wang, J., Lu, H.: Reading scene text with attention convolutional sequence modeling. arXiv preprint arXiv:1709.04303 (2017)
Shigeki, K., Soplin, N., Watanabe, S., Delcroix, D., Ogawa, A., Nakatani, T.: Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration. In: Proceedings of INTERSPEECH, vol. 9, pp. 1408–1412 (2019)
Google Scholar
Chorowski, J., Jaitly, N.: Towards better decoding and language model integration in sequence to sequence models. arXiv preprint arXiv:1612.02695 (2016)
Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2231–2239 (2016)
Google Scholar
Liu, Z., Li, Y., Ren, F., Goh, W.L., Yu, H.: Squeezedtext: a real-time scene text recognition by binary convolutional encoder-decoder network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Bai, F., Cheng, Z., Niu, Y., Pu, S., Zhou, S.: Edit probability for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1508–1516 (2018)
Google Scholar
Kim, S., Hori, T., Watanabe, S.: Joint CTC-attention based end-to-end speech recognition using multi-task learning. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4835–4839 (2017)
Google Scholar
Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: GTC: guided training of CTC towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11005–11012 (2020)
Google Scholar
Fan, D., Gao, G., Wu, H.: Sub-word based Mongolian offline handwriting recognition. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 246–253 (2019)
Google Scholar
Saluja, R., Punjabi, M., Carman, M., Ramakrishnan, G., Chaudhuri, P.: Sub-word embeddings for OCR corrections in highly fusional Indic languages. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 160–165 (2019)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
Wang, S., Chen, L., Xu, L., Fan, W., Sun, J., Naoi, S.: Deep knowledge training and heterogeneous CNN for handwritten Chinese text recognition. In: Proceedings of the 15th IEEE International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 84–89 (2016)
Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article Google Scholar
Xie, C., Lai, S., Liao, Q., Jin, L.: High performance offline handwritten Chinese text recognition with a new data preprocessing and augmentation pipeline. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 45–59. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_4
Chapter Google Scholar
Wu, Y.C., Yin, F., Liu, C.L.: Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recognit. 65, 251–264 (2017)
Article Google Scholar
Wang, Z.X., Wang, Q.F., Yin, F., Liu, C.L.: Weakly supervised learning for over-segmentation based handwritten Chinese text recognition. In: Proceedings of the 17th IEEE International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 157–162 (2020)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant U2034211, 62006017, in part by the Fundamental Research Funds for the Central Universities under Grant 2020JBZD010 and in part by the Beijing Natural Science Foundation under Grant L191016.

Author information

Authors and Affiliations

Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China
Siqi Cai, Wenyuan Xue, Qingyong Li & Peng Zhao

Authors

Siqi Cai
View author publications
You can also search for this author in PubMed Google Scholar
Wenyuan Xue
View author publications
You can also search for this author in PubMed Google Scholar
Qingyong Li
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingyong Li .

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, S., Xue, W., Li, Q., Zhao, P. (2021). HCADecoder: A Hybrid CTC-Attention Decoder for Chinese Text Recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-86334-0_12
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86333-3
Online ISBN: 978-3-030-86334-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

HCADecoder: A Hybrid CTC-Attention Decoder for Chinese Text Recognition

Abstract

Access this chapter

Similar content being viewed by others

Cascade 2D attentional decoders with context-enhanced encoder for scene text recognition

Word-character attention model for Chinese text classification

Text recognition in natural scenes based on deep learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

HCADecoder: A Hybrid CTC-Attention Decoder for Chinese Text Recognition

Abstract

Access this chapter

Similar content being viewed by others

Cascade 2D attentional decoders with context-enhanced encoder for scene text recognition

Word-character attention model for Chinese text classification

Text recognition in natural scenes based on deep learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation