SAE-PD-Seq: sequence autoencoder-based pre-training of decoder for sequence learning tasks

Bodapati, Jyostna Devi

doi:10.1007/s11760-021-01877-7

SAE-PD-Seq: sequence autoencoder-based pre-training of decoder for sequence learning tasks

Original Paper
Published: 10 March 2021

Volume 15, pages 1453–1459, (2021)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Jyostna Devi Bodapati ORCID: orcid.org/0000-0002-5185-882X^1,2

315 Accesses
5 Citations
Explore all metrics

Abstract

Sequence learning approaches require careful tuning of parameters for their success. Pre-trained sequence models exhibit a superior performance compared to the sequence models that are randomly initialized. This work presents a sequence autoencoder based pre-trained decoder approach for sequence learning. This approach is applicable for the tasks in which the output is a sequence. In the proposed method, a SAE (Sequence Auto Encoder) is trained with an objective of reconstructing the input sequence. The weights of the pre-trained SAE are then used to initialize the decoder in the sequence model developed based on the encoder–decoder paradigm. The proposed pre-trained decoder-based approach achieves superior performance as compared to the pre-trained encoder-based approach and the pre-trained encoder- and decoder-based approach. The behavior of the suggested approach is examined using unsupervised pre-training. The proposed method is evaluated for neural machine translation and image caption generation tasks. Outcomes of the experimental studies on benchmark datasets indicate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prompt Engineering in Large Language Models

How to Fine-Tune BERT for Text Classification?

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

References

Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: International Conference on Acoustics, Speech and Signal Processing, pp. 4945–4949 (2016)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19, 153–160 (2007)
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Article Google Scholar
Bodapati, J.D., Shaik, N.S., Naralasetti, V.: Composite deep neural network with gated-attention mechanism for diabetic retinopathy severity classification. J. Ambient Intell. Human. Comput. pp. 1–15 (2020)
Bodapati, J.D., Shaik, N.S., Naralasetti, V.: Deep convolution feature aggregation: an application to diabetic retinopathy severity level prediction. Signal Image Video Process. 1–8 (2020)
Bodapati, J.D., Veeranjaneyulu, N., Shaik, S.: Sentiment analysis from movie reviews using lstms. Ingénierie des Systèmes d Inf. 24(1), 125–129 (2019)
Article Google Scholar
Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder–decoder approaches. In: Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111 (2014)
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)
Clouse, D.S., Giles, C.L., Horne, B.G., Cottrell, G.W.: Time-delay neural networks: representation and induction of finite-state machines. IEEE Trans. Neural Netw. 8(5), 1065–1070 (1997)
Article Google Scholar
Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: International Conference on Neural Information Processing Systems, pp. 3079–3087 (2015)
Dreyer, M., Marcu, D. H.: Meaning-equivalent semantics for translation evaluation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 162–171 (2012)
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)
MathSciNet MATH Google Scholar
Gao, L., Guo, Z., Zhang, H., Xu, X., Shen, H.T.: Video captioning with attention-based LSTM and semantic consistency. IEEE Trans. Multimedia 19(9), 2045–2055 (2017)
Article Google Scholar
Glasmachers, T.: Limits of end-to-end learning. In: Asian Conference on Machine Learning, pp. 17–32 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: Fully convolutional localization networks for dense captioning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4565–4574 (2016)
Juang, B.H., Rabiner, L.R.: Hidden markov models for speech recognition. Technometrics 33(3), 251–272 (1991)
Article MathSciNet Google Scholar
Jyostna, D.B., Shaik, N.S., Mundukur, V.N., Msenet, N.B.: Multi-modal squeeze-and-excitation network for brain tumor severity prediction. Int. J. Pattern Recogn. Artif. Intell. 35(4), 1–21 (2020)
Google Scholar
Kaser, T., Klingler, S., Schwing, A.G., Gross, M.: Dynamic bayesian networks for student modeling. IEEE Trans. Learn. Technol. 10(4), 450–462 (2017)
Article Google Scholar
Kingma, D.P., Ba, J.: ADAM: A method for stochastic optimization. In: International Conference for Learning Representations, pp. 1–15 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Lane, T., Brodley, C.E.: An empirical study of two approaches to sequence learning for anomaly detection. Mach. Learn. 51(1), 73–107 (2003)
Article Google Scholar
Le, Q., Sutskever, I., Vinyals, O., Kaise, L.: Multi-task sequence to sequence learning. In: International Conference on Learning Representations (2016)
Lipton, Z. C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)
Liu, X., Li, H., Shao, J., Chen, D., Wang, X.: Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data. In: European Conference on Computer Vision (ECCV) (2018), pp. 338–354
Nakashika, T., Takiguchi, T., Ariki, Y.: High-order sequence modeling using speaker-dependent recurrent temporal restricted Boltzmann machines for voice conversion. In: Conference of the International Speech Communication Association (2014)
Nowak, J., Taspinar, A., Scherer, R.: LSTM recurrent neural networks for short text and sentiment classification. In: International Conference on Artificial Intelligence and Soft Computing, pp. 553–562 (2017)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)
Ramachandran, P., Liu, P.J., Le, Q.: Unsupervised pretraining for sequence to sequence learning. In: Conference on Empirical Methods in Natural Language Processing, pp. 383–391 (2017)
Song, K., Tan, X., Qin, T., Lu, J., Liu, T.-Y.: Mass: Masked sequence to sequence pre-training for language generation. In: International Conference on Machine Learning, pp. 5926–5936 (2019)
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852 (2015)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Szegedy, C., Toshev, A., Erhan, D.: Deep neural networks for object detection. Adv. Neural Inf. Processing Syst., pp. 2553–2561 (2013)
Tatoeba Project. Tab-delimited bilingual sentence pairs from the tatoeba project (good for anki and similar flashcard applications). https://www.manythings.org/anki. (2019)
Varis, D., Bojar, O.: Unsupervised pretraining for neural machine translation using elastic weight consolidation. Student Research Workshop, In: Annual Meeting of the Association for Computational Linguistics, pp. 130–135 (2019)
Wang, L., Zhou, F., Li, Z., Zuo, W., Tan, H.: Abnormal event detection in videos using hybrid spatio-temporal autoencoder. In: International Conference on Image Processing, pp. 2276–2280 (2018)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, 600036, India
Jyostna Devi Bodapati
Department of Computer Science and Engineering, Vignan’s Foundation for Science Technology and Research, Guntur, Andhra Pradesh, 522213, India
Jyostna Devi Bodapati

Authors

Jyostna Devi Bodapati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jyostna Devi Bodapati.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bodapati, J.D. SAE-PD-Seq: sequence autoencoder-based pre-training of decoder for sequence learning tasks. SIViP 15, 1453–1459 (2021). https://doi.org/10.1007/s11760-021-01877-7

Download citation

Received: 01 October 2020
Revised: 14 February 2021
Accepted: 19 February 2021
Published: 10 March 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11760-021-01877-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SAE-PD-Seq: sequence autoencoder-based pre-training of decoder for sequence learning tasks

Abstract

Access this article

Similar content being viewed by others

Prompt Engineering in Large Language Models

How to Fine-Tune BERT for Text Classification?

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SAE-PD-Seq: sequence autoencoder-based pre-training of decoder for sequence learning tasks

Abstract

Access this article

Similar content being viewed by others

Prompt Engineering in Large Language Models

How to Fine-Tune BERT for Text Classification?

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation