Skip to main content
Log in

SAE-PD-Seq: sequence autoencoder-based pre-training of decoder for sequence learning tasks

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Sequence learning approaches require careful tuning of parameters for their success. Pre-trained sequence models exhibit a superior performance compared to the sequence models that are randomly initialized. This work presents a sequence autoencoder based pre-trained decoder approach for sequence learning. This approach is applicable for the tasks in which the output is a sequence. In the proposed method, a SAE (Sequence Auto Encoder) is trained with an objective of reconstructing the input sequence. The weights of the pre-trained SAE are then used to initialize the decoder in the sequence model developed based on the encoder–decoder paradigm. The proposed pre-trained decoder-based approach achieves superior performance as compared to the pre-trained encoder-based approach and the pre-trained encoder- and decoder-based approach. The behavior of the suggested approach is examined using unsupervised pre-training. The proposed method is evaluated for neural machine translation and image caption generation tasks. Outcomes of the experimental studies on benchmark datasets indicate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: International Conference on Acoustics, Speech and Signal Processing, pp. 4945–4949 (2016)

  2. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19, 153–160 (2007)

    Google Scholar 

  3. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)

    Article  Google Scholar 

  4. Bodapati, J.D., Shaik, N.S., Naralasetti, V.: Composite deep neural network with gated-attention mechanism for diabetic retinopathy severity classification. J. Ambient Intell. Human. Comput. pp. 1–15 (2020)

  5. Bodapati, J.D., Shaik, N.S., Naralasetti, V.: Deep convolution feature aggregation: an application to diabetic retinopathy severity level prediction. Signal Image Video Process. 1–8 (2020)

  6. Bodapati, J.D., Veeranjaneyulu, N., Shaik, S.: Sentiment analysis from movie reviews using lstms. Ingénierie des Systèmes d Inf. 24(1), 125–129 (2019)

    Article  Google Scholar 

  7. Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder–decoder approaches. In: Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111 (2014)

  8. Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)

  9. Clouse, D.S., Giles, C.L., Horne, B.G., Cottrell, G.W.: Time-delay neural networks: representation and induction of finite-state machines. IEEE Trans. Neural Netw. 8(5), 1065–1070 (1997)

    Article  Google Scholar 

  10. Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: International Conference on Neural Information Processing Systems, pp. 3079–3087 (2015)

  11. Dreyer, M., Marcu, D. H.: Meaning-equivalent semantics for translation evaluation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 162–171 (2012)

  12. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)

    MathSciNet  MATH  Google Scholar 

  13. Gao, L., Guo, Z., Zhang, H., Xu, X., Shen, H.T.: Video captioning with attention-based LSTM and semantic consistency. IEEE Trans. Multimedia 19(9), 2045–2055 (2017)

    Article  Google Scholar 

  14. Glasmachers, T.: Limits of end-to-end learning. In: Asian Conference on Machine Learning, pp. 17–32 (2017)

  15. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  16. Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: Fully convolutional localization networks for dense captioning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4565–4574 (2016)

  17. Juang, B.H., Rabiner, L.R.: Hidden markov models for speech recognition. Technometrics 33(3), 251–272 (1991)

    Article  MathSciNet  Google Scholar 

  18. Jyostna, D.B., Shaik, N.S., Mundukur, V.N., Msenet, N.B.: Multi-modal squeeze-and-excitation network for brain tumor severity prediction. Int. J. Pattern Recogn. Artif. Intell. 35(4), 1–21 (2020)

    Google Scholar 

  19. Kaser, T., Klingler, S., Schwing, A.G., Gross, M.: Dynamic bayesian networks for student modeling. IEEE Trans. Learn. Technol. 10(4), 450–462 (2017)

    Article  Google Scholar 

  20. Kingma, D.P., Ba, J.: ADAM: A method for stochastic optimization. In: International Conference for Learning Representations, pp. 1–15 (2014)

  21. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  22. Lane, T., Brodley, C.E.: An empirical study of two approaches to sequence learning for anomaly detection. Mach. Learn. 51(1), 73–107 (2003)

    Article  Google Scholar 

  23. Le, Q., Sutskever, I., Vinyals, O., Kaise, L.: Multi-task sequence to sequence learning. In: International Conference on Learning Representations (2016)

  24. Lipton, Z. C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)

  25. Liu, X., Li, H., Shao, J., Chen, D., Wang, X.: Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data. In: European Conference on Computer Vision (ECCV) (2018), pp. 338–354

  26. Nakashika, T., Takiguchi, T., Ariki, Y.: High-order sequence modeling using speaker-dependent recurrent temporal restricted Boltzmann machines for voice conversion. In: Conference of the International Speech Communication Association (2014)

  27. Nowak, J., Taspinar, A., Scherer, R.: LSTM recurrent neural networks for short text and sentiment classification. In: International Conference on Artificial Intelligence and Soft Computing, pp. 553–562 (2017)

  28. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)

  29. Ramachandran, P., Liu, P.J., Le, Q.: Unsupervised pretraining for sequence to sequence learning. In: Conference on Empirical Methods in Natural Language Processing, pp. 383–391 (2017)

  30. Song, K., Tan, X., Qin, T., Lu, J., Liu, T.-Y.: Mass: Masked sequence to sequence pre-training for language generation. In: International Conference on Machine Learning, pp. 5926–5936 (2019)

  31. Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852 (2015)

  32. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

  33. Szegedy, C., Toshev, A., Erhan, D.: Deep neural networks for object detection. Adv. Neural Inf. Processing Syst., pp. 2553–2561 (2013)

  34. Tatoeba Project. Tab-delimited bilingual sentence pairs from the tatoeba project (good for anki and similar flashcard applications). https://www.manythings.org/anki. (2019)

  35. Varis, D., Bojar, O.: Unsupervised pretraining for neural machine translation using elastic weight consolidation. Student Research Workshop, In: Annual Meeting of the Association for Computational Linguistics, pp. 130–135 (2019)

  36. Wang, L., Zhou, F., Li, Z., Zuo, W., Tan, H.: Abnormal event detection in videos using hybrid spatio-temporal autoencoder. In: International Conference on Image Processing, pp. 2276–2280 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jyostna Devi Bodapati.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bodapati, J.D. SAE-PD-Seq: sequence autoencoder-based pre-training of decoder for sequence learning tasks. SIViP 15, 1453–1459 (2021). https://doi.org/10.1007/s11760-021-01877-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-021-01877-7

Keywords

Navigation