Abstract
Sequence learning approaches require careful tuning of parameters for their success. Pre-trained sequence models exhibit a superior performance compared to the sequence models that are randomly initialized. This work presents a sequence autoencoder based pre-trained decoder approach for sequence learning. This approach is applicable for the tasks in which the output is a sequence. In the proposed method, a SAE (Sequence Auto Encoder) is trained with an objective of reconstructing the input sequence. The weights of the pre-trained SAE are then used to initialize the decoder in the sequence model developed based on the encoder–decoder paradigm. The proposed pre-trained decoder-based approach achieves superior performance as compared to the pre-trained encoder-based approach and the pre-trained encoder- and decoder-based approach. The behavior of the suggested approach is examined using unsupervised pre-training. The proposed method is evaluated for neural machine translation and image caption generation tasks. Outcomes of the experimental studies on benchmark datasets indicate the effectiveness of the proposed approach.
Similar content being viewed by others
References
Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: International Conference on Acoustics, Speech and Signal Processing, pp. 4945–4949 (2016)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19, 153–160 (2007)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Bodapati, J.D., Shaik, N.S., Naralasetti, V.: Composite deep neural network with gated-attention mechanism for diabetic retinopathy severity classification. J. Ambient Intell. Human. Comput. pp. 1–15 (2020)
Bodapati, J.D., Shaik, N.S., Naralasetti, V.: Deep convolution feature aggregation: an application to diabetic retinopathy severity level prediction. Signal Image Video Process. 1–8 (2020)
Bodapati, J.D., Veeranjaneyulu, N., Shaik, S.: Sentiment analysis from movie reviews using lstms. Ingénierie des Systèmes d Inf. 24(1), 125–129 (2019)
Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder–decoder approaches. In: Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111 (2014)
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)
Clouse, D.S., Giles, C.L., Horne, B.G., Cottrell, G.W.: Time-delay neural networks: representation and induction of finite-state machines. IEEE Trans. Neural Netw. 8(5), 1065–1070 (1997)
Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: International Conference on Neural Information Processing Systems, pp. 3079–3087 (2015)
Dreyer, M., Marcu, D. H.: Meaning-equivalent semantics for translation evaluation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 162–171 (2012)
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)
Gao, L., Guo, Z., Zhang, H., Xu, X., Shen, H.T.: Video captioning with attention-based LSTM and semantic consistency. IEEE Trans. Multimedia 19(9), 2045–2055 (2017)
Glasmachers, T.: Limits of end-to-end learning. In: Asian Conference on Machine Learning, pp. 17–32 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: Fully convolutional localization networks for dense captioning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4565–4574 (2016)
Juang, B.H., Rabiner, L.R.: Hidden markov models for speech recognition. Technometrics 33(3), 251–272 (1991)
Jyostna, D.B., Shaik, N.S., Mundukur, V.N., Msenet, N.B.: Multi-modal squeeze-and-excitation network for brain tumor severity prediction. Int. J. Pattern Recogn. Artif. Intell. 35(4), 1–21 (2020)
Kaser, T., Klingler, S., Schwing, A.G., Gross, M.: Dynamic bayesian networks for student modeling. IEEE Trans. Learn. Technol. 10(4), 450–462 (2017)
Kingma, D.P., Ba, J.: ADAM: A method for stochastic optimization. In: International Conference for Learning Representations, pp. 1–15 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Lane, T., Brodley, C.E.: An empirical study of two approaches to sequence learning for anomaly detection. Mach. Learn. 51(1), 73–107 (2003)
Le, Q., Sutskever, I., Vinyals, O., Kaise, L.: Multi-task sequence to sequence learning. In: International Conference on Learning Representations (2016)
Lipton, Z. C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)
Liu, X., Li, H., Shao, J., Chen, D., Wang, X.: Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data. In: European Conference on Computer Vision (ECCV) (2018), pp. 338–354
Nakashika, T., Takiguchi, T., Ariki, Y.: High-order sequence modeling using speaker-dependent recurrent temporal restricted Boltzmann machines for voice conversion. In: Conference of the International Speech Communication Association (2014)
Nowak, J., Taspinar, A., Scherer, R.: LSTM recurrent neural networks for short text and sentiment classification. In: International Conference on Artificial Intelligence and Soft Computing, pp. 553–562 (2017)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)
Ramachandran, P., Liu, P.J., Le, Q.: Unsupervised pretraining for sequence to sequence learning. In: Conference on Empirical Methods in Natural Language Processing, pp. 383–391 (2017)
Song, K., Tan, X., Qin, T., Lu, J., Liu, T.-Y.: Mass: Masked sequence to sequence pre-training for language generation. In: International Conference on Machine Learning, pp. 5926–5936 (2019)
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852 (2015)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Szegedy, C., Toshev, A., Erhan, D.: Deep neural networks for object detection. Adv. Neural Inf. Processing Syst., pp. 2553–2561 (2013)
Tatoeba Project. Tab-delimited bilingual sentence pairs from the tatoeba project (good for anki and similar flashcard applications). https://www.manythings.org/anki. (2019)
Varis, D., Bojar, O.: Unsupervised pretraining for neural machine translation using elastic weight consolidation. Student Research Workshop, In: Annual Meeting of the Association for Computational Linguistics, pp. 130–135 (2019)
Wang, L., Zhou, F., Li, Z., Zuo, W., Tan, H.: Abnormal event detection in videos using hybrid spatio-temporal autoencoder. In: International Conference on Image Processing, pp. 2276–2280 (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bodapati, J.D. SAE-PD-Seq: sequence autoencoder-based pre-training of decoder for sequence learning tasks. SIViP 15, 1453–1459 (2021). https://doi.org/10.1007/s11760-021-01877-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-01877-7