Elsevier

Neurocomputing

Volume 322, 17 December 2018, Pages 93-101
Neurocomputing

Using a stacked residual LSTM model for sentiment intensity prediction

https://doi.org/10.1016/j.neucom.2018.09.049Get rights and content

Abstract

The sentiment intensity of a text indicates the strength of its association with positive sentiment, which is a continuous real-value between 0 and 1. Compared to polarity classification, predicting sentiment intensities for texts can achieve more fine-grained sentiment analyses. By introducing word embedding techniques, recent studies that use deep neural models have outperformed existing lexicon- and regression-based methods for sentiment intensity prediction. For better performance, a common way of a neural network is to add more layers in order to learn high-level features. However, when the depth increases, the network degrades and becomes more difficult to train. Since the errors between layers will be accumulated, and gradients will be vanished. To address this problem, this paper proposes a stacked residual LSTM model to predict sentiment intensity for a given text. By investigating the performances of shallow and deep architectures, we introduce a residual connection to every few LSTM layers to construct an 8-layer neural network. The residual connection can center layer gradients and propagated errors. Thus it makes the deeper network easier to optimize. This approach enables us to stack more layers of LSTM successfully for this task, which can improve the prediction accuracy of existing methods. Experimental results show that the proposed method outperforms lexicon-, regression-, and conventional NN-based methods proposed in previous studies.

Introduction

Online social networking services (SNSs), such as Twitter, Facebook and Weibo, enable users to share their thoughts, opinions, and emotions with others through texts which are informal and strongly subjective. Analysis of user-generated information is very useful for understanding how sentiments spread from person to person on the Internet. Sentiment analysis techniques [1], [2], [3], [4], [5] provide a way to handle such affective information automatically. As an active research field in computational linguistic and affective computing [6], sentiment analysis can analyze, process, induce and deduce such subjective texts with affective information.

Most existing methods of sentiment analysis focus on the polarity classification approach, which classifies the target texts into several categories, e.g., positive or negative. Such methods mostly use classification models. The methods first extract features such as n-gram, bag-of-words (BOW) or part-of-speech (POS) [7], [8]. Next, support vector machine (SVM), naïve Bayes, maximum entropy, logistic regression or random forest methods are applied on these features to classify texts into either positive or negative classes [9].

Alternatively, sentiment intensity prediction could be another choice for sentiment analysis. More specifically, sentiment intensity of a word, phrase or text indicates the strength of its association with positive sentiment, also known as valence values [10], [11] or affective ratings. It is a score between 0 and 1, where the score of 0 or 1 respectively indicates the least or maximum association with positive sentiment (i.e., negative or positive). In contrast to the traditional approach, the intensity prediction is usually considered to be a regression, rather than a classification, since the sentiment intensity is defined as a continuous real-value. The following three movie reviews were rated in both polarity and sentiment intensity, as taken from the Stanford Sentiment Treebank1 (SST) [12] corpus:

(Text 1 negative, senti = 0.375) The movie is genial but never inspired, and little about it will stay with you.

(Text 2 negative, senti = 0.194) However, the movie does not really deliver for country music fans or for family audiences.

(Text 3 negative, senti = 0.083) Bears are even worse than I imagined a movie ever could be.

All three reviews about the movie were classified to be negative. However, the 3rd review, which was rated a lower intensity (more negative) than other two texts, requires a higher priority to draw attention. In addition, the intensity prediction approach can provide more intelligent and fine-grained sentiment applications, such as hotspot detection and forecasting [13], mental illness identification [14], financial news analysis [15], question answering [16], and blog post analysis [17].

Few studies have sought to predict continuous affective ratings of texts using lexicon- and regression-based methods. The lexicon-based methods are based on the underlying assumption for most algorithms that the intensity of a given text can be estimated via the composition of the intensity of its constituent words [18]. Another approach uses regression-based methods [17], [19], [20]. These methods sought to learn the correlations between sentiment intensities and linguistic features of words, e.g., BoW and POS. However, the prediction performance of such methods is still low.

Recently, several classification methods have been implemented to explore the use of deep neural networks and word embedding, such as convolutional neural networks (CNN) [21], [22], recurrent neural networks (RNN) [23], [24] and long-short term memory (LSTM) [25], [26]. CNN [21], [22] is able to extract active local n-gram features. Conversely, LSTM [25], [26] can sequentially model the texts. This model focuses on past information and draws conclusions from the entire text. In addition, such NN methods are used as classifiers to distinguish whether the given text is positive or negative. These models have not been thoroughly investigated for sentiment intensity prediction.

In image recognition, several recently proposed models, such as VGG [27], InceptionNet [28], [29] and ResNet [30], have all exploited very “deep” architecture. The success of these models reveals that increasing the depth of a neural network can help improve the performance of learning models as deeper networks learn better representations of features [31]. For language modeling tasks, a feasible way of applying deep architecture is to use a stacked CNN [32] or LSTM [33], [34] model. The question is becoming clear: Is training better the prediction model as simple as stacking more layers? In fact, deeper networks are easily impacted by the degradation problem: when the network has more layers, prediction accuracy will becoming saturated and never increase again. Since the errors between layers will be accumulated, and gradients will be vanished. To explain this phenomenon, we trained a conventional 2-layer LSTM and a stacked 8-layer plain LSTM on SST for sentiment intensity prediction. Fig. 1 shows the mean squared error and Pearson correlation coefficient of these two models on training and testing sets. Unexpectedly, the deeper network has higher training and test error. Similar results on other corpora are shown in Fig. 4. This result indicates that the more stacked layers there are, the harder it is to optimize.

In this paper, we propose a stacked residual LSTM model to predict the sentiment intensity of a given text. To tackle the degradation problem, we introduce the residual connection to every few LSTM layers inspired by ResNet [30]. The residual connection can center layer gradients and propagated errors. Thus, it makes the network easier to optimize. This approach enables us to stack more layers of LSTM successfully for NLP task. As similar as stacked deep convolution networks extract different level features from pixels to shapes and contours in previous image processing task [28], [29], [30], the proposed stacked LSTM model can extract higher level sequence features from lower level n-gram features to form a hierarchical representation. These features are linguistic function blocks, which could be words, phrases, clauses, sentences, or even a paragraphs. Experiments were conducted on four English and Chinese corpora to evaluate the performance of the stacked residual LSTM model. We first investigate the degradation problems in stacked LSTM when the model is deepened. Next, the proposed model is compared with several previously proposed methods, such as traditional lexicon-, and regression-based methods, and the conventional deep neural network-based models, including CNN, LSTM and RNN. Besides sentiment intensity prediction, this stacked model with residual connections can also be used to build various time series prediction applications, such as short-term electrical load forecasting [35], solar irradiation forecasting [36], QoS estimating of stream service [37], [38] and video sequence recognition [39], [40].

The rest of this paper is organized as follows. Section 2 offers a brief review of related works. Section 3 describes the proposed neural network model and residual architecture. Section 4 summarizes the comparative results of different methods for sentiment intensity prediction. The study's conclusions are presented in Section 5.

Section snippets

Related work

Sentiment intensity of a word, phrase or sentence indicates its strength of positive emotion. In this section, we present a brief review about sentiment intensity prediction of texts, including lexicon-, regression- and conventional neural network-based methods.

Stacked residual LSTM model

This section presents the architecture of the proposed stacked residual LSTM. We first introduce a residual connection into LSTM layers with to form a building block. Next, by stacking LSTM building blocks, we turn the deep stacked LSTM model into its residual version. For outputting continuous affective ratings instead of discrete categories, we adopt a linear decoder in the output layer. The details of the proposed stacked residual LSTM are described in what follows.

Experimental results

In this section, we first investigate the degradation problems in stacked architecture when the model was deepened. Then, we present the experimental results of the proposed stacked residual LSTM for sentiment intensity prediction against existing lexicon-, regression-, and conventional deep NN-based methods. We also investigate the training per.

Conclusions

In this paper, we presented a stacked residual LSTM model to predict sentiment intensities of texts. By introducing residual connections to every few LSTM layers, we constructed an eight-layer stacked LSTM model. The residual connections let the deep stacked model avoid degradation problems. In addition, residual connection does not increase the complexity of the model. Since it does not add any trainable parameters to the stacked model.

In these experiments, we also provide comprehensive

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61702443 and 61762091, and in part by Educational Commission of Yunnan Province of China under Grant No. 2017ZZX030. The authors would like to thank the anonymous reviewers for their constructive comments.

Jin Wang is a lecturer in the School of Information Science and Engineering, Yunnan University, China. He received the Ph.D. degree in computer science and engineering from Yuan Ze University, Taoyuan, Taiwan and in communication and information systems from Yunnan University, Kunming, China. His research interests include natural language processing, text mining, and machine learning.

References (60)

  • P.D. Turney

    Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews

  • WangS. et al.

    Baselines and bigrams: simple, good sentiment and topic classification

  • J.A. Russell

    A circumplex model of affect

    J. Pers. Soc. Psychol.

    (1980)
  • A.B. Warriner et al.

    Norms of valence, arousal, and dominance for 13,915 English lemmas

    Behav. Res. Methods

    (2013)
  • R. Socher et al.

    Recursive deep models for semantic compositionality over a sentiment treebank

  • NguyenT. et al.

    Affective and content analysis of online depression communities

    IEEE Trans. Affect. Comput.

    (2014)
  • M. De Marne et al.

    Was it good? It was provocative. Learning the meaning of scalar adjectives

  • G. Paltoglou et al.

    Seeing stars of valence and arousal in blog posts

    IEEE Trans. Affect. Comput.

    (2013)
  • G. Paltoglou et al.

    Predicting emotional responses to long informal text

    IEEE Trans. Affect. Comput.

    (2013)
  • D. Gökçay et al.

    Predicting the sentiment in sentences based on words: an exploratory study on ANEW and ANET

  • N. Malandrakis et al.

    Distributional semantic models for affective text analysis

    IEEE Trans. Audio Speech Lang. Process.

    (2013)
  • N. Kalchbrenner et al.

    A convolutional neural network for modelling sentences

  • KimY.

    Convolutional neural networks for sentence classification

  • A. Graves

    Supervised Sequence Labelling

    (2012)
  • O. Irsoy et al.

    Opinion mining with deep recurrent neural networks

  • K.S. Tai et al.

    Improved semantic representations from tree-structured long short-term memory networks

  • WangX. et al.

    Predicting polarities of tweets by composing word embeddings with long short-term memory

  • K. Simonyan et al.

    Very deep convolutional networks for large-scale image recognition

  • C. Szegedy et al.

    Rethinking the inception architecture for computer vision

  • C. Szegedy, S. Ioffe, and V. Vanhoucke, “Inception-v4, inception-ResNet and the Impact of Residual Connections on...
  • Cited by (127)

    • Deep neural architecture for breast cancer detection from medical CT image modalities

      2023, Diagnostic Biomedical Signal and Image Processing Applications with Deep Learning Methods
    View all citing articles on Scopus

    Jin Wang is a lecturer in the School of Information Science and Engineering, Yunnan University, China. He received the Ph.D. degree in computer science and engineering from Yuan Ze University, Taoyuan, Taiwan and in communication and information systems from Yunnan University, Kunming, China. His research interests include natural language processing, text mining, and machine learning.

    Bo Peng is a Ph.D. candidate at the School of Information Science and Engineering, Yunnan University, China. His research interests include natural language processing and machine learning.

    Xuejie Zhang is a professor in the School of Information Science and Engineering, and Director of High Performance Computing Center, Yunnan University, China. He received his Ph.D. in Computer Science and Engineering from Chinese University of Hong Kong in 1998. His research interests include high performance computing, cloud computing, and big data analytics.

    View full text