Using a stacked residual LSTM model for sentiment intensity prediction
Introduction
Online social networking services (SNSs), such as Twitter, Facebook and Weibo, enable users to share their thoughts, opinions, and emotions with others through texts which are informal and strongly subjective. Analysis of user-generated information is very useful for understanding how sentiments spread from person to person on the Internet. Sentiment analysis techniques [1], [2], [3], [4], [5] provide a way to handle such affective information automatically. As an active research field in computational linguistic and affective computing [6], sentiment analysis can analyze, process, induce and deduce such subjective texts with affective information.
Most existing methods of sentiment analysis focus on the polarity classification approach, which classifies the target texts into several categories, e.g., positive or negative. Such methods mostly use classification models. The methods first extract features such as n-gram, bag-of-words (BOW) or part-of-speech (POS) [7], [8]. Next, support vector machine (SVM), naïve Bayes, maximum entropy, logistic regression or random forest methods are applied on these features to classify texts into either positive or negative classes [9].
Alternatively, sentiment intensity prediction could be another choice for sentiment analysis. More specifically, sentiment intensity of a word, phrase or text indicates the strength of its association with positive sentiment, also known as valence values [10], [11] or affective ratings. It is a score between 0 and 1, where the score of 0 or 1 respectively indicates the least or maximum association with positive sentiment (i.e., negative or positive). In contrast to the traditional approach, the intensity prediction is usually considered to be a regression, rather than a classification, since the sentiment intensity is defined as a continuous real-value. The following three movie reviews were rated in both polarity and sentiment intensity, as taken from the Stanford Sentiment Treebank1 (SST) [12] corpus:
(Text 1 negative, senti = 0.375) The movie is genial but never inspired, and little about it will stay with you.
(Text 2 negative, senti = 0.194) However, the movie does not really deliver for country music fans or for family audiences.
(Text 3 negative, senti = 0.083) Bears are even worse than I imagined a movie ever could be.
All three reviews about the movie were classified to be negative. However, the 3rd review, which was rated a lower intensity (more negative) than other two texts, requires a higher priority to draw attention. In addition, the intensity prediction approach can provide more intelligent and fine-grained sentiment applications, such as hotspot detection and forecasting [13], mental illness identification [14], financial news analysis [15], question answering [16], and blog post analysis [17].
Few studies have sought to predict continuous affective ratings of texts using lexicon- and regression-based methods. The lexicon-based methods are based on the underlying assumption for most algorithms that the intensity of a given text can be estimated via the composition of the intensity of its constituent words [18]. Another approach uses regression-based methods [17], [19], [20]. These methods sought to learn the correlations between sentiment intensities and linguistic features of words, e.g., BoW and POS. However, the prediction performance of such methods is still low.
Recently, several classification methods have been implemented to explore the use of deep neural networks and word embedding, such as convolutional neural networks (CNN) [21], [22], recurrent neural networks (RNN) [23], [24] and long-short term memory (LSTM) [25], [26]. CNN [21], [22] is able to extract active local n-gram features. Conversely, LSTM [25], [26] can sequentially model the texts. This model focuses on past information and draws conclusions from the entire text. In addition, such NN methods are used as classifiers to distinguish whether the given text is positive or negative. These models have not been thoroughly investigated for sentiment intensity prediction.
In image recognition, several recently proposed models, such as VGG [27], InceptionNet [28], [29] and ResNet [30], have all exploited very “deep” architecture. The success of these models reveals that increasing the depth of a neural network can help improve the performance of learning models as deeper networks learn better representations of features [31]. For language modeling tasks, a feasible way of applying deep architecture is to use a stacked CNN [32] or LSTM [33], [34] model. The question is becoming clear: Is training better the prediction model as simple as stacking more layers? In fact, deeper networks are easily impacted by the degradation problem: when the network has more layers, prediction accuracy will becoming saturated and never increase again. Since the errors between layers will be accumulated, and gradients will be vanished. To explain this phenomenon, we trained a conventional 2-layer LSTM and a stacked 8-layer plain LSTM on SST for sentiment intensity prediction. Fig. 1 shows the mean squared error and Pearson correlation coefficient of these two models on training and testing sets. Unexpectedly, the deeper network has higher training and test error. Similar results on other corpora are shown in Fig. 4. This result indicates that the more stacked layers there are, the harder it is to optimize.
In this paper, we propose a stacked residual LSTM model to predict the sentiment intensity of a given text. To tackle the degradation problem, we introduce the residual connection to every few LSTM layers inspired by ResNet [30]. The residual connection can center layer gradients and propagated errors. Thus, it makes the network easier to optimize. This approach enables us to stack more layers of LSTM successfully for NLP task. As similar as stacked deep convolution networks extract different level features from pixels to shapes and contours in previous image processing task [28], [29], [30], the proposed stacked LSTM model can extract higher level sequence features from lower level n-gram features to form a hierarchical representation. These features are linguistic function blocks, which could be words, phrases, clauses, sentences, or even a paragraphs. Experiments were conducted on four English and Chinese corpora to evaluate the performance of the stacked residual LSTM model. We first investigate the degradation problems in stacked LSTM when the model is deepened. Next, the proposed model is compared with several previously proposed methods, such as traditional lexicon-, and regression-based methods, and the conventional deep neural network-based models, including CNN, LSTM and RNN. Besides sentiment intensity prediction, this stacked model with residual connections can also be used to build various time series prediction applications, such as short-term electrical load forecasting [35], solar irradiation forecasting [36], QoS estimating of stream service [37], [38] and video sequence recognition [39], [40].
The rest of this paper is organized as follows. Section 2 offers a brief review of related works. Section 3 describes the proposed neural network model and residual architecture. Section 4 summarizes the comparative results of different methods for sentiment intensity prediction. The study's conclusions are presented in Section 5.
Section snippets
Related work
Sentiment intensity of a word, phrase or sentence indicates its strength of positive emotion. In this section, we present a brief review about sentiment intensity prediction of texts, including lexicon-, regression- and conventional neural network-based methods.
Stacked residual LSTM model
This section presents the architecture of the proposed stacked residual LSTM. We first introduce a residual connection into LSTM layers with to form a building block. Next, by stacking LSTM building blocks, we turn the deep stacked LSTM model into its residual version. For outputting continuous affective ratings instead of discrete categories, we adopt a linear decoder in the output layer. The details of the proposed stacked residual LSTM are described in what follows.
Experimental results
In this section, we first investigate the degradation problems in stacked architecture when the model was deepened. Then, we present the experimental results of the proposed stacked residual LSTM for sentiment intensity prediction against existing lexicon-, regression-, and conventional deep NN-based methods. We also investigate the training per.
Conclusions
In this paper, we presented a stacked residual LSTM model to predict sentiment intensities of texts. By introducing residual connections to every few LSTM layers, we constructed an eight-layer stacked LSTM model. The residual connections let the deep stacked model avoid degradation problems. In addition, residual connection does not increase the complexity of the model. Since it does not add any trainable parameters to the stacked model.
In these experiments, we also provide comprehensive
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant Nos. 61702443 and 61762091, and in part by Educational Commission of Yunnan Province of China under Grant No. 2017ZZX030. The authors would like to thank the anonymous reviewers for their constructive comments.
Jin Wang is a lecturer in the School of Information Science and Engineering, Yunnan University, China. He received the Ph.D. degree in computer science and engineering from Yuan Ze University, Taoyuan, Taiwan and in communication and information systems from Yunnan University, Kunming, China. His research interests include natural language processing, text mining, and machine learning.
References (60)
- et al.
Using text mining and sentiment analysis for online forums hotspot detection and forecast
Decis. Support Syst.
(2010) - et al.
Knowledge-based systems using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news
Knowl.-Based Syst
(2013) - et al.
Hourly day-ahead solar irradiance prediction using weather forecasts by LSTM
Energy
(2018) - et al.
Opinion mining and sentiment analysis
Found. Trends® Inf. Retr.
(2006) Sentiment analysis and opinion mining
Synth. Lect. Hum. Lang. Technol.
(2012)Techniques and applications for sentiment analysis
Commun. ACM
(2013)Sentiment analysis: detecting valence, emotions, and other affectual states from text
(2015)- et al.
Affect detection: an interdisciplinary review of models, methods, and their applications
IEEE Trans. Affect. Comput.
(2015) Affective Computing
(1995)- et al.
Thumbs up ? Sentiment classification using machine learning techniques
Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews
Baselines and bigrams: simple, good sentiment and topic classification
A circumplex model of affect
J. Pers. Soc. Psychol.
Norms of valence, arousal, and dominance for 13,915 English lemmas
Behav. Res. Methods
Recursive deep models for semantic compositionality over a sentiment treebank
Affective and content analysis of online depression communities
IEEE Trans. Affect. Comput.
Was it good? It was provocative. Learning the meaning of scalar adjectives
Seeing stars of valence and arousal in blog posts
IEEE Trans. Affect. Comput.
Predicting emotional responses to long informal text
IEEE Trans. Affect. Comput.
Predicting the sentiment in sentences based on words: an exploratory study on ANEW and ANET
Distributional semantic models for affective text analysis
IEEE Trans. Audio Speech Lang. Process.
A convolutional neural network for modelling sentences
Convolutional neural networks for sentence classification
Supervised Sequence Labelling
Opinion mining with deep recurrent neural networks
Improved semantic representations from tree-structured long short-term memory networks
Predicting polarities of tweets by composing word embeddings with long short-term memory
Very deep convolutional networks for large-scale image recognition
Rethinking the inception architecture for computer vision
Cited by (127)
Could the Russia-Ukraine war stir up the persistent memory of interconnectivity among Islamic equity markets, energy commodities, and environmental factors?
2024, Research in International Business and FinanceTemporal feature aggregation with attention for insider threat detection from activity logs
2023, Expert Systems with ApplicationsModelling a deep network using CNN and RNN for accident classification
2023, Measurement: SensorsDeep neural architecture for breast cancer detection from medical CT image modalities
2023, Diagnostic Biomedical Signal and Image Processing Applications with Deep Learning Methods
Jin Wang is a lecturer in the School of Information Science and Engineering, Yunnan University, China. He received the Ph.D. degree in computer science and engineering from Yuan Ze University, Taoyuan, Taiwan and in communication and information systems from Yunnan University, Kunming, China. His research interests include natural language processing, text mining, and machine learning.
Bo Peng is a Ph.D. candidate at the School of Information Science and Engineering, Yunnan University, China. His research interests include natural language processing and machine learning.
Xuejie Zhang is a professor in the School of Information Science and Engineering, and Director of High Performance Computing Center, Yunnan University, China. He received his Ph.D. in Computer Science and Engineering from Chinese University of Hong Kong in 1998. His research interests include high performance computing, cloud computing, and big data analytics.