Elsevier

Expert Systems with Applications

Volume 118, 15 March 2019, Pages 65-79
Expert Systems with Applications

Sentiment analysis based on rhetorical structure theory:Learning deep neural networks from discourse trees

https://doi.org/10.1016/j.eswa.2018.10.002Get rights and content

Highlights

  • Improves sentiment analysis with discourse trees from rhetoric structure theory.

  • Extracts salient passages based on the position and relation in the discourse tree.

  • Develops a tensor-based tree-structured neural network.

  • Tensor structure distinguishes hierarchy and relation types.

  • Overfitting is reduced by a tree-based algorithms for data augmentation.

Abstract

Prominent applications of sentiment analysis are countless, covering areas such as marketing, customer service and communication. The conventional bag-of-words approach for measuring sentiment merely counts term frequencies; however, it neglects the position of the terms within the discourse. As a remedy, we develop a discourse-aware method that builds upon the discourse structure of documents. For this purpose, we utilize rhetorical structure theory to label (sub-)clauses according to their hierarchical relationships and then assign polarity scores to individual leaves. To learn from the resulting rhetorical structure, we propose a tensor-based, tree-structured deep neural network (named Discourse-LSTM) in order to process the complete discourse tree. The underlying tensors infer the salient passages of narrative materials. In addition, we suggest two algorithms for data augmentation (node reordering and artificial leaf insertion) that increase our training set and reduce overfitting. Our benchmarks demonstrate the superior performance of our approach. Moreover, our tensor structure reveals the salient text passages and thereby provides explanatory insights.

Introduction

Sentiment analysis reveals personal opinions towards entities such as products, services or events, which can benefit organizations and businesses in improving their marketing, communication, production and procurement. For this purpose, sentiment analysis quantifies the positivity or negativity of subjective information in narrative materials (Chen, Xu, He, Wang, 2017, Feldman, 2013, Kratzwald, Ilic, Kraus, Feuerriegel, Prendinger, 2018, Pang, Lee, 2008). Among the many applications of sentiment analysis are tracking customer opinions (Araque, Corcuera-Platas, Sánchez-Rada, Iglesias, 2017, Bohanec, Kljajić Borštnar, Robnik-Šikonja, 2017, Tanaka, 2010), mining user reviews (Kontopoulos, Berberidis, Dergiades, Bassiliades, 2013, Mostafa, 2013, Ye, Zhang, Law, 2009), trading upon financial news (Khadjeh Nassirtoussi, Aghabozorgi, Ying Wah, Ngo, 2015, Kraus, Feuerriegel, 2017, Weng, Lu, Wang, Megahed, Martinez, 2018), detect social events (Yoo, Song, & Jeong, 2018) and predicting sales (Rui, Liu, Whinston, 2013, Yu, Liu, Huang, An, 2012).

Sentiment analysis traditionally utilizes bag-of-words approaches, which merely count the frequency of words (and tuples thereof) to obtain a mathematical representation of documents in matrix form (Dey, Jenamani, Thakkar, 2018, Guzella, Caminhas, 2009, Manning, Schütze, 1999, Pang, Lee, 2008). As such, these approaches are not capable of taking into consideration semantic relationships between sections and sentences of a document. In naïve bag-of-words models, all clauses are assigned the same level of relevance, which cannot mark certain subordinate clauses more than others for purposes of inferring the sentiment. Conversely, the objective of this paper is to develop a discourse-aware method for sentiment analysis that can recognize differences in salience between individual subordinate clauses, as well as the discriminate the relevance of sentences based on their function (e. g.whether it introduces a new fact or elaborates upon an existing one).

Let us, for instance, consider the two examples in Fig. 1, which express opposite polarities. By simply counting the frequency of positive and negative words, we cannot discriminate between the texts, as both contain the same number of polarity terms. To reliably analyze the sentiment, it is essential to account for the semantic structure and the variable importance across passages. That is, we can identify the main clauses and then infer the correct tone of the examples by looking at them. Similarly, RST trees can locate relevant parts in lengthy texts. For instance, the concluding section of a newspaper article is typically relevant as it reports the opinion of the author.

Our method is based on rhetorical structure theory (RST), which incorporates the discourse structures of natural language. RST structures documents hierarchically (Mann & Thompson, 1988) by splitting the content into (sub-)clauses called elementary discourse units (EDUs). The EDUs are then connected to form a binary discourse tree. Here RST discriminates between a nucleus, which conveys primary, and satellite, which conveys ancillary information. The formalization of nucleus/satellite can be loosely thought of main and subordinate parts of a clause. The edges are further labeled according to the type of discourse – for instance, whether it is an elaboration or an argument. Hence, this method essentially derives the function of a text passage. Both concepts of the RST tree help in localizing essential information within documents. Hence, the goal of this work is to develop a novel approach that identifies salient passages in a document based on their position in the discourse tree and incorporates their importance in the form of weights when computing sentiment scores.

Previous research has demonstrated that discourse-related information can improve the performance of sentiment analysis (see Section 2 for details). The work by Taboada, Voll, and Brooke (2008) is the first to combine rhetorical structure theory and sentiment analysis. In this work, the authors weigh adjectives in a nucleus more heavily than those in a satellite. Beyond that, one can reweigh the importance of passages based on their relation type (Hogenboom, Frasincar, de Jong, & Kaymak, 2015b) or depth (Märkle-Huß, Feuerriegel, & Prendinger, 2017) in the discourse tree. Some methods prune the discourse trees at certain thresholds to yield a tree of fixed depth, e. g.2 or 4 levels (Märkle-Huß et al., 2017). Other approaches train machine learning classifiers based on the relation types as input features (Hogenboom, Frasincar, de Jong, & Kaymak, 2015a). What the previous references have in common is that they try to map the tree structure onto mathematically simpler representations, thereby dropping partial information from the tree.

An alternative strategy is to apply tree-structured neural networks that traverse discourse trees for representation learning. When encountering a node, these networks combine the information from the leaves and pass them on to the next higher level, until reaching the root at which point a prediction is made. Thereby, the approach merely adheres to the tree-structure but does not account for either the relation type or whether it is a nucleus/satellite. To do so, one can extend the network to include different weights for each edge in the tree depending on, e. g., the relation type. This essentially introduces additional degrees of freedom that can weigh the different discourse units by their importance. The work by Fu, Liu, Xu, Yu, and Wang (2016) extends the network by such a mechanism with respect to the nucleus/satellite information but discards the relation type and merely applies the network to individual sentences instead of longer documents. The approach in Ji and Smith (2017) can only exploit the relation type and not the nucleus/satellite information. Furthermore, former approaches are based on traditional recursive neural networks, which are limited by the fact that they can persist information for only a few iterations (Bengio, Simard, & Frasconi, 1994). Therefore, these methods struggle with complex discourses, while we explicitly build upon tree-shaped long short-term memory models, since they are better equipped to handle very deep structures.

We build upon the previous works and advance them by proposing a specific neural network, called Discourse-LSTM. The Discourse-LSTM utilizes multiple tensors to localize salient passages within documents by incorporating the full discourse structure including nucleus/satellite information and relation types. In brief, our approach is as follows: we utilize rhetorical structure theory to represent the semantic structure of a document in the form of a hierarchical discourse tree. We then obtain sentiment scores for each leaf by utilizing both polarity dictionaries and word embeddings. The resulting tree is subsequently traversed by the Discourse-LSTM, thereby aggregating the sentiment scores based on the discourse structure in order to compute a sentiment score for the document. This approach thus weighs the importance of (sub-)clauses based on their position and relation in the discourse tree, which is learned during the training phase. As a consequence, this allows us to enhance sentiment analysis with discourse information. Another key contribution is that we propose two techniques for data augmentation that facilitate training and yield higher predictive accuracy.

The remainder of this paper is structured as follows. Section 2 reviews discourse parsing and RST-based sentiment analysis. Section 3 then introduces our Discourse-LSTM, as well as our algorithms for data augmentation. Section 4 describes our experimental setup in order to evaluate the performance of our deep learning methods in comparison to common baselines (Section 5). Section 6 concludes with a summary and suggestions for future research.

Section snippets

Rhetorical structure theory

Rhetorical structure theory formalizes the discourse in narrative materials by organizing sub-clauses, sentences and paragraphs into a hierarchy (Mann & Thompson, 1988). The premise is that a document is split into elementary discourse units, which constitute the smallest, indivisible segments. These EDUs are then connected by one of 18 different relation types, which represent edges in the discourse tree; see Table 1 for a list. Each relation is further labeled by a hierarchy type, i. e.either

Discourse-based sentiment analysis with deep learning

This section introduces our discourse-based methodology, which infers sentiment scores from textual materials. Fig. 3 illustrates the underlying framework and divides the procedure into steps for discourse parsing, computing low-level polarity features, data augmentation and prediction. The prediction phase implements either of the baselines or our proposed Discourse-LSTM.

Datasets

We build upon earlier work and utilize three common datasets. The first consists of 2000 movie reviews from Rotten Tomatoes (Pang & Lee, 2004), for which we perform 10-fold cross-validation and then average the predictive performance across splits. The second dataset comprises 50000 reviews from the Internet Movie Database (IMDb), which are split evenly into 25000 reviews for training and 25000 for testing (Maas et al., 2011). It includes, at most, 30 reviews for any one movie, since reviews

Results

In this section, we evaluate the performance of our Discourse-LSTM and compare it to the previous baselines. In addition, we perform statistical significance tests on the receiver operating characteristics (ROC) (DeLong, DeLong, & Clarke-Pearson, 1988). The evaluation provides evidence that incorporating semantic structure into the task of sentiment analysis improves the predictive performance.

Conclusion

Deep learning for natural language predominantly builds upon sequential models such as LSTMs. While these models usually achieve a high predictive power when applied to short texts, the complexity of linguistic discourse hampers performance for longer documents. As a remedy, our paper proposes an innovative, discourse-aware approach: we first parse the semantic structure based on rhetorical structure theory, thereby mapping the document onto a discourse tree that encodes its storyline. We then

Declarations of interest:

none

Acknowledgment

The valuable contribution of Ryan Grabowski is greatly appreciated.

References (49)

  • M. Kraus et al.

    Decision support fromfinancial disclosures with deep neural networks and transfer learning

    Decision Support Systems

    (2017)
  • S. Liu et al.

    Discovering sentiment sequence within email data through trajectory representation

    Expert Systems with Applications

    (2018)
  • M.M. Mostafa

    More than words: Social networks’ text mining for consumer brand sentiments

    Expert Systems with Applications

    (2013)
  • H. Rui et al.

    Whose and what chatter matters? the effect of tweets on movie sales

    Decision Support Systems

    (2013)
  • K. Tanaka

    A sales forecasting model for new-released and nonlinear sales trend products

    Expert Systems with Applications

    (2010)
  • B. Weng et al.

    Predicting short-term stock prices using ensemble methods and online data sources

    Expert Systems with Applications

    (2018)
  • Q. Ye et al.

    Sentiment classification of online reviews to travel destinations by supervised machine learning approaches

    Expert Systems with Applications

    (2009)
  • S. Yoo et al.

    Social media contents based sentiment analysis and prediction system

    Expert Systems with Applications

    (2018)
  • O. Abend et al.

    The state of the art in semantic representation

    Proceedings of the 55th annual meeting of the association for computational linguistics (acl ’17)

    (2017)
  • S. Baccianella et al.

    SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining

    Proceedings of the international conference on language resources and evaluation (lrec ’10)

    (2010)
  • Y. Bengio et al.

    Learning long-term dependencies with gradient descent is difficult

    IEEE Transactions on Neural Networks

    (1994)
  • P. Bhatia et al.

    Better document-level sentiment analysis from RST discourse parsing

    Proceedings of the conference on empirical methods in natural language processing (emnlp ’15)

    (2015)
  • E.R. DeLong et al.

    Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach

    Biometrics

    (1988)
  • R. Feldman

    Techniques and applications for sentiment analysis

    Communications of the ACM

    (2013)
  • Cited by (50)

    • A comprehensive decision support system for stock investment decisions

      2022, Expert Systems with Applications
      Citation Excerpt :

      Nowadays, such an approach is scarcely implemented in practice (Kolm et al., 2014; Solares et al., 2019). Some improvements of this approach consider different types of return’s probability distributions (Dokov et al., 2017; Mills, 1995; Peiró, 1999), alternative risk measures (Čorkalo, 2011; Greco et al., 2013; Lintner, 1965; Phu Nguyen & Luu Duc Huynh, 2019; Solares et al., 2019), decisions involving multiple periods of time (Basu & Nair, 2014), realistic considerations such as transactions costs and specific constraints (Khedmati & Azin, 2020; Ruf & Xie, 2019; Yang et al., 2018), news incorporation (Feuerriegel & Prendinger, 2016; Hagenau et al., 2013; Kraus & Feuerriegel, 2019), the search for robustness (Jorion, 1991; Kara et al., 2019; Quaranta & Zaffaroni, 2008; Scherer, 2007; Xidonas et al., 2020), and the attitude of the investor when facing risk (Fernandez et al., 2019, 2020; Solares et al., 2019). The proposed system uses a novel way to comprehensive deal with stock investments.

    • Detecting fake reviews through topic modelling

      2022, Journal of Business Research
      Citation Excerpt :

      The theory has been used in news in order to identify contents such as description of discourse, detection of fake news ( Della Vedova et al., 2018; Shu et al., 2017. Some of these studies used machine learning as a research technic (Han & Metha, 2019; Kraus & Feuerriegel, 2019; Prasanna, 2019, Rubin & Lukoianova, 2014; Rădescu, 2020; Rubin et al., 2015). Rhetorical Structure Theory has been also used for understanding deception in customer complaints (Pisarevskaya et al, 2019), detection of fake online reviews (Popoola, 2017)

    • MBiLSTMGloVe: Embedding GloVe knowledge into the corpus using multi-layer BiLSTM deep learning model for social media sentiment analysis

      2022, Expert Systems with Applications
      Citation Excerpt :

      Moreover, in DL techniques, each word is considered a feature of a sentence. As of retrospective, several investigations proposed wakeless DL-based investigations, which have different components and actions and have several features and demonstrations (Dmytro et al., 2021, Kiran et al., 2020, Kraus & Feuerriegel, 2019, Sakirin et al., 2021). Researchers have shown a processing wonder in the research strategies of this enormous mass of information to give the appearance of communicating on a decipherable subject.

    • Multi-source information fusion and deep-learning-based characteristics measurement for exploring the effects of peer engagement on stock price synchronicity

      2021, Information Fusion
      Citation Excerpt :

      The deep learning model can achieve state-of-the-art performance on sentiment analysis tasks because of its superior network architectures that enable us to learn representations of data with multiple levels of abstraction [54]. We thus prefer to employ deep learning techniques for sentiment analysis on posts, as many other researchers do [55–58]. We trained four classical deep learning models — convolutional neural network (CNN), long short-term memory (LSTM), bi-directional long short-term memory (BiLSTM), and gate recurrent unit (GRU).

    View all citing articles on Scopus
    View full text