Next Article in Journal
Resignification Practices of Youth in Zona da Mata, Brazil in the Transition Toward Agroecology
Previous Article in Journal
An Effective National Evaluation System of Schools for Sustainable Development: A Comparative European Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques

Department of Management Information Systems, BERI, Gyeongsang National University, 501 Jinjudae-ro Jinju-si, Gyeongsangnam-do 52828, Korea
Sustainability 2019, 11(1), 196; https://doi.org/10.3390/su11010196
Submission received: 11 November 2018 / Revised: 23 December 2018 / Accepted: 25 December 2018 / Published: 2 January 2019
(This article belongs to the Section Economic and Business Aspects of Sustainability)

Abstract

:
In the digital age, the abundant unstructured data on the Internet, particularly online news articles, provide opportunities for identifying social problems and understanding social systems for sustainability. However, the previous works have not paid attention to the social-problem-specific perspectives of such big data, and it is currently unclear how information technologies can use the big data to identify and manage the ongoing social problems. In this context, this paper introduces and focuses on social-problem-specific key noun terms, namely SocialTERMs, which can be used not only to search the Internet for social-problem-related data, but also to monitor the ongoing and future events of social problems. Moreover, to alleviate time-consuming human efforts in identifying the SocialTERMs, this paper designs and examines the SocialTERM-Extractor, which is an automatic approach for identifying the key noun terms of social-problem-related topics, namely SPRTs, in a large number of online news articles and predicting the SocialTERMs among the identified key noun terms. This paper has its novelty as the first trial to identify and predict the SocialTERMs from a large number of online news articles, and it contributes to literature by proposing three types of text-mining-based features, namely temporal weight, sentiment, and complex network structural features, and by comparing the performances of such features with various machine learning techniques including deep learning. Particularly, when applied to a large number of online news articles that had been published in South Korea over a 12-month period and mostly written in Korean, the experimental results showed that Boosting Decision Tree gave the best performances with the full feature sets. They showed that the SocialTERMs can be predicted with high performances by the proposed SocialTERM-Extractor. Eventually, this paper can be beneficial for individuals or organizations who want to explore and use social-problem-related data in a systematical manner for understanding and managing social problems even though they are unfamiliar with ongoing social problems.

1. Introduction

1.1. Social Problems and Challenging Issues for Identifying Ongoing Social Problems

Facing social problems as the challenges of the well-being and sustainability, e.g., high suicide rate and air pollution with fine dust in South Korea, research and development (R&D) projects have been promoted in addition to political and administrative measures not only to solve social problems, and but also to improve the quality of people’s lives by solving social problems. Subsequently, technologies for solving social problems have attracted increased attention and driven the expansion of social entrepreneurship around the world. Thus, both the public and private sectors have started to place more emphasis than before on the technologies that are related to social problems [1,2,3]. Moreover, technological knowledge shares, e.g., patent documents and open research articles, have been more easily accessible currently than ever before, and this provides opportunities for solving social problems based on technologies rather than government policies. Therefore, it is necessary to facilitate the exploration of technological knowledge shares for solving social problems, and we should focus on identifying the ongoing social problems because it is a starting point for linking social problems to state-of-the-art technologies as solutions.
At the same time, with the emergence of Web 2.0 and social media, the amount of unstructured, textual data on the Internet has grown tremendously, especially at the micro level, which involves human behaviors, such as tweets of an individual person, the textual expression in blog posts over a period time, and furthermore the sensed activities by digital sensors in the so-called Internet of Things [4]. This abundance of publicly available textual data creates new opportunities for both qualitative and quantitative researchers in various areas that are related to data and information sciences. In particular, the big data that are available on the Internet provide opportunities for identifying social problems. A large amount of event-related textual data, e.g., online news articles and web forum posts, contain data and information that are useful for taking an overview of ongoing social problems, e.g., the types of ongoing social problems and their progress. In this respect, several information technologies can be applied, from information retrieval (IR) to text mining.
However, the previous works have not paid attention to social-problem-specific perspectives of big data, so it is currently unclear how information technologies can be used to identify and manage the ongoing social problems from big data. In detail, the following challenging issues need to be resolved:
First, various social problems are in process simultaneously, and they occur in multiple streams of events. Therefore, it is a nontrivial task to determine the landscape of ongoing social problems from a large amount of the event-related textual data, e.g., online news articles and web forum posts. Particularly when individual persons are unfamiliar with the ongoing social problems, it is difficult for them to identify the ongoing social problems from such big data [5]. Therefore, it is necessary to identify the ongoing social problems and their key terms, which can represent the ongoing social problems, in an automatic way.
Second, most people use nouns as key terms to get data and information about ongoing social problems because they are relevant to topics, whereas the other types of key terms such as verbs, adjectives, and adverbs are more relevant to sentiments than topics [6,7,8]. In addition, for the same reason, it is harder for people to come up with and figure out key noun terms that are related to social problems than the other types of key terms. It means key noun terms are crucial for successfully getting data and information about the ongoing social problems. Hence, we need to focus on key noun terms for finding out social-problem-related data and information.
Third, some of the key noun terms can be more useful for figuring out the topics of ongoing social problems from big data because they play roles in categorizing those social-problem-related topics (SPRTs) into their corresponding social problems. For example, let us assume that two SPRTs were detected, and each SPRT was represented by five key noun terms, namely topic1 = {Suwon, Suicide, Student, Female, Police} and topic2 = {Daegu, Suicide, Student, Male, Violence}. Then, the key noun terms, such as Suicide, Student, Female/Male, and Violence, can be social-problem-specific key noun terms (hereinafter, SocialTERMs). Particularly, Suicide indicates that the two SPRTs can be grouped into the same type of social problems. In contrast, the city names, namely Suwon and Daegu, are event-specific key noun terms (hereinafter, EventTERMs), which specify that the two SPRTs occurred at different locations. Thus, the different roles of such key noun terms in labeling the identified SPRTs need to be considered.
Lastly, there have been no previous works on identifying the SocialTERMs from a large number of online news articles in an automatic way. Consequently, manual annotation after reading a large amount of textual data is unavoidable for now in extracting the SocialTERMs for the detected SPRTs. However, it requires significant human efforts, and it is labor-intensive, expensive, time-consuming, and often error-prone. In addition, to reflect the importance of key noun terms in the textual data at different levels, e.g., the document level and the detected topic level, several weighting schemes have been used in previous works, e.g., tf, idf, and tfidf [9]. However, it is unknown whether those weighting schemes can reflect the different roles of key noun terms in representing social problems over time.

1.2. Key Term Identification in the Previous Text Mining Applications

With the emergence of Web 2.0 and social media, the amount of unstructured data, most of which are textual and publicly available on the Internet, has increased massively, especially the amount of data on individual entities such as persons and companies. This big data creates new opportunities for both qualitative and quantitative researchers of data and information sciences. Thus, big data is essential not only for scientific research on social systems but also for businesses and individuals [10], and it is important to develop a method that helps people obtain the relevant data quickly and accurately from big data and analyze it. To resolve it, text mining has been employed, with a focus on analyzing the statistical properties of terms [11]. In particular, key terms are essentials for exploring the overall data set, and text mining uses them to inspect and process the obtained texts through typical preparation steps.
Table 1 shows recent studies (2014–2018) in which key terms were extracted and used for text mining applications. The previous works in Table 1 can be grouped according to the final application of their identified key terms: indexing, clustering, summarization, classification (or categorization), or mapping [12,13]. Indexing, in which textual data are represented with a set of extracted key terms, is a research goal on its own, and is also a step in most text mining applications, such as feature generation and text representation [14]. Clustering is to group textual data on the basis of their attributes to identify important themes, patterns, or trends [15], and it is employed for topic detection (TD) [16]. Summarization focuses on creating a summary that contains the most important points of the original documents [17]. Classification assigns textual data to two or more categories [18,19]. Mapping focuses on information visualization and supports effective and efficient searches of important subjects or topic areas, which are identified from textual data [15,20].
In addition, Table 1 provides three taxonomies for the key term identification of the previous works. First, the key terms can be discovered to best describe the textual data at different levels: the sentence level [21,22,23], the document level [24], and the topic level [6]. Second, the key term identifications used in the previous text mining applications can be divided into three categories: manual, automatic, and hybrid approaches. Third, particularly for the automatic approach, four types of techniques have been used: statistical, linguistic, machine-learning-based, and hybrid approaches [12,24,25]. While the statistical approaches do not require any learning mechanism and use statistical information of terms, e.g., tf, idf, and tfidf [26,27], the linguistic approaches use linguistic features of terms, e.g., parsing, sentiment analysis, and semantics [28,29]. Machine-learning-based approaches use key terms that are extracted from the collected textual data by means of a training process and apply them to a machine learning model to find key terms in new textual data [12,28]. The hybrid approaches combine one or more techniques [17,30].
According to Table 1, most prior text mining applications are either automatic approaches or hybrid approaches, and they have adopted either statistical techniques or hybrid techniques for extracting their key terms. Theoretically, under these taxonomies, this study can be categorized into classifying the topic-level key noun terms from online news articles into the SocialTERMs and the EventTERMs by adopting a hybrid technique, as highlighted in Table 1. Consequently, according to Table 1, the key theoretical contributions of this paper can be summarized as follows:
First, to the best of our knowledge, based on Table 1, there exists a research gap that no previous work has dealt with: the automatic classification of key noun terms that were identified by TD into the SocialTERMs and the EventTERMs. This paper contributes to addressing this research gap.
Second, to label the identified topics, most of the previous works in Table 1 used simple statistical approaches for characterizing key terms from clustered documents. On the other hand, this paper proposes and employs temporal weight features, sentiment features, and complex network structural features to represent key noun terms, which can be identified to label the detected SPRTs, after reviewing the features that were used in the previous works of Table 1.
Third, according to Table 1, no previous study has compared the performances of the state-of-the-art classification techniques, particularly deep learning, for distinguishing between the SocialTERMs and the EventTERMs among the key terms of the detected SPRTs. This paper extends the related literature by taking on such a challenging issue.

1.3. Purpose and Organization of This Paper

To resolve the abovementioned challenging issues, this paper proposes an automatic approach, namely SocialTERM-Extractor, which identifies the SPRTs from a large number of Korean online news articles and classifies the key noun terms of the detected SPRTs into the SocialTERMs and the EventTERMs.
To design and examine the proposed approach, three research questions can be formulated as below, and a research framework is constructed to answer those research questions:
  • RQ1. How well do the three types of features, namely temporal weight, sentiment, and complex network structural features, perform in distinguishing the SocialTERMs and the EventTERMs among the key noun terms of the detected SPRTs from a large number of online news articles by using different classification techniques? Moreover, which feature set and features give the best results?
  • RQ2. Which classification technique among the five base learners, namely Decision Tree (DT), Naïve Bayes (NB), Radial Basis Function Network (RBFN), Support Vector Machine (SVM), and Deep Belief Network (DBN), is best suited for differentiating the key noun terms of the detected SPRTs into the SocialTERMs and the EventTERMs?
  • RQ3. Which ensemble learning method gives the best results? Is there a single ensemble method that achieves the best performances for the all feature sets with any given base learner?
The rest of the paper is organized as follows: Section 2 outlines the proposed a research framework to design and examine the SocialTERM-Extractor, and explains it in detail. Section 3 presents the results of applying the suggested research framework to the online news articles, which are collected from the best-known Korean news portal site in South Korea. Section 4 discusses the application results in terms of designing an automatic system. Finally, Section 5 presents the conclusions of this paper with reflections on limitations and future works.

2. Materials and Methods

To answer the research questions in the previous section, a research framework is proposed as summarized in Figure 1. First, online news articles reporting social-problem-related events are collected from a test-bed news portal site. Second, sentiment analysis selects online news articles with negative sentiment from the collected data. Then, from the online news articles having negative sentiment, the SPRTs are detected, and labelled by their key noun terms. Third, the three types of features regarding the key noun terms of the detected SPRTs are measured: temporal weight, sentiment, and complex network structural features. Fourth, the configurations of different feature sets and different classification techniques are evaluated respect to three performance measures, namely accuracy, F-measure, and area under accuracy (AUC). Then, the comparative studies are performed for different feature sets and different classification techniques. Following subsections explain the steps of the proposed research framework in detail.

2.1. Collect Data

In the first component of Figure 1, news sections related to society are targeted for data collection. Then, the data collection is performed mainly by two steps, crawling and parsing:
First, a distributed web-crawling program is developed to collect the online news articles from the Internet in a significantly reduced timespan. In detail, the distributed web-crawling program is based on the simple remote procedure call (SRPC) framework, in which two tasks from a master computer are delivered to slave computers with various hardware configurations, i.e., a uniform resource identifier (URI) to crawl and how to crawl the given URI. Consequently, a large number of online news articles, published in the chosen society-related news sections of a test-bed news portal service, are collected as raw HTML pages.
Second, the textual data in <title>…</title> and <content>…</content> of the collected online news articles are parsed out from the raw HTML pages, and are stored in a relational database. In addition, the publication date of each online news article is stored in the database for the TD. This results in NEWS0,t = {news | online news articles, published at time t and collected from the chosen society-related news sections of the test-bed news portal service} for time t = 1, …, T.

2.2. Detect Social-Problem-Related Topics (SPRTs)

2.2.1. Select Online News Articles with Negative Sentiment

In this step, online news articles with negative sentiment are selected to focus on online news articles more related to social problems. Considering the wide applicability to different languages, the sentiment of an online news article is obtained based on multilingual sentiment feature set, made by two main parts proposed in Dang et al. [43]: the extraction of English sentiment feature set from SentiWordNet (http://sentiwordnet.isti.cnr.it/), and the construction of multilingual sentiment feature set.
To explain, for the multilingual sentiment features in SentiWordNet, the average polarity score is calculated by using the prior–polarity formula, defined as
s c o r e ( s e n t i , p o s , p o l ) = s y n s e t S Y N S E T ( s e n t i , p o s , p o l ) s w n s c o r e ( s y n s e t , p o s , p o l ) n ( p o l S Y S N S E T ( s e n t i , p o s , p o l ) ) ,
where senti is a sentiment feature in SentiWordNet, pos is a sentiment related part-of-speech (POS) sense, pos ∈ {verb, adverb, adjective}, pol is a type of polarity scores, pol ∈ {objective, positive, negative}, SYNSET(senti, pos, pol) is a set of synsets, i.e., synonyms, belonging to senti when pos and pol are given, and swnscore(synset, pos, pol) is the SentiWordNet score of synset with given pos and pol.
From score(senti, pos, pol), the final sentiment score is determined by the sentiment feature–calculation strategy. In the strategy, the sentiment features satisfying both score(senti, pos, pol = objective) < 0.5 and |score(senti, pos, pol = negative)| ≠ |score(senti, pos, pol = positive)| are taken into account, and the final negative sentiment score of a multilingual sentiment feature is calculated as
f i n a l n e g s c o r e ( s e n t i , p o s ) = { 0   if   | s c o r e ( s e n t i , p o s , p o l = negative ) | < | s c o r e ( s e n t i , p o s , p o l = positive ) | , | s c o r e ( s e n t i , p o s , p o l = negative ) |   otherwise .
Then, using the constructed multilingual sentiment feature set, the sentiment score of an online news article, news, is obtained by
n e w s n e g s c o r e ( n e w s ) = 1 3 p o s s e n t i N E W S S E N T I ( n e w s ) f i n a l n e g s c o r e ( s e n t i , p o s ) n ( N E W S S E N T I ( n e w s ) ) ,
where NEWSSENTI(news) is a set of the multilingual sentiment features appearing in newsNEWS0,t.
In particular, this study uses the multilingual sentiment feature set of English and Korean, constructed by Suh [8], because of the following reasons: it is based on the commonly used approach for measuring multilingual sentiment, proposed by Dang, Zhang and Chen [43]; it enables researchers of the other lingual cultures to make use of this study’s research framework; Korean, selected for this study, is taken into account as the non–English for the multilingual sentiment features; the sentiments of synonyms for a English sentiment feature are considered for the corresponding Korean sentiment feature; and the additional Korean sentiment features are generated and included to consider the negation.
To explain briefly, the Korean sentiment feature, constructed by Suh [8], inherits the final sentiment score and POS sense of its corresponding English sentiment feature, generated by Dang, Zhang and Chen [43]. For instance, as shown in Table A1 of Appendix A, the sentiment value of ‘더럽히/pvg+다/ef’ is −0.7500 for pos = verb, and comes from the corresponding English sentiment feature, ‘soil’. If the English sentiment feature has synonyms, the final sentiment scores of the synonyms are averaged for the corresponding Korean sentiment feature. For example, the sentiment value of ‘즉시/mag’ is the average of sentiment values from five English sentiment features: ‘instantly’, ‘straight_away’, ‘right_away’, ‘at_once’, and ‘swiftly’. Moreover, if the morphological analysis splits the Korean sentiment feature into a stem and ending(s), the extended Korean sentiment features are generated by adding various endings to the stem in possible POS senses and tenses. For instance, the stem of ‘더럽히/pvg+다/ef’ is ‘더럽히/pvg’, and the extended Korean sentiment features of ‘더럽히/pvg+다/ef’ are listed in Table A2 of Appendix A. They inherit sentiment values from the original Korean sentiment feature, ‘더럽히/pvg+다/ef’, and, when negation is added to their endings, −1 is multiplied to their sentiment values.
As a consequence, to select online news articles more concerned with social problems, the online news articles with newsnegscore(news) > 0 are chosen for the TD of the next step. This leads to NEWS1,t = {news | online news articles, published at time t, and turned out to have negative sentiment’s online news articles from NEWS0,t} for time t = 1, …, T.

2.2.2. Detect the SPRTs from the Collected and Negative Online News Articles

An event is defined as a real-world incident that is related to time(s) and location(s), e.g., 9/11 attacks of 2001, Hurricane Catalina of 2004, and North Korea’s nuclear weapon test [44]. Due to the rapid growth and popularity of the Web, when an event occurs, a large number of event-related textual data are published online [45]. Generally, online news articles are starting points, and Web 2.0 has recently led to the tremendous distribution of online news articles through individuals on social media [46,47]. As a result, managing, interpreting, and analyzing such a huge volume of online news articles that are related to events has been a difficult task. To address this, many online news articles that are related to a set of events and interconnected with one another need to be grouped into the same topic [16]. Then, such topics and their changes can be identified over time by using TD methods [48]. Formally, a topic is a seminal event that is associated with all related events, that is, a set of related events [49].
Therefore, to detect the topics of this study’s interest, i.e., SPRTs, this step clusters online news articles, which are collected and evaluated to have negative sentiment. First, noun terms are identified from the online news articles through a series of natural language processing (NLP) techniques, i.e., spacing, part-of-speech (POS) tagging, regular expressions-based noun extraction, and stop words removal. In this way, only noun terms are used for the TD because of the following reasons: first, the target key terms of this study, i.e., SocialTERMs and EventTERMs, are noun terms according to the introduction of this paper; second, the other types of key terms, i.e., verbs, adjectives, and adverbs, are more relevant to sentiments rather than topics [6,7,8]. Next, let news be an online news article in NEWS1,t, and noun be a noun term in NEWSNOUN0(news) = {noun | all noun terms in news}. Then, the weight score of noun in newsNEWS1,t is obtained by
w ( noun ,   news )   =   tf ( noun ,   news )   ×   idf t ( noun )   ×   ths ( noun ,   news ) .
Here, tf(noun, news) is the normalized frequency of noun appearing in news, and it is defined as
t f ( n o u n , n e w s ) = f ( n o u n , n e w s ) max n o u n N O U N ( n e w s ) ( f ( n o u n , n e w s ) ) ,
where f(noun) is the frequency of noun in <content>…</content> of news. In addition, idft(noun) is the inverse document frequency of noun, defined as
i d f t ( n o u n ) = log ( H t h t ( n o u n ) ) ,
where ht(noun) is the number of online news articles containing noun among online news articles in NEWS1,t, and Ht is the number of online news articles in NEWS1,t. On the other hand, ths(noun, news) is the existence of noun in <title>…</title> of news, given by
t h s ( n o u n , n e w s ) = { 1   if   n o u n   appears   in   < title > < / title >   of   n e w s , 0.5   otherwise .
Using the obtained w(noun, news) values, five noun terms with the highest weights are selected as key noun terms for news. This results in NEWSNOUN1(news) = {noun | five key noun terms for news}, and news is represented by the vector of its five key noun terms due to its simplicity, compared to the other textual representations models, e.g., the graph-based model, and the fuzzy set model [50,51]. Here, one may argue about how to decide the number of key noun terms for an online news article, but this study refers to the number of key noun terms used to represent an online news article in the previous works, i.e., three to five keywords [6,8,52,53]. Therefore, in this study, the number of key noun terms for an online news article is set to five as default.
Then, Algorithm 1 is adopted to cluster online news articles in NEWS1,t for t = 1, …, T. The Algorithm 1 is a modified version from the algorithm that was used in He, Chang, Lim and Banerjee [16] and Suh [8], which has been widely used and known effective for TD because it overcomes the following drawbacks. While previous TD models are broadly classified into two types, i.e., non–probabilistic and probabilistic [16], non-probabilistic models do not provide the number of topic clusters, and existing probabilistic models, especially latent Dirichlet allocation (LDA), seem to be overly complex for the TD problems.
Consequently, the Algorithm 1 extracts the topics of similar online news articles from NEWS1,t for t = 1, …, T. Over the iterations in executing Algorithm 1, the centroid of each topic keeps less than α key noun terms while excluding less important key noun terms. In consequence, Algorithm 1 results in TOPIC(topic) = {topic | SPRTs detected from NEWS1,t for t = 1, …, T}, TOPICNEWS(topic) = {news | online news articles, classified to topicTOPIC}, and TOPICNOUN(topic) = {noun | key noun terms in the centroid of topicTOPIC}.
Algorithm 1 Detecting the SPRTs from online news articles in NEWS1,t (t = 1, …, T).
Input:Online news articles in NEWS1,t and their noun score vectors, and threshold ε
Output:TOPIC, TOPICNEWS(topic), and TOPICNOUN(topic)
1:for time t = 1 (i.e., the first publication date among online news articles of NEWS1,t) to t = T (i.e., the last publication date among online news articles of NEWS1,t) do
2: select online news articles in NEWS1,t;
3: ift = 1 and n(TOPIC) = 0 then
4: create a topic, set the online news article as a centroid of the new topic, and announce it;
5: else
6: for each online news article of NEWS1,t do
7: compute the cosine similarity of an online news article with the centroid of each topic in TOPIC, defined as
s i m ( v c e n t r o i d , v n e w s ) = v c e n t r o i d v n e w s v c e n t r o i d v n e w s
where v i is the weight vector, v i v j is the dot product of two weight vectors, and v i is the magnitude of v i ;
8: if cosine similarity > threshold ε then
9: assign the new article to the nearest topic and update the centroid of the nearest topic with the new article by averaging their weight vectors;
10: else
11: create a new topic, assign the online news article to the new topic, and announce it;
12: end if
13: if the number of noun terms in the updated centroid > α then
14: keep only the top α noun terms with the highest weight scores as key noun terms;
15: end if
16: end for
17: end if
18:end for
19:select topics with >β online news articles, and define them as the SPRTs.

2.3. Measure the Three Types of Features to Represent the Key Noun Terms of the SPRTs

To label the identified topics, most of the previous works in Table 1 used simple statistical approaches for characterizing key terms from clustered documents. In contrast, this paper proposes and employs temporal weight features, sentiment features, and complex network structural features to represent key noun terms, which can be identified to label the detected SPRTs, after reviewing the features that were used in the previous works of Table 1. Details of the proposed three features can be explained as follows:
Temporal weight features. Temporal IR attempts to consider not only relevance but also temporal correspondence based on the underlying temporal factor behind search intension. A relatively large number of key noun terms, i.e., queries, for information access have temporal information needs [54]. Hence, to represent the temporally changing importance of a key noun term in the identified topics, this study modifies the traditional weighting statistics, e.g., tf, idf, and tfidf, by taking into account time, which yields temporal weight features. In addition, basic statistics such as the mean, variance, and |skewedness| are measured for the temporal weight features, to consider the distributional characteristics over the given time period. Here, the absolute value of skewedness is to measure the shape of skewedness irrespective of whether it is skewed to the left/negative or to the right/positive.
Sentiment features. The sentiment features of a key noun term are measured by sentiment analysis on the large-scale online news articles. In general, sentiment analysis determines whether a textual data instance is objective or subjective and whether a subjective textual data instance contains positive or negative statements, and measures the sentiment value of a subjective textual data instance [55,56]. In this paper, the approach of Suh [8] that uses SentiWordNet as a lexicon is adopted to extract multilingual sentiment features and score their sentiment values mainly for two reasons: it enables researchers in the other countries to use the research framework of this paper by constructing sentiment features with their own languages; and it takes into account the negations. In addition, this study exploits the basic statistics of a key noun term’s sentiment features to represent the distributional characteristics over the news and topics that contain the key noun term.
Complex network structural features. Using the co-occurrence relationships of the key noun terms as links, which are called co-news and co-topic links, the complex networks of the key noun terms are constructed, and their complex network structural properties are measured by referring to the standard measures of node centrality, i.e., the degree, closeness, and betweenness centralities [57,58,59], and used as features for this study. In addition, after specifying a boundary, such as identified SPRTs and detected topical communities, to the complex networks of the key noun terms, the basic statistics are measured to represent the distributions of a key noun term’s in-boundary network properties over the different SPRTs and topical communities.
Thus, in this Section 2.3, the proposed three types of features are measured for all the extracted key noun terms of the SPRTs, i.e., noun t o p i c T O P I C T O P I C N O U N ( t o p i c ) . The measured three types of features for the topic-level key noun term are used to decide automatically whether the topic-level key noun term is a SocialTERM or EventTERM in the next Section 2.4.

2.3.1. Measure the Temporal Weight Features of the SPRTs’ Key Noun Terms

The temporal weight features of noun t o p i c T O P I C T O P I C N O U N ( t o p i c ) , namely F1, are measured with four respects: df, tf, ths, and idf. Moreover, these temporal weight features are measured with two different respects: at the news level and at the topic level.
First, the temporal weight features of noun at the news level are measured as follows: Given that NOUNNEWSt(noun) is a set of online news articles, containing noun and published at time t, d f s c o r e 1 , t ( n o u n ) is the normalized number of online news articles in NOUNNEWSt(noun), and it is given by
d f s c o r e 1 , t ( n o u n ) = n ( N O U N N E W S t ( n o u n ) ) n ( n o u n N O U N N E W S t ( n o u n ) ) .
Given that t f ( n o u n , n e w s ) is the frequency of noun, which occurred in the content of news N O U N N E W S t ( n o u n ) , t f s c o r e 1 , t ( n o u n ) is obtained by normalizing t f ( n o u n , n e w s ) by the number of online news articles in NOUNNEWSt(noun), and it is defined as
t f s c o r e 1 , t ( n o u n ) = n e w s N O U N N E W S t ( n o u n ) t f ( n o u n , n e w s ) n ( N O U N N E W S t ( n o u n ) ) .
t h s s c o r e 1 , t ( n o u n ) is the normalized ths(noun, news) for online news articles in N O U N N E W S t ( n o u n ) , and it is given by
t i t l e s c o r e 1 , t ( n o u n ) =   n e w s N O U N N E W S t ( n o u n ) t h s ( n o u n , n e w s ) n ( n o u n N O U N N E W S t ( n o u n ) ) .
where ths(noun, news) is 2 if noun appears in the title of news, otherwise it is 1.
i d f s c o r e 1 , t ( n o u n ) is the inverse of the number of online news articles, containing noun at time t, and it is defined as
i d f s c o r e 1 , t ( n o u n ) = log ( n ( t o p i c T O P I C N E W S t ( t o p i c ) ) n ( N O U N N E W S t ( n o u n ) ) ) .
To represent the distribution of each of the Equations (9)–(12) over time t = 1, …, T, the mean, variance, and |skewness| are measured, and added as the temporal weight features of noun at the news level to F1. As a consequence, 12 features are measured as the news-level temporal weight features of noun.
Second, the temporal weight features of noun at the topic level are obtained as follows: Given that NOUNTOPICt(noun) is a set of the detected SPRTs that contain online news articles in NOUNNEWSt(noun) and thereby are related to noun, d f s c o r e 2 , t ( n o u n ) is the normalized number of the detected SPRTs in NOUNTOPICt(noun), defined as
d f s c o r e 2 , t ( n o u n ) = n ( N O U N T O P I C t ( n o u n ) ) n ( n o u n N O U N T O P I C t ( n o u n ) ) .
t f s c o r e 2 , t ( n o u n ) is obtained by normalizing t f s c o r e 1 , t ( n o u n ) over the detected SPRTs related to noun, and defined as
t f s c o r e 2 , t ( n o u n ) = t f s c o r e 1 , t ( n o u n ) n ( N O U N T O P I C t ( n o u n ) ) .
In the same way, t i t l e s c o r e 2 , t ( n o u n ) is got by normalizing t i t l e s c o r e 1 , t ( n o u n ) over the detected SPRTs related to noun, given by
t i t l e s c o r e 2 , t ( n o u n ) = t i t l e s c o r e 1 , t ( n o u n ) n ( N O U N T O P I C t ( n o u n ) ) ,
and i d f s c o r e 2 , t ( n o u n ) is the normalized i d f s c o r e 1 , t ( n o u n ) over the detected SPRTs regarding to noun, given by
i d f s c o r e 2 , t ( n o u n ) = log ( n ( T O P I C t ) n ( N O U N T O P I C t ( n o u n ) ) ) .
To represent the distribution of each of Equations (13)–(16) over time t = 1, …, T, 12 topic-level temporal weight features on noun are measured, and added to F1. Consequently, Table 2 shows that 24 temporal weight features on noun, measured at both news and topic levels.

2.3.2. Measure the Sentiment Features of the SPRTs’ Key Noun Terms

This component extracts features related to the sentiment of noun, namely F2. To do so, this paper adopts the multilingual sentiment feature set, constructed by Suh [8] through two main parts: the extraction of English sentiment feature set from SentiWordNet (http://sentiwordnet.isti.cnr.it/), and the construction of multilingual sentiment feature set.
Let NOUNSENTI(noun) be a set of the constructed multilingual sentiment features that contain noun. Then, the sentiment score of a multilingual sentiment feature including noun with the given pos is obtained by
f e a t u r e s e n t i s c o r e ( n o u n , p o s ) = s e n t i N O U N S E N T I ( n o u n ) f i n a l s c o r e ( s e n t i , p o s ) n ( N O U N S E N T I ( n o u n ) ) .
In addition, the sentiment score of noun is defined as
n o u n s e n t i s c o r e ( n o u n ) = 1 3 p o s f e a t u r e s e n t i c o r e ( n o u n , p o s ) .
The sentiment score of noun at the news level is obtained by averaging the sentiment scores of online news articles, containing noun, and it is given by
s e n t i s c o r e 1 ( n o u n ) = n e w s N O U N N E W S ( n o u n ) n e w s s e n t i s c o r e ( n e w s ) n ( N O U N N E W S ( n o u n ) ) ,
where NOUNNEWS(noun) = NOUNNEWS1(noun)∪…∪NOUNNEWST(noun) for time t = 1, …, T. Here, the sentiment score of an online news article, news, is given by
n e w s s e n t i s c o r e ( n e w s ) = 1 3 p o s s e n t i N E W S S E N T I ( n e w s ) f i n a l s c o r e ( s e n t i , p o s ) n ( N E W S S E N T I ( n e w s ) ) ,
where NEWSSENTI(news) is a set of the multilingual sentiment features appearing in news. In addition, to represent the distribution of the sentiment scores of online news articles that contain noun, variance, and |skewness| for newssentiscore(news) are measured over newsNOUNNEWS(noun), and they are added as the sentiment features of noun to F2. Here, the mean value of newssentiscore(news) is equal to sentiscore1(noun).
The sentiment score of noun at the topic level is defined as
s e n t i s c o r e 2 ( n o u n ) = t o p i c N O U N T O P I C ( n o u n ) t o p i c s e n t i s c o r e ( t o p i c ) n ( N O U N T O P I C ( n o u n ) ) .
Here, the sentiment score of the detected topic, topic, is given by
t o p i c s e n t i s c o r e ( t o p i c ) = n e w s T O P I C N E W S ( t o p i c ) n e w s s e n t i s c o r e ( n e w s ) n ( T O P I C N E W S ( t o p i c ) ) .
In addition, to represent the distribution of the sentiment scores over the detected SPRTs, whose online news articles are containing noun, the mean, variance, and |skewness| for topicsentiscore(topic) are measured over topicNOUNTOPIC(noun), and they are added as the sentiment features of noun to F2. Here, the mean value of topicsentiscore(topic) is equal to sentiscore2(noun). Consequently, 10 sentiment features of noun are measured as shown in Table 3, and F22 and F23 are particularly measured to represent the distributions of the sentiment scores of noun over its news and topic.

2.3.3. Measure the Complex Network Structural Features of the SPRTs’ Key Noun Terms

A network whose structure is irregular, complex, and dynamically evolving over time is defined as a complex network. The research on complex networks has resulted in the identification of a series of unifying principles and statistical properties that are common to most real networks [60]. For a given plain graph, approaches that are based on the structure-based patterns of complex networks can be grouped into feature-based and proximity-based approaches: feature-based approaches extract graph-centric features, e.g., node degree; proximity-based approaches quantify the closeness of nodes in the graph to identify associations, e.g., PageRank [61]. In particular, feature-based approaches compute various measures that are associated with the nodes, dyads, triads, egonets, communities, and global graph structure. Among these measures, this paper focuses on the nodes and communities because they both correspond to the node perspective.
Network properties characterize an individual node’s position within a complex network. The three most widely investigated concepts for evaluating such network properties are the degree, closeness, and betweenness centralities [57,58,59]. These are the standard measures of node centrality, which were originally introduced to quantify the importance of an individual in a social network. Given an adjacency matrix M n × n ( m i j ) of networks, where n is ≥3, the three normalized network centralities can be respectively defined as follows:
d e g r e e i = 1 ( n 1 ) j i m i j ,
where mij = 1 if node i is connected to node j. A high value of degreei means that node i acts as a center in the network.
c l o s e n e s s i = ( n 1 ) [ j i d i j ] 1 ,
where dij is the number of edges in the shortest path from node i to node j. closenessi indicates the influence of node i on the other nodes.
b e t w e e n n e s s i = 1 ( n 1 ) ( n 2 ) / 2 j i k g j i k / g j k ,
where gjk is the number of the shortest paths between node j and node k, and gjik is the number of the shortest paths between node j and node k that contain node i. A high betweennessi value means that node i is located at the core of the networks and has higher momentum of transition.
A community is a densely connected subgroup, which is known to exist in many real-world networks, and community detection (CD) can help us understand networks more deeply and identify interesting properties that are shared by the nodes [62,63]. The fundamental idea behind most CD methods is to partition the nodes of the network into modules [64]. For the agglomerative methods of CD, there are two commonly used algorithms: first, Newman’s CD algorithm is a widely used agglomerative method that uses modularity to measure the goodness of the current partitioning; second, the recently developed Louvain method [65] is an agglomerative method and is commonly used because of its low computational complexity and high performance. When merging communities, the Louvain method considers not only the modularity but also the consolidation ratio [41]. Newman’s algorithm is effective but slow, whereas Louvain’s method is much more computationally efficient [66]. Therefore, this paper adopts the Louvain method for detecting topical communities from complex networks of key noun terms, which are used to label the detected SPRTs.
Based on the abovementioned definitions related to complex networks, this component extracts the complex network structural features regarding noun, namely F3, by constructing two types of the complex networks of the SPRTs’ key noun terms: cross-boundary networks and in-boundary networks. Figure 2 describes how the networks of the key noun terms are constructed respectively, and details are explained as follows:
The cross-boundary networks (CBNs) are constructed by using the key noun terms as nodes, and setting edges by the co-occurrence relationship between the key noun terms in terms of news and topics. In other words, CBNco-news is constructed by making the key noun terms as nodes and their co-occurrence frequencies in online news articles, i.e., co-news frequencies, as the corresponding link weights. Similarly, by setting co-occurrence frequencies in the detected topics, i.e., co-topic frequencies, as the corresponding link weights, CBNco-topic is constructed.
In-boundary networks (IBNs) are built up by using the key noun terms in a particular boundary, and their co-occurrence relationships with respect to the boundary. For IBNs, this study uses two types of boundaries, topics, and communities. First, let ITNco-news(topic) be a kind of IBNs, constructed by setting topic as the boundary and co-news frequencies of the key noun terms as link weights. Second, the Louvain method-based CD on CBNco-topic is performed to take into account the semantic relationship among the key noun terms in terms of their co-topic frequencies. Unlike the TD, the CD allows noun to only one of the detected communities. For each detected community, community, an in-community network, i.e., ICNco-topic(community), is formed by setting co-topic frequencies of the key noun terms in the boundary of community as the link weights.
To evaluate the network properties of noun in both CBNco-news and CBNco-topic, degree, closeness, and betweenness are respectively measured as the complex network structural features of noun. Relating to the IBNs, the network properties of noun in ITNco-news(topic) are degree(noun, ITNco-news(topic)), closeness(noun, ITNco-news(topic)), and betweenness(noun, ITNco-news(topic)). In particular, to represent the distribution of the three network centralities of noun over the detected SPRTs, the mean, variance, and |skewness| are measured regarding noun, and they are added as the complex network structural features of noun to F3. Then, the structural properties of noun in its corresponding ICNco-topic(community) are obtained as degree(noun, ICNco-topic(community)), closeness(noun, ICNco-topic(community)), and betweenness(noun, ICNco-topic(community)). As a result, Table 4 shows that 18 complex network structural features of noun for each of the constructed complex networks of the SPRTs’ key noun terms.

2.4. Classify the Key Noun Terms of the SPRTs into the SocialTERMs and the EventTERMs

This subsection defines a target variable for classification, and introduces machine learning techniques used for classification in the previous text mining applications. In addition, it explains the experimental settings to generate configurations, which result from combining the different feature sets and different classification techniques.

2.4.1. Definition for a Target Variable

By referring to the examples, mentioned in the introduction, SocialTERM and EventTERM can be defined as below:
Definition 1.
(SocialTERM) Given social-problem-related topics (SPRTs) and their key noun terms, the SocialTERM of a SPRT is defined as a key noun term that are perceived as: characterizing the SPRT as a social problem; and being a useful cue to identifying and monitoring the ongoing and future events of the social problem. SocialTERMs are irrelevant of the event-specific characteristics of the SPRTs, e.g., when and where the events of the SPRT happened, but reflective of the social-problem-specific perspectives of the SPRTs, e.g., what social problems the SPRT includes, and what causes are underlying such social problems.
Definition 2.
(EventTERM) Given SPRTs and their key noun terms, the EventTERM of a SPRT is defined as a key noun term that is not perceived as a SocialTERM, because it is not able to explain the social-problem-specific characteristic of the SPRT but the event-specific characteristics of the events that belong to the SPRT. Thus, the EventTERMs are considered not useful to identifying and monitoring the ongoing and future events of social problems.
For the key noun terms obtained from the detected topics, their target variables, y(noun), are manually identified by three professional and experienced social scientists, invited as inspectors. Defined as Equation (26), these are used as the true values to be compared to the estimated values.
y ( n o u n ) = { SocialTERM   if   n o u n   is   the   social   problem - specific   key   noun   term   of   the   detected   SPRTs , EventTERM   otherwise .
To assure the reliability of the manual investigation, Cohen’s Kappa, k, is calculated for the inter-agreement between the three inspectors, and it is defined as
k = 1 ( 1 p o 1 p e ) ,
where p o is the relative observed agreement among the three inspectors, and p e is the hypothetical probability of chance agreement. The Cohen’s Kappa is a statistic that measures inter-rater agreement for categorical items, and it serves as an evidence that the combination of several sources reduced the bias of individual sources [56,67,68]. For these reasons, it is adopted in this study to evaluate the consistency of annotated results by the three inspectors.

2.4.2. Machine Learning Techniques for Classification in the Previous Text Mining Applications

To distinguish between the SocialTERMs and the EventTERMs among the key noun terms of the detected SPRTs, this paper adopts supervised classification techniques, which have been extensively studied due to their high classification performance. Of the classification techniques that were used in the previous works in Table 1, four commonly used classification techniques and a recently proposed deep-learning-based technique are adopted as base learners for this study. To name, they are C4.5 as Decision Tree (DT) [9,69], Naïve Bayes (NB) [70,71,72], Radial Basis Function Network (RBFN) [9], Support Vector Machine (SVM) [73,74], and Deep Belief Network (DBN) [75,76,77]. Each of them is explained in the S.1 of Supplementary Materials.
In addition to the five base learners, three types of ensemble methods are combined with each of the five base learners for this study. Ensemble learning is a machine learning paradigm in which multiple learners are trained to solve the same problem. In contrast to the base learners, which try to learn one hypothesis from the training data, the ensemble learning methods try to learn a set of hypotheses and combine them for use. In general, ensemble methods are divided into two categories: instance partitioning and feature partitioning. Bagging and Boosting are instance partitioning methods; RS is a feature partitioning method [78].
Particularly, the three ensemble methods, namely Bagging, Boosting, and RS, are summarized as follows: Bagging is one of the simplest ensemble methods but has surprisingly good performance. The combination strategy of base learners for Bagging is majority voting. This strategy reduces the variance when combined with the base learner generation strategies. Bagging is particularly appealing when the available data are of limited size [79]. Unlike Bagging, Boosting produces different base learners by sequentially giving instances that have been misclassified by the previous base learner larger weight in the next iteration of training. The final model that is obtained by Boosting is a linear combination of several base learners, which are weighted by their own performances. There are several Boosting algorithms; the most widely used is AdaBoost [78]. RS is an ensemble construction technique, which uses random subspaces to both construct and aggregate the base learners. If a dataset has many redundant or irrelevant features, base learners in random subspaces may be better than in the original feature space. The combined decision of such base learners may be superior to that of a single classifier that is constructed on the original training dataset in the complete feature sets.
To the best of our knowledge from Table 1, no previous study has compared the performances of the state-of-the-art classification techniques, particularly DBN, in distinguishing between the SocialTERMs and the EventTERMs among the key terms of the detected SPRTs. Hence, this study adopts the five base learners and their combinations with the three ensemble methods. Moreover, these classification techniques are compared in terms of their performances.

2.4.3. Experimental Settings on Features and Classification Techniques

In this paper, the experiments are performed with 60 configurations, which result from combining the three feature sets, namely F1, F1 + F2, and F1 + F2 + F3, and 20 classification techniques. Details on the experimental settings are as follows.
The three types of features, i.e., F1, F2, and F3, are obtained after the feature extraction of the Section 2.3. Based on these different types of features, three feature sets are constructed in an incremental way: feature set F1; feature set F1 + F2; and feature set F1 + F2 + F3. This incremental order implies the evolutionary sequence of features [19,80].
In addition, three popular ensemble methods, i.e., Bagging, Boosting, and RS, are implemented respectively with the five base learners. Consequently, the paper uses 20 classification techniques to differentiate the SocialTERMs from the EventTERMs as described in Table 5. For an experiment that uses one of the 20 classification techniques, a 10 fold validation is performed to train a classifier and evaluate it. Before performing the experiments, if the sample sizes of two classes in y(noun) of the data set for an experiment are imbalanced, the imbalanced problem has to be resolved because imbalanced datasets may have problems such as small sample size, overlapping or class separability, and small disjunctions [81]. Previous approaches for dealing with imbalanced datasets are grouped into four categories: algorithm-level, e.g., Hellinger Distance Decision Trees; data-level, e.g., random oversampling and synthetic minority oversampling technique (SMOTE); cost-sensitive, e.g., AdaCost; and classifier ensembles, e.g., Bagging [82]. Among them, the SMOTE approach is known for its good performances when adopted with ensemble methods [81], and therefore it is used to deal with the imbalance problem of this study [83].
Among the 20 classification techniques, to implement the conventional 16 classification approaches of DT, NB, RBFN, and SVM, the data mining toolkit WEKA (Waikato Environment for Knowledge Analysis) version 3.7.0 is used because it is the best-known open-source toolkit with a collection of various machine learning algorithms for solving data mining problems [19,78]. In detail, for the base learners, J48 module (WEKA’s own version of C4.5) for DT, RBFNetwork module for RBFN, NaïveBayes module for NB, and SMO module for SVM; for the ensemble methods, Bagging module for Bagging, AdaBoostM1 module for Boosting, and RandomSubSpace module for RS. Moreover, for DBN and its ensemble learning methods, python-based deep learning tutorials from ‘www.deeplearning.net’ are used as references, and modified. In implementing DBN, the number of hidden layers is set to two, and the dimension in each layer is set to 100 by default.

2.5. Evaluate Results with Comparisons

This component assesses the performance of the configurations of three feature sets and 20 classification techniques for classifying the key noun terms of the SPRTs into the SocialTERMs and the EventTERMs. Among the standard metrics, widely used in IR and text classification studies, this paper uses the three performance measures, i.e., accuracy, F-measure, and AUC to evaluates each configuration. In particular, the definition of accuracy can be explained with a confusion matrix as shown in Table 6, and it is defined as
a c c u r a c y = TP   +   TN TP   +   FP   +   FN   +   TN ,
and F-measure is obtained by
F m e a s u r e = 2 TP 2 TP   +   FP   +   FN .
In addition, pairwise t tests are used for the comparisons because they are the simplest statistical tests, and they are commonly used for comparing the performance of two algorithms. The pairwise t tests examine whether the average difference in two approaches is significantly different from 0 by repeating the same experiments many times, particularly 50 times for this study [19]. In detail, the effect of adding one feature set on the three performance measures for a certain classification technique is investigated by conducting 60 individual pairwise t tests, i.e., 60 = three feature set comparisons × 20 classification techniques. Moreover, classification techniques for a certain feature set are compared in terms of the three performance measures by conducting 120 individual pairwise t tests, which are composed as follows: 30 between five BL classification techniques, i.e., 30 = 10 technique comparisons × three feature sets; 45 between five BL classification techniques and 15 ensemble learning methods, i.e., 45 = 15 technique comparisons × three feature sets; and 45 between 15 ensemble learning methods, i.e., 45 = 15 technique comparisons × three feature sets.

3. Results

3.1. Test Bed for Data Collection: South Korea and Korean News Portal Site

Relating to the Section 2.1, this paper selected South Korea as a test-bed country for three main reasons: first, it is an information and communication technology (ICT)-intensive nation, so many online news articles are available, and it is easier to identify the SPRTs from online news articles [8,9]. Second, it is well known for its high prevalence of social problems, e.g., it has the highest rate of suicide among OECD countries [84]. This means that South Korea needs to identify social problems more than the other countries do, which corresponds to the desired application of this study. Third, it is a knowledge-intensive country, so, once identified, SocialTERMs can be better used to explore technologies for solving social problems than in other countries [85].
By using the distributed web-crawling program, the online news articles were collected from NAVER.com, which is the best-known Korean news portal site. These articles had been published in the society-related news sections in the 356 days from May 2013 to June 2014, i.e., t = 1, …, 365. In total, 126,402 online news articles were collected from the targeted society-related sections, and the parsed data were stored in the relational database for the experiments.

3.2. Evaluation Results

Relating to the Section 2.2, 43,711 online news articles with negative sentiment were selected from the collected 126,403 online news articles. Next, the thresholds ε = 0.3, α = 20, and β = 10 were determined based on a pre-topic analysis of 100 online news articles, which were published in the first month, and 2961 topics of online news articles, which were detected from the 43,711 online news articles by Algorithm 1. Among the 2961 detected topics of online news articles, the 467 topics with more than 10 (=β) online news articles were chosen as the final detected topics, namely the SPRTs. Then, as explained in the Section 2.3, the three types of features, namely temporal weight, sentiment, and complex network structural features, were measured for the 1810 key noun terms, which were extracted from the 467 detected SPRTs (see Table 7 for examples of the 1810 key noun terms). Particularly in measuring the complex network structural features, JUNG (http://jung.sourceforge.net/), which is a Java-based software library for network analysis, was used to obtain the network centralities, and Gephi (https://gephi.org/), which is an open-source graph visualization platform, was used to identify communities from the constructed co-news and co-topic key term networks. Table A3, Table A4, Table A5 and Table A6 in Appendix B show the descriptive statistics on the three types of features that represent the 1810 key noun terms.
The target variables, which are denoted as y(noun), of the 1810 key noun terms were manually identified by three inspectors. The procedure yielded a Cohen’s Kappa inter-rater reliability of 0.8678, thereby indicating good agreement, i.e., k ≥ 0.8, according to Lombard et al. [86]. Disagreements among the three inspectors were jointly reviewed until a final agreement was reached. These were used as the true values, to be compared to the estimated values. In addition, the 1810 key noun terms were the imbalanced in terms of the classes of their target variables. To resolve this imbalance issue, SMOTE was applied to the 1810 key noun terms. By adding 502 new instances of y(noun) = SocialTERM, a balanced data set of 1156 SocialTERMs and 1156 EventTERMs was prepared for the following experiments.
Next, according to the Section 2.4, experiments were performed on the prepared data set, and Table 8 shows the experimental results on the three performance measures for different feature sets and different classification techniques. Consequently, the full feature set configuration of F1 + F2 + F3 and the ensemble learning method, namely Boosting DT, gave the best accuracy, i.e., 83.8769%, which is 1.3264% better than the second best configuration, i.e., F1 + F2 and Boosting DT. Moreover, Table 8 shows that, with F1 + F2 + F3, Boosting DT also gave the best performances in terms of F-measure (1.7112% better than with F1 + F2) and ACU (1.8174% better than with F1 + F2). Thus, the results in Table 8 provided an answer to the part of RQ1 that is how well the three types of features perform by using different classification techniques.
The possible reason for the best performances of Boosting DT is as follows: DT could deal with the numerical features of this study properly as categorical features; and DT with Boosting could reduce multi-collinearity problems, which may exist among features [9,74,78].

4. Discussion

4.1. Comparisons of Feature Sets

Table A6 of Appendix C shows the comparison results of pairwise t tests, which were performed to evaluate the effects of different feature sets on the performance of a classification technique in terms of the three performance measures. The comparison results gave answers to the part of RQ1 about which feature set and features give the best results, and their details are as follows.
By summarizing the comparison results in Table A6, Figure 3 illustrates the ratio of agreement with the positive effect of adding a feature subset on increasing performance from different perspectives. One of its key findings is that for most of the 20 classification techniques, adding F1, F2, and F3 individually increased performance in respect of the three performance measures. This indicates that each of the feature sets that were suggested by this study is useful for identifying the SocialTERMs from the detected SPRTs from online news articles. Sentiment feature set F2 led to better performances, regardless of the classification technique. The effect of adding complex network structural feature set F3 was smaller than those of adding F1 and F2.
Furthermore, using Boosting DT, which was shown to be the best classification technique in Table 8, this paper performed pairwise t tests to compare their different feature subsets, and investigated the effect of adding each feature subset on the classification performance. Table 9 shows that for all three performance measures, the significant performance improvements by feature sets F1, F2, and F3 was respectively attributed to feature subsets F11 and F12 for F1, F21 and F23 for F2, and F31 for F3. This indicates that these features are more useful in characterizing the relatedness of the key noun terms to social problems.

4.2. Comparisons on Classification Techniques

In addition, the classification techniques were compared in three ways: base learner vs. base learner (see Table A7 of Appendix C), base learner vs. ensemble learning method (see Table A8 of Appendix C), and ensemble method vs. ensemble method (see Table A9 of Appendix C). Table A7 shows the results of the pairwise t tests, which were performed to examine the effects of different base learners on three performance measures for a specific feature set, and Figure 4 provides an overview of the results in Table A7. According to the results, the performance rankings of all five base learners are different according to the selected feature sets, and it implies that there is no single best classification technique for all three performance measures.
Table A8 shows the results of the pairwise t tests, which were performed to examine the effect of combining an ensemble method on three performance measures for a specific feature set, and Figure 5 summarizes the results in Table A8. To explain, Figure 5 shows that in terms of all three performance measures, combining Bagging yielded better performances than their base learners in most configurations for all the incremental feature sets, while Boosting and RS did not perform as well as Bagging. The reason for the positive effect of Bagging can be that Bagging helps preserve the important information better than the base learners by considering the features in their entirety, unlike the base learners, which only considers the average of the aggregated features. Overall, it is concluded that combining an ensemble learning method is appropriate for this study to identify the SocialTERMs from the detected SPRTs.
Table A9 shows the results of the pairwise t tests, which were performed to examine the effects of different ensemble methods on three performance measures for a specific feature set if a base learner is given. Figure 6, Figure 7 and Figure 8 explain the performance rankings of the ensemble methods, which are evaluated based on the results in Table A9.
Some interesting findings from Figure 6 are as follows: While Bagging was ranked best among the three ensemble methods if combined with DT and DBN for F1, Boosting gave better accuracies with DT and DBN for F1 + F2 and F1 + F2 + F3, and with NB for all feature sets. The possible reasons for the superiority of Boosting with DT, NB, and DBN are as follows: The strategy of Boosting, which gives higher weights to misclassifications in training, was effective for training models of DT, NB, and DBN with more features; and Boosting’s robustness against the multi-collinearity problems among complex features could help DT, NB, and DT to have better accuracies. Moreover, for all feature sets, RS always achieved better accuracies than other ensemble methods if it was used with RBFN, while no major ensemble method achieved better accuracy with SVM. In Figure 7 and Figure 8, the same results with Figure 6 were observed, except that for all feature sets, Boosting gave better AUCs if combined with SVM, followed in descending order of Bagging and RS as shown in Figure 8d.
Thus, Figure 6, Figure 7 and Figure 8 indicate that the choice of an ensemble method for obtaining better performances depends on the feature sets and the base learners. Therefore, it can hardly be said that a single ensemble method gave the best accuracy for the all feature sets with any single base learner. However, as shown in Figure 6f, if the accuracy rankings of the ensemble methods were averaged over the different base learners for a given feature set, Boosting was a comparatively better choice as an ensemble method for any base learner. Moreover, Figure 7f and Figure 8f that averaged the F-measure rankings and the AUC rankings, respectively, also demonstrate the same results with Figure 6f, i.e., the superiority of Boosting over Bagging and RS.

5. Conclusions

This paper proposed and examined an automatic approach, namely SocialTERM-Extractor, for distinguishing between the SocialTERMs and the EventTERMs among the key noun terms of the detected SPRTs from a large number of Korean online news articles. It aimed at resolving the challenging issues that were mentioned in Section 1. Using the best-known news portal site of South Korea as a test-bed, experiments were conducted by following the proposed research framework, as explained in Section 2. The experimental results in Table 8 showed that the configuration of the full feature set, namely F1 + F2 + F3 and Boosting DT gave the best performances for accuracy, as well as F-measure and AUC. Its high performances, e.g., 83.8769% accuracy, implies that the proposed approach can automatically identify the SocialTERMs in a reliable way (RQ1 was partly answered).
Furthermore, according to Figure 3, the pairwise t tests on three performance measures for adding a feature set in Table A6 indicated that most of the 20 classification techniques agreed that the three feature sets, namely F1, F2, and F3, contributed to improving the classification performance in a statistically significant way. In particular, it was agreed by all 20 classification techniques that adding sentiment feature set F2 improved the classification performance, in particular unanimously in terms of accuracy and AUC. When the best classification technique, namely Boosting DT, was used, Table 9 showed that the individual addition of feature subsets such as F11, F12, F21, F23, and F31 increased all three performance measures actually. This indicates that the significant improvement in terms of three performance measures by adding feature sets in Table A6 is attributed to such feature subsets (RQ1 was partly answered).
Relating to the comparisons of the classification techniques, according to Figure 4 (and Table A7), the performance rankings of all five base learners differed according to the selected feature sets (RQ2 was answered). In addition, Figure 5 (and Table A8) revealed that most of the 20 configurations agreed that most ensemble learning methods produced better performances than the base learners (RQ3 was answered). According to Figure 6, Figure 7 and Figure 8 (and Table A9), the ensemble method that obtains the best results depends on the feature sets and the base learners. Nevertheless, when the performance rankings of an ensemble method for a feature set were averaged over all types of base learners, ensemble learning methods with Boosting showed comparatively better results for all feature sets (RQ3 was answered).
Theoretically, this paper contributes to expanding the related literature by applying text mining and machine learning techniques to a large number of online news articles as big data. To the best of our knowledge, this study is the first to provide an automatic approach for identifying and predicting the SocialTERMs of the detected SPRTs from online news articles. The appropriate SocialTERMs can be identified automatically so anybody, even someone who is unfamiliar with the ongoing social problems, can benefit from the automatic approach of this study. It helps enable everyone to recognize the landscape of the SPRTs from a large amount of event-related textual data without difficulty. In addition, this study has a significant impact on sustainability, since the SocialTERMs can be used as key noun terms in searching for technologies that are helpful for solving social problems and monitoring the ongoing and future events that are associated with the social problems. Eventually, the paper may facilitate innovations in our society by driving the development of technologies for ongoing and future social problems.
Practically, by answering RQ1~RQ2, this paper provided a reference and guidance for researchers, government officials, politicians, and companies that are in need of the system implementation. The paper investigated which kinds of feature sets are preferable, what kinds of classification techniques perform better, and how these two factors must be combined to obtain the best results. These results help determine the proper model for building a system with real-world large data. In the suggested research framework, the paper suggested novel approaches for representing the key noun terms: temporal weight, sentiment, and complex network structural features. Moreover, the paper compared state-of-the-art techniques, including the recently proposed DBN, which is a deep-learning-based technique. It showed that the simpler conventional classification method was better for this study, while the more complex DBN gave worse results. This indicates that the deep architecture is not a magic key for all kinds of problems in machine learning research, as it is known that the deep architecture works for big data cases with a lot of variables. However, as the results were not much worse compared to the other approaches, better performances by the deep architecture in the other applications may be possible.
Thus, if the automatic approach is implemented by developing a system, the system can automatically recommend the SocialTERMs, which are useful key noun terms for exploring technologies that can be used to solve social problems. The SocialTERMs can be applied to the prediction of future social problems and the monitoring the ongoing social problems from a large number of online news articles. Thus, this study finally helps obtain the new insights about how to identify ongoing and upcoming social problems from big data, thereby paving a way to big-data-driven social and technological innovations for the public good.
Further research can be conducted to overcome the limitations of this study. First, this study used only a large number of online news articles, but, in addition to online news articles, large-scale data from social media, e.g., YouTube, Twitter, and Facebook, may provide good sources for extracting temporal weight, sentiment, and complex network structural features on the key noun terms of the detected SPRTs. Second, the paper focused on the three types of features, but there may be other useful features, and more sophisticated classification techniques can be taken into account to improve the classification performance.
In addition, as future work, a portal site that provides the proposed methodology can be planned so that this methodology can be available to individuals and groups who are in need of identifying the SPRTs and their SocialTERMs. The easier-to-use method can also be considered in developing the portal site, e.g., k-means and latent Dirichlet allocation (LDA) for the TD approach. If developed, the proposed methodology and system can be evaluated in terms of whether they are helpful for users not only in exploring technologies for solving social problems but also in monitoring ongoing and future social problems based on a large amount of event-related textual data.

Supplementary Materials

The following are available online at https://www.mdpi.com/2071-1050/11/1/196/s1, S.1: Reviews on the State-of-the-Art Machine Learning Techniques.

Funding

This study was supported by the National Research Foundation of Korea Grant (NRF-2017R1C1B1010065), funded by the Korean Government.

Acknowledgments

I am grateful to the inspectors, who manually coded the extracted key noun terms of the detected SPRTs into SocialTERMs and EventTERMs. I would like to thank the anonymous reviewers for their valuable comments that helped revise the original version of this paper.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

Table A1. Examples of Korean sentiment features.
Table A1. Examples of Korean sentiment features.
English Sentiment FeaturePOSFinal Sentiment ScoreTranslationMorphological AnalysisKorean Sentiment FeaturePOSSentiment Value
soilVerb−0.7500더럽히다더럽히/pvg+다/ef더럽히/pvg+다/efVerb−0.7500
disheartenedAdjective−0.4250속상한속상/ncps+하/xsms+ㄴ/etm속상/ncps+하/xsms+ㄴ/etmAdjective−0.4250
instantlyAdverb−0.1875즉시즉시/mag즉시/magAdverb−0.2000
straight_away−0.3750
right_away−0.5000
at_once−0.1875
swiftly0.2500
Table A2. Korean sentiment features, extended from the stem ‘더럽히/pvg’.
Table A2. Korean sentiment features, extended from the stem ‘더럽히/pvg’.
StemEndingsExtended Korean Sentiment FeaturePOSTenseSentiment Value
NegationFinal Ending
더럽히/pvg-ㄴ다/ef더럽히/pvg+ㄴ다/ef더럽힌다VerbPresent0.7500
었다/ef더럽히/pvg+었다/ef더럽히었다, 더럽혔다Past
ㄹ 것이다/ef더럽히/pvg+ㄹ 것이다/ef더럽힐 것이다Future
는/etm더럽히/pvg+는/etm더럽히는, 더럽힌AdjectivePresent
었던/etm더럽히/pvg+었던/etm더럽히었던, 더럽혔던Past
ㄹ/etm더럽히/pvg+ㄹ/etm더럽힐Future
게/ecs더럽히/pvg+게/ecs더럽히게Adverb
지/ecx 않/px는다/ef더럽히/pvg+지/ecx 않/px+는다/ef더럽히지 않는다VerbPresent−0.7500
었다/ef더럽히/pvg+지/ecx 않/px+었다/ef더럽히지 않았다Past
을 것이다/ef더럽히/pvg+지/ecx 않/px+을 것이다/ef더럽히지 않을 것이다Future
는/etm더럽히/pvg+지/ecx 않/px+는/etm더럽히지 않는AdjectivePresent
었던/etm더럽히/pvg+지/ecx 않/px+었/ep+던/etm더럽히지 않았던Past
을/etm더럽히/pvg+지/ecx 않/px+을/etm더럽히지 않을Future
게/ecs더럽히/pvg+지/ecx 않/px+게/ecs더럽히지 않게Adverb

Appendix B

Table A3. Descriptive statistics on the features F1 from the test bed.
Table A3. Descriptive statistics on the features F1 from the test bed.
TypeFeaturesSocialTERMsEventTERMsSocialTERMs + EventTERMs
MinMaxMeanSDMinMaxMeanSDMinMaxMeanSD
F11dfscore1,t(noun)mean0.0007 0.0611 0.0171 0.0123 0.0002 0.0776 0.0144 0.0116 0.0002 0.0776 0.0154 0.0119
variance0.0000 0.0090 0.0009 0.0008 0.0000 0.0044 0.0007 0.0006 0.0000 0.0090 0.0008 0.0007
|skewness|0.0768 12.3427 2.9182 1.8452 0.0035 14.7584 3.2080 2.0100 0.0035 14.7584 3.1033 1.9570
dfscore2,t(noun)mean0.0002 0.0273 0.0067 0.0052 0.0000 0.0314 0.0056 0.0049 0.0000 0.0314 0.0060 0.0050
variance0.0000 0.0018 0.0004 0.0003 0.0000 0.0019 0.0003 0.0003 0.0000 0.0019 0.0003 0.0003
|skewness|0.8417 19.1050 4.8521 2.6011 0.7918 19.1050 5.3541 2.9967 0.7918 19.1050 5.1727 2.8702
F12tfscore1,t(noun)mean0.0000 0.3335 0.0614 0.0528 0.0000 0.3345 0.0430 0.0469 0.0000 0.3345 0.0497 0.0499
variance0.0000 0.1813 0.0427 0.0310 0.0000 0.1984 0.0308 0.0285 0.0000 0.1984 0.0351 0.0300
|skewness|0.0000 19.1050 5.0052 3.1014 0.0000 19.1050 6.3110 4.4465 0.0000 19.1050 5.8392 4.0616
tfscore2,t(noun)mean0.0000 0.1013 0.0219 0.0198 0.0000 0.1245 0.0149 0.0168 0.0000 0.1245 0.0174 0.0183
variance0.0000 0.0665 0.0145 0.0117 0.0000 0.0740 0.0100 0.0100 0.0000 0.0740 0.0116 0.0109
|skewness|0.0000 19.1050 8.1998 4.2110 0.0000 19.1050 9.2439 5.3063 0.0000 19.1050 8.8666 4.9640
F13titlescore1,t(noun)mean0.0471 2.9845 1.2800 0.8035 0.0205 3.1060 1.1290 0.7854 0.0205 3.1060 1.1836 0.7953
variance0.2059 5.5199 3.4616 1.3431 0.0885 5.4283 3.2218 1.4555 0.0885 5.5199 3.3084 1.4206
|skewness|0.0004 9.7994 1.6360 1.3264 0.0014 15.9908 1.9602 1.6250 0.0004 15.9908 1.8430 1.5318
titlescore2,t(noun)mean0.0118 1.4455 0.5072 0.3544 0.0005 1.4574 0.4424 0.3421 0.0005 1.4574 0.4658 0.3480
variance0.05083.6949 1.7454 0.9260 0.0001 3.6738 1.5632 0.9365 0.0001 3.6949 1.6290 0.9368
|skewness|0.654619.1050 3.5004 2.2744 0.6383 19.1050 4.0408 2.7496 0.6383 19.1050 3.8455 2.6009
F14idfscore1,t(noun)mean0.00020.0702 0.0058 0.0072 0.0002 0.1146 0.0050 0.0080 0.0002 0.1146 0.0053 0.0077
variance0.00000.0087 0.0002 0.0006 0.0000 0.0246 0.0002 0.0008 0.0000 0.0246 0.0002 0.0007
|skewness|1.1309 18.2918 5.7815 3.0954 0.5637 19.0358 6.2418 3.4033 0.5637 19.0358 6.0755 3.3027
idfscore2,t(noun)mean0.0000 0.0309 0.0040 0.0044 0.0000 0.0422 0.0035 0.0045 0.0000 0.0422 0.0037 0.0045
variance0.0000 0.0037 0.0002 0.0004 0.0000 0.0046 0.0002 0.0004 0.0000 0.0046 0.0002 0.0004
|skewness|1.2120 19.1050 6.7066 3.3285 1.4104 19.1050 7.3342 3.6612 1.2120 19.1050 7.1074 3.5573
Table A4. Descriptive statistics on the features of F2 from the test bed.
Table A4. Descriptive statistics on the features of F2 from the test bed.
TypeFeaturesSocialTERMsEventTERMsSocialTERMs + EventTERMs
MinMaxMeanSDMinMaxMeanSDMinMaxMeanSD
F21featuresentiscore(noun, pos = verb)-0.0000 0.8750 0.0186 0.0678 0.0000 0.6250 0.0054 0.0319 0.0000 0.8750 0.0102 0.0485
featuresentiscore (noun, pos = adverb)-0.0000 0.8125 0.0885 0.1647 0.0000 0.7500 0.0294 0.0936 0.0000 0.8125 0.0507 0.1273
featuresentiscore (noun, pos = adjective)-0.0000 0.2917 0.0370 0.0595 0.0000 0.2500 0.0146 0.0369 0.0000 0.2917 0.0227 0.0476
nounsentiscore(noun)-0.2754 0.9227 0.3576 0.0451 0.2037 0.6793 0.3535 0.0402 0.2037 0.9227 0.3550 0.0421
F22sentiscore1(noun)-0.0048 0.7668 0.0391 0.0531 0.0028 0.6672 0.0370 0.0442 0.0028 0.7668 0.0378 0.0476
newssentiscore(news)mean0.0335 18.6163 2.7072 1.8694 0.0042 17.8336 2.5756 1.8807 0.0042 18.6163 2.6231 1.8777
variance0.2624 0.5257 0.3573 0.0325 0.2365 0.6223 0.3564 0.0381 0.2365 0.6223 0.3567 0.0361
|skewness|0.0000 0.3164 0.0129 0.0215 0.0000 0.6035 0.0149 0.0346 0.0000 0.6035 0.0142 0.0305
F23sentiscore2(noun)-0.0000 8.5184 1.5585 1.2550 0.0000 9.6439 1.5191 1.2559 0.0000 9.6439 1.5333 1.2557
topicsentiscore(topic)mean0.0000 0.8750 0.0186 0.0678 0.0000 0.6250 0.0054 0.0319 0.0000 0.8750 0.0102 0.0485
variance0.0000 0.8125 0.0885 0.1647 0.0000 0.7500 0.0294 0.0936 0.0000 0.8125 0.0507 0.1273
|skewness|0.0000 0.2917 0.0370 0.0595 0.0000 0.2500 0.0146 0.0369 0.0000 0.2917 0.0227 0.0476
Table A5. Descriptive statistics on the features of F3 from the test bed.
Table A5. Descriptive statistics on the features of F3 from the test bed.
TypeFeaturesSocialTERMsEventTERMsSocialTERMs + EventTERMs
MinMaxMeanSDMinMaxMeanSDMinMaxMeanSD
F31degree(noun, CBNco-news)-0.00000.3314 0.0115 0.0233 0.0000 0.3808 0.0094 0.0261 0.0000 0.3808 0.0102 0.0252
closeness(noun, CBNco-news)-0.0000 0.0536 0.0008 0.0028 0.0000 0.0848 0.0007 0.0044 0.0000 0.0848 0.0007 0.0039
betweenness(noun, CBNco-news)-0.0000 257.1000 65.9200 32.24150.0000 257.1000 60.7803 35.5006 0.0000 257.1000 62.6374 34.4473
F32degree(noun, CBNco-topic)-0.0011 0.0956 0.0090 0.0086 0.0011 0.1365 0.0077 0.0092 0.0011 0.1365 0.0082 0.0090
closeness(noun, CBNco-topic)-0.0000 0.0537 0.0013 0.0040 0.0000 0.1162 0.0011 0.0055 0.0000 0.1162 0.0012 0.0050
betweenness(noun, CBNco-topic)-430.7596 811.8286 600.6902 55.1418 430.7596 882.7842 587.6797 55.8419 430.7596 882.7842 592.3807 55.9402
F33degree(noun, ITNco-news(topic))mean0.0000 0.5625 0.0783 0.1257 0.0000 0.5664 0.0670 0.1204 0.0000 0.5664 0.0711 0.1225
variance0.0000 0.0693 0.0013 0.0064 0.0000 0.0578 0.0006 0.0036 0.0000 0.0693 0.0009 0.0048
|skewness|0.0000 2.1494 0.0506 0.2658 0.0000 1.9938 0.0346 0.2127 0.0000 2.1494 0.0403 0.2334
closeness(noun, ITNco-news(topic))mean0.0000 1.0000 0.0260 0.1069 0.0000 1.0000 0.0158 0.0878 0.0000 1.0000 0.0195 0.0953
variance0.0000 0.5000 0.0056 0.0360 0.0000 0.5000 0.0019 0.0213 0.0000 0.5000 0.0033 0.0276
|skewness|0.0000 3.5246 0.0718 0.3811 0.0000 3.1623 0.0400 0.2923 0.0000 3.5246 0.0515 0.3275
betweenness(noun, ITNco-news(topic))mean0.0000 1.0000 0.2892 0.4211 0.0000 1.0000 0.2380 0.3982 0.0000 1.0000 0.2565 0.4074
variance0.0000 0.2813 0.0046 0.0247 0.0000 0.2813 0.0036 0.0230 0.0000 0.2813 0.0040 0.0236
|skewness|0.0000 2.6458 0.0429 0.2585 0.0000 2.6458 0.0366 0.2437 0.0000 2.6458 0.0389 0.2491
F34degree(noun, ICNco-topic(community))-0.0089 0.6063 0.0855 0.0668 0.0048 0.6929 0.0753 0.0665 0.0048 0.6929 0.0790 0.0668
closeness(noun, ICNco-topic(community))-0.0010 0.0154 0.0036 0.0022 0.0009 0.0145 0.0034 0.0020 0.0009 0.0154 0.0035 0.0021
betweenness(noun, ICNco-topic(community))-0.0000 0.5131 0.0207 0.0547 0.0000 0.6866 0.0146 0.0507 0.0000 0.6866 0.0168 0.0522

Appendix C

Table A6. Pairwise t tests on three performance measures for adding a feature set.
Table A6. Pairwise t tests on three performance measures for adding a feature set.
(a) Performance Measure = Accuracy
HypothesisDT
BL DTBagging DTBoosting DTRS DT
tptptptp
F1 + F1(-) > F1(-)0.2235 0.8240 6.22470.00007.17410.00009.73410.0000
F2 + F2(-) > F2(-)30.88210.000037.43640.000021.40160.000028.78310.0000
F3 + F3(-) > F3(-)1.1166 0.2688 3.23030.00215.96780.000018.23840.0000
NB
BL NBBagging NBBoosting NBRS NB
tptptptp
F1 + F1(-) > F1(-)25.10390.000036.63980.00007.60150.000033.71820.0000
F2 + F2(-) > F2(-)19.90010.000042.31850.000031.30280.000047.05580.0000
F3 + F3(-) > F3(-)3.05250.0035−0.3495 0.7280 4.26650.0001−0.1539 0.8783
RBFN
BL RBFNBagging RBFNBoosting RBFNRS RBFN
tptptptp
F1 + F1(-) > F1(-)0.2235 0.8240 4.56180.000034.39600.00004.49050.0000
F2 + F2(-) > F2(-)30.88210.000034.76080.000059.40400.000036.45250.0000
F3 + F3(-) > F3(-)1.1166 0.2688 1.1315 0.2625 19.62230.00001.0085 0.3177
SVM
BL SVMBagging SVMBoosting SVMRS SVM
tptptptp
F1 + F1(-) > F1(-)47.84300.000026.10930.000024.53620.000039.79710.0000
F2 + F2(-) > F2(-)46.07170.000062.90560.000048.74270.000056.27020.0000
F3 + F3(-) > F3(-)1.6093 0.1131 12.81530.00007.90370.000011.22890.0000
DBN
BL DBNBagging DBNBoosting DBNRS DBN
tptptptp
F1 + F1(-) > F1(-)1.7887 0.0791 4.18230.00014.77980.00002.07360.0427
F2 + F2(-) > F2(-)11.15020.000019.12450.000012.84510.000020.22120.0000
F3 + F3(-) > F3(-)1.4964 0.1401 4.39540.00010.8260 0.4123 1.9701 0.0537
(b) Performance measure = F-measure
HypothesisDT
BL DTBagging DTBoosting DTRS DT
tptptptp
F1 + F1(-) > F1(-)−8.94870.00004.48600.00063.43220.004310.33050.0000
F2 + F2(-) > F2(-)18.86970.000021.19060.000012.20110.000018.80140.0000
F3 + F3(-) > F3(-)1.1519 0.2644 0.4977 0.6249 4.39960.000311.66020.0000
NB
BL NBBagging NBBoosting NBRS NB
tptptptp
F1 + F1(-) > F1(-)20.33050.000033.16590.00003.96950.000948.25660.0000
F2 + F2(-) > F2(-)−7.45500.000024.35940.000018.88620.000027.95790.0000
F3 + F3(-) > F3(-)0.9334 0.3634 −3.45500.00312.20660.0459−1.9266 0.0702
RBFN
BL RBFNBagging RBFNBoosting RBFNRS RBFN
tptptptp
F1 + F1(-) > F1(-)−8.94870.00002.72540.014121.97690.00002.67840.0162
F2 + F2(-) > F2(-)18.86970.000018.55740.000039.43770.000016.48520.0000
F3 + F3(-) > F3(-)1.1519 0.2644 0.3593 0.7237 22.94240.00000.5061 0.6189
SVM
BL SVMBagging SVMBoosting SVMRS SVM
tptptptp
F1 + F1(-) > F1(-)51.32230.000020.30920.000012.63800.000028.97820.0000
F2 + F2(-) > F2(-)15.67090.000026.68240.000027.73340.000035.09390.0000
F3 + F3(-) > F3(-)1.5515 0.1400 7.46000.00006.37500.00009.55680.0000
DBN
BL DBNBagging DBNBoosting DBNRS DBN
tptptptp
F1 + F1(-) > F1(-)0.2507 0.8049 1.7329 0.1056 3.07370.00661.1969 0.2469
F2 + F2(-) > F2(-)1.5755 0.1357 2.0121 0.0600 4.29510.00046.95310.0000
F3 + F3(-) > F3(-)0.0878 0.9310 2.56950.01970.8361 0.4143 0.9683 0.3458
(c) Performance measure = AUC
HypothesisDT
BL DTBagging DTBoosting DTRS DT
tptptptp
F1 + F1(-) > F1(-)−5.29980.000011.72730.00008.30590.000014.22230.0000
F2 + F2(-) > F2(-)17.51290.000035.04950.000016.17470.000031.33890.0000
F3 + F3(-) > F3(-)4.30070.00012.99570.00416.63880.000013.64990.0000
NB
BL NBBagging NBBoosting NBRS NB
tptptptp
F1 + F1(-) > F1(-)57.10720.000011.32640.000018.75060.00005.53780.0000
F2 + F2(-) > F2(-)68.73470.000088.41920.000064.78010.000093.43640.0000
F3 + F3(-) > F3(-)1.9929 0.0510 −2.75240.00819.31950.0000−8.19660.0000
RBFN
BL RBFNBagging RBFNBoosting RBFNRS RBFN
tptptptp
F1 + F1(-) > F1(-)−5.29980.000010.71450.000043.90450.00009.48670.0000
F2 + F2(-) > F2(-)17.51290.000042.08000.000072.70030.000036.34160.0000
F3 + F3(-) > F3(-)4.30070.00012.95940.004524.03470.00002.26640.0272
SVM
BL SVMBagging SVMBoosting SVMRS SVM
tptptptp
F1 + F1(-) > F1(-)50.69930.000037.29240.000032.27630.000041.90190.0000
F2 + F2(-) > F2(-)40.27390.000078.45040.000052.77080.000066.46010.0000
F3 + F3(-) > F3(-)1.8869 0.0644 21.89090.000011.02550.000011.40460.0000
DBN
BL DBNBagging DBNBoosting DBNRS DBN
tptptptp
F1 + F1(-) > F1(-)1.7315 0.0889 4.55100.00005.04350.00001.6871 0.0970
F2 + F2(-) > F2(-)11.75400.000021.86300.000012.57760.000020.49070.0000
F3 + F3(-) > F3(-)1.4243 0.1598 4.74510.00000.9398 0.3513 1.7289 0.0892
Notes: F1(-) = F2 + F3, F2(-) = F1 + F3, and F3(-) = F1 + F2. The results are t and p values of the t tests for feature set comparisons, and the results with a significance level higher than 5% are italicized.
Table A7. Pairwise t tests on three performance measures for different base learners.
Table A7. Pairwise t tests on three performance measures for different base learners.
(a) Performance Measure = Accuracy
HypothesisF1F1 + F2F1 + F2 + F3
tptptp
BL NB > BL DT12.07130.0000−69.51400.0000−72.88540.0000
BL RBFN > BL DT0.0000 1.0000 0.0000 1.0000 0.0000 1.0000
BL SVM > BL DT3.41220.0015−37.16660.0000−36.70210.0000
BL DBM > BL DT−16.39510.0000−19.09500.0000−20.09590.0000
BL RBFN > BL NB−12.07130.000069.51400.000072.88540.0000
BL SVM > BL NB−14.10940.000045.42360.000041.64820.0000
BL DBN > BL NB−21.82540.0000−1.0832 0.2874 0.5031 0.6186
BL SVM > BLRBFN3.41220.0015−37.16660.0000−36.70210.0000
BL DBN > BL RBFN−16.39510.0000−19.09500.0000−20.09590.0000
BL DBN > BL SVM−18.46710.0000−9.26540.0000−8.96520.0000
(b) Performance measure = F-measure
HypothesisF1F1 + F2F1 + F2 + F3
tptptp
BL NB > BL DT17.30220.0000−104.75760.0000−84.91640.0000
BL RBFN > BL DT0.0000 1.0000 0.0000 1.0000 0.0000 1.0000
BL SVM > BL DT5.14330.0000−49.48010.0000−37.67460.0000
BL DBM > BL DT−18.94020.0000−19.15570.0000−20.06820.0000
BL RBFN > BL NB−17.30220.0000104.75760.000084.91640.0000
BL SVM > BL NB−25.81950.000085.05020.000068.87420.0000
BL DBN > BL NB−23.03630.0000−4.88590.0000−4.17430.0002
BL SVM > BLRBFN5.14330.0000−49.48010.0000−37.67460.0000
BL DBN > BL RBFN−18.94020.0000−19.15570.0000−20.06820.0000
BL DBN > BL SVM−20.39500.0000−12.73620.0000−13.12090.0000
(c) Performance measure = AUC
HypothesisF1F1 + F2F1 + F2 + F3
tptptp
BL NB > BL DT21.21320.00006.11230.00001.9996 0.0535
BL RBFN > BL DT0.0000 1.0000 0.0000 1.0000 0.0000 1.0000
BL SVM > BL DT−15.02290.0000−35.95300.0000−34.34910.0000
BL DBM > BL DT−25.16560.0000−24.84530.0000−27.27660.0000
BL RBFN > BL NB−21.21320.0000−6.11230.0000−1.9996 0.0535
BL SVM > BL NB−80.04030.0000−86.12650.0000−99.11180.0000
BL DBN > BL NB−38.30700.0000−30.10590.0000−33.14000.0000
BL SVM > BLRBFN−15.02290.0000−35.95300.0000−34.34910.0000
BL DBN > BL RBFN−25.16560.0000−24.84530.0000−27.27660.0000
BL DBN > BL SVM−19.74930.0000−9.76270.0000−9.71550.0000
Notes: The results are t and p values of the t tests for classification technique comparisons, and the results with a significance level higher than 5% are italicized.
Table A8. Pairwise t tests on three performance measures for base leaner vs. ensemble learning methods.
Table A8. Pairwise t tests on three performance measures for base leaner vs. ensemble learning methods.
(a) Performance Measure = Accuracy
Ensemble methodsHypothesisF1F1 + F2F1 + F2 + F3
tptptp
BaggingBagging DT > BL DT88.44980.0000106.01510.0000117.76500.0000
Bagging NB > BL NB−17.70520.000027.16560.000017.23750.0000
Bagging RBFN > BL RBFN67.78530.000081.62470.000075.43740.0000
Bagging SVM > BL SVM2.78990.007415.61350.000024.07450.0000
Bagging DBN > BL DBN2.87580.00563.79630.00045.90880.0000
BoostingBoosting DT > BL DT14.48930.000071.14360.0000134.90440.0000
Boosting NB > BL NB−6.57430.000030.59470.000031.58200.0000
Boosting RBFN > BL RBFN5.19060.0000−3.89270.000326.03110.0000
Boosting SVM > BL SVM1.9504 0.0562 19.75250.000026.51130.0000
Boosting DBN > BL DBN11.01540.00003.74990.00042.63600.0108
RSRS DT > BL DT10.74630.000020.25100.000048.40880.0000
RS NB > BL NB−20.32020.000032.36850.000021.34470.0000
RS RBFN > BL RBFN87.00350.000089.35070.000095.77490.0000
RS SVM > BL SVM2.16450.034722.00200.000027.39030.0000
RS DBN > BL DBN−10.73110.0000−0.3488 0.7285 −0.3114 0.7566
(b) Performance measure = F-measure
Ensemble methodsHypothesisF1F1 + F2F1 + F2 + F3
tptptp
BaggingBagging DT > BL DT75.93690.0000132.90170.0000102.13130.0000
Bagging NB > BL NB−21.51700.000072.05260.000065.48270.0000
Bagging RBFN > BL RBFN66.42830.000081.78510.000076.56840.0000
Bagging SVM > BL SVM2.56870.013330.12210.000038.68660.0000
Bagging DBN > BL DBN3.82420.00034.43230.00016.66380.0000
BoostingBoosting DT > BL DT18.32000.000073.15550.0000120.73630.0000
Boosting NB > BL NB−8.70850.000068.84450.000057.02370.0000
Boosting RBFN > BL RBFN2.94660.0048−5.38160.000027.47130.0000
Boosting SVM > BL SVM1.7372 0.0877 35.69820.000042.41570.0000
Boosting DBN > BL DBN9.57770.00003.71960.00053.09710.0031
RSRS DT > BL DT6.75420.000023.69990.000042.59810.0000
RS NB > BL NB−24.85910.000098.78480.000068.56320.0000
RS RBFN > BL RBFN68.43760.000083.15820.000095.28360.0000
RS SVM > BL SVM1.8305 0.0725 36.75340.000043.59460.0000
RS DBN > BL DBN−13.26000.0000−0.4610 0.6466 −0.4142 0.6803
(c) Performance measure = AUC
Ensemble methodsHypothesisF1F1 + F2F1 + F2 + F3
tptptp
BaggingBagging DT > BL DT120.50130.0000112.79570.000098.44800.0000
Bagging NB > BL NB−29.56580.0000−31.58240.0000−37.86720.0000
Bagging RBFN > BL RBFN95.64640.000090.03830.000078.40100.0000
Bagging SVM > BL SVM28.86700.000066.80310.000078.59650.0000
Bagging DBN > BL DBN4.15590.00014.75370.00006.82910.0000
BoostingBoosting DT > BL DT25.49240.000083.01140.0000111.50710.0000
Boosting NB > BL NB−15.73580.0000−8.58680.00001.3123 0.1953
Boosting RBFN > BL RBFN11.53630.000022.36860.000026.1984 0.0000
Boosting SVM > BL SVM29.58380.000077.87300.000073.1120 0.0000
Boosting DBN > BL DBN7.93220.00001.8372 0.0713 1.0919 0.2794
RSRS DT > BL DT20.81640.000034.07700.000046.44260.0000
RS NB > BL NB−28.03940.0000−33.23380.0000−43.85820.0000
RS RBFN > BL RBFN100.52340.000097.63230.000083.21220.0000
RS SVM > BL SVM2.03600.046720.98320.000029.37670.0000
RS DBN > BL DBN−12.58050.0000−0.3533 0.7251 −0.4280 0.6703
Notes: The results are t and p values of the t tests for classification technique comparisons, and the results with a significance level higher than 5% are italicized.
Table A9. Pairwise t tests on three performance measures for different ensemble methods, i.e., ensemble method vs. ensemble method.
Table A9. Pairwise t tests on three performance measures for different ensemble methods, i.e., ensemble method vs. ensemble method.
(a) Performance Measure = Accuracy
Base learnersHypothesisF1F1 + F2F1 + F2 + F3
tptptp
DTBoosting DT > Bagging DT−34.73630.00005.79990.000014.28090.0000
RS DT > Bagging DT−72.34510.0000−43.93220.0000−33.97800.0000
RS DT > Boosting DT−8.35940.0000−39.69750.0000−45.15530.0000
NBBoosting NB > Bagging NB8.20970.000012.82480.000016.65180.0000
RS NB > Bagging NB−3.64720.00061.7535 0.0850 1.4520 0.1520
RS NB > Boosting NB−10.61850.0000−12.18530.0000−16.29860.0000
RBFNBoosting RBFN > Bagging RBFN−64.36080.0000−72.33530.0000−60.11610.0000
RS RBFN > Bagging RBFN8.17920.00009.42830.00008.48860.0000
RS RBFN > Boosting RBFN83.60840.000079.56340.000079.68890.0000
SVMBoosting SVM > Bagging SVM−0.9735 0.3345 2.46550.0167−1.0771 0.2861
RS SVM > Bagging SVM−0.7949 0.4300 1.4032 0.1666 −1.0371 0.3043
RS SVM > Boosting SVM0.1963 0.8450 −1.4398 0.1557 0.0709 0.9437
DBNBoosting DBN > Bagging DBN7.29250.00000.3045 0.7619 −3.14140.0028
RS DBN > Bagging DBN−13.88080.0000−4.07020.0002−6.25610.0000
RS DBN > Boosting DBN−30.07400.0000−4.01450.0002−2.95520.0045
(b) Performance measure = F-measure
Base learnersHypothesisF1F1 + F2F1 + F2 + F3
tptptp
DTBoosting DT > Bagging DT−32.42980.00004.10170.000211.49100.0000
RS DT > Bagging DT−57.83360.0000−47.78450.0000−26.93370.0000
RS DT > Boosting DT−12.05620.0000−39.24250.0000−36.06100.0000
NBBoosting NB > Bagging NB8.20020.000014.23730.000020.17740.0000
RS NB > Bagging NB−1.9306 0.0585 0.5775 0.5665 2.03650.0463
RS NB > Boosting NB−9.97780.0000−15.61580.0000−19.10770.0000
RBFNBoosting RBFN > Bagging RBFN−76.59380.0000−78.71430.0000−56.74670.0000
RS RBFN > Bagging RBFN8.08500.00007.58140.00008.14280.0000
RS RBFN > Boosting RBFN76.63850.000080.79900.000073.97330.0000
SVMBoosting SVM > Bagging SVM−1.3708 0.1772 1.8589 0.0682 −0.3744 0.7095
RS SVM > Bagging SVM−1.3800 0.1747 −1.3300 0.1893 −1.0393 0.3032
RS SVM > Boosting SVM0.0280 0.9778 −3.62900.0006−0.7164 0.4767
DBNBoosting DBN > Bagging DBN5.64230.0000−0.2732 0.7858 −3.6884 0.0006
RS DBN > Bagging DBN−19.39050.0000−4.86810.0000−6.90530.0000
RS DBN > Boosting DBN−31.36740.0000−4.13740.0001−3.45370.0011
(c) Performance measure = AUC
Base learnersHypothesisF1F1 + F2F1 + F2 + F3
tptptp
DTBoosting DT > Bagging DT−34.60900.00008.03640.000022.52450.0000
RS DT > Bagging DT−78.71280.0000−52.10420.0000−53.00850.0000
RS DT > Boosting DT−12.00810.0000−45.21200.0000−69.02980.0000
NBBoosting NB > Bagging NB0.8335 0.4099 23.98240.000029.98020.0000
RS NB > Bagging NB0.6667 0.5076 −4.88010.0000−6.55410.0000
RS NB > Boosting NB−0.4738 0.6383 −26.02680.0000−34.45830.0000
RBFNBoosting RBFN > Bagging RBFN−98.57280.0000−104.38900.0000−94.43180.0000
RS RBFN > Bagging RBFN10.11870.00008.96040.00007.02420.0000
RS RBFN > Boosting RBFN103.18780.0000117.49960.0000102.82840.0000
SVMBoosting SVM > Bagging SVM5.27140.00009.43000.00005.08550.0000
RS SVM > Bagging SVM−29.69670.0000−43.90770.0000−47.86720.0000
RS SVM > Boosting SVM−29.91900.0000−54.04030.0000−47.49040.0000
DBNBoosting DBN > Bagging DBN3.73280.0004−2.74830.0084−5.51100.0000
RS DBN > Bagging DBN−18.91130.0000−4.98500.0000−6.78790.0000
RS DBN > Boosting DBN−25.42710.0000−2.14880.0359−1.4594 0.1499
Notes: The results are t and p values of the t tests for classification technique comparisons, and the results with a significance level higher than 5% are italicized.

References

  1. Myung, W.; Lee, G.-H.; Won, H.-H.; Fava, M.; Mischoulon, D.; Nyer, M.; Kim, D.K.; Heo, J.-Y.; Jeon, H.J. Paraquat Prohibition and Change in the Suicide Rate and Methods in South Korea. PLoS ONE 2015, 10, e0128980. [Google Scholar] [CrossRef] [PubMed]
  2. Ittipanuvat, V.; Fujita, K.; Sakata, I.; Kajikawa, Y. Finding linkage between technology and social issue: A Literature Based Discovery approach. J. Eng. Technol. Manag. 2014, 32 (Suppl. C), 160–184. [Google Scholar] [CrossRef]
  3. Phillips, W.; Lee, H.; Ghobadian, A.; O’Regan, N.; James, P. Social Innovation and Social Entrepreneurship: A Systematic Review. Group Organ. Manag. 2015, 40, 428–461. [Google Scholar] [CrossRef]
  4. Chang, R.M.; Kauffman, R.J.; Kwon, Y. Understanding the paradigm shift to computational social science in the presence of big data. Decis. Support Syst. 2014, 63 (Suppl. C), 67–80. [Google Scholar] [CrossRef]
  5. Chi, Y.L. A consumer-centric design approach to develop comprehensive knowledge-based systems for keyword discovery. Exp. Syst. Appl. 2009, 36, 2481–2493. [Google Scholar] [CrossRef]
  6. Jiang, S.; Chen, H.; Nunamaker, J.F.; Zimbra, D. Analyzing firm-specific social media and market: A stakeholder-based event analysis framework. Decis. Support Syst. 2014, 67, 30–39. [Google Scholar] [CrossRef]
  7. Zhang, Y.L.; Dang, Y.; Chen, H.C. Gender Classification for Web Forums. IEEE Trans. Syst. Man Cybern. A 2011, 41, 668–677. [Google Scholar] [CrossRef]
  8. Suh, J.H. Forecasting the daily outbreak of topic-level political risk from social media using hidden Markov model-based techniques. Technol. Forecast. Soc. Change 2015, 94, 115–132. [Google Scholar] [CrossRef]
  9. Suh, J.H.; Park, C.H.; Jeon, S.H. Applying text and data mining techniques to forecasting the trend of petitions filed to e-People. Exp. Syst. Appl. 2010, 37, 7255–7268. [Google Scholar] [CrossRef]
  10. Einav, L.; Levin, J. Economics in the age of big data. Science 2014, 346, 1243089. [Google Scholar] [CrossRef]
  11. Debortoli, S.; Müller, O.; Junglas, I.A.; vom Brocke, J. Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial. Commun. Assoc. Inf. Syst. 2016, 39, 110–135. [Google Scholar] [CrossRef] [Green Version]
  12. Conde, A.; Larrañaga, M.; Arruarte, A.; Elorriaga, J.A.; Roth, D. litewi: A combined term extraction and entity linking method for eliciting educational ontologies from textbooks. J. Assoc. Inf. Sci. Technol. 2016, 67, 380–399. [Google Scholar] [CrossRef]
  13. Tseng, Y.-H.; Lin, C.-J.; Lin, Y.-I. Text mining techniques for patent analysis. Inf. Proc. Manag. 2007, 43, 1216–1247. [Google Scholar] [CrossRef]
  14. Chou, C.-H.; Sinha, A.P.; Zhao, H. Commercial Internet filters: Perils and opportunities. Decis. Support Syst. 2010, 48, 521–530. [Google Scholar] [CrossRef]
  15. Dang, Y.; Zhang, Y.L.; Chen, H.C.; Hu, P.J.H.; Brown, S.A.; Larson, C. Arizona Literature Mapper: An Integrated Approach to Monitor and Analyze Global Bioterrorism Research Literature. J. Am. Soc. Inf. Sci. Technol. 2009, 60, 1466–1485. [Google Scholar] [CrossRef]
  16. He, Q.; Chang, K.; Lim, E.P.; Banerjee, A. Keep it simple with time: A reexamination of probabilistic topic detection models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1795–1808. [Google Scholar] [PubMed]
  17. Hu, Y.-H.; Chen, Y.-L.; Chou, H.-L. Opinion mining from online hotel reviews—A text summarization approach. Inf. Process. Manag. 2017, 53, 436–449. [Google Scholar] [CrossRef]
  18. Liu, Y.-H.; Chen, Y.-L.; Ho, W.-L. Predicting associated statutes for legal problems. Inf. Process. Manag. 2015, 51, 194–211. [Google Scholar] [CrossRef]
  19. Suh, J.H. Comparing writing style feature-based classification methods for estimating user reputations in social media. Springerplus 2016, 5, 261. [Google Scholar] [CrossRef]
  20. Nieminen, P.; Polonen, I.; Sipola, T. Research literature clustering using diffusion maps. J. Informetr. 2013, 7, 874–886. [Google Scholar] [CrossRef] [Green Version]
  21. Chen, K.-Y.; Luesukprasert, L.; Chou, S.-C.T. Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling. IEEE Trans. Knowl. Data Eng. 2007, 19, 1016–1025. [Google Scholar] [CrossRef]
  22. Xu, H.; Zhang, F.; Wang, W. Implicit feature identification in Chinese reviews using explicit topic mining model. Knowl.-Based Syst. 2015, 76, 166–175. [Google Scholar] [CrossRef]
  23. Zheng, X.L.; Lin, Z.; Wang, X.W.; Lin, K.J.; Song, M.N. Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification. Knowl.-Based Syst. 2014, 61, 29–47. [Google Scholar] [CrossRef]
  24. Rose, S.; Engel, D.; Cramer, N.; Cowley, W. Automatic keyword extraction from individual documents. Text Min. 2010, 1–20. [Google Scholar]
  25. Abilhoa, W.D.; de Castro, L.N. A keyword extraction method from twitter messages represented as graphs. Appl. Math. Comput. 2014, 240, 308–325. [Google Scholar] [CrossRef]
  26. Noh, H.; Jo, Y.; Lee, S. Keyword selection and processing strategy for applying text mining to patent analysis. Exp. Syst. Appl. 2015, 42, 4348–4360. [Google Scholar] [CrossRef]
  27. Piryani, R.; Madhavi, D.; Singh, V.K. Analytical mapping of opinion mining and sentiment analysis research during 2000–2015. Inf. Process. Manag. 2017, 53, 122–150. [Google Scholar] [CrossRef]
  28. Yang, S.; Han, R.; Wolfram, D.; Zhao, Y. Visualizing the intellectual structure of information science (2006–2015): Introducing author keyword coupling analysis. J. Informetr. 2016, 10, 132–150. [Google Scholar] [CrossRef]
  29. Liu, Z.; Jansen, B.J. Questioner or question: Predicting the response rate in social question and answering on Sina Weibo. Inf. Process. Manag. 2018, 54, 159–174. [Google Scholar] [CrossRef]
  30. Peetz, M.-H.; de Rijke, M.; Kaptein, R. Estimating Reputation Polarity on Microblog Posts. Inf. Process. Manag. 2016, 52, 193–216. [Google Scholar] [CrossRef]
  31. Almeida, T.A.; Silva, T.P.; Santos, I.; Gómez Hidalgo, J.M. Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering. Knowl.-Based Syst. 2016, 108 (Suppl. C), 25–32. [Google Scholar] [CrossRef]
  32. Rao, Y.; Li, Q.; Wu, Q.; Xie, H.; Wang, F.L.; Wang, T. A multi-relational term scheme for first story detection. Neurocomputing 2017, 254 (Suppl. C), 42–52. [Google Scholar] [CrossRef] [Green Version]
  33. Lin, D.; Li, L.; Cao, D.; Lv, Y.; Ke, X. Multi-modality weakly labeled sentiment learning based on Explicit Emotion Signal for Chinese microblog. Neurocomputing 2018, 272 (Suppl. C), 258–269. [Google Scholar] [CrossRef]
  34. Alruily, M.; Ayesh, A.; Zedan, H. Crime profiling for the Arabic language using computational linguistic techniques. Inf. Process. Manag. 2014, 50, 315–341. [Google Scholar] [CrossRef]
  35. Lo, S.L.; Chiong, R.; Cornforth, D. An unsupervised multilingual approach for online social media topic identification. Exp. Syst. Appl. 2017, 81 (Suppl. C), 282–298. [Google Scholar] [CrossRef]
  36. Pournarakis, D.E.; Sotiropoulos, D.N.; Giaglis, G.M. A computational model for mining consumer perceptions in social media. Decis. Support Syst. 2017, 93 (Suppl. C), 98–110. [Google Scholar] [CrossRef]
  37. Zhang, Y.; Porter, A.L.; Hu, Z.; Guo, Y.; Newman, N.C. “Term clumping” for technical intelligence: A case study on dye-sensitized solar cells. Technol. Forecast. Soc. Change 2014, 85, 26–39. [Google Scholar] [CrossRef] [Green Version]
  38. Li, Q.; Jin, Z.; Wang, C.; Zeng, D.D. Mining opinion summarizations using convolutional neural networks in Chinese microblogging systems. Knowl.-Based Syst. 2016, 107 (Suppl. C), 289–300. [Google Scholar] [CrossRef]
  39. Weichselbraun, A.; Gindl, S.; Scharl, A. Enriching semantic knowledge bases for opinion mining in big data applications. Knowl.-Based Syst. 2014, 69, 78–85. [Google Scholar] [CrossRef] [Green Version]
  40. Li, Q.; Liu, Y. Exploring the diversity of retweeting behavior patterns in Chinese microblogging platform. Inf. Process. Manag. 2017, 53, 945–962. [Google Scholar] [CrossRef]
  41. Jung, S.; Segev, A. Analyzing future communities in growing citation networks. Knowl.-Based Syst. 2014, 69, 34–44. [Google Scholar] [CrossRef]
  42. Lee, Y.; Kim, S.Y.; Song, I.; Park, Y.; Shin, J. Technology opportunity identification customized to the technological capability of SMEs through two-stage patent analysis. Scientometrics 2014, 100, 227–244. [Google Scholar] [CrossRef]
  43. Dang, Y.; Zhang, Y.; Chen, H. A Lexicon-Enhanced Method for Sentiment Classification: An Experiment on Online Product Reviews. IEEE Intell. Syst. 2010, 25, 46–53. [Google Scholar] [CrossRef]
  44. Chen, C.C.; Chen, Y.T.; Chen, M.C. An aging theory for event life-cycle modeling. IEEE Trans. Syst. Man Cybern. A 2007, 37, 237–248. [Google Scholar] [CrossRef]
  45. Zhu, X.S.; Oates, T. Finding story chains in newswire articles using random walks. Inform. Syst. Front 2014, 16, 753–769. [Google Scholar] [CrossRef]
  46. Cataldi, M.; Di Caro, L.; Schifanella, C. Emerging topic detection on Twitter based on temporal and social terms evaluation. In MDMKDD ’10, Proceedings of the Tenth International Workshop on Multimedia Data Mining; ACM: Washington, DC, USA, 2010; Volume 4, pp. 1–10. [Google Scholar]
  47. Vavliakis, K.N.; Symeonidis, A.L.; Mitkas, P.A. Event identification in web social media through named entity recognition and topic modeling. Data Knowl. Eng. 2013, 88, 1–24. [Google Scholar] [CrossRef]
  48. Gang, D.; Jun, G.; Weiran, X.; Zhen, Y. Maximizing the reliability of two-state automaton for burst feature detection in news streams. In IEEE International Conference on Progress in Informatics and Computing (PIC); IEEE CS Press: Shanghai, China, 2010; Volume 1, pp. 229–233. [Google Scholar]
  49. Yang, C.C.; Xiaodong, S.; Chih-Ping, W. Discovering Event Evolution Graphs from News Corpora. IEEE Trans. Syst. Man Cybern. A 2009, 39, 850–863. [Google Scholar] [CrossRef]
  50. Schumaker, R.P.; Chen, H. Textual analysis of stock market prediction using breaking financial news. ACM Trans. Infor. Syst. 2009, 27, 1–19. [Google Scholar] [CrossRef]
  51. Huang, H.-H.; Kuo, Y.-H. Cross-Lingual Document Representation and Semantic Similarity Measure: A Fuzzy Set and Rough Set Based Approach. IEEE Trans. Fuzzy Syst. 2010, 18, 1098–1111. [Google Scholar] [CrossRef]
  52. Spina, D.; Gonzalo, J.; Amigó, E. Discovering filter keywords for company name disambiguation in twitter. Exp. Syst. Appl. 2013, 40, 4986–5003. [Google Scholar] [CrossRef]
  53. Sheth, A.; Thomas, C.; Mehra, P. Continuous Semantics to Analyze Real-Time Data. IEEE Int. Comput. 2010, 14, 84–89. [Google Scholar] [CrossRef] [Green Version]
  54. Jatowt, A.; Au Yeung, C.M.; Tanaka, K. Generic method for detecting focus time of documents. Inf. Process. Manag. 2015, 51, 851–868. [Google Scholar] [CrossRef] [Green Version]
  55. Zhang, Y.L.; Dang, Y.; Chen, H.C. Research note: Examining gender emotional differences in Web forum communication. Decis. Support Syst. 2013, 55, 851–860. [Google Scholar] [CrossRef]
  56. Ji, X.; Chun, S.A.; Wei, Z.; Geller, J. Twitter sentiment classification for measuring public health concerns. Soc. Netw. Anal. Min. 2015, 5, 13. [Google Scholar] [CrossRef]
  57. Borgatti, S.P.; Mehra, A.; Brass, D.J.; Labianca, G. Network Analysis in the Social Sciences. Science 2009, 323, 892–895. [Google Scholar] [CrossRef] [Green Version]
  58. Yan, E.J.; Ding, Y. Applying Centrality Measures to Impact Analysis: A Coauthorship Network Analysis. J. Am. Soc. Inf. Sci. Technol. 2009, 60, 2107–2118. [Google Scholar] [CrossRef]
  59. Suh, J.H. Exploring the effect of structural patent indicators in forward patent citation networks on patent price from firm market value. Technol. Anal. Strateg. Manag. 2015, 27, 485–502. [Google Scholar] [CrossRef]
  60. Boccaletti, S.; Latora, V.; Moreno, Y.; Chavez, M.; Hwang, D.U. Complex networks: Structure and dynamics. Phys. Rep. 2006, 424, 175–308. [Google Scholar] [CrossRef] [Green Version]
  61. Akoglu, L.; Tong, H.; Koutra, D. Graph based anomaly detection and description: A survey. Data Min. Knowl. Discov. 2014, 29, 626–688. [Google Scholar] [CrossRef]
  62. Steinhaeuser, K.; Chawla, N.V. Identifying and evaluating community structure in complex networks. Pattern Recognit. Lett. 2010, 31, 413–421. [Google Scholar] [CrossRef]
  63. Zhao, Z.; Feng, S.; Wang, Q.; Huang, J.Z.; Williams, G.J.; Fan, J. Topic oriented community detection through social objects and link analysis in social networks. Knowl.-Based Syst. 2012, 26, 164–173. [Google Scholar] [CrossRef]
  64. Expert, P.; Evans, T.S.; Blondel, V.D.; Lambiotte, R. Uncovering space-independent communities in spatial networks. Proc. Natl. Acad. Sci. USA 2011, 108, 7663–7668. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  65. Vincent, D.B.; Jean-Loup, G.; Renaud, L.; Etienne, L. Fast unfolding of communities in large networks. J. Statist. Mech. 2008, 2008, P10008. [Google Scholar]
  66. Nettleton, D.F. Data mining of social networks represented as graphs. Comput. Sci. Rev. 2013, 7, 1–34. [Google Scholar] [CrossRef] [Green Version]
  67. Ku, Y.; Chiu, C.; Zhang, Y.; Chen, H.; Su, H. Text mining self-disclosing health information for public health service. J. Assoc. Infor. Sci. Technol. 2014, 65, 928–947. [Google Scholar] [CrossRef]
  68. Abbasi, A.; Chen, H.; Thoms, S.; Fu, T. Affect analysis of web forums and blogs using correlation ensembles. IEEE Trans. Knowl. Data Eng. 2008, 20, 1168–1180. [Google Scholar] [CrossRef]
  69. Oztekin, A.; Kong, Z.Y.J.; Delen, D. Development of a structural equation modeling-based decision tree methodology for the analysis of lung transplantations. Decis. Support Syst. 2011, 51, 155–166. [Google Scholar] [CrossRef]
  70. Yang, Y.M.; Slattery, S.; Ghani, R. A study of approaches to hypertext categorization. J. Intell. Inf. Syst. 2002, 18, 219–241. [Google Scholar] [CrossRef]
  71. Roy, B.V.; Yan, X. Manipulation Robustness of Collaborative Filtering. Manag. Sci. 2010, 56, 1911–1929. [Google Scholar] [Green Version]
  72. Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques: Practical Machine Learning Tools and Techniques; Elsevier Science: New York, NY, USA, 2011. [Google Scholar]
  73. Farquad, M.A.H.; Ravi, V.; Raju, S.B. Churn prediction using comprehensible support vector machine: An analytical CRM application. Appl. Soft Comput. 2014, 19, 31–40. [Google Scholar] [CrossRef]
  74. Pinto, T.; Sousa, T.M.; Praça, I.; Vale, Z.; Morais, H. Support Vector Machines for decision support in electricity markets׳ strategic bidding. Neurocomputing 2016, 172, 438–445. [Google Scholar] [CrossRef]
  75. Abdel-Zaher, A.M.; Eldeib, A.M. Breast cancer classification using deep belief networks. Exp. Syst. Appl. 2016, 46, 139–144. [Google Scholar] [CrossRef]
  76. Bengio, Y.; Lamblin, P.; Popovici, D.; Larochelle, H. Greedy layer-wise training of deep networks. Adv. Neural Inf. Proc. Syst. 2007, 19, 153. [Google Scholar]
  77. Bengio, Y. Learning Deep Architectures for AI. Found. Trends Mach. Learn. 2009, 2, 1–127. [Google Scholar] [CrossRef] [Green Version]
  78. Wang, G.; Sun, J.S.; Ma, J.; Xu, K.Q.; Gu, J.B. Sentiment classification: The contribution of ensemble learning. Decis. Support Syst. 2014, 57, 77–93. [Google Scholar] [CrossRef]
  79. Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms, 1th ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2012; p. 236. [Google Scholar]
  80. Abbasi, A.; Chen, H. Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inform. Syst. 2008, 26, 7. [Google Scholar] [CrossRef]
  81. Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man Cybern. C 2012, 42, 463–484. [Google Scholar] [CrossRef]
  82. Díez-Pastor, J.F.; Rodríguez, J.J.; García-Osorio, C.I.; Kuncheva, L.I. Diversity techniques improve the performance of the best imbalance learning ensembles. Inform. Sci. 2015, 325, 98–117. [Google Scholar] [CrossRef]
  83. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  84. WHO. Suicide Mortality Rate: Data Tables; World Health Statistics: Geneva, Switzerland, 2017. [Google Scholar]
  85. Suh, J.H.; Park, S.C. Service-oriented Technology Roadmap (SoTRM) using patent map for R&D strategy of service industry. Exp. Syst. Appl. 2009, 36, 6754–6772. [Google Scholar]
  86. Lombard, M.; Snyder-Duch, J.; Bracken, C.C. Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability. Hum. Commun. Res. 2002, 28, 587–604. [Google Scholar] [CrossRef]
Figure 1. Research framework, proposed by this study to design and examine the SocialTERM-Extractor.
Figure 1. Research framework, proposed by this study to design and examine the SocialTERM-Extractor.
Sustainability 11 00196 g001
Figure 2. Illustration on how to form cross-boundary networks (CBNs), i.e., CBNco-news and CBNco-topic, and in-boundary networks (IBNs), i.e., ITNs (topic) and ICNs (community).
Figure 2. Illustration on how to form cross-boundary networks (CBNs), i.e., CBNco-news and CBNco-topic, and in-boundary networks (IBNs), i.e., ITNs (topic) and ICNs (community).
Sustainability 11 00196 g002
Figure 3. Ratio of agreement that a feature set improved performance.
Figure 3. Ratio of agreement that a feature set improved performance.
Sustainability 11 00196 g003
Figure 4. The performance rankings of base learners for a given feature set.
Figure 4. The performance rankings of base learners for a given feature set.
Sustainability 11 00196 g004
Figure 5. Ratio of agreement that an ensemble method improved performance for a given feature set.
Figure 5. Ratio of agreement that an ensemble method improved performance for a given feature set.
Sustainability 11 00196 g005
Figure 6. The accuracy rankings of ensemble methods for a given feature set with different base learners.
Figure 6. The accuracy rankings of ensemble methods for a given feature set with different base learners.
Sustainability 11 00196 g006
Figure 7. The F-measure rankings of ensemble methods for a given feature set with different base learners.
Figure 7. The F-measure rankings of ensemble methods for a given feature set with different base learners.
Sustainability 11 00196 g007
Figure 8. The AUC rankings of ensemble methods for a given feature set with different base learners.
Figure 8. The AUC rankings of ensemble methods for a given feature set with different base learners.
Sustainability 11 00196 g008
Table 1. Recent works (2014–2018) that extract and use key terms for text mining applications.
Table 1. Recent works (2014–2018) that extract and use key terms for text mining applications.
Final Goals of Using Key TermsPrevious Works with DescriptionType of DataLevel of Textual Data Analysis 1Category of Key Term Extraction 2Technique Used for Key Term Extraction 3
L1L2L3C1C2C3T1T2T3T4
IndexingAbilhoa and de Castro [25] proposed a keyword extraction technique of Twitter messages in which tweets are represented as graphs.Tweets **
Noh, Jo and Lee [26] explored strategies for selecting and processing keywords for patent analysis purposes.Patent documents
Almeida et al. [31] proposed a method to normalize and expand original short and messy text messages.Short message service (SMS) *****
Rao et al. [32] presented a new term weighting scheme called LGT, which jointly models the Local element, Global element, and Topical association of each story.Online news articles * *
Lin et al. [33] proposed an Explicit Emotion Signal based cross media sentiment learning approach.Microblog (i.e., Sina Weibo) posts *** *
ClusteringJiang, Chen, Nunamaker and Zimbra [6] proposed a novel stakeholder-based event analysis framework that uses online stylometric analysis, and partitions their messages into different time periods of major firm events.Web forum posts ****
Alruily et al. [34] examined the crime domain in the Arabic language (unstructured text) using text mining techniques, and presented the development and application.Online news articles*****
Lo et al. [35] presented an unsupervised multilingual approach for identifying highly relevant terms and topics from the mass of social media data.Tweets ***
Pournarakis et al. [36] devised a novel genetic algorithm to improve clustering of tweets in semantically coherent groups.Tweets ***
SummarizationZhang et al. [37] presented six term clumping steps that can clean and consolidate topical content in text sources for tech mining.Research articles *
Zheng, Lin, Wang, Lin and Song [23] studied an approach to extract product and service aspect words, as well as sentiment words, automatically from reviews.Reviews *
Li et al. [38] proposed a convolutional neural network (CNN)-based opinion summarization method for Chinese microblogging systems.Microblog (i.e., Sina Weibo) posts
Hu, Chen and Chou [17] proposed a novel multi-text summarization technique for identifying the top-k most informative sentences of hotel reviews.Reviews*** *
ClassificationWeichselbraun et al. [39] presented a novel method for contextualizing and enriching large semantic knowledge bases for opinion mining with a focus on Web intelligence platforms and other high-throughput big data applications.Reviews **
Xu, Zhang and Wang [22] proposed a support vector machine (SVM)-based approach to identify implicit features from Chinese customer reviews.Reviews *****
Peetz, de Rijke and Kaptein [30] proposed a feature-based model based on three dimensions, i.e., the source of the tweet, the contents of the tweet and the reception of the tweet.Tweets *****
Li and Liu [40] established a classification model that predicts the temporal class of an original microblog’s retweeting time series by using readily available social-influential, topical, and temporal factors.Microblog (i.e., Sina Weibo) posts
This studyOnline news articles ***
MappingJung and Segev [41] proposed methods to analyze how communities change over time in the citation network graph without additional external information and based on node and link prediction and community detection.Research articles **
Lee et al. [42] suggested a way of technology opportunity identification that is customizable to the R&D capabilities of small and medium-sized enterprises (SMEs).Patent documents **
Yang, Han, Wolfram and Zhao [28] introduced the author keyword coupling analysis (AKCA) method to visualize the field of information science (2006–2015).Research articles *****
Piryani, Madhavi and Singh [27] presented a scientometric mapping of research work done on opinion mining and sentiment analysis (OMSA) during 2000–2016.Research articles **
Notes: 1 Level (L): sentence (L1), document (L2), and topic (L3). 2 Category of key term extraction (C): manual (C1), automatic (C2), and hybrid (C3). The asterisk * shows the previous works with * belong to the categories, checked with * and composing the hybrid type C3. 3 Techniques used for key term extraction (T): statistical (T1), linguistics (T2), machine learning (T3), and hybrid (T4). The asterisk * shows the previous works with * used the technique(s), checked with * and composing the hybrid type T4.
Table 2. Temporal weight features of the detected SPRTs’ key noun terms, proposed for this study.
Table 2. Temporal weight features of the detected SPRTs’ key noun terms, proposed for this study.
Feature Sub SetLevelTemporal Weight Feature 1
F11newsmean, variance, and |skewness| of dfscore1,t(noun)
topicmean, variance, and |skewness| of dfscore2,t(noun)
F12newsmean, variance, and |skewness| of tfscore1,t(noun)
topicmean, variance, and |skewness| of tfscore2,t(noun)
F13newsmean, variance, and |skewness| of titlescore1,t(noun)
topicmean, variance, and |skewness| of titlescore2,t(noun)
F14newsmean, variance, and |skewness| of idfscore1,t(noun)
topicmean, variance, and |skewness| of idfscore2,t(noun)
Notes: 1 The statistics of temporal weights of noun are obtained over t = 1, …, T.
Table 3. Sentiment features of the detected SPRTs’ key noun terms, proposed for this study.
Table 3. Sentiment features of the detected SPRTs’ key noun terms, proposed for this study.
Feature Sub SetLevelSentiment Feature 1,2,3
F21-featuresentiscore(noun, pos = verb)
featuresentiscore(noun, pos = adverb)
featuresentiscore(noun, pos = adjective)
nounsentiscore(noun)
F22newssentiscore1(noun)
variance and |skewness| of newssentiscore(news)
F23topicsentiscore2(noun)
variance and |skewness| of topicsentiscore(topic)
Notes: 1 The statistics of newssentiscore(news) and topicsentiscore(topic) are obtained respectively over newsNOUNNEWS(noun) and topicNOUNTOPIC(noun). 2 If n(NOUNNEWS(noun)) ≤ 1, the variance value of newssentiscore(news) is set as 0. If n(NOUNNEWS(noun)) ≤ 2, the skewness value of newssentiscore(news) is set as 0. 3 If n(NOUNTOPIC(noun)) ≤ 1, the variance value of topicsentiscore(topic) is set as 0. If n(NOUNTOPIC(noun)) ≤ 2, the skewness value of topicsentiscore(topic) is set as 0.
Table 4. Complex network structural features of the SPRTs’ key noun terms, proposed for this study.
Table 4. Complex network structural features of the SPRTs’ key noun terms, proposed for this study.
Feature Sub SetNetwork TypeBoundary TypeLink TypeComplex Network Structure Feature 1,2
F31Cross-boundary-co-newsdegree(noun, CBNco-news)
closeness(noun, CBNco-news)
betweenness(noun, CBNco-news)
F32 -co-topicdegree(noun, CBNco-topic)
closeness(noun, CBNco-topic)
betweenness(noun, CBNco-topic)
F33In-boundaryGiven topicco-newsmean, variance, and |skewness| of degree(noun, ITNco-news(topic))
mean, variance, and |skewness| of closeness(noun, ITNco-news(topic))
mean, variance, and |skewness| of betweenness(noun, ITNco-news(topic))
F34 Given communityco-topicdegree(noun, ICNco-topic(community))
closeness(noun, ICNco-topic(community))
betweenness(noun, ICNco-topic(community))
Notes: 1 The statistics of degree, closeness, and betweenness of noun in ITNco-news(topic) are obtained over topicNOUNTOPIC(noun). 2 If n(NOUNTOPIC(noun)) ≤ 1, the variance values of degree, closeness, and betweenness of noun in ITNco-news(topic) are set as 0. If n(NOUNTOPIC(noun)) ≤2, the skewness values of degree, closeness, and betweenness of noun in ITNco-news(topic) are set as 0.
Table 5. Classification techniques, constructed for this study.
Table 5. Classification techniques, constructed for this study.
Classification TechniquesBase LearnersEnsemble Methods
DTNBRBFNSVMDBNBL 1BaggingBoostingRS
BL DT
Bagging DT
Boosting DT
RS DT
BL NB
Bagging NB
Boosting NB
RS NB
BL RBFN
Bagging RBFN
Boosting RBFN
RS RBFN
BL SVM
Bagging SVM
Boosting SVM
RS SVM
BL DBN
Bagging DBN
Boosting DBN
RS DBN
Notes: 1 BL is the abbreviation of baseline, which means that the ensemble method is not used.
Table 6. Confusion matrix for classification results.
Table 6. Confusion matrix for classification results.
Actual Result
SocialTERMsEventTERMs
Predicted resultSocialTERMsTrue positive (TP)False positive (FP)
EventTERMsFalse negative (FN)True negative (TN)
Table 7. The list of the top 10 key noun terms for each class according to their document frequencies, dfscore1,t(noun).
Table 7. The list of the top 10 key noun terms for each class according to their document frequencies, dfscore1,t(noun).
ClassKey Noun Terms in Korean (English)dfscore1,t(noun)In-Class RankOverall Rank
SocialTERM여성 (female)0.0611 15
대학 (university)0.0611 26
병원 (hospital)0.0604 38
아이 (child)0.0589 410
학생 (student)0.0573 511
환자 (patient)0.0565 612
장애 (disability)0.0553 713
남성 (male)0.0528 821
선고 (sentence)0.0515 923
검찰 (prosecution)0.0511 1024
EventTERM경찰 (police)0.0776 11
학교 (school)0.0645 22
부산 (Busan)0.0642 33
교육 (education)0.0614 44
사람 (person)0.0609 57
대구 (Daegu)0.0603 69
수사 (investigation)0.0550 714
회장 (president)0.0543 815
발생 (outbreak)0.0543 916
교수 (professor)0.0543 1017
Table 8. Performances of different feature sets and different classification techniques.
Table 8. Performances of different feature sets and different classification techniques.
(a) Performance Measure = Accuracy
Feature setDT
BL DTBagging DTBoosting DTRS DT
F158.7788 ± 0.005974.384 4 ± 0.007663.2497 ± 0.015860.6012 ± 0.0071
F1 + F266.3552 ± 0.004281.1332 ± 0.006482.5505 ± 0.011870.7771 ± 0.0112
F1 + F2 + F366.5297 ± 0.003981.7142 ± 0.005983.8769 ± 0.005975.1081 ± 0.0089
NB
BL NBBagging NBBoosting NBRS NB
F160.2984 ± 0.003558.9720 ± 0.002259.6583 ± 0.004058.7659 ± 0.0022
F1 + F260.4700 ± 0.002062.1396 ± 0.002763.4530 ± 0.004962.2506 ± 0.0022
F1 + F2 + F360.6142 ± 0.002162.0401 ± 0.004064.1955 ± 0.005862.1799 ± 0.0034
RBFN
BL RBFNBagging RBFNBoosting RBFNRS RBFN
F158.7788 ± 0.005970.9400 ± 0.007859.5603 ± 0.005772.4308 ± 0.0062
F1 + F266.3552 ± 0.004276.9867 ± 0.005865.8276 ± 0.006178.4386 ± 0.0061
F1 + F2 + F366.5297 ± 0.003977.1597 ± 0.006768.9821 ± 0.003478.5092 ± 0.0056
SVM
BL SVMBagging SVMBoosting SVMRS SVM
F159.1825 ± 0.002659.4132 ± 0.003759.3267 ± 0.003159.3426 ± 0.0031
F1 + F263.0926 ± 0.002464.4464 ± 0.004164.6958 ± 0.003764.5732 ± 0.0028
F1 + F2 + F363.2930 ± 0.002865.4743 ± 0.004165.3720 ± 0.003265.3777 ± 0.0031
DBN
BL DBNBagging DBNBoosting DBNRS DBN
F153.3990 ± 0.017054.6916 ± 0.017857.5148 ± 0.011549.6817 ± 0.0085
F1 + F260.1239 ± 0.017461.5670 ± 0.011561.6693 ± 0.014459.9632 ± 0.0183
F1 + F2 + F360.7555 ± 0.015262.6925 ± 0.009561.7397 ± 0.013660.6326 ± 0.0153
(b) Performance measure = F-measure
Feature setDT
BL DTBagging DTBoosting DTRS DT
F157.6430 ± 0.008274.3585 ± 0.008863.6145 ± 0.015959.3495 ± 0.0111
F1 + F265.4532 ± 0.003981.1585 ± 0.005282.1295 ± 0.011970.5122 ± 0.0110
F1 + F2 + F365.5590 ± 0.004881.7649 ± 0.007283.8407 ± 0.006875.1471 ± 0.0113
NB
BL NBBagging NBBoosting NBRS NB
F160.4035 ± 0.003058.8007 ± 0.002859.5686 ± 0.004358.6722 ± 0.0024
F1 + F256.8606 ± 0.002362.1944 ± 0.003363.7492 ± 0.005062.2347 ± 0.0019
F1 + F2 + F357.0204 ± 0.002761.7321 ± 0.002964.3634 ± 0.006561.8825 ± 0.0028
RBFN
BL RBFNBagging RBFNBoosting RBFNRS RBFN
F157.6430 ± 0.008270.6085 ± 0.006858.1797 ± 0.005772.1917 ± 0.0083
F1 + F265.4532 ± 0.003976.9010 ± 0.006664.8197 ± 0.005278.2892 ± 0.0075
F1 + F2 + F365.5590 ± 0.004877.2176 ± 0.006868.8268 ± 0.004478.5373 ± 0.0057
SVM
BL SVMBagging SVMBoosting SVMRS SVM
F158.4588 ± 0.002858.7076 ± 0.004558.5790 ± 0.002558.5808 ± 0.0023
F1 + F261.5544 ± 0.001963.9253 ± 0.003864.0992 ± 0.003463.8106 ± 0.0027
F1 + F2 + F361.7701 ± 0.002765.0890 ± 0.003965.0543 ± 0.003364.9956 ± 0.0030
DBN
BL DBNBagging DBNBoosting DBNRS DBN
F143.4786 ± 0.040147.3221 ± 0.037752.3530 ± 0.031133.3678 ± 0.0116
F1 + F253.9359 ± 0.032756.9606 ± 0.018156.8000 ± 0.026653.5389 ± 0.0340
F1 + F2 + F354.8043 ± 0.029058.7171 ± 0.014056.8986 ± 0.023154.4863 ± 0.0305
(c) Performance measure = AUC
Feature setDT
BL DTBagging DTBoosting DTRS DT
F161.3058 ± 0.007282.3743 ± 0.006470.3160 ± 0.018065.8481 ± 0.0096
F1 + F268.7706 ± 0.008288.0634 ± 0.004689.8433 ± 0.011277.0963 ± 0.0106
F1 + F2 + F369.5211 ± 0.009588.6718 ± 0.004991.6607 ± 0.005479.8530 ± 0.0077
NB
BL NBBagging NBBoosting NBRS NB
F164.1659 ± 0.001862.8991 ± 0.001562.9606 ± 0.003862.9253 ± 0.0016
F1 + F269.7733 ± 0.003767.4650 ± 0.001569.0069 ± 0.003267.2505 ± 0.0019
F1 + F2 + F369.8820 ± 0.002967.3182 ± 0.002370.0062 ± 0.004366.9227 ± 0.0023
RBFN
BL RBFNBagging RBFNBoosting RBFNRS RBFN
F161.3058 ± 0.007277.6983 ± 0.006163.1809 ± 0.005379.3792 ± 0.0068
F1 + F268.7706 ± 0.008284.4544 ± 0.004972.4777 ± 0.003985.5619 ± 0.0047
F1 + F2 + F369.5211 ± 0.009584.9973 ± 0.005274.3151 ± 0.003385.9469 ± 0.0052
SVM
BL SVMBagging SVMBoosting SVMRS SVM
F159.1926 ± 0.002961.7137 ± 0.003862.3229 ± 0.005059.3262 ± 0.0022
F1 + F263.2656 ± 0.001868.0528 ± 0.003568.9031 ± 0.003564.5211 ± 0.0027
F1 + F2 + F363.4443 ± 0.002169.4318 ± 0.003669.9604 ± 0.004465.3753 ± 0.0029
DBN
BL DBNBagging DBNBoosting DBNRS DBN
F153.7798 ± 0.014755.3225 ± 0.014056.6244 ± 0.013050.2384 ± 0.0045
F1 + F260.2221 ± 0.017061.9084 ± 0.009560.9947 ± 0.015660.0628 ± 0.0179
F1 + F2 + F360.8064 ± 0.014762.9222 ± 0.008461.2206 ± 0.014760.6341 ± 0.0164
Notes: ± are standard deviations. For each base learner, the best result is highlighted as italics, and the best result over all configurations is additionally highlighted as red and bold italics.
Table 9. Pairwise t tests on three performance measures for different feature subsets when the best classification technique, namely Boosting DT, was selected.
Table 9. Pairwise t tests on three performance measures for different feature subsets when the best classification technique, namely Boosting DT, was selected.
(a) Performance Measure = Accuracy
Feature setAdded feature sub setHypothesisBoosting DTSupported
tp
F1F11F1(-) + F11 > F1(-)4.74540.0000
F12F1(-) + F11 + F12 > F1(-) + F114.03120.0002
F13F1(-) + F11 + F12 + F13 > F1(-) + F11 + F12−1.3335 0.1877
F14F1(-) + F11 + F12 + F13 + F14 > F1(-) + F11 + F12 + F130.0988 0.9216
F2F21F2(-) + F21 > F2(-)21.40200.0000
F22F2(-) + F21 + F22 > F2(-) + F21−1.1473 0.2560
F23F2(-) + F21 + F22 + F23 > F2(-) + F21 + F222.21010.0312
F3F31F3(-) + F31 > F3(-)3.24010.0020
F32F3(-) + F31 + F32 > F3(-) + F310.8737 0.3862
F33F3(-) + F31 + F32 + F33 > F3(-) + F31 + F321.8535 0.0689
F34F3(-) + F31 + F32 + F33 + F34 > F3(-) + F31 + F32 + F330.5195 0.6054
(b) Performance measure = F-measure
Feature setAdded feature sub setHypothesisBoosting DTSupported
tp
F1F11F1(-) + F11 > F1(-)3.81470.0004
F12F1(-) + F11 + F12 > F1(-) + F115.70160.0000
F13F1(-) + F11 + F12 + F13 > F1(-) + F11 + F12−0.7661 0.4467
F14F1(-) + F11 + F12 + F13 + F14 > F1(-) + F11 + F12 + F131.3614 0.1787
F2F21F2(-) + F21 > F2(-)27.37570.0000
F22F2(-) + F21 + F22 > F2(-) + F21−0.5322 0.5966
F23F2(-) + F21 + F22 + F23 > F2(-) + F21 + F222.04800.0451
F3F31F3(-) + F31 > F3(-)2.27520.0266
F32F3(-) + F31 + F32 > F3(-) + F311.6143 0.1128
F33F3(-) + F31 + F32 + F33 > F3(-) + F31 + F320.7423 0.4610
F34F3(-) + F31 + F32 + F33 + F34 > F3(-) + F31 + F32 + F331.4505 0.1523
(c) Performance measure = AUC
Feature setAdded feature sub setHypothesisBoosting DTSupported
tp
F1F11F1(-) + F11 > F1(-)4.32160.0001
F12F1(-) + F11 + F12 > F1(-) + F116.81920.0000
F13F1(-) + F11 + F12 + F13 > F1(-) + F11 + F120.8119 0.4202
F14F1(-) + F11 + F12 + F13 + F14 > F1(-) + F11 + F12 + F130.5489 0.5853
F2F21F2(-) + F21 > F2(-)20.26310.0000
F22F2(-) + F21 + F22 > F2(-) + F21−1.0377 0.3038
F23F2(-) + F21 + F22 + F23 > F2(-) + F21 + F222.29350.0256
F3F31F3(-) + F31 > F3(-)3.09330.0031
F32F3(-) + F31 + F32 > F3(-) + F312.75970.0082
F33F3(-) + F31 + F32 + F33 > F3(-) + F31 + F321.8379 0.0714
F34F3(-) + F31 + F32 + F33 + F34 > F3(-) + F31 + F32 + F331.1500 0.2550
Notes: F1(-) = F2 + F3. F2(-) = F1 + F3. F3(-) = F1 + F2. The results are t and p values of the t tests for feature set comparisons, and the results with a significance level higher than 5% are italicized.

Share and Cite

MDPI and ACS Style

Suh, J.H. SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques. Sustainability 2019, 11, 196. https://doi.org/10.3390/su11010196

AMA Style

Suh JH. SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques. Sustainability. 2019; 11(1):196. https://doi.org/10.3390/su11010196

Chicago/Turabian Style

Suh, Jong Hwan. 2019. "SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques" Sustainability 11, no. 1: 196. https://doi.org/10.3390/su11010196

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop