research-article

Open Access

Fuzzy Contrast Set Based Deep Attention Network for Lexical Analysis and Mental Health Treatment

Authors:
Usman Ahmed

Western Norway University of Applied Sciences, Bergen, Norway

Western Norway University of Applied Sciences, Bergen, Norway
View Profile

,
Jerry Chun-Wei Lin

Western Norway University of Applied Sciences, Bergen, Norway

Western Norway University of Applied Sciences, Bergen, Norway

0000-0001-8768-9709
View Profile

,
Gautam Srivastava

Brandon University, Canada and China Medical University, Taichung, Taiwan

Brandon University, Canada and China Medical University, Taichung, Taiwan

0000-0001-9851-4103
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21 Issue 5Article No.: 87pp 1–16https://doi.org/10.1145/3506701

Published:29 April 2022Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Internet-delivered psychological treatments (IDPT) consider mental problems based on Internet interaction. With such increased interaction because of the COVID-19 pandemic, more online tools have been widely used to provide evidence-based mental health services. This increase helps cover more population by using fewer resources for mental health treatments. Adaptivity and customization for the remedy routine can help solve mental health issues quickly. In this research, we propose a fuzzy contrast-based model that uses an attention network for positional weighted words and classifies mental patient authored text into distinct symptoms. After that, the trained embedding is used to label mental data. Then the attention network expands its lexicons to adapt to the usage of transfer learning techniques. The proposed model uses similarity and contrast sets to classify the weighted attention words. The fuzzy model then uses the sets to classify the mental health data into distinct classes. Our method is compared with non-embedding and traditional techniques to demonstrate the proposed model. From the experiments, the feature vector can achieve a high ROC curve of 0.82 with problems associated with nine symptoms.

1 INTRODUCTION

The COVID-19 epidemic in 93% of nations throughout the world has interrupted mental health services, according to a report by the World Health Organization. However, mental health demand has grown as a result of the lockdown of impacted regions as a preventative precaution. Physiological anxiety factors such as dread of sickness and anxiety about the future increase during any confinement [37]. Isolation and lack of school-wide relationships and jobs add to mental stress and lead to generally bad psychiatric treatment for the community. The absence of protection equipment, social isolation, and a high-stress environment exacerbate anxiety and symptoms of depression in health professionals on the frontline level. There was a significant degree of anxiety during the pandemic lockdown [13]. Since there is so much uncertainty about what causes depression, several elements are often associated with their research. Among the various facts that a wide and growing literature provides are several studies on how anxiety is dealt with. Due to conflicting accounts, it remains difficult to extract meaningful information.

A mix of current circumstances causes depression and long-term and personal factors rather than a single immediate problem or occurrence [27]. It is not always possible to evaluate the cause or the amendment in harsh conditions [18]. It is imperative to recognize early signs and symptoms of depression and obtain assistance as quickly as possible. Many internet forums and social media sites now allow individuals to interact anonymously to talk about their misery, bereavement, and future treatment options [19]. People from all around the world can freely express their opinions and feelings [26]. Online surveillance can be a proactive and promising way to identify those at high risk. It can prompt mediation and improve general well-being [31].

Anxiety is one of the world’s most debilitating disorders, according to the World Health Organization.¹ With more than 264 million individuals afflicted globally, anxiety has become a prevalent condition [12]. Untreated depression has the potential to worsen and cause lifetime pain [22]. Of the worst kind, anxiety can lead to suicidal thoughts. For example, approximately 800,000 people die each year by suicide according to the World Health Organization. In those aged 15 to 29 years, suicide is the second most common cause of death. The reason is that between 76% and 85% of mentally ill people in low- and middle-income countries are untreated. Lack of financial assistance and support, lack of trained practitioners, inaccurate assessment, and society’s mental stigma are an obstacle to good treatment [12]. Negative thoughts, shyness, and fear of disclosures are the primary obstacles preventing people from seeking treatment. People are frequently embarrassed, humiliated, and fearful of having their psychological anguish examined in depth [29]. Because of these factors, people may be hesitant to admit that they are sad or seek mental treatment and therapy. The prevention and treatment of mental health disorders have become a global concern in healthcare systems.

To develop an adaptive system that would reduce waiting times and deliver intervention at a reduced cost, the overburdened healthcare system is under pressure from economic and technological perspectives. Internet-delivered psychological treatments (IDPT) can aid a broad population with physical and psychological suffering while using fewer resources [28]. Most solutions already in place are tunnel based, inflexible, and incompatible [27]. Existing models lack adaptive behavior, leading to less user adhesion and more losses [14]. Therapies should take into consideration the numerous methods for consumers to get appropriate treatment. The implementation may make this user adopt an IDPT system to consider the user’s behavior. Users express their preferences and requests according to their conditions and psychological symptoms [27].

This research aims to obtain depression-oriented data from the written language of the patient. We then identify and visualize the results by utilizing a deep attention based approach. The dialogue expresses the worries of a patient with regard to mental health in the majority of cases. We next analyze the extraction of the factors causing symptoms of depression based on the statements of the patient. By employing interactive internet technology, we wish to deliver contextual information and visualization for mental health. We then analyze the removal of factors that produce symptoms associated with depression based on the comments of the patient. In addition, the designed model can help deliver preventive steps using interactive internet technology that provide contextual knowledge and visualization for mental health. We also utilize the NLP technique and in-depth learning to extract symptoms of anxiety and sadness from mental health therapy. After that, the semantic vectors are used to expand the synonym to identify mental health issues. Our method helps make the learning system generalized by minimizing data entry tasks. From the experiments, the proposed method reached 0.82 ROC, showing that semantic vectors for synonymous expansion enhance the accuracy of training without sacrificing results.

The rest of the article is organized as follows. Section 2 describes the related work. Section 3 outlines the fundamental strategy for the experimentation, data collection, and model development. Section 4 presents the contrast set, and Section 5 discusses the fuzzy inference system. Section 6 presents the outcomes and findings. Section 7 concludes by providing a summary and additional work recommendations.

2 RELATED WORK

Numerous attempts to enhance depression diagnosis using computer-aided techniques have indeed been made. Fliege et al. [10] discussed how to measure depressive symptoms using the Item Response Theory Test (IRT) Depression-CAT and D-CAT. They developed an application to analyze depression symptoms using actual patient data, thus increasing measurement accuracy and minimizing responsive loads. Instead of using a static questionnaire, an adaptive questionnaire was used to measure the progress [16]. The earlier data responses to queries have been utilized to select the next best questions. By asking the most pertinent questions for each patient’s CAT, it was feasible to incorporate fewer items while achieving greater measurement precision throughout the whole construct range.

Fliege et al. [10] discussed the method based on key linguistic text features. This work focuses on supervised classification methods and text features, which can only be used to detect states of mental effect by utilizing a limited dataset in short texts [41]. This approach, which also has difficulty with binary classification, classifies brief sentences as disturbed or not. At this fine level, there are four textual classifications: severe pain, mild discomfort, response, and pleasure. Any post that indicates an active desire to hurt anyone or oneself is classified as high distress in the annotated set of succinctly written messages, whereas those which have only noticed negative sentiments are labeled as moderate. In the numerous public online forums on mental well-being, the researchers evaluated a dataset of 200 comments. This dataset was used for further deep learning methods including naive Bayes, maximum entropy, and decision tree.

Online community teens stressed out utilizing layered widespread models were examined by Dinakar et al. [9]. They trained a set of base models for predicting labels including linear kernel support (SVM-L), a radial base kernel (SVM-R), and stochastic gradient boosted decision trees (GBDT). To form these models into categories of text in 23 different topics, text categorization was employed. For each code, a meta-function set was coupled with SMV-L, SVM-R, and GBDT [9]. The features for the basic classifier have been used by the chi-squared feature selection and hand-coded features that include unigrams, lexicons, and part-of-speech bigrams, among others. The ratings for the decision function of each prediction were then translated into meta-functions for meta-learners and also the topic distribution for the L-LDA model. They looked at 7,147 individual tales on a prominent teenage aid website posted by concerned teens.

The behavior of Twitter users and adolescents, in general, was studied by De Choudhury et al. [8] to decide whether or not they had been depressed. The study sought to create a machine learning model, thus identifying and predicting the beginnings of fear or sadness in specific individuals with a range of social media signals. The authors addressed the problem of generating an overview of the fundamental truth. Amazon Mechanical Turk annotators were required to complete the Center for Epidemiological Studies Depression scale. There were other inquiries regarding their history and present situations, which were distressing. The Turkish annotators who completed the questionnaire were asked for details about their Twitter login used to pull the feed from Twitter. On depressed/non-depressed data, a machine learning classifier was constructed by utilizing both tweets and network features; the feature of many followers was also included. A highly favorable relationship was discovered with the statistics on anxiety control centers when the classification was applied to a large sample of U.S. geo-located Twitter data. Research that examined more than 2 million tweets among 476 users to predict depression was published. The most effective results were obtained using SVM classification utilizing a collection of conduction properties and Tweet responses, as well as time and frequency of postings, such as pronouns, cursing, and sad phrases.

Another research work used public Twitter data to explore psychological problems [5]. In addition to indications of anxiety, bipolar disorder, and seasonal affective disorder, they gathered data for several mental illnesses quickly and affordably. Researchers utilized LIWC research to evaluate how much each disease group differs from a control group. They duplicated prior severe depression outcomes and added new bipolar PTSD results. Two language models were used: (1) a standard LM unigram to check every full word’s likelihood and (2) a 5-g LM character in sequences up to five characters. The classifier was established to separate each group from the control group by showing the corresponding signal in the language of each group [23]. Throughout the analysis and classification, the correlations were analyzed to identify connections and obtain insights into quantifiable and significant Twitter psychological signals. To recognize stress, Lin et al. [17] employed a deep neural network (DDN) to solve the limitations. Data from four microblogs were studied, and the authors analyzed the effects of their suggested four-layered DNNs. Examples of machine learning algorithms are random forest, SVM and naive Bayes. They utilized three pooling techniques for each model to evaluate performance: Max pooling, mean-over-instance, and mean-over-time. Each model performed well or poorly depending on the grouping technique. DNN, by using average overtime pooling, however, achieved the best results. Neuman et al. [30] developed an additional methodology named Pedesis, which employed the NLP dependency parsing method to crack websites that incorporate anxiety and extract improved conceptual domains into metaphorical connections. The domain knowledge was then utilized for defining words or sentences used for depression metaphors. Based on these facts, human experts developed a “depression lexicon,” which contains synonyms of the first and second grades. The vocabulary is used to autonomously evaluate the quantity of text depression and if the content deals with the subject of depression and hidden patterns, and large functions are often utilized to help the neural network build a unique depiction of the area [33]. The trained network then utilizes the knowledgeable characteristics to predict the conditional input vector distribution. Indeed, for domain-specific applications, several neural network topologies are proposed. The multi-layered perceptron architecture is one of the major notions. Every hidden layer utilizes average output layers in this network to calculate input and weights from the previous layer. The nonlinear activation function is used on the final/output layer of the network. Thus, they modify the gradient-dependent component and the loss function. The network is needed to lower the loss of supervised education, a nonlinear issue for optimization. The weight and bias parameters are used for maximizing the loss. Most of the approaches are based on the descent process. The gradient-based techniques start with random points for each input vector. Then several rounds are performed for a set of cases (batches). The loss is determined for the loss values and gradient by a trainer using the nonlinear objective function. The weights are then modified to decrease the loss function [33]. The loss is gradually reduced to the minimal level or the convergence point.

Hidden layers and the framework of architecture provide their predictive capacity to neural networks. The correct selection of many layers, architecture type, layers, and hyperparameters helps in the network tunning. Training the tuned network can help the input features’ higher-order representation of the vector [2, 7]. Higher representation of features is taught to generalize and enhance prediction ability. The network with the lowest computer complexity and best prediction capability is chosen in modern neural network research. The number of architectural ideas has increased during the previous two decades. Sze et al. [36] made the most important distinctions between the concealed layers, layer types, shapes, and linkages between the layers. Wainberg et al. [38] demonstrated how higher-dimensional features might be extracted from tabular data through techniques of machine learning. Pattern embedding accumulates from the image pixels in the convolutional neural network (CNN). The pixel information and the variation in information increases learning and prediction capabilities of the network. The translation-invariable pixel here aids the network. The recurrent neural network (RNN) architecture was designed and used for sequential data in the area of natural language processes, including machine translation, language generation, and time-series analysis [39]. Either an encoder or a decoder is made up of the RNN model, with the encoder sequencing the input and decoding it all into a vector with a fixed length. The model uses separate portals that depend upon the loss function to process the input attributes. Another challenge with the RNN encoder and decoder design is the alignment of input and output vectors. The sequence is dependent on the values of the neighbor. The development of a new network known as the attention mechanism is another RNN version [2, 20]. However, it uses the attention approach of the input vector by allocating weights selectively to selected inputs. The decoder may utilize the context vector orientation and related weights for greater representation of the characteristics, which are dependent on the priority significance and position of the relevant information. For predictions, the weights of the RNN model are learned by the architecture and feature representation, including care weight and the context vector [20]. There are many variants of the network, including a soft, hard, and global design, for such an attention mechanism. They created the soft attention paradigm to minimize contextual information [3]. The context vector was built with the average hidden status of the model. The approach helps in understanding how the input feature is concealed and in decreasing the loss. Xu et al. [40] build the context vector under close attention using hidden state sampling. Due to the difficulties of architectural convergence, however, hard attention reduces the cost of computation. Local and global attention are other differences indicated by Luong et al. [21]. Global attention is the central ground between softer and harder attention. The model chooses the focal point for each input batch, which contributes to having quick convergence. This is used to learn the position of the attention vector in the local attention model with a prediction function. Techniques anticipate the place of attention. Domain-specific data analysis aids in the development of effective local and global level attention architecture.

3 THE DESIGNED METHODOLOGY

This article proposes the embedding training approach for developing a depression symptom identification model (Figure 1). We applied cosine similarity to the PHQ-9 symptoms score in this technique, as illustrated in Figures 1 and 2. To increase knowledge and embed word size for similarity, the trained lexical enhanced approach is proposed. The suggested approach for extracting depression symptoms from just a patient’s authored text is described. Sample datasets were used from the work of Ahmed et al. [2] and Mukhiya et al. [28]. Data labeling is discussed in Section 3.5. An anonymous user provided one example from a patient:

Fig. 1. Methodology of the proposed method. Two output visualizations with the symptom’s probability score are shown in the prediction phase.

Fig. 2. Training and development set ROC-AUC comparisons.

I am in a really poor spot right now. Even my melancholy and anxiety are severe, and I am unable to function or hold down a job or do anything else, so I spend my days eating junk food at home. Each day is tedious and difficult to get it through, yet I am unable to operate in society due to my anxiety and sadness.

It is difficult to identify mental health diseases with ICD10 classification [34]. The dynamic nature and intensity of symptoms change based on a patient treated in a certain period of time for a certain condition. Psychiatrists therefore listen to the contours of the patient and collect further essential information during the whole evaluation procedure for mental health. The method of the psychiatrist involves using a typical analytical questionnaire such as the PHQ-9 and a supported test to examine the diagnostic reliability of each evaluation against the mental health problems of the participants. To determine the intensity, the schemes in the survey consist of symptomatic categories, and the frequency is to offer a score based on a certain threshold. For example, nine separate questionnaires are reflected in each of the symptoms, the frequency of which the process of creation is defined as light, moderate, or severe. The method is known as the “Elicitation Process Clinical Symptom” (CSEP) [34]. One of the major aims of this study is to automate the procedure by using active learning. Each set of symptoms is categorized according to the participant’s text periodicity, and clinical cumulative anxiety and depression are determined.

3.1 Psychometric Questionnaire (PQ)

Several additional PHQ-9 anxiety assessments are available, and PHQ-9 is among the most frequently utilized questionnaire, as proposed by Kroenke et al. [15]. The proposed technique for patient-authored content uses the standard PHQ-9 questionnaires [1, 2, 27]. Assessing depressed symptoms is a frequent procedure. As part of standard CSEP practice, the clinician asks each category’s inquiry and evaluates the patient’s response to add the frequency to the class. As shown in Table 1, nine symptoms can be classified into a variety of areas, including falling asleep, interest, concentrating, and eating problems. The psychiatrist determines the evaluation score after completing all of the question-based assessments. The patient’s depression level is indicated by the evaluation score.

Table 1.

Symptoms	PHQ-9
S1	Little interest or pleasure in doing things
S2	Feeling down, depressed, or hopeless
S3	Trouble falling or staying asleep or sleeping too much
S4	Feeling tired or having little energy
S5	Poor appetite or overeating
S6	Feeling bad about yourself or that you are a failure or have let yourself or your family down
S7	Trouble concentrating on things such as reading the newspaper or watching television
S8	Moving or speaking so slowly that other people may notice, or the opposite being so restless that you have been moving around a lot more than usual
S9	Thoughts that you would be better off dead or of hurting yourself

View Table

Table 1. Nine PHQ-9 Questionnaire

3.2 Seed Term Generation

Throughout this study, seed term generation is used for key words found in the PHQ-9 questionnaire. This section describes how the phrase anxiety lexicon was formed (the word list of symptoms of depression). It frequently comprises word forms of emotions, such as anxiety and sadness, as seen in Table 1. Seed lexicons are chosen manually, and the associated hypernyms, hyponyms, and antonyms are found using WordNet for each sentence [25]. WordNet, an English lexicon database repository, is managed and created by Princeton University. For each word category, the database stores names, measures, descriptions and adjectives. There is a collection of synsets in each category word that are then utilized for expressing unusual notions. Synsets are divided into lexicon-based and semantic categories. Words that are part of the same synset are synonyms, for example. The top five terms are helpful and connected to the key symptom phrases, according to empirical research. In addition, only the WordNet technique is used to expand the seed of the word in Table 1. Different categorization systems have various lists of symptoms of depression [27]. These lists use clinical or informal complaint vocabulary depending on whether the survey is a patient or physician questionnaire. Major classification systems of chronic depression, including DSM-5² and the World Health Organization [34], are extensively used, and have been integrated in a fine list of symptoms [27].

3.3 Pre-processing Step

Pre-processing is required to be implemented, as it requires to structure of the text. Each patient-authored text follows the following procedure:

(1)	All texts are processed and formatted by the UTF-8 encoding standard. This assists in the preservation of consistency.
(2)	Modify each word’s capitalization to lowercase.
(3)	Eliminate any tabs or spaces that may have been used to separate text.
(4)	Erase all non-valued unique characters (#, +, -, *, =, HTTP, HTTPS).
(5)	Substitute text-based phrases with full words (e.g., can’t with cannot).

3.4 Lexicon Embedding

A wide range of strategies for detecting emotions has been documented in the extensive NLP literature. Emotional knowledge-based systems especially have received much attention, which are made up of a vocabulary of phrase senses and a learned variety of context anchoring. Affective knowledge consists of words that convey context and emotions. We then provide an embedding strategy that uses contextually changeable words from the depression lexicon (based on word meaning) and emotional input from internet forums. We used a 300-dimensional pre-trained model for global vector for word representation (GloVe) [35]. Word embedding helps in the creation of word level tokens for the input patient texts. The context is projected in vector space via GloVe-based vector embedding. Here, the embedded part represents the learned sentence structure. Perhaps the restored embedding has captured the semantic structure of the text. Each word vector is spread according to Charles’s 2000 notion that you shall recognize a word by the company it maintains [4]. Linguistic patterns are also used to calculate the co-occurrence rates of vector representation terms. There is a comparable word nearby. As a result, the psychological analysis does not require a pre-trained model. We expand the dataset by training the custom mental health model with a word sense model and a transfer learning approach. This is true since a large portion of the embedding is based on open-source data (Wikipedia texts) and sentiment expertise (Twitter data). Emotions are expressed using the terms sad and joyful. These terms, however, allude to a certain mental state. As a result, word sense must be used to broaden the embedding. The emotional lexicon, which is based on word sense, aids in demonstrating potential consequences. Custom embedding for the categorization of various symptoms can indeed be used to accomplish fine-grain classification. The words that include the part of speech are retrieved using part-of-speech tagging: noun, verb, adverb, and adjective. For each retrieved part of speech, we used WordNet to extract synonyms, hyponyms, morphemes, and physical meaning from the corpus, which consists of a series of texts. As a consequence, for each document, we receive emotional words. The W set used to train the model is then utilized to build vocabulary. The learned vector is the word vector dimension. For every nine symptoms from the PHQ-9 questionnaire, lexicons are converted into a vector using the trained model. The cosine similarity technique is to compute the similarity between patient-authored text embeddings and symptoms mentioned in Table 1. Another similarity value ranging from 0 to 1 exists for each of the nine symptoms. We use a trained model to turn the sentences into vectors that are semantically aware.

3.5 Dataset

The dataset was obtained via a discussion on a webpage and social media platforms [27]. The 500 texts were labeled using the Amazon Mechanical Turk service [27]. To record the remaining data, the proposed method is used. The labeling is made by the technique of PHQ-9 rating, as mentioned next:

(1)	Score 0: Not at all
(2)	Score 1: Several days
(3)	Score 2: More than half the days
(4)	Score 3: Nearly every day.

For each symptom, we transform the annotation into a binary classifier with score 0 if there are no symptoms and scores 1, 2, 3 if any symptom is present.

3.6 Deep Learning Methods

The LSTM network can retain the necessary information in the cell memory. The last step from the hidden state in the output layer can be delivered in an LSTM unidirectional design. During the empirical research, we noticed that the elementally average method to total time stages was superior. We utilized a bidirectional LSTM design, which received token lists from beginning to end and forward LSTM unrolling.

The proposed attention approach makes use of the text’s word significance [42]. In addition to the LSTM layer, we then introduced the attention technique. This feature aids in the extraction of useful terms. The dropout layer receives the attention output vector as input. For the training of large networks, supervised learning conventionally requires a large labeled dataset. We used the transfer learning method to expand the lexical analysis and labeled the dataset.

4 CONTRAST SET

In this research, we build the lexicon by the preceding method. However, using the extended lexicon is not enough for learning contextual information. We proposed attention-based contrast sets to map the continuous vector concerning the associated labeling (i.e., symptoms) for patient-authored texts. The contrast set helps map the context and word concerning its labeled data points [32]. We used the concept of the support difference and co-occurrence context. We used the attention weights for each labeled data to find the classification of the patient-authored texts. The attention learning methods are used for the contrast set generation, and then we classified the fuzzy-based model in symptoms extraction, detection, and classification.

For a given attention-based lexicon, a contrast set pattern contains the user-defined frequency corresponding to different labels across the dataset. To formally define an attention-based contrast set, we proposed the pattern, support, and difference definition.

Definition 1

(Attention Lexicon Dataset).

Let \( \mathcal {L}=\left\lbrace l_{1}, l_{2}, \ldots , l_{n}\right\rbrace \) be a set of lexicons and \( {Class}=\left\lbrace c_{1}, c_{2}, \ldots , c_{m}\right\rbrace \) set of distinct labels. A attention lexicon dataset \( \mathcal {D} \) contains the lexicons set that has the positional weights for the distinct Class, \( {Instances}=\left\lbrace \left(\boldsymbol {l}_{w}, Class_{w}\right)\right\rbrace _{l=1}^{w} \), where \( \boldsymbol {l}_{w} \subseteq \) Class is a instances contain set of attention lexicons, \( {Class} \) is the distinct label for \( \boldsymbol {I}_{w} \), and w represents positional weighted words for the sentences used in the patient-authored texts.

Definition 2

(Pattern).

A pattern for the lexicon X is a set of emotional lexicons \( X \subseteq \mathcal {L} \) that contains the positional weighted lexicons for the distinct Class (i.e., symptoms).

Definition 3

(Support).

The support for the pattern X with respect to distinct label Class is the percentage of the instances in L, of which Class contains X: (1) \( \begin{equation} Sup (X, Class)=\frac{|\operatorname{Sup}(X, Class)|}{|\operatorname{Sup}(Class, \mathcal {L})|}, \end{equation} \) where \( \operatorname{Sup}(Class, \mathcal {L})=\left\lbrace \left(\boldsymbol {I}_{w}, Class_{w}\right) \in \mathcal {L} \mid l_{i}=c\right\rbrace _{i=1}^{n} \) is the set of lexicons labeled with Class, representing the count of X for distinct Class. In addition, \( \operatorname{sup}(X, class)=\left\lbrace \left(\boldsymbol {I}_{w}, Class_{w}\right) \in \mathcal {L} \mid \boldsymbol {I}_{w} \in \right. \) \( \operatorname{sup}(Class, \mathcal {L}) \) and \( \left.X \subseteq \boldsymbol {I}_{w}\right\rbrace _{w=1}^{n} \) is the set of instances in \( \mathcal {L} \), containing both pattern X and label Class. The number of distinct Classes j in \( \mathcal {L} \) is represented by \( \sum _{i=1}^{j}\left|\operatorname{Sup}\left(Class_{w}, \mathcal {L}\right)\right|= \) \( |\mathcal {L}| \), j.

Definition 4

(Support Difference).

The support difference for the pattern X with respect to label Class is represented in Equation (2). (2) \( \begin{equation} {Diff}(X, {Class})= \mathit {MAX} \left\lbrace Sup \left(X, Class_{i}\right)\right\rbrace _{i=1}^{j}-{\mathit {MIN}} \left\lbrace \sup \left(X, Class_{i}\right)\right\rbrace _{i=1}^{j} \end{equation} \)

Example: Consider the dataset mentioned in Table 2, which contains five instances and two Class (i.e., \( c_{1} \) and \( c_{2} \), respectively). The instance \( I_{w} \), \( \lbrace unhappy \), \( depressed\rbrace , \) appears in three instances: \( I_{1} \), \( I_{2} \), \( and \ I_{4} \). Therefore, the support for \( \lbrace unhappy \), \( depressed\rbrace \) in dataset \( \mathcal {L} \) is \( sup (\lbrace unhappy \), \( depressed\rbrace \), \( {L})= 3 / 5 = 0.6 \). The support for the set \( \lbrace unhappy \), \( depressed\rbrace \) with respect to distinct class \( c_{1} \) is \( sup (\lbrace unhappy \), \( depressed\rbrace , c_{1}) =3 / 3 = 1 \), since three instances (i.e., the instance \( \lbrace I_{1} \), \( I_{2} \), \( I_{4} \rbrace \)) with class \( c_{1} \) contains \( \lbrace unhappy \), \( depressed\rbrace \). In same way, the support of \( \lbrace unhappy \), \( depressed\rbrace \) with respect to the second class \( c_{2} \) is \( sup (\lbrace unhappy \), \( depressed\rbrace , c_{2}) = 3 / 0 = 0 \). The detailed calculation is mentioned in Table 3. The final contrast set is mentioned later in Table 5.

Table 2.

Instances#	Class	\( Item_1 \)	\( Item_2 \)	\( Item_3 \)	\( Item_4 \)	\( Item_5 \)
\( I_1 \)	c1	Sad	Unhappy	Depressed	–	–
\( I_2 \)	c1	–	Unhappy	Depressed	Helpful	–
\( I_3 \)	c2	Sad	–	–	Helpful	–
\( I_4 \)	c1	–	Unhappy	Depressed	Helpful	Devotion
\( I_5 \)	c2	Sad	–	Depressed	Helpful	–

Note: A dash (–) represents that the item is not presented in that instance.

View Table

Table 2. A Set of Lexicon Instances with Distinct Classes

Note: A dash (–) represents that the item is not presented in that instance.

Table 3.

Contrast Set	Count	Support	Class-1	Class-2	\( Sup(instance, \)	\( Sup(instance, \)	Support
					\( Class_1) \)	\( Class_2) \)	Difference
Sad	3	0.60	1	2	0.33	0.67	0.33
Unhappy	3	0.60	3	0	1	0	1
Unhappy, depressed	3	0.60	3	0	1	0	1
Depressed, helpful	3	0.60	2	1	0.67	0.33	0.33

View Table

Table 3. Contrast Set with Minimum Threshold of 0.60 and a Set of Lexicon Instances with Distinct Classes

For instance, if the value of support and difference is set to \( \delta = 0.6 \), then with references to Table 3 and 4, the set \( \lbrace unhappy, depressed\rbrace \) is said to be contrast set as the support is equal to 0.6 and difference is equal to 1.

Table 4.

Instances#	Class	Items
\( I_1 \)	\( c_1 \)	Sad	Unhappy	Unhappy, depressed
\( I_2 \)	\( c_1 \)	Unhappy		Unhappy, depressed		Depressed, helpful
\( I_3 \)	\( c_2 \)	Sad
\( I_4 \)	\( c_1 \)	Unhappy		Unhappy, depressed	Depressed, helpful
\( I_5 \)	\( c_2 \)	Sad		Depressed, helpful

View Table

Table 4. Contrast Set with the Contexts

As mentioned later in Table 5, consider the contrast set \( \lbrace unhappy\rbrace \), in which we have the similarity context \( \mathcal {N}_{S}(b)=\lbrace unhappy depressed\rbrace \) as both words share the emotional meaning of unhappy. The co-occurrence metric for the first instance is \( (Sad, \ unhappy, (unhappy,depressed)) \) since unhappy co-occurs with sad in instance \( I_{1}, \ unhappy \) co-occurs with \( \lbrace unhappy, \ depressed\rbrace \) in instance \( I_{1,2,4} \), and unhappy co-occurs with \( \lbrace depressed, \ helpful\rbrace \) in instance \( I_{2,4} \). The similarity and co-occurrence help measure the contrast sets as accurate based on the co-occurrence and their individual instance lexicon sets.

Table 5.


Sorted Symptom	Description	Probability
s4	Feeling tired or having little energy	0.46
s8	Moving or speaking so slowly that other people notice, or the opposite being so restless that you have been moving around a lot more than usual	0.45
s2	Feeling down, depressed, or hopeless	0.39
s1	Little interest or pleasure in doing things	0.26
s5	Poor appetite or overeating	0.15
s3	Trouble falling or staying asleep or sleeping too much	0.13
s7	Trouble concentrating on things such as reading the newspaper or watching television	0.12
s6	Feeling bad about yourself or that you are a failure or have let yourself or your family down	0.02
s9	Thoughts that you would be better off dead or of hurting yourself	0.0097

View Table

Table 5. Explainability Results

With the preceding example, we describe how the co-occurrence context helps capture the contrast sets. We now discuss the usefulness of the fuzzy sets as mentioned in Table 4.

5 FUZZY INFERENCE SYSTEM

A fuzzy knowledge supports the if-then rules to denote the relationship of input and output. The methods involve developing fuzzy rules, fuzzification, inference rules generation, and defuzzification to crisp outputs. The membership values convert the data into membership degrees among 0 and 1. We used the triangular membership function for modifying the contrast set support difference values to fuzzification [1]. The method is presented in Equation (3). (3) \( \begin{equation} f(x)= {\left\lbrace \begin{array}{ll}0 & \text{ if } x \le =i \\ \frac{x-i}{j-i} & \text{ if } i \le x \le j \\ \frac{k-x}{k-j} & \text{ if } j \le x \le k \end{array}\right.} \end{equation} \)

5.0.1 Fuzzy Rule Generation.

The rules from the contrast set and fuzzification values are important because they help classify the attention positional weighted elements. The itemset after contrast set generation can help classify into linguistic rules. The lexicon and assign classes help create many rules.

5.0.2 Defuzzification.

The testing data is fed to the fuzzification steps. Based on the membership function values, the fuzzified input is matched with the inference rules. The inferences rules are obtained from the linguistic values, converted into a fuzzy score using the weighted method. From the fuzzy score, a classification decision is produced. The flow of the operation is mentioned in Algorithm 1.

6 EXPERIMENTAL RESULT AND ANALYSIS

The patient-authored text was pre-processed and converted into an emotional-based lexicon. Then we trained different networks. We used the pre-trained network GloVe for the transfer learning task. The trained embedding was expanded with a new lexicon. Then the text model was converted into the nine symptom vector lexicon. After that, vectors for the patient questionnaire and patient authored text were used to label the unlabeled data. The labeled data was then trained and compared with different architectures. We used the ROC curve, precision, recall, and F-measure performance metrics. We used the Adam optimizer to reduce the training loss. Figures 2 and 3 show the performance of attention network [42] with a fuzzy contrast set, bidirectional LSTM [24], LSTM [11], and a feed-forward network [2]. Model tuning for deep neural architecture requires changes to the design in terms of cell type, the number of hidden layers, type of activation function, learning rate handling, and error function. In addition to the LSTM layer, we used the contrast set with a fuzzy classifier to improve model performance. During empirical analysis, models showed the overfitting issue on the development and testing set. To handle these issues, we performed the model for a longer time (i.e., 1,000 epochs). We employed the early stopping method to save and tune model processing. We also used the clipping method to avoid gradient issues [6].

Fig. 3. Loss analysis of the training and development set.

Figures 2 and 3 show the performance of the feed-forward network where the training loss reached 0.41 and the testing loss reached 0.59. The model tends to overfit and close to the upper left corner. The architecture did not perform well, as the sequential data do not preserve the sequence in the simple network. The sequential models can handle the sequential data and achieve good performance. Thus, we performed the LSTM network and reached a ROC value of 0.78. The recall of the LSTM model is presented in Figure 4, and the precision is presented in Figure 5. The model suffered from vanish gradient issues. The cell became complex to the complex gates. The architecture required to more tuning. The instances with symptoms related to the questionnaire performed well as bidirectional LSTM with contrast fuzzy set 0.82 F1 measure mentioned in the Figure 6.

Fig. 4. Recall-based comparison with different architectures.

Fig. 5. Precision-based comparison with different architectures.

Fig. 6. F1-measure-based comparison with different architectures.

In the bidirectional with contrast fuzzy set and bidirectional LSTM with attention, they can have better performance. The model used the two-directional approach from a backward and forward pass. Both hidden gates help preserve sequential order. The attention layer helps weigh the positional words. As a result, the trained model results in the lowest error. The recall curve at the top corner represents that the model has a low false-positive and false-negative rate. Due to the use of contrast fuzzy set, the bidirectional LSTM with contrast fuzzy set outperforms the bidirectional LSTM with attention. The model has high performance with the lowest development set error.

The model achieved ROC 0.91 in the training set and 0.82 in the development set. The high performance indicates that the model results in a high positive rate. The results support the existence of important words. This helps the contrast set generate distinct boundaries. The model recognizes the target word in the task, and it learned the symptoms of the patient’s authored text.

In Figure 7 and Table 5, the proposed model is able to visualize contrast set weights for the quoted words in sentences of patient-authored text. The visualized symptom and probability score [s4: 046, s8: 0.45, s2: 0.39 and s1: 0.26] reflects the context and patient triggering point. The patient is struggling with feeling tired or having little energy and moving or speaking so slowly that other people …. a lot more than usual, which indicates that the source of the contrast set is the words in dark highlight (i.e., tedious and difficult to get it through and my anxiety and sadness). Therefore, the model can find the contrast set words and highlight them according to probability and symptom scores.

Fig. 7. Explainable depression symptoms extracted by the proposed approach.

7 CONCLUSION

A tool for Natural Language Processing and deep learning for healthcare intervention has been introduced recently. The pandemic era forced treatment of psychological patients by using an online medium. A limited number of studies work on mental health symptoms. However, adoption of a model for mental health specifically is not well discussed. This article used a contrast set with a fuzzy model to classify mental health patients into nine distinct classes. We proposed the support difference contrast set lexicon analysis. The attention network uses that to fuzzify the input. Then, contrast inference rules are used to classify the mental health treatment test. The fuzzy rules can be used for the labeling and visualization tasks. This tool can help psychiatrists make a customized and appropriate program for remedy. The computer-aided system helps in highlighting key words and helps adapt and give visualizations. The LSTM model with attention and contrast is set to achieve the highest accuracy. The model achieved 0.82 ROC and helped visualize the weighted words. The weighted words can help understand the patient’s issues. In the future, we will try to implement a more adaptive algorithm to classify text and reduce overfitting issues.

Footnotes

REFERENCES

[1] Ahmed Usman, Lin Jerry Chun-Wei, and Srivastava Gautam. 2021. Fuzzy explainable attention-based deep active learning on mental-health data. In Proceedings of the 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’21). IEEE, Los Alamitos, CA, 1–6.Google ScholarDigital Library
Reference 1Reference 2
[2] Ahmed Usman, Mukhiya Suresh Kumar, Srivastava Gautam, Lamo Yngve, and Lin Jerry Chun-Wei. 2021. Attention-based deep entropy active learning using lexical algorithm for mental health treatment. Frontiers in Psychology 12 (2021), 471.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
[3] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In The International Conference on Learning Representations, Bengio Yoshua and LeCun Yann (Eds.). ICLR, 24–34.Google Scholar
Reference
[4] Charles Walter G.. 2000. Contextual correlates of meaning. Applied Psycholinguistics 21, 4 (2000), 505–524.Google ScholarCross Ref
Reference
[5] Chen Emily, Lerman Kristina, and Ferrara Emilio. 2020. Tracking social media discourse about the COVID-19 pandemic: Development of a public coronavirus Twitter data set. JMIR Public Health and Surveillance 6, 2 (2020), e19273.Google ScholarCross Ref
Reference
[6] Chen Xiangyi, Wu Steven Z., and Hong Mingyi. 2020. Understanding gradient clipping in private SGD: A geometric perspective. Advances in Neural Information Processing Systems 33 (2020), 1–10.Google Scholar
Reference
[7] Cho Kyunghyun, Merrienboer Bart van, Gülçehre Çaglar, Bahdanau Dzmitry, Bougares Fethi, Schwenk Holger, and Bengio Yoshua. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In The Conference on Empirical Methods in Natural Language Processing, Moschitti Alessandro, Pang Bo, and Daelemans Walter (Eds.). EMNLP, 1724–1734.Google ScholarCross Ref
Reference
[8] Choudhury Munmun De, Gamon Michael, Counts Scott, and Horvitz Eric. 2013. Predicting depression via social media. In Proceedings of the 7th International Conference on Weblogs and Social Media. 23–34.Google Scholar
Reference
[9] Dinakar Karthik, Weinstein Emily, Lieberman Henry, and Selman Robert Louis. 2014. Stacked generalization learning to analyze teenage distress. In Proceedings of the 8th International Conference on Weblogs and Social Media. 23–34.Google ScholarCross Ref
Reference 1Reference 2
[10] Fliege Herbert, Becker Janine, Walter Otto B., Bjorner Jakob B., Klapp Burghard F., and Rose Matthias. 2005. Development of a computer-adaptive test for depression (D-CAT). Quality of Life Research 14, 10 (2005), 2277.Google ScholarCross Ref
Reference 1Reference 2
[11] Greff Klaus, Srivastava Rupesh K., Koutník Jan, Steunebrink Bas R., and Schmidhuber Jürgen. 2016. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (2016), 2222–2232.Google ScholarCross Ref
Reference
[12] James Spencer L., Abate Degu, Abate Kalkidan Hassen, Abay Solomon M., Abbafati Cristiana, Abbasi Nooshin, Abbastabar Hedayat, et al. 2018. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: A systematic analysis for the global burden of disease study 2017. Lancet 392, 10159 (2018), 1789–1858.Google ScholarCross Ref
Reference 1Reference 2
[13] Karmen Christian, Hsiung Robert C., and Wetter Thomas. 2015. Screening internet forum participants for depression symptoms by assembling and enhancing multiple NLP methods. Computer Methods and Programs in Biomedicine 120, 1 (2015), 27–36.Google ScholarDigital Library
Reference
[14] Konrad Artie, Bellotti Victoria, Crenshaw Nicole, Tucker Simon, Nelson Les, Du Honglu, Pirolli Peter, and Whittaker Steve. 2015. Finding the adaptive sweet spot. In Proceedings of the 33rd Annual Conference on Human Factors in Computing Systems. ACM, New York, NY, 3829–3838.Google ScholarDigital Library
Reference
[15] Kroenke Kurt, Spitzer Robert L., and Williams Janet B. W.. 2001. The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine 16, 9 (2001), 606–613.Google ScholarCross Ref
Reference
[16] Li Tim M. H., Chau Michael, Wong Paul W. C., and Yip Paul S. F.. 2012. A hybrid system for online detection of emotional distress. In Intelligence and Security Informatics. Springer, 73–80.Google ScholarCross Ref
Reference
[17] Lin Huijie, Jia Jia, Guo Quan, Xue Yuanyuan, Li Qi, Huang Jie, Cai Lianhong, and Feng Ling. 2014. User-level psychological stress detection from social media using deep neural network. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, New York, NY, 507–516.Google ScholarDigital Library
Reference
[18] Losada David E. and Gamallo Pablo. 2018. Evaluating and improving lexical resources for detecting signs of depression in text. Language Resources and Evaluation 54, 1 (2018), 1–24.Google ScholarCross Ref
Reference
[19] Low Daniel M., Rumker Laurie, Talkar Tanya, Torous John, Cecchi Guillermo, and Ghosh Satrajit S.. 2020. Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on Reddit during COVID-19: Observational study. Journal of Medical Internet Research 22, 10 (2020), e22635.Google ScholarCross Ref
Reference
[20] Lu Jiasen, Yang Jianwei, Batra Dhruv, and Parikh Devi. 2016. Hierarchical question-image co-attention for visual question answering. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, Lee Daniel D., Sugiyama Masashi, Luxburg Ulrike von, Guyon Isabelle, and Garnett Roman (Eds.). Curran Associates, 289–297.Google Scholar
Reference 1Reference 2
[21] Luong Thang, Pham Hieu, and Manning Christopher D.. 2015. Effective approaches to attention-based neural machine translation. In The Conference on Empirical Methods in Natural Language Processing, Màrquez Lluís, Callison-Burch Chris, Su Jian, Pighin Daniele, and Marton Yuval (Eds.). Stanford University, Stanford, CA, 1412–1421.Google ScholarCross Ref
Reference
[22] Mazza Mario Gennaro, Lorenzo Rebecca De, Conte Caterina, Poletti Sara, Vai Benedetta, Bollettini Irene, Melloni Elisa Maria Teresa, et al. 2020. Anxiety and depression in COVID-19 survivors: Role of inflammatory and clinical predictors. Brain, Behavior, and Immunity 89 (2020), 594–600.Google ScholarCross Ref
Reference
[23] McDonnell Michelle, Owen Jason Edward, and Bantum Erin O’Carroll. 2020. Identification of emotional expression with cancer survivors: Validation of linguistic inquiry and word count. JMIR Formative Research 4, 10 (2020), e18246.Google ScholarCross Ref
Reference
[24] Melamud Oren, Goldberger Jacob, and Dagan Ido. 2016. context2vec: Learning generic context embedding with bidirectional LSTM. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 51–61.Google ScholarCross Ref
Reference
[25] Miller George, Fellbaum Christiane, Kegl Judy, and Miller Katherine. 2009. WordNet: An electronic lexical reference system based on theories of lexical memory. Revue Québécoise de Linguistique 17, 2 (2009), 181–212.Google ScholarCross Ref
Reference
[26] Mühleck Julia, Borse Sigrid, Wunderer Eva, Strauß Bernhard, and Berger Uwe. 2019. Online-Befragung zur Bekanntheit von Angeboten zur Aufklärung, Prävention, Beratung und Nachsorge bei Essstörungen. Prävention und Gesundheitsförderung 15, 1 (2019), 73–79.Google ScholarCross Ref
Reference
[27] Mukhiya Suresh Kumar, Ahmed Usman, Rabbi Fazle, Pun Violet Ka I., and Lamo Yngve. 2020. Adaptation of IDPT system based on patient-authored text data using NLP. In Proceedings of the IEEE International Symposium on Computer-Based Medical Systems. IEEE, Los Alamitos, CA, 27–36.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
Reference 8
[28] Mukhiya S. K., Wake J. D., Inal Y., and Lamo Y.. 2020. Adaptive systems for internet-delivered psychological treatments. IEEE Access 8 (2020), 112220–112236.Google ScholarCross Ref
Reference 1Reference 2
[29] Mukhiya Suresh Kumar, Wake Jo Dugstad, Inal Yavuz, Pun Ka I., and Lamo Yngve. 2020. Adaptive elements in internet-delivered psychological treatment systems: Systematic review. Journal of Medical Internet Research 22, 11 (2020), e21066.Google ScholarCross Ref
Reference
[30] Neuman Yair, Cohen Yohai, Assaf Dan, and Kedma Gabbi. 2012. Proactive screening for depression through metaphorical and automatic text analysis. Artificial Intelligence in Medicine 56, 1 (2012), 19–25.Google ScholarDigital Library
Reference
[31] Neuraz Antoine, Lerner Ivan, Digan William, Paris Nicolas, Tsopra Rosy, Rogier Alice, Baudoin David, et al. 2020. Natural language processing for rapid response to emergent diseases: Case study of calcium channel blockers and hypertension in the COVID-19 pandemic. Journal of Medical Internet Research 22, 8 (2020), e20773.Google ScholarCross Ref
Reference
[32] Nguyen Dang, Luo Wei, Vo Bay, Nguyen Loan T. T., and Pedrycz Witold. 2021. Con2Vec: Learning embedding representations for contrast sets. Knowledge-Based Systems 229 (2021), 107382.Google ScholarDigital Library
Reference
[33] Nguyen Giang, Dlugolinsky Stefan, Bobák Martin, Tran Viet D., García Álvaro López, Heredia Ignacio, Malík Peter, and Hluchý Ladislav. 2019. Machine learning and deep learning frameworks and libraries for large-scale data mining: A survey. Artificial Intelligence Review 52, 1 (2019), 77–124.Google ScholarDigital Library
Reference 1Reference 2
[34] World Health Organization1993. The ICD-10 Classification of Mental and Behavioural Disorders: Diagnostic Criteria for Research. Vol. 2. World Health Organization.Google Scholar
Reference 1Reference 2Reference 3
[35] Pennington Jeffrey, Socher Richard, and Manning Christopher D.. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1532–1543.Google ScholarCross Ref
Reference
[36] Sze Vivienne, Chen Yu Hsin, Yang Tien Ju, and Emer Joel S.. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105, 12 (2017), 2295–2329.Google ScholarCross Ref
Reference
[37] Troyer Emily A., Kohn Jordan N., and Hong Suzi. 2020. Are we facing a crashing wave of neuropsychiatric sequelae of COVID-19? Neuropsychiatric symptoms and potential immunologic mechanisms. Brain, Behavior, and Immunity 87 (2020), 34–39.Google ScholarCross Ref
Reference
[38] Wainberg Michael, Merico Daniele, Delong Andrew, and Frey Brendan J.. 2018. Deep learning in biomedicine. Nature Biotechnology 36, 9 (2018), 829–838.Google ScholarCross Ref
Reference
[39] Wu Yonghui, Schuster Mike, Chen Zhifeng, Le Quoc V., Norouzi Mohammad, Macherey Wolfgang, Krikun Maxim, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016).Google Scholar
Reference
[40] Xu Kelvin, Ba Jimmy, Kiros Ryan, Cho Kyunghyun, Courville Aaron C., Salakhutdinov Ruslan, Zemel Richard S., and Bengio Yoshua. 2015. Show, attend and tell: Neural image caption generation with visual attention. In The International Conference on Machine Learning(JMLR Workshop and Conference Proceedings, Vol. 37), Bach Francis R. and Blei David M. (Eds.). MLR, 2048–2057.Google Scholar
Reference
[41] Xu Lingling, Jin Ruyi, Huang Feifei, Zhou Yanhui, Li Zonglong, and Zhang Minqiang. 2020. Development of computerized adaptive testing for emotion regulation. Frontiers in Psychology 11 (2020), 3340.Google ScholarCross Ref
Reference
[42] Yang Zichao, Yang Diyi, Dyer Chris, He Xiaodong, Smola Alex, and Hovy Eduard. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480–1489.Google ScholarCross Ref
Reference 1Reference 2

Index Terms

Fuzzy Contrast Set Based Deep Attention Network for Lexical Analysis and Mental Health Treatment

Index terms have been assigned to the content through auto-classification.

Recommendations

Mental Health Informatics
Read More
Predicting opioid overdose risk of patients with opioid prescriptions using electronic health records based on temporal deep learning
Graphical abstract

Display Omitted
Highlights
- We train a model to predict opioid overdose in the future based on past electronic health records.
Abstract
The US is experiencing an opioid epidemic, and opioid overdose is causing more than 100 deaths per day. Early identification of patients at high risk of Opioid Overdose (OD) can help to make targeted preventative interventions. We aim ...
Read More
Computers in talk-based mental health interventions

The cost to society of mental illness is substantial. A large scale international study has identified mental illnesses as the second leading cause of disability and premature mortality in the developed world [Murray, C.L., Lopez, A.D. (Eds.), 1996. The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 5
September 2022
486 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3533669
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 April 2022
- Online AM: 3 February 2022
- Accepted: 1 December 2021
- Revised: 1 November 2021
- Received: 1 September 2021
Published in tallip Volume 21, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Fuzzy system
deep learning
constraint sets
human intervention
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 631
  Total Downloads
- Downloads (Last 12 months)300
- Downloads (Last 6 weeks)38
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Fuzzy Contrast Set Based Deep Attention Network for Lexical Analysis and Mental Health Treatment

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

1 INTRODUCTION

2 RELATED WORK

3 THE DESIGNED METHODOLOGY

3.1 Psychometric Questionnaire (PQ)

3.2 Seed Term Generation

3.3 Pre-processing Step

3.4 Lexicon Embedding

3.5 Dataset

3.6 Deep Learning Methods

4 CONTRAST SET

(Attention Lexicon Dataset).

(Pattern).

(Support).

(Support Difference).

5 FUZZY INFERENCE SYSTEM

5.0.1 Fuzzy Rule Generation.

5.0.2 Defuzzification.

6 EXPERIMENTAL RESULT AND ANALYSIS

7 CONCLUSION

Footnotes

REFERENCES

Cited By

Index Terms

Recommendations

Mental Health Informatics

Predicting opioid overdose risk of patients with opioid prescriptions using electronic health records based on temporal deep learning

Computers in talk-based mental health interventions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media