Abstract
Internet-delivered psychological treatments (IDPT) consider mental problems based on Internet interaction. With such increased interaction because of the COVID-19 pandemic, more online tools have been widely used to provide evidence-based mental health services. This increase helps cover more population by using fewer resources for mental health treatments. Adaptivity and customization for the remedy routine can help solve mental health issues quickly. In this research, we propose a fuzzy contrast-based model that uses an attention network for positional weighted words and classifies mental patient authored text into distinct symptoms. After that, the trained embedding is used to label mental data. Then the attention network expands its lexicons to adapt to the usage of transfer learning techniques. The proposed model uses similarity and contrast sets to classify the weighted attention words. The fuzzy model then uses the sets to classify the mental health data into distinct classes. Our method is compared with non-embedding and traditional techniques to demonstrate the proposed model. From the experiments, the feature vector can achieve a high ROC curve of 0.82 with problems associated with nine symptoms.
1 INTRODUCTION
The COVID-19 epidemic in 93% of nations throughout the world has interrupted mental health services, according to a report by the World Health Organization. However, mental health demand has grown as a result of the lockdown of impacted regions as a preventative precaution. Physiological anxiety factors such as dread of sickness and anxiety about the future increase during any confinement [37]. Isolation and lack of school-wide relationships and jobs add to mental stress and lead to generally bad psychiatric treatment for the community. The absence of protection equipment, social isolation, and a high-stress environment exacerbate anxiety and symptoms of depression in health professionals on the frontline level. There was a significant degree of anxiety during the pandemic lockdown [13]. Since there is so much uncertainty about what causes depression, several elements are often associated with their research. Among the various facts that a wide and growing literature provides are several studies on how anxiety is dealt with. Due to conflicting accounts, it remains difficult to extract meaningful information.
A mix of current circumstances causes depression and long-term and personal factors rather than a single immediate problem or occurrence [27]. It is not always possible to evaluate the cause or the amendment in harsh conditions [18]. It is imperative to recognize early signs and symptoms of depression and obtain assistance as quickly as possible. Many internet forums and social media sites now allow individuals to interact anonymously to talk about their misery, bereavement, and future treatment options [19]. People from all around the world can freely express their opinions and feelings [26]. Online surveillance can be a proactive and promising way to identify those at high risk. It can prompt mediation and improve general well-being [31].
Anxiety is one of the world’s most debilitating disorders, according to the World Health Organization.1 With more than 264 million individuals afflicted globally, anxiety has become a prevalent condition [12]. Untreated depression has the potential to worsen and cause lifetime pain [22]. Of the worst kind, anxiety can lead to suicidal thoughts. For example, approximately 800,000 people die each year by suicide according to the World Health Organization. In those aged 15 to 29 years, suicide is the second most common cause of death. The reason is that between 76% and 85% of mentally ill people in low- and middle-income countries are untreated. Lack of financial assistance and support, lack of trained practitioners, inaccurate assessment, and society’s mental stigma are an obstacle to good treatment [12]. Negative thoughts, shyness, and fear of disclosures are the primary obstacles preventing people from seeking treatment. People are frequently embarrassed, humiliated, and fearful of having their psychological anguish examined in depth [29]. Because of these factors, people may be hesitant to admit that they are sad or seek mental treatment and therapy. The prevention and treatment of mental health disorders have become a global concern in healthcare systems.
To develop an adaptive system that would reduce waiting times and deliver intervention at a reduced cost, the overburdened healthcare system is under pressure from economic and technological perspectives. Internet-delivered psychological treatments (IDPT) can aid a broad population with physical and psychological suffering while using fewer resources [28]. Most solutions already in place are tunnel based, inflexible, and incompatible [27]. Existing models lack adaptive behavior, leading to less user adhesion and more losses [14]. Therapies should take into consideration the numerous methods for consumers to get appropriate treatment. The implementation may make this user adopt an IDPT system to consider the user’s behavior. Users express their preferences and requests according to their conditions and psychological symptoms [27].
This research aims to obtain depression-oriented data from the written language of the patient. We then identify and visualize the results by utilizing a deep attention based approach. The dialogue expresses the worries of a patient with regard to mental health in the majority of cases. We next analyze the extraction of the factors causing symptoms of depression based on the statements of the patient. By employing interactive internet technology, we wish to deliver contextual information and visualization for mental health. We then analyze the removal of factors that produce symptoms associated with depression based on the comments of the patient. In addition, the designed model can help deliver preventive steps using interactive internet technology that provide contextual knowledge and visualization for mental health. We also utilize the NLP technique and in-depth learning to extract symptoms of anxiety and sadness from mental health therapy. After that, the semantic vectors are used to expand the synonym to identify mental health issues. Our method helps make the learning system generalized by minimizing data entry tasks. From the experiments, the proposed method reached 0.82 ROC, showing that semantic vectors for synonymous expansion enhance the accuracy of training without sacrificing results.
The rest of the article is organized as follows. Section 2 describes the related work. Section 3 outlines the fundamental strategy for the experimentation, data collection, and model development. Section 4 presents the contrast set, and Section 5 discusses the fuzzy inference system. Section 6 presents the outcomes and findings. Section 7 concludes by providing a summary and additional work recommendations.
2 RELATED WORK
Numerous attempts to enhance depression diagnosis using computer-aided techniques have indeed been made. Fliege et al. [10] discussed how to measure depressive symptoms using the Item Response Theory Test (IRT) Depression-CAT and D-CAT. They developed an application to analyze depression symptoms using actual patient data, thus increasing measurement accuracy and minimizing responsive loads. Instead of using a static questionnaire, an adaptive questionnaire was used to measure the progress [16]. The earlier data responses to queries have been utilized to select the next best questions. By asking the most pertinent questions for each patient’s CAT, it was feasible to incorporate fewer items while achieving greater measurement precision throughout the whole construct range.
Fliege et al. [10] discussed the method based on key linguistic text features. This work focuses on supervised classification methods and text features, which can only be used to detect states of mental effect by utilizing a limited dataset in short texts [41]. This approach, which also has difficulty with binary classification, classifies brief sentences as disturbed or not. At this fine level, there are four textual classifications: severe pain, mild discomfort, response, and pleasure. Any post that indicates an active desire to hurt anyone or oneself is classified as high distress in the annotated set of succinctly written messages, whereas those which have only noticed negative sentiments are labeled as moderate. In the numerous public online forums on mental well-being, the researchers evaluated a dataset of 200 comments. This dataset was used for further deep learning methods including naive Bayes, maximum entropy, and decision tree.
Online community teens stressed out utilizing layered widespread models were examined by Dinakar et al. [9]. They trained a set of base models for predicting labels including linear kernel support (SVM-L), a radial base kernel (SVM-R), and stochastic gradient boosted decision trees (GBDT). To form these models into categories of text in 23 different topics, text categorization was employed. For each code, a meta-function set was coupled with SMV-L, SVM-R, and GBDT [9]. The features for the basic classifier have been used by the chi-squared feature selection and hand-coded features that include unigrams, lexicons, and part-of-speech bigrams, among others. The ratings for the decision function of each prediction were then translated into meta-functions for meta-learners and also the topic distribution for the L-LDA model. They looked at 7,147 individual tales on a prominent teenage aid website posted by concerned teens.
The behavior of Twitter users and adolescents, in general, was studied by De Choudhury et al. [8] to decide whether or not they had been depressed. The study sought to create a machine learning model, thus identifying and predicting the beginnings of fear or sadness in specific individuals with a range of social media signals. The authors addressed the problem of generating an overview of the fundamental truth. Amazon Mechanical Turk annotators were required to complete the Center for Epidemiological Studies Depression scale. There were other inquiries regarding their history and present situations, which were distressing. The Turkish annotators who completed the questionnaire were asked for details about their Twitter login used to pull the feed from Twitter. On depressed/non-depressed data, a machine learning classifier was constructed by utilizing both tweets and network features; the feature of many followers was also included. A highly favorable relationship was discovered with the statistics on anxiety control centers when the classification was applied to a large sample of U.S. geo-located Twitter data. Research that examined more than 2 million tweets among 476 users to predict depression was published. The most effective results were obtained using SVM classification utilizing a collection of conduction properties and Tweet responses, as well as time and frequency of postings, such as pronouns, cursing, and sad phrases.
Another research work used public Twitter data to explore psychological problems [5]. In addition to indications of anxiety, bipolar disorder, and seasonal affective disorder, they gathered data for several mental illnesses quickly and affordably. Researchers utilized LIWC research to evaluate how much each disease group differs from a control group. They duplicated prior severe depression outcomes and added new bipolar PTSD results. Two language models were used: (1) a standard LM unigram to check every full word’s likelihood and (2) a 5-g LM character in sequences up to five characters. The classifier was established to separate each group from the control group by showing the corresponding signal in the language of each group [23]. Throughout the analysis and classification, the correlations were analyzed to identify connections and obtain insights into quantifiable and significant Twitter psychological signals. To recognize stress, Lin et al. [17] employed a deep neural network (DDN) to solve the limitations. Data from four microblogs were studied, and the authors analyzed the effects of their suggested four-layered DNNs. Examples of machine learning algorithms are random forest, SVM and naive Bayes. They utilized three pooling techniques for each model to evaluate performance: Max pooling, mean-over-instance, and mean-over-time. Each model performed well or poorly depending on the grouping technique. DNN, by using average overtime pooling, however, achieved the best results. Neuman et al. [30] developed an additional methodology named Pedesis, which employed the NLP dependency parsing method to crack websites that incorporate anxiety and extract improved conceptual domains into metaphorical connections. The domain knowledge was then utilized for defining words or sentences used for depression metaphors. Based on these facts, human experts developed a “depression lexicon,” which contains synonyms of the first and second grades. The vocabulary is used to autonomously evaluate the quantity of text depression and if the content deals with the subject of depression and hidden patterns, and large functions are often utilized to help the neural network build a unique depiction of the area [33]. The trained network then utilizes the knowledgeable characteristics to predict the conditional input vector distribution. Indeed, for domain-specific applications, several neural network topologies are proposed. The multi-layered perceptron architecture is one of the major notions. Every hidden layer utilizes average output layers in this network to calculate input and weights from the previous layer. The nonlinear activation function is used on the final/output layer of the network. Thus, they modify the gradient-dependent component and the loss function. The network is needed to lower the loss of supervised education, a nonlinear issue for optimization. The weight and bias parameters are used for maximizing the loss. Most of the approaches are based on the descent process. The gradient-based techniques start with random points for each input vector. Then several rounds are performed for a set of cases (batches). The loss is determined for the loss values and gradient by a trainer using the nonlinear objective function. The weights are then modified to decrease the loss function [33]. The loss is gradually reduced to the minimal level or the convergence point.
Hidden layers and the framework of architecture provide their predictive capacity to neural networks. The correct selection of many layers, architecture type, layers, and hyperparameters helps in the network tunning. Training the tuned network can help the input features’ higher-order representation of the vector [2, 7]. Higher representation of features is taught to generalize and enhance prediction ability. The network with the lowest computer complexity and best prediction capability is chosen in modern neural network research. The number of architectural ideas has increased during the previous two decades. Sze et al. [36] made the most important distinctions between the concealed layers, layer types, shapes, and linkages between the layers. Wainberg et al. [38] demonstrated how higher-dimensional features might be extracted from tabular data through techniques of machine learning. Pattern embedding accumulates from the image pixels in the convolutional neural network (CNN). The pixel information and the variation in information increases learning and prediction capabilities of the network. The translation-invariable pixel here aids the network. The recurrent neural network (RNN) architecture was designed and used for sequential data in the area of natural language processes, including machine translation, language generation, and time-series analysis [39]. Either an encoder or a decoder is made up of the RNN model, with the encoder sequencing the input and decoding it all into a vector with a fixed length. The model uses separate portals that depend upon the loss function to process the input attributes. Another challenge with the RNN encoder and decoder design is the alignment of input and output vectors. The sequence is dependent on the values of the neighbor. The development of a new network known as the attention mechanism is another RNN version [2, 20]. However, it uses the attention approach of the input vector by allocating weights selectively to selected inputs. The decoder may utilize the context vector orientation and related weights for greater representation of the characteristics, which are dependent on the priority significance and position of the relevant information. For predictions, the weights of the RNN model are learned by the architecture and feature representation, including care weight and the context vector [20]. There are many variants of the network, including a soft, hard, and global design, for such an attention mechanism. They created the soft attention paradigm to minimize contextual information [3]. The context vector was built with the average hidden status of the model. The approach helps in understanding how the input feature is concealed and in decreasing the loss. Xu et al. [40] build the context vector under close attention using hidden state sampling. Due to the difficulties of architectural convergence, however, hard attention reduces the cost of computation. Local and global attention are other differences indicated by Luong et al. [21]. Global attention is the central ground between softer and harder attention. The model chooses the focal point for each input batch, which contributes to having quick convergence. This is used to learn the position of the attention vector in the local attention model with a prediction function. Techniques anticipate the place of attention. Domain-specific data analysis aids in the development of effective local and global level attention architecture.
3 THE DESIGNED METHODOLOGY
This article proposes the embedding training approach for developing a depression symptom identification model (Figure 1). We applied cosine similarity to the PHQ-9 symptoms score in this technique, as illustrated in Figures 1 and 2. To increase knowledge and embed word size for similarity, the trained lexical enhanced approach is proposed. The suggested approach for extracting depression symptoms from just a patient’s authored text is described. Sample datasets were used from the work of Ahmed et al. [2] and Mukhiya et al. [28]. Data labeling is discussed in Section 3.5. An anonymous user provided one example from a patient:
I am in a really poor spot right now. Even my melancholy and anxiety are severe, and I am unable to function or hold down a job or do anything else, so I spend my days eating junk food at home. Each day is tedious and difficult to get it through, yet I am unable to operate in society due to my anxiety and sadness.
It is difficult to identify mental health diseases with ICD10 classification [34]. The dynamic nature and intensity of symptoms change based on a patient treated in a certain period of time for a certain condition. Psychiatrists therefore listen to the contours of the patient and collect further essential information during the whole evaluation procedure for mental health. The method of the psychiatrist involves using a typical analytical questionnaire such as the PHQ-9 and a supported test to examine the diagnostic reliability of each evaluation against the mental health problems of the participants. To determine the intensity, the schemes in the survey consist of symptomatic categories, and the frequency is to offer a score based on a certain threshold. For example, nine separate questionnaires are reflected in each of the symptoms, the frequency of which the process of creation is defined as light, moderate, or severe. The method is known as the “Elicitation Process Clinical Symptom” (CSEP) [34]. One of the major aims of this study is to automate the procedure by using active learning. Each set of symptoms is categorized according to the participant’s text periodicity, and clinical cumulative anxiety and depression are determined.
3.1 Psychometric Questionnaire (PQ)
Several additional PHQ-9 anxiety assessments are available, and PHQ-9 is among the most frequently utilized questionnaire, as proposed by Kroenke et al. [15]. The proposed technique for patient-authored content uses the standard PHQ-9 questionnaires [1, 2, 27]. Assessing depressed symptoms is a frequent procedure. As part of standard CSEP practice, the clinician asks each category’s inquiry and evaluates the patient’s response to add the frequency to the class. As shown in Table 1, nine symptoms can be classified into a variety of areas, including falling asleep, interest, concentrating, and eating problems. The psychiatrist determines the evaluation score after completing all of the question-based assessments. The patient’s depression level is indicated by the evaluation score.
Symptoms | PHQ-9 |
---|---|
S1 | Little interest or pleasure in doing things |
S2 | Feeling down, depressed, or hopeless |
S3 | Trouble falling or staying asleep or sleeping too much |
S4 | Feeling tired or having little energy |
S5 | Poor appetite or overeating |
S6 | Feeling bad about yourself or that you are a failure or have let yourself or your family down |
S7 | Trouble concentrating on things such as reading the newspaper or watching television |
S8 | Moving or speaking so slowly that other people may notice, or the opposite being so restless that you have been moving around a lot more than usual |
S9 | Thoughts that you would be better off dead or of hurting yourself |
3.2 Seed Term Generation
Throughout this study, seed term generation is used for key words found in the PHQ-9 questionnaire. This section describes how the phrase anxiety lexicon was formed (the word list of symptoms of depression). It frequently comprises word forms of emotions, such as anxiety and sadness, as seen in Table 1. Seed lexicons are chosen manually, and the associated hypernyms, hyponyms, and antonyms are found using WordNet for each sentence [25]. WordNet, an English lexicon database repository, is managed and created by Princeton University. For each word category, the database stores names, measures, descriptions and adjectives. There is a collection of synsets in each category word that are then utilized for expressing unusual notions. Synsets are divided into lexicon-based and semantic categories. Words that are part of the same synset are synonyms, for example. The top five terms are helpful and connected to the key symptom phrases, according to empirical research. In addition, only the WordNet technique is used to expand the seed of the word in Table 1. Different categorization systems have various lists of symptoms of depression [27]. These lists use clinical or informal complaint vocabulary depending on whether the survey is a patient or physician questionnaire. Major classification systems of chronic depression, including DSM-52 and the World Health Organization [34], are extensively used, and have been integrated in a fine list of symptoms [27].
3.3 Pre-processing Step
Pre-processing is required to be implemented, as it requires to structure of the text. Each patient-authored text follows the following procedure:
(1) | All texts are processed and formatted by the UTF-8 encoding standard. This assists in the preservation of consistency. | ||||
(2) | Modify each word’s capitalization to lowercase. | ||||
(3) | Eliminate any tabs or spaces that may have been used to separate text. | ||||
(4) | Erase all non-valued unique characters (#, +, -, *, =, HTTP, HTTPS). | ||||
(5) | Substitute text-based phrases with full words (e.g., can’t with cannot). |
3.4 Lexicon Embedding
A wide range of strategies for detecting emotions has been documented in the extensive NLP literature. Emotional knowledge-based systems especially have received much attention, which are made up of a vocabulary of phrase senses and a learned variety of context anchoring. Affective knowledge consists of words that convey context and emotions. We then provide an embedding strategy that uses contextually changeable words from the depression lexicon (based on word meaning) and emotional input from internet forums. We used a 300-dimensional pre-trained model for global vector for word representation (GloVe) [35]. Word embedding helps in the creation of word level tokens for the input patient texts. The context is projected in vector space via GloVe-based vector embedding. Here, the embedded part represents the learned sentence structure. Perhaps the restored embedding has captured the semantic structure of the text. Each word vector is spread according to Charles’s 2000 notion that you shall recognize a word by the company it maintains [4]. Linguistic patterns are also used to calculate the co-occurrence rates of vector representation terms. There is a comparable word nearby. As a result, the psychological analysis does not require a pre-trained model. We expand the dataset by training the custom mental health model with a word sense model and a transfer learning approach. This is true since a large portion of the embedding is based on open-source data (Wikipedia texts) and sentiment expertise (Twitter data). Emotions are expressed using the terms sad and joyful. These terms, however, allude to a certain mental state. As a result, word sense must be used to broaden the embedding. The emotional lexicon, which is based on word sense, aids in demonstrating potential consequences. Custom embedding for the categorization of various symptoms can indeed be used to accomplish fine-grain classification. The words that include the part of speech are retrieved using part-of-speech tagging: noun, verb, adverb, and adjective. For each retrieved part of speech, we used WordNet to extract synonyms, hyponyms, morphemes, and physical meaning from the corpus, which consists of a series of texts. As a consequence, for each document, we receive emotional words. The W set used to train the model is then utilized to build vocabulary. The learned vector is the word vector dimension. For every nine symptoms from the PHQ-9 questionnaire, lexicons are converted into a vector using the trained model. The cosine similarity technique is to compute the similarity between patient-authored text embeddings and symptoms mentioned in Table 1. Another similarity value ranging from 0 to 1 exists for each of the nine symptoms. We use a trained model to turn the sentences into vectors that are semantically aware.
3.5 Dataset
The dataset was obtained via a discussion on a webpage and social media platforms [27]. The 500 texts were labeled using the Amazon Mechanical Turk service [27]. To record the remaining data, the proposed method is used. The labeling is made by the technique of PHQ-9 rating, as mentioned next:
(1) | Score 0: Not at all | ||||
(2) | Score 1: Several days | ||||
(3) | Score 2: More than half the days | ||||
(4) | Score 3: Nearly every day. |
For each symptom, we transform the annotation into a binary classifier with score 0 if there are no symptoms and scores 1, 2, 3 if any symptom is present.
3.6 Deep Learning Methods
The LSTM network can retain the necessary information in the cell memory. The last step from the hidden state in the output layer can be delivered in an LSTM unidirectional design. During the empirical research, we noticed that the elementally average method to total time stages was superior. We utilized a bidirectional LSTM design, which received token lists from beginning to end and forward LSTM unrolling.
The proposed attention approach makes use of the text’s word significance [42]. In addition to the LSTM layer, we then introduced the attention technique. This feature aids in the extraction of useful terms. The dropout layer receives the attention output vector as input. For the training of large networks, supervised learning conventionally requires a large labeled dataset. We used the transfer learning method to expand the lexical analysis and labeled the dataset.
4 CONTRAST SET
In this research, we build the lexicon by the preceding method. However, using the extended lexicon is not enough for learning contextual information. We proposed attention-based contrast sets to map the continuous vector concerning the associated labeling (i.e., symptoms) for patient-authored texts. The contrast set helps map the context and word concerning its labeled data points [32]. We used the concept of the support difference and co-occurrence context. We used the attention weights for each labeled data to find the classification of the patient-authored texts. The attention learning methods are used for the contrast set generation, and then we classified the fuzzy-based model in symptoms extraction, detection, and classification.
For a given attention-based lexicon, a contrast set pattern contains the user-defined frequency corresponding to different labels across the dataset. To formally define an attention-based contrast set, we proposed the pattern, support, and difference definition.
(Attention Lexicon Dataset).
Let \( \mathcal {L}=\left\lbrace l_{1}, l_{2}, \ldots , l_{n}\right\rbrace \) be a set of lexicons and \( {Class}=\left\lbrace c_{1}, c_{2}, \ldots , c_{m}\right\rbrace \) set of distinct labels. A attention lexicon dataset \( \mathcal {D} \) contains the lexicons set that has the positional weights for the distinct Class, \( {Instances}=\left\lbrace \left(\boldsymbol {l}_{w}, Class_{w}\right)\right\rbrace _{l=1}^{w} \), where \( \boldsymbol {l}_{w} \subseteq \) Class is a instances contain set of attention lexicons, \( {Class} \) is the distinct label for \( \boldsymbol {I}_{w} \), and w represents positional weighted words for the sentences used in the patient-authored texts.
(Pattern).
A pattern for the lexicon X is a set of emotional lexicons \( X \subseteq \mathcal {L} \) that contains the positional weighted lexicons for the distinct Class (i.e., symptoms).
(Support).
The support for the pattern X with respect to distinct label Class is the percentage of the instances in L, of which Class contains X: (1) \( \begin{equation} Sup (X, Class)=\frac{|\operatorname{Sup}(X, Class)|}{|\operatorname{Sup}(Class, \mathcal {L})|}, \end{equation} \) where \( \operatorname{Sup}(Class, \mathcal {L})=\left\lbrace \left(\boldsymbol {I}_{w}, Class_{w}\right) \in \mathcal {L} \mid l_{i}=c\right\rbrace _{i=1}^{n} \) is the set of lexicons labeled with Class, representing the count of X for distinct Class. In addition, \( \operatorname{sup}(X, class)=\left\lbrace \left(\boldsymbol {I}_{w}, Class_{w}\right) \in \mathcal {L} \mid \boldsymbol {I}_{w} \in \right. \) \( \operatorname{sup}(Class, \mathcal {L}) \) and \( \left.X \subseteq \boldsymbol {I}_{w}\right\rbrace _{w=1}^{n} \) is the set of instances in \( \mathcal {L} \), containing both pattern X and label Class. The number of distinct Classes j in \( \mathcal {L} \) is represented by \( \sum _{i=1}^{j}\left|\operatorname{Sup}\left(Class_{w}, \mathcal {L}\right)\right|= \) \( |\mathcal {L}| \), j.
(Support Difference).
The support difference for the pattern X with respect to label Class is represented in Equation (2). (2) \( \begin{equation} {Diff}(X, {Class})= \mathit {MAX} \left\lbrace Sup \left(X, Class_{i}\right)\right\rbrace _{i=1}^{j}-{\mathit {MIN}} \left\lbrace \sup \left(X, Class_{i}\right)\right\rbrace _{i=1}^{j} \end{equation} \)
Example: Consider the dataset mentioned in Table 2, which contains five instances and two Class (i.e., \( c_{1} \) and \( c_{2} \), respectively). The instance \( I_{w} \), \( \lbrace unhappy \), \( depressed\rbrace , \) appears in three instances: \( I_{1} \), \( I_{2} \), \( and \ I_{4} \). Therefore, the support for \( \lbrace unhappy \), \( depressed\rbrace \) in dataset \( \mathcal {L} \) is \( sup (\lbrace unhappy \), \( depressed\rbrace \), \( {L})= 3 / 5 = 0.6 \). The support for the set \( \lbrace unhappy \), \( depressed\rbrace \) with respect to distinct class \( c_{1} \) is \( sup (\lbrace unhappy \), \( depressed\rbrace , c_{1}) =3 / 3 = 1 \), since three instances (i.e., the instance \( \lbrace I_{1} \), \( I_{2} \), \( I_{4} \rbrace \)) with class \( c_{1} \) contains \( \lbrace unhappy \), \( depressed\rbrace \). In same way, the support of \( \lbrace unhappy \), \( depressed\rbrace \) with respect to the second class \( c_{2} \) is \( sup (\lbrace unhappy \), \( depressed\rbrace , c_{2}) = 3 / 0 = 0 \). The detailed calculation is mentioned in Table 3. The final contrast set is mentioned later in Table 5.
Note: A dash (–) represents that the item is not presented in that instance.
Note: A dash (–) represents that the item is not presented in that instance.
For instance, if the value of support and difference is set to \( \delta = 0.6 \), then with references to Table 3 and 4, the set \( \lbrace unhappy, depressed\rbrace \) is said to be contrast set as the support is equal to 0.6 and difference is equal to 1.
As mentioned later in Table 5, consider the contrast set \( \lbrace unhappy\rbrace \), in which we have the similarity context \( \mathcal {N}_{S}(b)=\lbrace unhappy depressed\rbrace \) as both words share the emotional meaning of unhappy. The co-occurrence metric for the first instance is \( (Sad, \ unhappy, (unhappy,depressed)) \) since unhappy co-occurs with sad in instance \( I_{1}, \ unhappy \) co-occurs with \( \lbrace unhappy, \ depressed\rbrace \) in instance \( I_{1,2,4} \), and unhappy co-occurs with \( \lbrace depressed, \ helpful\rbrace \) in instance \( I_{2,4} \). The similarity and co-occurrence help measure the contrast sets as accurate based on the co-occurrence and their individual instance lexicon sets.
Sorted Symptom | Description | Probability |
---|---|---|
s4 | Feeling tired or having little energy | 0.46 |
s8 | Moving or speaking so slowly that other people notice, or the opposite being so restless that you have been moving around a lot more than usual | 0.45 |
s2 | Feeling down, depressed, or hopeless | 0.39 |
s1 | Little interest or pleasure in doing things | 0.26 |
s5 | Poor appetite or overeating | 0.15 |
s3 | Trouble falling or staying asleep or sleeping too much | 0.13 |
s7 | Trouble concentrating on things such as reading the newspaper or watching television | 0.12 |
s6 | Feeling bad about yourself or that you are a failure or have let yourself or your family down | 0.02 |
s9 | Thoughts that you would be better off dead or of hurting yourself | 0.0097 |
With the preceding example, we describe how the co-occurrence context helps capture the contrast sets. We now discuss the usefulness of the fuzzy sets as mentioned in Table 4.
5 FUZZY INFERENCE SYSTEM
A fuzzy knowledge supports the
5.0.1 Fuzzy Rule Generation.
The rules from the contrast set and fuzzification values are important because they help classify the attention positional weighted elements. The itemset after contrast set generation can help classify into linguistic rules. The lexicon and assign classes help create many rules.
5.0.2 Defuzzification.
The testing data is fed to the fuzzification steps. Based on the membership function values, the fuzzified input is matched with the inference rules. The inferences rules are obtained from the linguistic values, converted into a fuzzy score using the weighted method. From the fuzzy score, a classification decision is produced. The flow of the operation is mentioned in Algorithm 1.
6 EXPERIMENTAL RESULT AND ANALYSIS
The patient-authored text was pre-processed and converted into an emotional-based lexicon. Then we trained different networks. We used the pre-trained network GloVe for the transfer learning task. The trained embedding was expanded with a new lexicon. Then the text model was converted into the nine symptom vector lexicon. After that, vectors for the patient questionnaire and patient authored text were used to label the unlabeled data. The labeled data was then trained and compared with different architectures. We used the ROC curve, precision, recall, and F-measure performance metrics. We used the Adam optimizer to reduce the training loss. Figures 2 and 3 show the performance of attention network [42] with a fuzzy contrast set, bidirectional LSTM [24], LSTM [11], and a feed-forward network [2]. Model tuning for deep neural architecture requires changes to the design in terms of cell type, the number of hidden layers, type of activation function, learning rate handling, and error function. In addition to the LSTM layer, we used the contrast set with a fuzzy classifier to improve model performance. During empirical analysis, models showed the overfitting issue on the development and testing set. To handle these issues, we performed the model for a longer time (i.e., 1,000 epochs). We employed the early stopping method to save and tune model processing. We also used the clipping method to avoid gradient issues [6].
Figures 2 and 3 show the performance of the feed-forward network where the training loss reached 0.41 and the testing loss reached 0.59. The model tends to overfit and close to the upper left corner. The architecture did not perform well, as the sequential data do not preserve the sequence in the simple network. The sequential models can handle the sequential data and achieve good performance. Thus, we performed the LSTM network and reached a ROC value of 0.78. The recall of the LSTM model is presented in Figure 4, and the precision is presented in Figure 5. The model suffered from vanish gradient issues. The cell became complex to the complex gates. The architecture required to more tuning. The instances with symptoms related to the questionnaire performed well as bidirectional LSTM with contrast fuzzy set 0.82 F1 measure mentioned in the Figure 6.
In the bidirectional with contrast fuzzy set and bidirectional LSTM with attention, they can have better performance. The model used the two-directional approach from a backward and forward pass. Both hidden gates help preserve sequential order. The attention layer helps weigh the positional words. As a result, the trained model results in the lowest error. The recall curve at the top corner represents that the model has a low false-positive and false-negative rate. Due to the use of contrast fuzzy set, the bidirectional LSTM with contrast fuzzy set outperforms the bidirectional LSTM with attention. The model has high performance with the lowest development set error.
The model achieved ROC 0.91 in the training set and 0.82 in the development set. The high performance indicates that the model results in a high positive rate. The results support the existence of important words. This helps the contrast set generate distinct boundaries. The model recognizes the target word in the task, and it learned the symptoms of the patient’s authored text.
In Figure 7 and Table 5, the proposed model is able to visualize contrast set weights for the quoted words in sentences of patient-authored text. The visualized symptom and probability score [s4: 046, s8: 0.45, s2: 0.39 and s1: 0.26] reflects the context and patient triggering point. The patient is struggling with feeling tired or having little energy and moving or speaking so slowly that other people …. a lot more than usual, which indicates that the source of the contrast set is the words in dark highlight (i.e., tedious and difficult to get it through and my anxiety and sadness). Therefore, the model can find the contrast set words and highlight them according to probability and symptom scores.
7 CONCLUSION
A tool for Natural Language Processing and deep learning for healthcare intervention has been introduced recently. The pandemic era forced treatment of psychological patients by using an online medium. A limited number of studies work on mental health symptoms. However, adoption of a model for mental health specifically is not well discussed. This article used a contrast set with a fuzzy model to classify mental health patients into nine distinct classes. We proposed the support difference contrast set lexicon analysis. The attention network uses that to fuzzify the input. Then, contrast inference rules are used to classify the mental health treatment test. The fuzzy rules can be used for the labeling and visualization tasks. This tool can help psychiatrists make a customized and appropriate program for remedy. The computer-aided system helps in highlighting key words and helps adapt and give visualizations. The LSTM model with attention and contrast is set to achieve the highest accuracy. The model achieved 0.82 ROC and helped visualize the weighted words. The weighted words can help understand the patient’s issues. In the future, we will try to implement a more adaptive algorithm to classify text and reduce overfitting issues.
Footnotes
- [1] . 2021. Fuzzy explainable attention-based deep active learning on mental-health data. In Proceedings of the 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’21). IEEE, Los Alamitos, CA, 1–6.Google ScholarDigital Library
- [2] . 2021. Attention-based deep entropy active learning using lexical algorithm for mental health treatment. Frontiers in Psychology 12 (2021), 471.Google ScholarCross Ref
- [3] . 2015. Neural machine translation by jointly learning to align and translate. In The International Conference on Learning Representations, and (Eds.). ICLR, 24–34.Google Scholar
- [4] . 2000. Contextual correlates of meaning. Applied Psycholinguistics 21, 4 (2000), 505–524.Google ScholarCross Ref
- [5] . 2020. Tracking social media discourse about the COVID-19 pandemic: Development of a public coronavirus Twitter data set. JMIR Public Health and Surveillance 6, 2 (2020), e19273.Google ScholarCross Ref
- [6] . 2020. Understanding gradient clipping in private SGD: A geometric perspective. Advances in Neural Information Processing Systems 33 (2020), 1–10.Google Scholar
- [7] . 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In The Conference on Empirical Methods in Natural Language Processing, , , and (Eds.). EMNLP, 1724–1734.Google ScholarCross Ref
- [8] . 2013. Predicting depression via social media. In Proceedings of the 7th International Conference on Weblogs and Social Media. 23–34.Google Scholar
- [9] . 2014. Stacked generalization learning to analyze teenage distress. In Proceedings of the 8th International Conference on Weblogs and Social Media. 23–34.Google ScholarCross Ref
- [10] . 2005. Development of a computer-adaptive test for depression (D-CAT). Quality of Life Research 14, 10 (2005), 2277.Google ScholarCross Ref
- [11] . 2016. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (2016), 2222–2232.Google ScholarCross Ref
- [12] . 2018. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: A systematic analysis for the global burden of disease study 2017. Lancet 392, 10159 (2018), 1789–1858.Google ScholarCross Ref
- [13] . 2015. Screening internet forum participants for depression symptoms by assembling and enhancing multiple NLP methods. Computer Methods and Programs in Biomedicine 120, 1 (2015), 27–36.Google ScholarDigital Library
- [14] . 2015. Finding the adaptive sweet spot. In Proceedings of the 33rd Annual Conference on Human Factors in Computing Systems. ACM, New York, NY, 3829–3838.Google ScholarDigital Library
- [15] . 2001. The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine 16, 9 (2001), 606–613.Google ScholarCross Ref
- [16] . 2012. A hybrid system for online detection of emotional distress. In Intelligence and Security Informatics. Springer, 73–80.Google ScholarCross Ref
- [17] . 2014. User-level psychological stress detection from social media using deep neural network. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, New York, NY, 507–516.Google ScholarDigital Library
- [18] . 2018. Evaluating and improving lexical resources for detecting signs of depression in text. Language Resources and Evaluation 54, 1 (2018), 1–24.Google ScholarCross Ref
- [19] . 2020. Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on Reddit during COVID-19: Observational study. Journal of Medical Internet Research 22, 10 (2020), e22635.Google ScholarCross Ref
- [20] . 2016. Hierarchical question-image co-attention for visual question answering. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, , , , , and (Eds.). Curran Associates, 289–297.Google Scholar
- [21] . 2015. Effective approaches to attention-based neural machine translation. In The Conference on Empirical Methods in Natural Language Processing, , , , , and (Eds.). Stanford University, Stanford, CA, 1412–1421.Google ScholarCross Ref
- [22] . 2020. Anxiety and depression in COVID-19 survivors: Role of inflammatory and clinical predictors. Brain, Behavior, and Immunity 89 (2020), 594–600.Google ScholarCross Ref
- [23] . 2020. Identification of emotional expression with cancer survivors: Validation of linguistic inquiry and word count. JMIR Formative Research 4, 10 (2020), e18246.Google ScholarCross Ref
- [24] . 2016. context2vec: Learning generic context embedding with bidirectional LSTM. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 51–61.Google ScholarCross Ref
- [25] . 2009. WordNet: An electronic lexical reference system based on theories of lexical memory. Revue Québécoise de Linguistique 17, 2 (2009), 181–212.Google ScholarCross Ref
- [26] . 2019. Online-Befragung zur Bekanntheit von Angeboten zur Aufklärung, Prävention, Beratung und Nachsorge bei Essstörungen. Prävention und Gesundheitsförderung 15, 1 (2019), 73–79.Google ScholarCross Ref
- [27] . 2020. Adaptation of IDPT system based on patient-authored text data using NLP. In Proceedings of the IEEE International Symposium on Computer-Based Medical Systems. IEEE, Los Alamitos, CA, 27–36.Google ScholarCross Ref
- [28] . 2020. Adaptive systems for internet-delivered psychological treatments. IEEE Access 8 (2020), 112220–112236.Google ScholarCross Ref
- [29] . 2020. Adaptive elements in internet-delivered psychological treatment systems: Systematic review. Journal of Medical Internet Research 22, 11 (2020), e21066.Google ScholarCross Ref
- [30] . 2012. Proactive screening for depression through metaphorical and automatic text analysis. Artificial Intelligence in Medicine 56, 1 (2012), 19–25.Google ScholarDigital Library
- [31] . 2020. Natural language processing for rapid response to emergent diseases: Case study of calcium channel blockers and hypertension in the COVID-19 pandemic. Journal of Medical Internet Research 22, 8 (2020), e20773.Google ScholarCross Ref
- [32] . 2021. Con2Vec: Learning embedding representations for contrast sets. Knowledge-Based Systems 229 (2021), 107382.Google ScholarDigital Library
- [33] . 2019. Machine learning and deep learning frameworks and libraries for large-scale data mining: A survey. Artificial Intelligence Review 52, 1 (2019), 77–124.Google ScholarDigital Library
- [34] 1993. The ICD-10 Classification of Mental and Behavioural Disorders: Diagnostic Criteria for Research. Vol. 2. World Health Organization.Google Scholar
- [35] . 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1532–1543.Google ScholarCross Ref
- [36] . 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105, 12 (2017), 2295–2329.Google ScholarCross Ref
- [37] . 2020. Are we facing a crashing wave of neuropsychiatric sequelae of COVID-19? Neuropsychiatric symptoms and potential immunologic mechanisms. Brain, Behavior, and Immunity 87 (2020), 34–39.Google ScholarCross Ref
- [38] . 2018. Deep learning in biomedicine. Nature Biotechnology 36, 9 (2018), 829–838.Google ScholarCross Ref
- [39] . 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016).Google Scholar
- [40] . 2015. Show, attend and tell: Neural image caption generation with visual attention. In The International Conference on Machine Learning(
JMLR Workshop and Conference Proceedings , Vol. 37), and (Eds.). MLR, 2048–2057.Google Scholar - [41] . 2020. Development of computerized adaptive testing for emotion regulation. Frontiers in Psychology 11 (2020), 3340.Google ScholarCross Ref
- [42] . 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480–1489.Google ScholarCross Ref
Index Terms
- Fuzzy Contrast Set Based Deep Attention Network for Lexical Analysis and Mental Health Treatment
Recommendations
Predicting opioid overdose risk of patients with opioid prescriptions using electronic health records based on temporal deep learning
Graphical abstractDisplay Omitted
Highlights- We train a model to predict opioid overdose in the future based on past electronic health records.
AbstractThe US is experiencing an opioid epidemic, and opioid overdose is causing more than 100 deaths per day. Early identification of patients at high risk of Opioid Overdose (OD) can help to make targeted preventative interventions. We aim ...
Computers in talk-based mental health interventions
The cost to society of mental illness is substantial. A large scale international study has identified mental illnesses as the second leading cause of disability and premature mortality in the developed world [Murray, C.L., Lopez, A.D. (Eds.), 1996. The ...
Comments