A hybrid approach of Poisson distribution LDA with deep Siamese Bi-LSTM and GRU model for semantic similarity prediction for text data

Viji, D.; Revathy, S.

doi:10.1007/s11042-023-15050-4

A hybrid approach of Poisson distribution LDA with deep Siamese Bi-LSTM and GRU model for semantic similarity prediction for text data

Published: 18 March 2023

Volume 82, pages 37221–37248, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

343 Accesses
1 Altmetric
Explore all metrics

Abstract

Prediction of semantic similarity between text data is an open and challenging research issue in the NLP-Natural Language-processing field. Traditional semantic text-similarity techniques capturing text lexical features neglect syntactic and semantic text properties and are exhibited with higher dimensions of feature vectors. To overcome these issues, the present study aims to develop a hybrid approach integrating Deep Siamese Bi-LSTM-Bidirectional Long-short term Memory network and GRU-Gated Recurrent-Unit neural network training model. The proposed model is employed in the weight estimation of vectors and minimizing feature vector dimension before the training phases. Initially, Pre-processing phase, eliminates special characters from text form, converting them to feature vectors through vectorization and weight values are updated using Weighted TF-IDF-Term Frequency Inverse-Document Frequency aided by the log-likelihood Weight calculation method. The Poisson Normal LDA-Linear-discriminant analysis technique reduced the dimensions of the feature vector. Such embedded vectors as weight values are fed into the training model, wherein the trained model estimates similarity scores of input data and performs text classification using Deep Siamese Bi-LSTM and GRU classifiers. The proposed model undergoes performance assessment by attaining 19% improved accuracy rate by using STS Dataset than the existing methods. The model also showed better results for the other datasets. The higher accuracy and F1 score elucidated the efficiency of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

Data availability

My Manuscript has no associated data.

References

Araque O, Zhu G, Iglesias CA (2019) A semantic similarity-based perspective of affect lexicons for sentiment analysis. Knowl-Based Syst 165:346–359
Article Google Scholar
Avasthi S, Chauhan R, Acharjya DP (2021) Techniques, applications, and issues in mining large-scale text databases, in Advances in Information Communication Technology and Computing, ed: Springer, pp 385–396
Avasthi S, Chauhan R, Acharjya DP (2021) Processing large text corpus using N-gram language modeling and smoothing, in Proceedings of the Second International Conference on Information Management and Machine Intelligence, pp 21–32
Avasthi S, Chauhan R, Acharjya DP (2022) Information Extraction and Sentiment Analysis to gain insight into the COVID-19 crisis, in International Conference on Innovative Computing and Communications, pp 343–353
Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterpr Inf Syst 13:329–351
Article Google Scholar
Bhatti UA, Yu Z, Yuan L, Zeeshan Z, Nawaz SA, Bhatti M, Mehmood A, Ain QU, Wen L (2020) Geometric algebra applications in geospatial artificial intelligence and remote sensing image processing. IEEE Access 8:155783–155796
Article Google Scholar
Bhatti UA, Yu Z, Chanussot J, Zeeshan Z, Yuan L, Luo W et al (2021) Local similarity-based spatial–spectral fusion hyperspectral image classification with deep CNN and Gabor filtering. IEEE Trans Geosci Remote Sens 60:1–15
Article Google Scholar
Bhatti UA, Ming-Quan Z, Qing-Song H, Ali S, Hussain A, Yuhuan Y et al (2021) Advanced color edge detection using Clifford algebra in satellite images. IEEE Photonics J 13:1–20
Article Google Scholar
Bhatti UA, Zeeshan Z, Nizamani MM, Bazai S, Yu Z, Yuan L (2022) Assessing the change of ambient air quality patterns in Jiangsu Province of China pre-to post-COVID-19. Chemosphere 288:132569
Article Google Scholar
Biçici E (2022) Machine translation performance prediction system: optimal prediction for optimal translation. SN Comput Sci 3:1–23
Article Google Scholar
Bollegala D, Kiryo R, Tsujino K, Yukawa H (2020) Language-independent tokenisation rivals language-specific tokenisation for word similarity prediction, arXiv preprint arXiv:2002.11004
Camacho-Collados J, Pilehvar MT (2017) On the role of text preprocessing in neural network architectures: An evaluation study on text categorization and sentiment analysis, arXiv preprint arXiv:1707.01780
Choi H, Lee H (2019) Multitask learning approach for understanding the relationship between two sentences. Inf Sci 485:413–426
Article Google Scholar
Dias L, Gerlach M, Scharloth J, Altmann EG (2018) Using text analysis to quantify the similarity and evolution of scientific disciplines. R Soc Open Sci 5:171545
Article Google Scholar
Gudakahriz SJ, Moghadam AME, Mahmoudi F (2020) An experimental study on performance of text representation models for sentiment analysis. Inf Syst Telecommun:45–52. https://doi.org/10.7508/jist.2020.01.005
Guo J, Wu B, Zhou P (2020) BLHNN: A Novel Charge Prediction Model Based on Bi-Attention LSTM-CNN Hybrid Neural Network, in 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC), pp 246–252
Hu P, Peng D, Sang Y, Xiang Y (2019) Multi-view linear discriminant analysis network. IEEE Trans Image Process 28:5352–5365
Article MathSciNet MATH Google Scholar
Huang X, Wu L, Ye Y (2019) A review on dimensionality reduction techniques. Int J Pattern Recognit Artif Intell 33:1950017
Article Google Scholar
Jabri S, Dahbi A, Gadi T, Bassir A (2018) Ranking of text documents using TF-IDF weighting and association rules mining, in 2018 4th international conference on optimization and applications (ICOA), pp 1–6
Kumar CP, Babu LD (2019) Novel text preprocessing framework for sentiment analysis. In: Smart Intelligent Computing and Applications. ed: Springer, pp 309–317
Chapter Google Scholar
Li X, Yao C, Zhang Q, Zhang G (2019) Semantic similarity modeling based on multi-granularity interaction matching. Int J Innov Comput Inf Control 15:1685–1700
Google Scholar
Li X, Zeng F, Yao C (2020) A semi-supervised paraphrase identification model based on multi-granularity interaction reasoning. IEEE Access 8:60790–60800
Article Google Scholar
Liu Y, Li K, Yan D, Gu S (2022) A network-based CNN model to identify the hidden information in text data. Phys A: Stat Mech Appl 590:126744
Article Google Scholar
Luo L-x (2019) Network text sentiment analysis method combining LDA text representation and GRU-CNN. Pers Ubiquit Comput 23:405–412
Article Google Scholar
Ma J, Guo X, Zhao X (2022) Identifying purchase intention through deep learning: analyzing the Q & D text of an E-commerce platform. Ann Oper Res:1–20
Mahmoud A, Zrigui M (2021) BLSTM-API: bi-LSTM recurrent neural network-based approach for Arabic paraphrase identification. Arab J Sci Eng 46:4163–4174
Article Google Scholar
Meenakshi D, Shanavas ARM (2022) Transformer induced enhanced feature engineering for contextual similarity detection in text. Bull Electr Eng Inform 11:2124–2130
Article Google Scholar
Nanda R, Siragusa G, Di Caro L, Boella G, Grossio L, Gerbaudo M et al (2019) Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives. Artif Intell Law 27:199–225
Article Google Scholar
Othman N, Faiz R, Smaïli K (2022) Learning English and Arabic question similarity with Siamese neural networks in community question answering services. Data Knowl Eng 138:101962
Article Google Scholar
Prasetya DD, Wibawa AP, Hirashima T (2018) The performance of text similarity algorithms. Int J Adv Intell Inform 4:63–69
Article Google Scholar
Rahim MMAA (2021) Measuring semantic similarity for Arabic sentences using machine learning, Princess Sumaya University for technology (Jordan)
Roul RK, Sahoo JK, Arora K (2017) Modified TF-IDF term weighting strategies for text categorization, in 2017 14th IEEE India council international conference (INDICON), pp 1–6
Sarwar TB, Noor NM, Miah MSU (2022) Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding. PeerJ Computer Science 8:e1024
Article Google Scholar
Shihab MSH, Aditya S, Setu JH, Imtiaz-Ud-Din K, Efat MIA (2020) A Hybrid GRU-CNN Feature Extraction Technique for Speaker Identification, in 2020 23rd International Conference on Computer and Information Technology (ICCIT), pp 1–6
Singh AK, Shashi M (2019) Vectorization of text documents for identifying unifiable news articles. Int J Adv Comput Sci Appl 10. https://doi.org/10.14569/IJACSA.2019.0100742
Soğancıoğlu G, Öztürk H, Özgür A (2017) BIOSSES: a semantic sentence similarity estimation system for the biomedical domain. Bioinformatics 33:i49–i58
Article Google Scholar
Song H-J, Heo T-S, Kim J-D, Park C-Y, Kim Y-S (2021) Sentence similarity evaluation using Sent2Vec and siamese neural network with parallel structure. J Intell Fuzzy Syst:1–10
Sravanthi P, Srinivasu B (2017) Semantic similarity between sentences. Int Res J Eng Technol (IRJET) 4:156–161
Google Scholar
Sun F, Chen H (2018) Feature extension for chinese short text classification based on LDA and word2vec, in 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp 1189–1194
Tao J, Jia L, Wan MC, Meng JH (2020) The Text modeling method of Tibetan text combining Word2vec and improved TF-IDF. J Phys Conf Ser 1601:042007
Article Google Scholar
Tien NH, Le NM, Tomohiro Y, Tatsuya I (2019) Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity. Inf Process Manag 56:102090
Article Google Scholar
Tomer M, Kumar M (2020) Improving text summarization using Ensembled approach based on fuzzy with LSTM. Arab J Sci Eng 45:10743–10754
Article Google Scholar
Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 20(295):1–16. https://doi.org/10.1186/s13059-019-1861-6
Vekariya DV, Limbasiya NR (2020) A novel approach for semantic similarity measurement for high quality answer selection in question answering using deep learning methods, in 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp 518–522
Wu J, Huang C, Chen Y (2020) Patent Text Classification Study Based on Bi-LSTM-A Model, in 2020 5th international conference on control, Robotics and Cybernetics (CRC), pp 1–5
Xiong C-z, Su M (2019) IARNN-based semantic-containing double-level embedding Bi-LSTM for question-and-answer matching. Comput Intell Neurosci 2019:1–10
Xu G, Wu X, Yao H, Li F, Yu Z (2019) Research on topic recognition of network sensitive information based on SW-LDA model. IEEE Access 7:21527–21538
Article Google Scholar
Xu C, Wang H, Wu S, Lin Z (2021) Tag-enhanced dynamic compositional neural network over arbitrary tree structure for sentence representation. Expert Syst Appl 181:115182
Article Google Scholar
Yang Y, Yuan S, Cer D, Kong S-y, Constant N, Pilar P et al (2018) Learning semantic textual similarity from conversations, arXiv preprint arXiv:1804.07754
Yang Z, Hu Z, Dyer C, Xing EP, Berg-Kirkpatrick T (2018) Unsupervised text style transfer using language models as discriminators. In: NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp 7298–7309
Yang Y, Wu B, Zhao K, Guo W (2020) Tweet stance detection: A two-stage DC-BILSTM model based on semantic attention, in 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC), pp 22–29
Yang J, Li Y, Gao C, Zhang Y (2021) Measuring the short text similarity based on semantic and syntactic information. Futur Gener Comput Syst 114:169–180
Article Google Scholar
Yu S, Liu D, Zhu W, Zhang Y, Zhao S (2020) Attention-based LSTM, GRU and CNN for short text classification. J Intell Fuzzy Syst 39:333–340
Article Google Scholar
Zhang Y, Lu W, Ou W, Zhang G, Zhang X, Cheng J et al (2019) Chinese medical question answer selection via hybrid models based on CNN and GRU. Multimed Tools Appl 79:1–26
Google Scholar
Zhang X, Li P, Li H (2020) AMBERT: A pre-trained language model with multi-grained tokenization, arXiv preprint arXiv:2008.11869
Zhang P, Huang X, Wang Y, Jiang C, He S, Wang H (2021) Semantic similarity computing model based on multi model fine-grained nonlinear fusion. IEEE Access 9:8433–8443
Article Google Scholar
Zheng T, Gao Y, Wang F, Fan C, Fu X, Li M et al (2019) Detection of medical text semantic similarity based on convolutional neural network. BMC Medical Inform Decis Mak 19:1–11
Article Google Scholar
Zhu Z, He Z, Tang Z, Wang B, Chen W (2018) A Semantic Similarity Computing Model based on Siamese Network for Duplicate Questions Identification, in CCKS Tasks, pp 44–51
Zulqarnain M, Ghazali R, Ghouse MG, Mushtaq MF (2019) Efficient processing of GRU based on word embedding for text classification. JOIV: Int J Inform Visualization 3:377–383
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, India
D. Viji
Department of Computing Technologies, SRM Institute of Technology, Kattankulathur, Chengalpattu, India
D. Viji
Department of Information Technology, Sathyabama Institute of Science and Technology, Chennai, India
S. Revathy

Authors

D. Viji
View author publications
You can also search for this author in PubMed Google Scholar
S. Revathy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Viji.

Ethics declarations

Conflict of interest

On behalf of all authors, I the Corresponding author report that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Viji, D., Revathy, S. A hybrid approach of Poisson distribution LDA with deep Siamese Bi-LSTM and GRU model for semantic similarity prediction for text data. Multimed Tools Appl 82, 37221–37248 (2023). https://doi.org/10.1007/s11042-023-15050-4

Download citation

Received: 04 May 2022
Revised: 19 November 2022
Accepted: 27 February 2023
Published: 18 March 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11042-023-15050-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid approach of Poisson distribution LDA with deep Siamese Bi-LSTM and GRU model for semantic similarity prediction for text data

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A survey on deep learning approaches for text-to-SQL

Impact of word embedding models on text analytics in deep learning environment: a review

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid approach of Poisson distribution LDA with deep Siamese Bi-LSTM and GRU model for semantic similarity prediction for text data

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A survey on deep learning approaches for text-to-SQL

Impact of word embedding models on text analytics in deep learning environment: a review

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation