Document Retrieval and Cluster Based Indexing using Rider Spider Monkey Optimization Algorithm
Madhulika Yarlagadda1, K. Gangadhara Rao2, A. Srikrishna3

1Smt.Madhulika Yarlagadda, Assistant Professor in Information Technology department at R.V.R & J.C College of Engineering, Chowdavaram, Guntur, Andhrapradesh, India.
2Dr.K.Gangadhara Rao, Professor, Department of Computer Science and Engineering at Acharya Nagarjuna University, Guntur, Andhra Pradesh, India.
3Dr. A. Srikrishna, Professor and head of the Department of Information Technology at RVR & JC College of engineering, Chowdavaram, Guntur, Andhra Pradesh, India.
Manuscript received on February 10, 2020. | Revised Manuscript received on February 20, 2020. | Manuscript published on March 30, 2020. | PP: 1318-1327 | Volume-8 Issue-6, March 2020. | Retrieval Number: F7508038620/2020©BEIESP | DOI: 10.35940/ijrte.F7508.038620

Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Document retrieval process is more significant in the field of research community for retrieving the highly-relevant documents that fit for the user query. Even though various document retrieval methods are introduced, retrieving the exact document based on the indexing is a quite challenging task in the document retrieval framework. Thus, an effective document retrieval algorithm named Rider Spider Monkey Optimization Algorithm (RSOA) is proposed in this research. Initially, the documents are pre-processed by the stop word elimination and the stemming process, and the features are extracted to find the key words of the documents by applying the Term Frequency-Inverse Document Frequency (TF-IDF). The selected keywords are passed into the cluster-based indexing phase, where the cluster centroids are identified by using the proposed Rider Spider Monkey Optimization Algorithm. Moreover the query matching is carried out at two levels, at first, the query is forwarded and is matched to the entire cluster centroid to find the appropriate centroid. At the second level; the user query is matched based on the records present inside the matched centroid. Moreover, the query matching is progressed using the distance measure by the Bhattacharya distance to retrieve the documents. The performance is analyzed using the metrics, namely precision, F-measure, and recall and accuracy with the values of 90.141%, 91.876%, 91.178%, and 91.202%, respectively using 20 news group dataset.
Keywords: Cluster Based Indexing, Cluster Centroid, Stop Word Removal, Holoentropy, Rider Spider Monkey Optimization.
Scope of the Article: Algorithm Engineering.