Abstract
In order to improve the search performance of rich text content, a cloud search engine system based on rich text content is designed. On the basis of traditional search engine hardware system, several hardware devices such as Solr index server, collector, Chinese word segmentation device and searcher are installed, and the data interface is adjusted. On the basis of hardware equipment and database support, this paper uses the open source Apache Tika framework to obtain the metadata of rich text documents, implements word segmentation according to the rich text content and semantics, and calculates the weight of each keyword. Input search keywords, establish a text index, use BM25 algorithm to calculate the similarity between keywords and text, and output the search results of rich text according to the similarity calculation results. The experimental results show that the design system has high recall rate, high throughput, and the construction time of each data item index in different files is short, which improves the search efficiency and search accuracy.
Similar content being viewed by others
References
Sangaiah AK, Medhane DV, Han T et al (2019) Enforcing position-based confidentiality with machine learning paradigm through mobile edge computing in real-time industrial informatics. IEEE Trans Indust Informatics 15(7):4189–4196
Saravanan K, Radhakrishnan A (2018) Dynamic search engine platform for cloud service level agreements using semantic annotation. Int J Semant Web Inf Syst 14(3):70–98
Senthilkumar NC, Ch PR (2019) Collaborative search engine for enhancing personalized user search based on domain knowledge. J Med Syst 43(8):1–9
Barsnes H, Vaudel M (2018) SearchGUI: a highly adaptable common interface for proteomics search and de novo engines. J Proteome Res 8b00175
Hussain A, Gul S, Shah TA et al (2019) Retrieval effectiveness of image search engines. Electron Libr 37(1):173–184
Strzelecki A (2019) Website removal from search engines due to copyright violation. ASLIB Proc 71(1):54–71
Cafuta D, Sruk V, Dodig I (2018) Fast-flux botnet detection based on traffic response and search engines credit worthiness. Tehnicki Vjesnik Technical Gazette 25(2):1210–1224
Kong D, Fu C, Yang J, Xu D, Han L (2017) The impact of the collective influence of search engines on social networks. IEEE Access 5:24920–24930
Wei W, Fan X, Song H et al (2016) Imperfect information dynamic stackelberg game based resource allocation using hidden Markov for cloud computing. IEEE Trans Serv Comput 11(1):78–89
Lu H, Su S, Tian Z et al (2019) A novel search engine for internet of everything based on dynamic prediction. Commun China 16(3):42–52
Liu Y, Li N (2019) Retrieving hidden friends: a collusion privacy attack against online friend search engine. IEEE Trans Inform Forensics Secur 14(4):833–847
Shuai L, Xinyu L, Shuai W et al (2020) Fuzzy-aided solution for out-of-view challenge in visual tracking under IoT assisted complex environment. Neural Comput Applic. https://doi.org/10.1007/s00521-020-05021-3
Tennenholtz M, Kurland O (2019) Rethinking search engines and recommendation systems: a game theoretic perspective. Commun ACM 62(12):66–75
Liu S, Wang S, Liu X, Lin CT, Lv Z (2020) Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans Fuzzy Syst,:1. https://doi.org/10.1109/TFUZZ.2020.3006520
Hultman G, Mcewan R, Pakhomov S et al (2019) Usability evaluation of NLP-PIER: a clinical document search engine for researchers. Stud Health Technol Informatics 245:1269
Shuai L, Chunli G, Fadi A et al (2020) Reliability of response region: a novel mechanism in visual tracking by edge computing for IIoT environments. Mech Syst Signal Process 138:106537
Kumar N (2017) Document clustering approach for Meta search engine. IOP Conference Series: Mater Sci Eng 225:012291
Youm S, Liu S Development healthcare PC and multimedia software for improvement of health status and exercise habits. Multimed Tools Appl 76(17):17751–17763
Zhang J, Cai X, Le T et al (2019) A study on effective measurement of search results from search engines. J Glob Inf Manag 27(1):196–221
Liu S, Pan Z, Cheng X (2017) A novel fast fractal image compression method based on distance clustering in high dimensional sphere surface. Fractals 25(4):1740004
Taheri SM, Bahle RN, Samiee M (2018) Study on search engines' reaction to the metadata records created based on combined method of rich snippets and linked data. Iranian J Inform Process Manag 33(2):639–658
Hadjilambrou Z, Kleanthous M, Antoniou G, Portero A, Sazeides Y (2019) Comprehensive characterization of an open source document search engine. Acm Trans Architecture Code Optimiz 16(2):1–21
Palos-Sanchez P, Martin-Velicia F, Saura JR (2018) Complexity in the acceptance of sustainable search engines on the internet: an analysis of unobserved heterogeneity with FIMIX-PLS. Complexity 2018:1–19
Wirawan KT, Sukarsa IM, Bayupati IPA (2019) Balinese historian Chatbot using full-text search and artificial intelligence markup language method. Inter J Intell Syst Appl 11(8):21–34
Yangyang S, Chen W (2019) Research on Lucene full text search sorting algorithm based on web. Comput Digital Eng 047(05):1208–1211,1239
Acknowledgements
This paper is supported by the Major projects of science and technology in Inner Mongolia with No. 2019ZD016.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chan, Hp., Xu, L., Liu, Hh. et al. System Design of Cloud Search Engine Based on Rich Text Content. Mobile Netw Appl 26, 459–472 (2021). https://doi.org/10.1007/s11036-020-01676-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-020-01676-3