Skip to main content
Log in

System Design of Cloud Search Engine Based on Rich Text Content

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

In order to improve the search performance of rich text content, a cloud search engine system based on rich text content is designed. On the basis of traditional search engine hardware system, several hardware devices such as Solr index server, collector, Chinese word segmentation device and searcher are installed, and the data interface is adjusted. On the basis of hardware equipment and database support, this paper uses the open source Apache Tika framework to obtain the metadata of rich text documents, implements word segmentation according to the rich text content and semantics, and calculates the weight of each keyword. Input search keywords, establish a text index, use BM25 algorithm to calculate the similarity between keywords and text, and output the search results of rich text according to the similarity calculation results. The experimental results show that the design system has high recall rate, high throughput, and the construction time of each data item index in different files is short, which improves the search efficiency and search accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Sangaiah AK, Medhane DV, Han T et al (2019) Enforcing position-based confidentiality with machine learning paradigm through mobile edge computing in real-time industrial informatics. IEEE Trans Indust Informatics 15(7):4189–4196

    Article  Google Scholar 

  2. Saravanan K, Radhakrishnan A (2018) Dynamic search engine platform for cloud service level agreements using semantic annotation. Int J Semant Web Inf Syst 14(3):70–98

    Article  Google Scholar 

  3. Senthilkumar NC, Ch PR (2019) Collaborative search engine for enhancing personalized user search based on domain knowledge. J Med Syst 43(8):1–9

    Article  Google Scholar 

  4. Barsnes H, Vaudel M (2018) SearchGUI: a highly adaptable common interface for proteomics search and de novo engines. J Proteome Res 8b00175

  5. Hussain A, Gul S, Shah TA et al (2019) Retrieval effectiveness of image search engines. Electron Libr 37(1):173–184

    Article  Google Scholar 

  6. Strzelecki A (2019) Website removal from search engines due to copyright violation. ASLIB Proc 71(1):54–71

    Google Scholar 

  7. Cafuta D, Sruk V, Dodig I (2018) Fast-flux botnet detection based on traffic response and search engines credit worthiness. Tehnicki Vjesnik Technical Gazette 25(2):1210–1224

    Google Scholar 

  8. Kong D, Fu C, Yang J, Xu D, Han L (2017) The impact of the collective influence of search engines on social networks. IEEE Access 5:24920–24930

    Article  Google Scholar 

  9. Wei W, Fan X, Song H et al (2016) Imperfect information dynamic stackelberg game based resource allocation using hidden Markov for cloud computing. IEEE Trans Serv Comput 11(1):78–89

    Article  Google Scholar 

  10. Lu H, Su S, Tian Z et al (2019) A novel search engine for internet of everything based on dynamic prediction. Commun China 16(3):42–52

    Google Scholar 

  11. Liu Y, Li N (2019) Retrieving hidden friends: a collusion privacy attack against online friend search engine. IEEE Trans Inform Forensics Secur 14(4):833–847

    Article  Google Scholar 

  12. Shuai L, Xinyu L, Shuai W et al (2020) Fuzzy-aided solution for out-of-view challenge in visual tracking under IoT assisted complex environment. Neural Comput Applic. https://doi.org/10.1007/s00521-020-05021-3

  13. Tennenholtz M, Kurland O (2019) Rethinking search engines and recommendation systems: a game theoretic perspective. Commun ACM 62(12):66–75

    Article  Google Scholar 

  14. Liu S, Wang S, Liu X, Lin CT, Lv Z (2020) Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans Fuzzy Syst,:1. https://doi.org/10.1109/TFUZZ.2020.3006520

  15. Hultman G, Mcewan R, Pakhomov S et al (2019) Usability evaluation of NLP-PIER: a clinical document search engine for researchers. Stud Health Technol Informatics 245:1269

    Google Scholar 

  16. Shuai L, Chunli G, Fadi A et al (2020) Reliability of response region: a novel mechanism in visual tracking by edge computing for IIoT environments. Mech Syst Signal Process 138:106537

    Article  Google Scholar 

  17. Kumar N (2017) Document clustering approach for Meta search engine. IOP Conference Series: Mater Sci Eng 225:012291

    Article  Google Scholar 

  18. Youm S, Liu S Development healthcare PC and multimedia software for improvement of health status and exercise habits. Multimed Tools Appl 76(17):17751–17763

  19. Zhang J, Cai X, Le T et al (2019) A study on effective measurement of search results from search engines. J Glob Inf Manag 27(1):196–221

    Article  Google Scholar 

  20. Liu S, Pan Z, Cheng X (2017) A novel fast fractal image compression method based on distance clustering in high dimensional sphere surface. Fractals 25(4):1740004

    Article  Google Scholar 

  21. Taheri SM, Bahle RN, Samiee M (2018) Study on search engines' reaction to the metadata records created based on combined method of rich snippets and linked data. Iranian J Inform Process Manag 33(2):639–658

    Google Scholar 

  22. Hadjilambrou Z, Kleanthous M, Antoniou G, Portero A, Sazeides Y (2019) Comprehensive characterization of an open source document search engine. Acm Trans Architecture Code Optimiz 16(2):1–21

    Article  Google Scholar 

  23. Palos-Sanchez P, Martin-Velicia F, Saura JR (2018) Complexity in the acceptance of sustainable search engines on the internet: an analysis of unobserved heterogeneity with FIMIX-PLS. Complexity 2018:1–19

    Article  Google Scholar 

  24. Wirawan KT, Sukarsa IM, Bayupati IPA (2019) Balinese historian Chatbot using full-text search and artificial intelligence markup language method. Inter J Intell Syst Appl 11(8):21–34

    Google Scholar 

  25. Yangyang S, Chen W (2019) Research on Lucene full text search sorting algorithm based on web. Comput Digital Eng 047(05):1208–1211,1239

    Google Scholar 

Download references

Acknowledgements

This paper is supported by the Major projects of science and technology in Inner Mongolia with No. 2019ZD016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arun Kumar Sangaiah.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chan, Hp., Xu, L., Liu, Hh. et al. System Design of Cloud Search Engine Based on Rich Text Content. Mobile Netw Appl 26, 459–472 (2021). https://doi.org/10.1007/s11036-020-01676-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-020-01676-3

Keywords

Navigation