skip to main content
article

Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

Published:01 January 2006Publication History
Skip Abstract Section

Abstract

This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial locality present in the stream of processed queries. SDC extracts from historical usage data the results of the most frequently submitted queries and stores them in a static, read-only portion of the cache. The remaining entries of the cache are dynamically managed according to a given replacement policy and are used for those queries that cannot be satisfied by the static portion. Moreover, we improve the hit ratio of SDC by using an adaptive prefetching strategy, which anticipates future requests by introducing a limited overhead over the back-end WSE. We experimentally demonstrate the superiority of SDC over purely static and dynamic policies by measuring the hit ratio achieved on three large query logs by varying the cache parameters and the replacement policy used for managing the dynamic part of the cache. Finally, we deploy and measure the throughput achieved by a concurrent version of our caching system. Our tests show how the SDC cache can be efficiently exploited by many threads that concurrently serve the queries of different users.

References

  1. Barroso, L. A., Dean, J., and Hölze, U. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro 22, 2 (Mar./Apr.), 22--28. Google ScholarGoogle Scholar
  2. Beitzel, S. M., Jensen, E. C., Chowdhury, A., Grossman, D., and Frieder, O. 2004. Hourly analysis of a very large topically categorized web query log. In SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 321--328. Google ScholarGoogle Scholar
  3. Hölscher, C. 1998. How Internet experts search for information on the Web. In Proceedings of WebNet 98---World Conference on the WWW and Internet & Intranet (Orlando, FL, Nov. 7--12).Google ScholarGoogle Scholar
  4. Johnson, T. and Shasha, D. 1994. 2Q: A low overhead high performance buffer management replacement algorithm. In VLDB '94: Proceedings of the 20th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 439--450. Google ScholarGoogle Scholar
  5. Karedla, R., Love, J., and Wherry, B. 1994. Caching strategies to improve disk system performance. IEEE Comput. 27, 3, 38--46. Google ScholarGoogle Scholar
  6. Lempel, R. and Moran, S. 2003. Predictive caching and prefetching of query results in search engines. In WWW '03: Proceedings of the 12th International Conference on World Wide Web. ACM Press, New York, NY, 19--28. Google ScholarGoogle Scholar
  7. Long, X. and Suel, T. 2005. Three-level caching for efficient query processing in large Web search engines. In WWW '05: Proceedings of the 14th International Conference on World Wide Web. ACM Press, New York, NY, 257--266. Google ScholarGoogle Scholar
  8. Markatos, E. P. 2000. On caching search engine results. In Proceedings of the 5th International Web Caching and Content Delivery Workshop. Go online to http://www.iwcw.org/2000/Proceedings/proceedings.html.Google ScholarGoogle Scholar
  9. Markatos, E. P. 2001. On caching search engine results. Comput. Commun. 24, 2, 137--143. Google ScholarGoogle Scholar
  10. Moffat, A. and Zobel, J. 2004. What does it mean to “measure performance”? In Proceedings of the International Conference on Web Informations Systems, X. Zhou, S. Su, M. P. Papazoglou, M. E. Owlowska, and K. Jeffrey, Eds. Lecture Notes in Computer Science, vol. 3306. Springer, Berlin, Germany, 1--12.Google ScholarGoogle Scholar
  11. O'Neil, E. J., O'Neil, P. E., and Weikum, G. 1993. The LRU--KS page replacement algorithm for database disk buffering. In SIGMOD '93: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 297--306. Google ScholarGoogle Scholar
  12. Orlando, S., Perego, R., and Silvestri, F. 2001. Design of a parallel and distributed Web search engine. In ParCo2001: Proceedings of the International Conference Parallel Computing: Advances and Current Issues. Imperial College Press, London, U.K., 197--204.Google ScholarGoogle Scholar
  13. Podlipnig, S. and Boszormenyi, L. 2003. A survey of web cache replacement strategies. ACM Comput. Surv. 35, 4, 374--398. Google ScholarGoogle Scholar
  14. Raghavan, V. V. and Sever, H. 1995. On the reuse of past optimal queries. In SIGIR '95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 344--350. Google ScholarGoogle Scholar
  15. Robinson, J. T. and Devarakonda, M. V. 1990. Data cache management using frequency-based replacement. In SIGMETRICS '90: Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. ACM Press, New York, NY, 134--142. Google ScholarGoogle Scholar
  16. Saraiva, P. C., de Moura, E. S., Ziviani, N., Meira, W., Fonseca, R., and Riberio-Neto, B. 2001. Rank-preserving two-level caching for scalable search engines. In SIGIR '01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 51--58. Google ScholarGoogle Scholar
  17. Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large Web search engine query log. SIGIR Forum 33, 1, 6--12. Google ScholarGoogle Scholar
  18. Silvestri, F. 2004. High performance issues in Web search engines: Algorithms and techniques. Ph.D. dissertation. Università degli Studi di Pisa---Facoltà di Informatica, Pisa, Italy.Google ScholarGoogle Scholar
  19. Spink, A., Wolfram, D., Jansen, M. B. J., and Saracevic, T. 2001. Searching the Web: The public and their queries. J. Amer. Soc. Inform. Sci. Tech. 52, 3, 226--234. Google ScholarGoogle Scholar
  20. Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes---Compressing and Indexing Documents and Images, 2nd ed. Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle Scholar
  21. Xie, Y. and O'Hallaron, D. 2002. Locality in search engine queries and its implications for caching. In Proceedings of IEEE INFOCOM 2002: The 21st Annual Joint Conference of the IEEE Computer and Communications Societies.Google ScholarGoogle Scholar

Index Terms

  1. Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Information Systems
              ACM Transactions on Information Systems  Volume 24, Issue 1
              January 2006
              143 pages
              ISSN:1046-8188
              EISSN:1558-2868
              DOI:10.1145/1125857
              Issue’s Table of Contents

              Copyright © 2006 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 January 2006
              Published in tois Volume 24, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader