Abstract
This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial locality present in the stream of processed queries. SDC extracts from historical usage data the results of the most frequently submitted queries and stores them in a static, read-only portion of the cache. The remaining entries of the cache are dynamically managed according to a given replacement policy and are used for those queries that cannot be satisfied by the static portion. Moreover, we improve the hit ratio of SDC by using an adaptive prefetching strategy, which anticipates future requests by introducing a limited overhead over the back-end WSE. We experimentally demonstrate the superiority of SDC over purely static and dynamic policies by measuring the hit ratio achieved on three large query logs by varying the cache parameters and the replacement policy used for managing the dynamic part of the cache. Finally, we deploy and measure the throughput achieved by a concurrent version of our caching system. Our tests show how the SDC cache can be efficiently exploited by many threads that concurrently serve the queries of different users.
- Barroso, L. A., Dean, J., and Hölze, U. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro 22, 2 (Mar./Apr.), 22--28. Google Scholar
- Beitzel, S. M., Jensen, E. C., Chowdhury, A., Grossman, D., and Frieder, O. 2004. Hourly analysis of a very large topically categorized web query log. In SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 321--328. Google Scholar
- Hölscher, C. 1998. How Internet experts search for information on the Web. In Proceedings of WebNet 98---World Conference on the WWW and Internet & Intranet (Orlando, FL, Nov. 7--12).Google Scholar
- Johnson, T. and Shasha, D. 1994. 2Q: A low overhead high performance buffer management replacement algorithm. In VLDB '94: Proceedings of the 20th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 439--450. Google Scholar
- Karedla, R., Love, J., and Wherry, B. 1994. Caching strategies to improve disk system performance. IEEE Comput. 27, 3, 38--46. Google Scholar
- Lempel, R. and Moran, S. 2003. Predictive caching and prefetching of query results in search engines. In WWW '03: Proceedings of the 12th International Conference on World Wide Web. ACM Press, New York, NY, 19--28. Google Scholar
- Long, X. and Suel, T. 2005. Three-level caching for efficient query processing in large Web search engines. In WWW '05: Proceedings of the 14th International Conference on World Wide Web. ACM Press, New York, NY, 257--266. Google Scholar
- Markatos, E. P. 2000. On caching search engine results. In Proceedings of the 5th International Web Caching and Content Delivery Workshop. Go online to http://www.iwcw.org/2000/Proceedings/proceedings.html.Google Scholar
- Markatos, E. P. 2001. On caching search engine results. Comput. Commun. 24, 2, 137--143. Google Scholar
- Moffat, A. and Zobel, J. 2004. What does it mean to “measure performance”? In Proceedings of the International Conference on Web Informations Systems, X. Zhou, S. Su, M. P. Papazoglou, M. E. Owlowska, and K. Jeffrey, Eds. Lecture Notes in Computer Science, vol. 3306. Springer, Berlin, Germany, 1--12.Google Scholar
- O'Neil, E. J., O'Neil, P. E., and Weikum, G. 1993. The LRU--KS page replacement algorithm for database disk buffering. In SIGMOD '93: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 297--306. Google Scholar
- Orlando, S., Perego, R., and Silvestri, F. 2001. Design of a parallel and distributed Web search engine. In ParCo2001: Proceedings of the International Conference Parallel Computing: Advances and Current Issues. Imperial College Press, London, U.K., 197--204.Google Scholar
- Podlipnig, S. and Boszormenyi, L. 2003. A survey of web cache replacement strategies. ACM Comput. Surv. 35, 4, 374--398. Google Scholar
- Raghavan, V. V. and Sever, H. 1995. On the reuse of past optimal queries. In SIGIR '95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 344--350. Google Scholar
- Robinson, J. T. and Devarakonda, M. V. 1990. Data cache management using frequency-based replacement. In SIGMETRICS '90: Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. ACM Press, New York, NY, 134--142. Google Scholar
- Saraiva, P. C., de Moura, E. S., Ziviani, N., Meira, W., Fonseca, R., and Riberio-Neto, B. 2001. Rank-preserving two-level caching for scalable search engines. In SIGIR '01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 51--58. Google Scholar
- Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large Web search engine query log. SIGIR Forum 33, 1, 6--12. Google Scholar
- Silvestri, F. 2004. High performance issues in Web search engines: Algorithms and techniques. Ph.D. dissertation. Università degli Studi di Pisa---Facoltà di Informatica, Pisa, Italy.Google Scholar
- Spink, A., Wolfram, D., Jansen, M. B. J., and Saracevic, T. 2001. Searching the Web: The public and their queries. J. Amer. Soc. Inform. Sci. Tech. 52, 3, 226--234. Google Scholar
- Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes---Compressing and Indexing Documents and Images, 2nd ed. Morgan Kaufmann, San Francisco, CA. Google Scholar
- Xie, Y. and O'Hallaron, D. 2002. Locality in search engine queries and its implications for caching. In Proceedings of IEEE INFOCOM 2002: The 21st Annual Joint Conference of the IEEE Computer and Communications Societies.Google Scholar
Index Terms
- Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data
Recommendations
A refreshing perspective of search engine caching
WWW '10: Proceedings of the 19th international conference on World wide webCommercial Web search engines have to process user queries over huge Web indexes under tight latency constraints. In practice, to achieve low latency, large result caches are employed and a portion of the query traffic is served using previously ...
The impact of caching on search engines
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalIn this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs.caching posting lists. Using a query log ...
Predictive caching and prefetching of query results in search engines
WWW '03: Proceedings of the 12th international conference on World Wide WebWe study the caching of query result pages in Web search engines. Popular search engines receive millions of queries per day, and efficient policies for caching query results may enable them to lower their response time and reduce their hardware ...
Comments