ABSTRACT
We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving' or 'Christmas gifts') where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to efficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using the burst information extracted from a sequence, we are able to efficiently perform 'query-by-burst' on the database of time-series. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.
- R. Agrawal, C. Faloutsos, and A. Swami. Efficient Similarity Search in Sequence Databases. In Proc. of the 4th FODO, pages 69--84, Oct. 1993. Google ScholarDigital Library
- N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In Proc. of ACM SIGMOD, 1990. Google ScholarDigital Library
- T. Bozkaya and M.Özsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. of SIGMOD, 1997. Google ScholarDigital Library
- S. Brin. Near neighbor search in large metric spaces. In Proc. of 21th VLDB, 1995. Google ScholarDigital Library
- A. W. chee Fu, P. M. Chan, Y.-L. Cheung, and Y. Moon. Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances. Journal of VLDB, 2000. Google ScholarDigital Library
- T. Chiueh. Content based image indexing. In Proc. of VLDB, 1994. Google ScholarDigital Library
- P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Proc. of 23rd VLDB, pages 426--435, 1997. Google ScholarDigital Library
- J. Hellerstein, C. Papadimitriou, and E. Koutsoupias. Towards an analysis of indexing schemes. In Proc. of 16th ACM PODS, 1997. Google ScholarDigital Library
- E. Keogh. Exact indexing of dynamic time warping. In Proc. of VLDB, 2002. Google ScholarDigital Library
- E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. In Proc. of ACM SIGMOD, pages 151--162, 2001. Google ScholarDigital Library
- J. Kleinberg. Bursty and hierarchical structure in streams. In Proc. of 8th SIGKDD, 2002. Google ScholarDigital Library
- A. Oppenheim, A. Willsky, and S. Nawab. Signals and Systems, 2nd Edition. Prentice Hall, 1997. Google ScholarDigital Library
- D. Rafiei and A. Mendelzon. Efficient retrieval of similar time sequences using dft. In Proc. of FODO, 1998.Google Scholar
- C. Wang and X. S. Wang. Multilevel filtering for high dimensional nearest neighbor search. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000.Google Scholar
- D. Wu, D. Agrawal, A. E. Abbadi, A. K. Singh, and T. R. Smith. Efficient retrieval for browsing large image databases. In Proc. of CIKM, pages 11--18, 1996. Google ScholarDigital Library
- P. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. of 3rd SIAM on Discrete Algorithms, 1992.Google Scholar
- Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In Proc. of 9th SIGKDD, 2003. Google ScholarDigital Library
- Identifying similarities, periodicities and bursts for online search queries
Recommendations
Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval TechnologyWeb users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Identifying ambiguous queries in web search
WWW '07: Proceedings of the 16th international conference on World Wide WebIt is widely believed that some queries submitted to search engines are by nature ambiguous (e.g., java, apple). However, few studies have investigated the questions of "how many queries are ambiguous?" and "how can we automatically identify an ...
Optimizing complex queries based on similarities of subqueries
As database technology is applied to more and more application domains, user queries are becoming increasingly complex (e.g. involving a large number of joins and a complex query structure). Query optimizers in existing database management systems (DBMS)...
Comments