Article

Identifying similarities, periodicities and bursts for online search queries

Authors:
Michail Vlachos

UC Riverside

UC Riverside
View Profile

,
Christopher Meek

View Profile

,
Zografoula Vagena

UC Riverside

UC Riverside
View Profile

,
Dimitrios Gunopulos

UC Riverside

UC Riverside
View Profile

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of dataJune 2004Pages 131–142https://doi.org/10.1145/1007568.1007586

Published:13 June 2004Publication History

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

Pages 131–142

ABSTRACT

We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving' or 'Christmas gifts') where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to efficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using the burst information extracted from a sequence, we are able to efficiently perform 'query-by-burst' on the database of time-series. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.

References

R. Agrawal, C. Faloutsos, and A. Swami. Efficient Similarity Search in Sequence Databases. In Proc. of the 4th FODO, pages 69--84, Oct. 1993. Google ScholarDigital Library
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In Proc. of ACM SIGMOD, 1990. Google ScholarDigital Library
T. Bozkaya and M.Özsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. of SIGMOD, 1997. Google ScholarDigital Library
S. Brin. Near neighbor search in large metric spaces. In Proc. of 21th VLDB, 1995. Google ScholarDigital Library
A. W. chee Fu, P. M. Chan, Y.-L. Cheung, and Y. Moon. Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances. Journal of VLDB, 2000. Google ScholarDigital Library
T. Chiueh. Content based image indexing. In Proc. of VLDB, 1994. Google ScholarDigital Library
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Proc. of 23rd VLDB, pages 426--435, 1997. Google ScholarDigital Library
J. Hellerstein, C. Papadimitriou, and E. Koutsoupias. Towards an analysis of indexing schemes. In Proc. of 16th ACM PODS, 1997. Google ScholarDigital Library
E. Keogh. Exact indexing of dynamic time warping. In Proc. of VLDB, 2002. Google ScholarDigital Library
E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. In Proc. of ACM SIGMOD, pages 151--162, 2001. Google ScholarDigital Library
J. Kleinberg. Bursty and hierarchical structure in streams. In Proc. of 8th SIGKDD, 2002. Google ScholarDigital Library
A. Oppenheim, A. Willsky, and S. Nawab. Signals and Systems, 2nd Edition. Prentice Hall, 1997. Google ScholarDigital Library
D. Rafiei and A. Mendelzon. Efficient retrieval of similar time sequences using dft. In Proc. of FODO, 1998.Google Scholar
C. Wang and X. S. Wang. Multilevel filtering for high dimensional nearest neighbor search. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000.Google Scholar
D. Wu, D. Agrawal, A. E. Abbadi, A. K. Singh, and T. R. Smith. Efficient retrieval for browsing large image databases. In Proc. of CIKM, pages 11--18, 1996. Google ScholarDigital Library
P. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. of 3rd SIAM on Discrete Algorithms, 1992.Google Scholar
Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In Proc. of 9th SIGKDD, 2003. Google ScholarDigital Library

Identifying similarities, periodicities and bursts for online search queries
1. Information systems
  1. Information retrieval

Recommendations

Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Web users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Read More
Identifying ambiguous queries in web search
WWW '07: Proceedings of the 16th international conference on World Wide Web

It is widely believed that some queries submitted to search engines are by nature ambiguous (e.g., java, apple). However, few studies have investigated the questions of "how many queries are ambiguous?" and "how can we automatically identify an ...
Read More
Optimizing complex queries based on similarities of subqueries

As database technology is applied to more and more application domains, user queries are becoming increasingly complex (e.g. involving a large number of joins and a complex query structure). Query optimizers in existing database management systems (DBMS)...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
June 2004
988 pages
ISBN:1581138598
DOI:10.1145/1007568
Conference Chairs:
Arnd Christian König
Microsoft Research
,
Stefan Dessloch
University of Kaiserslautern, Germany
,
General Chair:
Patrick Valduriez
INRIA, France
,
Program Chair:
Gerhard Weikum
University of the Saarland
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 185
  Total Citations
  View Citations
- 1,986
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Identifying similarities, periodicities and bursts for online search queries

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Identifying popular search goals behind search queries to improve web search ranking

Identifying ambiguous queries in web search

Optimizing complex queries based on similarities of subqueries