skip to main content
10.1145/1007568.1007586acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Identifying similarities, periodicities and bursts for online search queries

Published:13 June 2004Publication History

ABSTRACT

We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving' or 'Christmas gifts') where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to efficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using the burst information extracted from a sequence, we are able to efficiently perform 'query-by-burst' on the database of time-series. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.

References

  1. R. Agrawal, C. Faloutsos, and A. Swami. Efficient Similarity Search in Sequence Databases. In Proc. of the 4th FODO, pages 69--84, Oct. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In Proc. of ACM SIGMOD, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Bozkaya and M.Özsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proc. of SIGMOD, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Brin. Near neighbor search in large metric spaces. In Proc. of 21th VLDB, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. W. chee Fu, P. M. Chan, Y.-L. Cheung, and Y. Moon. Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances. Journal of VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Chiueh. Content based image indexing. In Proc. of VLDB, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Proc. of 23rd VLDB, pages 426--435, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Hellerstein, C. Papadimitriou, and E. Koutsoupias. Towards an analysis of indexing schemes. In Proc. of 16th ACM PODS, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Keogh. Exact indexing of dynamic time warping. In Proc. of VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. In Proc. of ACM SIGMOD, pages 151--162, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Kleinberg. Bursty and hierarchical structure in streams. In Proc. of 8th SIGKDD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Oppenheim, A. Willsky, and S. Nawab. Signals and Systems, 2nd Edition. Prentice Hall, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Rafiei and A. Mendelzon. Efficient retrieval of similar time sequences using dft. In Proc. of FODO, 1998.Google ScholarGoogle Scholar
  14. C. Wang and X. S. Wang. Multilevel filtering for high dimensional nearest neighbor search. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000.Google ScholarGoogle Scholar
  15. D. Wu, D. Agrawal, A. E. Abbadi, A. K. Singh, and T. R. Smith. Efficient retrieval for browsing large image databases. In Proc. of CIKM, pages 11--18, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. of 3rd SIAM on Discrete Algorithms, 1992.Google ScholarGoogle Scholar
  17. Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In Proc. of 9th SIGKDD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Identifying similarities, periodicities and bursts for online search queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
      June 2004
      988 pages
      ISBN:1581138598
      DOI:10.1145/1007568

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 June 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader