ABSTRACT
Index trees created using distance based indexing are difficult to maintain online since the distance function involved is often costly to compute. This problem is intensified when the database we are dealing with, is frequently updated, as only limited time is available to perform the maintenance. In this paper, we propose a novel tree maintenance mechanism for the problem of answering approximate k-Nearest Neighbor queries with a probabilistic guarantee on timeseries streams. When the underlying data change, we may choose to defer updating the tree as long as the probabilistic guarantee of answering queries is high. To prolong such deferment, we present innovative techniques that maintain the utility of the tree by migrating its pivots and by partially reconstructing it. As the probabilistic guarantee decays with time and crosses the minimum guarantee threshold, all of the deferred updates are performed. In essence, our work offers an elegant compromise between the accuracy guarantee of query results and the cost of providing them. With extensive empirical studies, we also show the flexibility and efficiency of our approach.
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS, pages 1--16, 2002. Google ScholarDigital Library
- B. Babcock, M. Datar, R. Motwani, and L. O'Callaghan. Maintaining variance and k-medians over data stream windows. In PODS, pages 234--243, 2003. Google ScholarDigital Library
- T. Bozkaya and Z. M. Özsoyoglu. Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst., 24(3):361--404, 1999. Google ScholarDigital Library
- S. Brin. Near neighbor search in large metric spaces. In VLDB, pages 574--584, 1995. Google ScholarDigital Library
- B. Bustos, G. Navarro, and E. Chávez. Pivot selection techniques for proximity searching in metric spaces. Pattern Recognition Letters, 24(14):2357--2366, 2003. Google ScholarDigital Library
- Y. Cai, K. A. Hua, G. Cao, and T. Xu. Real-time processing of range-monitoring queries in heterogeneous mobile databases. IEEE Trans. Mob. Comput., 5(7):931--942, 2006. Google ScholarDigital Library
- L. Chen and R. T. Ng. On the marriage of Lp-norms and edit distance. In VLDB, pages 792--803, 2004. Google ScholarDigital Library
- R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng., 16(9):1112--1127, 2004. Google ScholarDigital Library
- P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--435, 1997. Google ScholarDigital Library
- B. Cui, B. C. Ooi, J. Su, and K.-L. Tan. Contorting high dimensional data for efficient main memory processing. In SIGMOD Conference, pages 479--490, 2003. Google ScholarDigital Library
- Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. M. Fischer. Path sharing and predicate evaluation for high-performance XML filtering. ACM Trans. Database Syst., 28(4):467--516, 2003. Google ScholarDigital Library
- A. W.-C. Fu, P. M.-S. Chan, Y.-L. Cheung, and Y. S. Moon. Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances. VLDB J., 9(2):154--173, 2000. Google ScholarDigital Library
- L. Gao and X. S. Wang. Continually evaluating similarity-based pattern queries on a streaming time series. In SIGMOD Conference, pages 370--381, 2002. Google ScholarDigital Library
- V. Gopalkrishnan, P. Chairunnanda, and A. Najib. Efficient index maintenance for answering approximate queries in a volatile environment. Technical Report TR/VG/07/03, CAIS, Nanyang Technological University, Singapore, 2007.Google Scholar
- G. R. Hjaltason and H. Samet. Index-driven similarity search in metric spaces. ACM Trans. Database Syst., 28(4):517--580, 2003. Google ScholarDigital Library
- H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2):364--397, 2005. Google ScholarDigital Library
- N. Koudas, B. C. Ooi, K.-L. Tan, and R. Z. 0003. Approximate NN queries on streams with guaranteed error/performance bounds. In VLDB, pages 804--815, 2004. Google ScholarDigital Library
- Y.-N. Law and C. Zaniolo. An adaptive nearest neighbor classification algorithm for data streams. In PKDD, pages 108--120, 2005. Google ScholarDigital Library
- B. Liu, W.-C. Lee, and D. L. Lee. Distributed caching of multi-dimensional data in mobile environments. In Mobile Data Management, pages 229--233, 2005. Google ScholarDigital Library
- W. P. M. Polly and M. H. Wong. Efficient and robust feature extraction and pattern matching of time series by a lattice structure. In CIKM, pages 271--278, 2001. Google ScholarDigital Library
- S. Saltenis, C. S. Jensen, S. T. Leutenegger, and M. A. Lopez. Indexing the positions of continuously moving objects. In SIGMOD Conference, pages 331--342, 2000. Google ScholarDigital Library
- Y. Tao, R. Cheng, X. Xiao, W. K. Ngai, B. Kao, and S. Prabhakar. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In VLDB, pages 922--933, 2005. Google ScholarDigital Library
- J. K. Uhlmann. Metric trees. Appl. Math. Lett., 4(5):61--62, 1991.Google ScholarCross Ref
- J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett., 40(4):175--179, 1991.Google ScholarCross Ref
- M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. J. Keogh. Indexing multi-dimensional time-series with support for multiple distance measures. In KDD, pages 216--225, 2003. Google ScholarDigital Library
- M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD Conference, pages 131--142, 2004. Google ScholarDigital Library
- L. Wei, E. J. Keogh, H. V. Herle, and A. Mafra-Neto. Atomic wedgie: Efficient query filtering for streaming times series. In ICDM, pages 490--497, 2005. Google ScholarDigital Library
- O. Wolfson, A. P. Sistla, S. Chamberlain, and Y. Yesha. Updating and querying databases that track mobile units. Distributed and Parallel Databases, 7(3):257--387, 1999. Google ScholarDigital Library
- H. Wu, B. Salzberg, and D. Zhang. Online event-driven subsequence matching over financial data streams. In SIGMOD Conference, pages 23--34, 2004. Google ScholarDigital Library
- Y. Xia and S. Prabhakar. Q+Rtree: Efficient indexing for moving object database. In DASFAA, pages 175--182, 2003. Google ScholarDigital Library
- P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA, pages 311--321, 1993. Google ScholarDigital Library
Index Terms
- Querying time-series streams
Recommendations
Querying Time Series Data Based on Similarity
We study similarity queries for time series data where similarity is defined, in a fairly general way, in terms of a distance function and a set of affine transformations on the Fourier series representation of a sequence. We identify a safe set of ...
Querying Imprecise Data in Moving Object Environments
In moving object environments, it is infeasible for the database tracking the movement of objects to store the exact locations of objects at all times. Typically, the location of an object is known with certainty only at the time of the update. The ...
Querying data provenance
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataMany advanced data management operations (e.g., incremental maintenance, trust assessment, debugging schema mappings, keyword search over databases, or query answering in probabilistic databases), involve computations that look at how a tuple was ...
Comments