Abstract
The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is a very expensive operation. Existing high-dimensional kNN join algorithms were designed for static data sets and therefore cannot handle updates efficiently. In this article, we propose a novel kNN join method, named kNNJoin +, which supports efficient incremental computation of kNN join results with updates on high-dimensional data. As a by-product, our method also provides answers for the reverse kNN queries with very little overhead. We have performed an extensive experimental study. The results show the effectiveness of kNNJoin+ for processing high-dimensional kNN joins in dynamic workloads.
Similar content being viewed by others
Notes
In this paper, an update means an insertion, a deletion or a change to an existing data object.
Here kNN query only needs to find a new k-th NN.
References
The UCI KDD Archive (1999) KDD Cup 1999 Data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (http://kdd.ics.uci.edu)
Achtert E, Böhm C, Kröger P, Kunath P, Pryakhin A, Renz M (2006) Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In: SIGMOD’06: proceedings of the 2006 ACM SIGMOD international conference on management of data, pp 515–526
Berchtold S, Keim DA (1998) High-dimensional index structures database support for next decade’s applications (tutorial). In: SIGMOD ’98: proceedings of the 1998 ACM SIGMOD international conference on management of data, p 501
Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: Proceeding of the 7th international conference on database theory (ICDT), pp 217–235
Böhm C, Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373
Böhm C, Krebs F (2004) The k-nearest neighbor join: turbo charging the kdd process. Knowl Inf Syst (KAIS) 6(6):728–749
Böhm C, Kriegel H-P (2000) Dynamically optimizing high-dimensional index structures. In: Proceedings of the 7th international conference on extending database technology (EDBT), pp 36–50
Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: VLDB ’97: proceedings of the 23rd international conference on very large data bases, pp 426–435
Dasarathy BV (1991) Nearest neighbor (nn) norms - nn pattern classification techniques. IEEE Computer Society, Silver Spring
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD ’84: proceedings of the 1984 ACM SIGMOD international conference on management of data, pp 47–57
Hartigan J, Wong M (1979) A K-means clustering algorithm. Appl Stat 28:100–108
Huang X, Jensen CS, Saltenis S (2006) Multiple k nearest neighbor query processing in spatial network databases. In: ADBIS ’06: proceedings of 10th East European conference of advances in databases and information systems, pp 266–281
Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst (TODS) 30(2):364–397
Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. In: SIGMOD ’00: proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 201–212
Lin K-I, Jagadish HV, Faloutsos C (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3:517–542
Rafiei D, Mendelzon A (2000) Querying time series data based on similarity. IEEE Trans Knowl Data Eng 12(5):675–693
Tao Y, Papadias D, Lian X (2004) Reverse knn search in arbitrary dimensionality. In: VLDB ’04: Proceedings of the 30th international conference on very large data bases, pp 744–755
Tao Y, Yiu, ML, Mamoulis N (2006) Reverse nearest neighbor search in metric spaces. IEEE Trans Knowl Data Eng 18(9):1239–1252
Weber R, Schek H, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98: proceedings of the 24rd international conference on very large data bases, pp 194–205
Wong R, Tao Y, Fu A, Xiao X, Pryakhin A, Renz M (2007) On efficient spatial matching. In: VLDB ’07: proceedings of the 33rd international conference on very large data bases, pp 579–590
Xia C, Lu H, Ooi BC, Hu J (2004) Gorder: an efficient method for knn join processing. In: VLDB ’04: proceedings of the 30th international conference on very large data bases, pp 756–767
Yang C, Lin K (2001) An index structure for efficient reverse nearest neighbor queries In: Proceedings of the 17th international conference on data engineering (ICDE), pp 485–492
Yu C, Cui B, Wang S, Su J (2007) Efficient index-based knn join processing for high-dimensional data. Inf Softw Technol 49(4):332–344
Yu C, Ooi BC, Tan K-L, Jagadish HV (2001) Indexing the distance: an efficient method to knn processing. In: VLDB ’01: proceedings of the 27th international conference on very large data bases, pp 166–174
Zhang R, Koudas N, Ooi BC, Srivastava D (2005) Multiple aggregations over data streams. In: SIGMOD ’05: proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 299–310
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yu, C., Zhang, R., Huang, Y. et al. High-dimensional kNN joins with incremental updates. Geoinformatica 14, 55–82 (2010). https://doi.org/10.1007/s10707-009-0076-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-009-0076-5