High-dimensional kNN joins with incremental updates

Yu, Cui; Zhang, Rui; Huang, Yaochun; Xiong, Hui

doi:10.1007/s10707-009-0076-5

High-dimensional kNN joins with incremental updates

Published: 06 February 2009

Volume 14, pages 55–82, (2010)
Cite this article

GeoInformatica Aims and scope Submit manuscript

Cui Yu¹,
Rui Zhang²,
Yaochun Huang³ &
…
Hui Xiong⁴

777 Accesses
48 Citations
3 Altmetric
Explore all metrics

Abstract

The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is a very expensive operation. Existing high-dimensional kNN join algorithms were designed for static data sets and therefore cannot handle updates efficiently. In this article, we propose a novel kNN join method, named kNNJoin ⁺, which supports efficient incremental computation of kNN join results with updates on high-dimensional data. As a by-product, our method also provides answers for the reverse kNN queries with very little overhead. We have performed an extensive experimental study. The results show the effectiveness of kNNJoin⁺ for processing high-dimensional kNN joins in dynamic workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

In this paper, an update means an insertion, a deletion or a change to an existing data object.
Here kNN query only needs to find a new k-th NN.

References

The UCI KDD Archive (1999) KDD Cup 1999 Data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (http://kdd.ics.uci.edu)
Achtert E, Böhm C, Kröger P, Kunath P, Pryakhin A, Renz M (2006) Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In: SIGMOD’06: proceedings of the 2006 ACM SIGMOD international conference on management of data, pp 515–526
Berchtold S, Keim DA (1998) High-dimensional index structures database support for next decade’s applications (tutorial). In: SIGMOD ’98: proceedings of the 1998 ACM SIGMOD international conference on management of data, p 501
Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: Proceeding of the 7th international conference on database theory (ICDT), pp 217–235
Böhm C, Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373
Article Google Scholar
Böhm C, Krebs F (2004) The k-nearest neighbor join: turbo charging the kdd process. Knowl Inf Syst (KAIS) 6(6):728–749
Article Google Scholar
Böhm C, Kriegel H-P (2000) Dynamically optimizing high-dimensional index structures. In: Proceedings of the 7th international conference on extending database technology (EDBT), pp 36–50
Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: VLDB ’97: proceedings of the 23rd international conference on very large data bases, pp 426–435
Dasarathy BV (1991) Nearest neighbor (nn) norms - nn pattern classification techniques. IEEE Computer Society, Silver Spring
Google Scholar
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD ’84: proceedings of the 1984 ACM SIGMOD international conference on management of data, pp 47–57
Hartigan J, Wong M (1979) A K-means clustering algorithm. Appl Stat 28:100–108
Article Google Scholar
Huang X, Jensen CS, Saltenis S (2006) Multiple k nearest neighbor query processing in spatial network databases. In: ADBIS ’06: proceedings of 10th East European conference of advances in databases and information systems, pp 266–281
Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R (2005) iDistance: an adaptive B⁺-tree based indexing method for nearest neighbor search. ACM Trans Database Syst (TODS) 30(2):364–397
Article Google Scholar
Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. In: SIGMOD ’00: proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 201–212
Lin K-I, Jagadish HV, Faloutsos C (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3:517–542
Article Google Scholar
Rafiei D, Mendelzon A (2000) Querying time series data based on similarity. IEEE Trans Knowl Data Eng 12(5):675–693
Article Google Scholar
Tao Y, Papadias D, Lian X (2004) Reverse knn search in arbitrary dimensionality. In: VLDB ’04: Proceedings of the 30th international conference on very large data bases, pp 744–755
Tao Y, Yiu, ML, Mamoulis N (2006) Reverse nearest neighbor search in metric spaces. IEEE Trans Knowl Data Eng 18(9):1239–1252
Article Google Scholar
Weber R, Schek H, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98: proceedings of the 24rd international conference on very large data bases, pp 194–205
Wong R, Tao Y, Fu A, Xiao X, Pryakhin A, Renz M (2007) On efficient spatial matching. In: VLDB ’07: proceedings of the 33rd international conference on very large data bases, pp 579–590
Xia C, Lu H, Ooi BC, Hu J (2004) Gorder: an efficient method for knn join processing. In: VLDB ’04: proceedings of the 30th international conference on very large data bases, pp 756–767
Yang C, Lin K (2001) An index structure for efficient reverse nearest neighbor queries In: Proceedings of the 17th international conference on data engineering (ICDE), pp 485–492
Yu C, Cui B, Wang S, Su J (2007) Efficient index-based knn join processing for high-dimensional data. Inf Softw Technol 49(4):332–344
Article Google Scholar
Yu C, Ooi BC, Tan K-L, Jagadish HV (2001) Indexing the distance: an efficient method to knn processing. In: VLDB ’01: proceedings of the 27th international conference on very large data bases, pp 166–174
Zhang R, Koudas N, Ooi BC, Srivastava D (2005) Multiple aggregations over data streams. In: SIGMOD ’05: proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 299–310

Download references

Author information

Authors and Affiliations

Monmouth University, West Long Branch, NJ, 07764, USA
Cui Yu
University of Melbourne, Carlton, Victoria, 3053, Australia
Rui Zhang
University of Texas - Dallas, Dallas, TX, 75080, USA
Yaochun Huang
Rutgers, the State University of New Jersey, Newark, NJ, 07102, USA
Hui Xiong

Authors

Cui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yaochun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cui Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, C., Zhang, R., Huang, Y. et al. High-dimensional kNN joins with incremental updates. Geoinformatica 14, 55–82 (2010). https://doi.org/10.1007/s10707-009-0076-5

Download citation

Received: 02 July 2008
Revised: 01 December 2008
Accepted: 16 January 2009
Published: 06 February 2009
Issue Date: January 2010
DOI: https://doi.org/10.1007/s10707-009-0076-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional kNN joins with incremental updates

Abstract

Access this article

Similar content being viewed by others

Efficient kNN Join over Dynamic High-Dimensional Data

On reverse-k-nearest-neighbor joins

Efficient continuous kNN join over dynamic high-dimensional data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-dimensional kNN joins with incremental updates

Abstract

Access this article

Similar content being viewed by others

Efficient kNN Join over Dynamic High-Dimensional Data

On reverse-k-nearest-neighbor joins

Efficient continuous kNN join over dynamic high-dimensional data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation