Abstract
Efficient query processing is one of the basic needs for data mining algorithms. Clustering algorithms, association rule mining algorithms and OLAP tools all rely on efficient query processors being able to deal with high-dimensional data. Inside such a query processor, multidimensional index structures are used as a basic technique. As the implementation of such an index structure is a difficult and time-consuming task, we propose a new approach to implement an index structure on top of a commercial relational database system. In particular, we map the index structure to a relational database design and simulate the behavior of the index structure using triggers and stored procedures. This can be easily done for a very large class of multidimensional index structures. To demonstrate the feasibility and efficiency, we implemented an X-tree on top of Oracle8. We ran several experiments on large databases and recorded a performance improvement up to a factor of 11.5 compared to a sequential scan of the database.
Similar content being viewed by others
References
Agrawal, R., Lin, K., Sawhney, H., and Shim, K. (1995). Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. In Proc. 21st Int. Conf. on Very Large Data Bases (pp. 490–501).
Agrawal, R. and Srikant, R. (1994). Fast Algorithms for Mining Association Rules. In Proc. 20th Int. Conf. on Very Large Data Bases, Chile (pp. 487–499).
Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. (1990). The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ (pp. 322–331).
Bentley, J.L. (1975). Multidimensional Search Trees Used for Associative Searching. Communications of the ACM, 18(9), 509–517.
Bentley, J.L. (1979). Multidimensional Binary Search in Database Applications. IEEE Trans. Software Eng., 4(5), 397–409.
Berchtold, S., Böhm, C., Braunmüller, B., Keim, D.A., and Kriegel, H.-P. (1997a). Fast Parallel Similarity Search in Multimedia Databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, AZ (pp. 1–12).
Berchtold, S., Böhm, C., Jagadish, H.V., Kriegel, H.-P., and Sander, J. (2000). Independent Quantization: An Index Compression Technique for High-Dimensional Spaces. In Proc. Int. Conf. on Data Engineering, San Diego, CA (pp. 577–588).
Berchtold, S., Böhm, C., Keim, D., and Kriegel, H.-P. (1997b). A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space. In ACM PODS Symposium on Principles of Database Systems, Tucson, AZ (pp. 78–86).
Berchtold, S., Böhm, C., and Kriegel, H.-P. (1998a). The Pyramid-Technique: Towards Indexing Beyond the Curse of Dimensionality. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA (pp. 142–153).
Berchtold, S., Böhm, C., and Kriegel, H.-P. (1998b). Improving the Query Performance of High-Dimensional Index Structures Using Bulk-Load Operations. In 6th. Int. Conf. on Extending Database Technology, Valencia, Spain (pp. 216–230).
Berchtold, S., Keim, D., and Kriegel, H.-P. (1996). The X-tree: An Index Structure for High-Dimensional Data. In Proc. 22nd Int. Conf. on Very Large Data Bases, Mumbay, India (pp. 28–39).
Böhm, C. (1998). Efficiently Indexing High-Dimensional Data Spaces. Ph.D. Thesis, Faculty for Mathematics and Computer Science, University of Munich.
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR (pp. 226–231).
Faloutsos, C. (1985). Multiattribute Hashing Using Gray Codes. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Austin, TX (pp. 227–238).
Faloutsos, C. and Roseman, S. (1989). Fractals for Secondary Key Retrieval. In Proc. 8th ACM SIGACT/SIGMOD Symp. on Principles of Database Systems (pp. 247–252).
Finkel, R. and Bentley, J.L. (1974). Quad Trees: A Data Structure for Retrieval of Composite Keys, Acta Informatica, 4(1), 1–9.
Guttman, A. (1984). R-trees: A Dynamic Index Structure for Spatial Searching. In Proc. ACMSIGMOD Int. Conf. on Management of Data, Boston, MA (pp. 47–57).
Hjaltason, G.R. and Samet, H. (1995). Ranking in Spatial Databases. In Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME (pp. 83–95).
Ho, C.T., Agrawal, R., Megiddo, N., and Srikant, R. (1997). Range Queries in OLAP Data Cubes. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, AZ (pp. 73–88).
Jagadish, H.V. (1990). Linear Clustering of Objects with Multiple Attributes. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ (pp. 332–342).
Jain, R. and White, D.A. (1996). Similarity Indexing: Algorithms and Performance. In Proc. SPIE Storage and Retrieval for Image and Video Databases IV, San Jose, CA, Vol. 2670 (pp. 62–75).
Katayama, N. and Satoh, S. (1997). The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, AZ (pp. 369–380).
Knorr, E.M. and Ng, R.T. (1998). Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proc. 24th Int. Conf. on Very Large Data Bases, New York City (pp. 392–403).
Lin, K., Jagadish, H.V., and Faloutsos, C. (1995). The TV-Tree: An Index Structure for High-Dimensional Data. VLDB Journal, 3, 517–542.
Lomet, D. and Salzberg, B. (1989). The hB-tree: A Robust Multiattribute Search Structure. In Proc. 5th IEEE Int. Conf. on Data Engineering, Los Angeles, CA (pp. 296–304).
Mehrotra, R. and Gary, J. (1993). Feature-Based Retrieval of Similar Shapes. In Proc. 9th Int. Conf. on Data Engineering, Vienna, Austria (pp. 108–115).
Nievergelt, J., Hinterberger, H., and Sevcik, K.C. (1984). The Grid File: An Adaptable, Symmetric Multikey File Structure, ACM Trans. on Database Systems, 9(1), 38–71.
Sander, J., Ester, M., Kriegel, H.-P., and Xu, X. (1998). Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications, Data Mining and Knowledge Discovery, 2(2), 169–184.
Wallace, T. and Wintz, P. (1980). An Efficient Three-Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors, Computer Graphics and Image Processing, 13, 99–126.
Weber, R., Schek, H.-J., and Blott, S. (1998).AQuantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In Proc. Int. Conf. on Very Large Data Bases, New York (pp. 194–205).
White, D.A. and Jain, R. (1996). Similarity Indexing with the SS-Tree. In Proc. 12th Int. Conf. on Data Engineering, New Orleans, LA (pp. 516–523).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Böhm, C., Berchtold, S., Kriegel, HP. et al. Multidimensional Index Structures in Relational Databases. Journal of Intelligent Information Systems 15, 51–70 (2000). https://doi.org/10.1023/A:1008729828172
Issue Date:
DOI: https://doi.org/10.1023/A:1008729828172