Abstract
The local intrinsic dimensionality (LID) model enables assessment of the complexity of the local neighbourhood around a specific query object of interest. In this paper, we study variations in the LID of a query, with respect to different subspaces and local neighbourhoods. We illustrate the surprising phenomenon of how the LID of a query can substantially decrease as further features are included in a dataset. We identify the role of two key feature properties in influencing the LID for feature combinations: correlation and dominance. Our investigation provides new insights into the impact of different feature combinations on local regions of the data.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
Suppose \(q=0\) \(\in \) \(\mathbb {R}\) and \(x_1=2 \in X\) are 1 dimensional data values. Then, \(x_1\) directly represents a distance value from q to itself along the X axis.
- 4.
In fact, our model allows \(F_X\) (or \(F_Y\)) to be a set of features, rather than a single feature, but for simplicity we will present in the context of being a single feature.
- 5.
References
Bouveyron, C., Celeux, G., Girard, S.: Intrinsic dimension estimation by maximum likelihood in probabilistic PCA. Pattern Recogn. Lett. 32, 1706–1713 (2011)
Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. In: ICCV, vol. 290, pp. 2323–2326 (2000)
Amsaleg, L., et al.: Extreme-value-theoretic estimation of local intrinsic dimensionality. DMKD 32(6), 1768–1805 (2018)
Amsaleg, L., et al.: Estimating local intrinsic dimensionality. In: SIGKDD, pp. 29–38 (2015)
Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: Proceedings of the Thirty-Fourth Annual ACM STOC, pp. 741–750 (2002)
Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: ICDMW, pp. 587–594 (2012)
Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: ICDMW, pp. 468–473 (2013)
Houle, M.E.: Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 64–79. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-68474-1_5
Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: ICDM, pp. 299–308 (2012)
Von Brünken, J., Houle, M., Zimek, A.: Intrinsic dimensional outlier detection in high-dimensional data. NII Technical Reports, pp. 1–12 (2015)
Houle, M.E., Schubert, E., Zimek, A.: On the correlation between local intrinsic dimensionality and outlierness. In: Marchand-Maillet, S., Silva, Y.N., Chávez, E. (eds.) SISAP 2018. LNCS, vol. 11223, pp. 177–191. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02224-2_14
Houle, M.E.: Inlierness, outlierness, hubness and discriminability: an extreme-value-theoretic foundation. NII Technical Reports, pp. 1–32 (2015)
Houle, M.E.: Local intrinsic dimensionality II: multivariate analysis and distributional support. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 80–95. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-68474-1_6
Coles, S.G.: An Introduction to Statistical Modeling of Extreme Values, vol. 208. Springer, London (2001). https://doi.org/10.1007/978-1-4471-3675-0
Rousu, D.N.: Weibull skewness and kurtosis as a function of the shape parameter. Technometrics 15(4), 927–930 (1973)
Pearson, K.: Contributions to the mathematical theory of evolution. II. skew variation in homogeneous material. Philos. Trans. R. Soc. Lond. Ser. A 186, 343–414 (1895)
Nelsen, R.B.: An Introduction to Copulas. Springer, New York (2006). https://doi.org/10.1007/0-387-28678-0
Takeuchi, T.: Constructing a bivariate distribution function with given marginals and correlation: application to the galaxy luminosity function. Mon. Not. R. Astron. Soc. 406, 1830–1840 (2010)
Kendall, M.G., Stuart, A., Ord, J.K. (eds.): Kendall’s Advanced Theory of Statistics. Oxford University Press Inc., Oxford (1987)
Kendall, M.G.: Rank and product-moment correlation. Biometrika 36(1/2), 177–193 (1949)
Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM TKDD 1(3), 14 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hashem, T., Rashidi, L., Bailey, J., Kulik, L. (2019). Characteristics of Local Intrinsic Dimensionality (LID) in Subspaces: Local Neighbourhood Analysis. In: Amato, G., Gennaro, C., Oria, V., Radovanović , M. (eds) Similarity Search and Applications. SISAP 2019. Lecture Notes in Computer Science(), vol 11807. Springer, Cham. https://doi.org/10.1007/978-3-030-32047-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-32047-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32046-1
Online ISBN: 978-3-030-32047-8
eBook Packages: Computer ScienceComputer Science (R0)